Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary variants for single address #13

Closed
ghost opened this issue Feb 28, 2016 · 2 comments
Closed

Unnecessary variants for single address #13

ghost opened this issue Feb 28, 2016 · 2 comments

Comments

@ghost
Copy link

ghost commented Feb 28, 2016

grep -a "愛知県名古屋市南区豊田町" mecab-user-dict-seed.20160225.csv

名古屋市豊田町,1293,1293,-5820,名詞,固有名詞,地域,一般,,,愛知県名古屋市南区豊田町,ナゴヤシトヨダチョウ,ナゴヤシトヨダチョー
愛知県南区豊田町,1293,1293,-1981,名詞,固有名詞,地域,一般,,,愛知県名古屋市南区豊田町,アイチケンミナミクトヨダチョウ,アイチケンミナミクトヨダチョー
愛知県名古屋市南区豊田町,1293,1293,-19354,名詞,固有名詞,地域,一般,,,愛知県名古屋市南区豊田町,アイチケンナゴヤシミナミクトヨダチョウ,アイチケンナゴヤシミナミクトヨダチョー
愛知県名古屋市豊田町,1293,1293,-18608,名詞,固有名詞,地域,一般,,,愛知県名古屋市南区豊田町,アイチケンナゴヤシトヨダチョウ,アイチケンナゴヤシトヨダチョー

I think we don't need "名古屋市豊田町" "愛知県南区豊田町" "愛知県名古屋市豊田町".
https://www.google.co.jp/search?q="名古屋市豊田町"
4 results
https://www.google.co.jp/search?q="愛知県南区豊田町"
0 results
https://www.google.co.jp/search?q="愛知県名古屋市豊田町"
0 results

@neologd
Copy link
Owner

neologd commented Feb 28, 2016

Thank you for your comment.

When we use Google Custom Search API, we should pay 5$/1000 queries.
We can't use that filtering method.

And we think these entries are important to aggregate an unknown address string.
We need these variants.

In the future, we will provide an option for people who want to save RAM.
I think that your opinion will be reflected in at that time.

@neologd neologd closed this as completed Feb 28, 2016
@ghost
Copy link
Author

ghost commented Feb 29, 2016

I thought you generated the entries automatically from
Japan Post's "KEN_ALL.CSV".

e.g.
23112,"457 ","4570855","アイチケン","ナゴヤシミナミク","トヨダチョウ", "愛知県","名古屋市南区","豊田町",0,0,0,0,0,0

Ken + Chou ("愛知県" + "豊田町") is unnecessary in most cases,
so I thought you could remove the code that generates "Ken + Chou".

we think these entries are important to aggregate an unknown address string.

hm.
We don't see "Ken + Chou" on the internet in most cases,
and I think splitted "Ken|Chou" is enough.
Anyway this is not a bug and this is my opinion.

Thank you for the reply.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant