Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing cities and countries #20

Closed
pythobot opened this issue May 16, 2020 · 5 comments
Closed

Missing cities and countries #20

pythobot opened this issue May 16, 2020 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@pythobot
Copy link

Hello,

I have started trying out this library, but it seems to be missing cities and countries mostly from South America. What's the best way to update the cities.json and countries.json files? Is it ok just to add the data in there manually?

Also, how can this library map Shanghai as China, where is that relation mapped? why does it not behave the same for Caracas?

>>> geotext.extract(input_text="Living in Caracas", span_info=True)
{'cities': {'Caracas': {'count': 1, 'span_info': [(10, 17)]}}, 'countries': {}}

Thanks in advance!

@iwpnd
Copy link
Owner

iwpnd commented May 16, 2020

Hi, thanks for using the library. :)

So, the demo data is super incomplete and only there to show the capabilities. You can however simply bring your own data as shown.

Also, how can this library map Shanghai as China, where is that relation mapped? why does it not behave the same for Caracas?

In the example, China is only found because it is specifically mentioned. Relations are not yet part of the library.

@nfrik
Copy link

nfrik commented Jun 12, 2020

I agree, unfortunately, the library doesn't recognize some countries/cities if typed with lower case:

geotext.extract(input_text='london is cool tashkent is not bad, but ukraine is not a city')

returns:

{'cities': {'London': {'count': 1, 'span_info': [(0, 6)]}}, 'countries': {}}

@iwpnd
Copy link
Owner

iwpnd commented Jun 12, 2020

Hi @nfrik and thanks for your interest.

flashgeotext is case sensitive for two reasons.

  1. Cities and countries are named entitities and therefor capitalized more often than not in sources such as newspaper articles, wikipedia articles, academic publications etcpp.
  2. deliberately narrowing it down to a case sensitive lookup decreases the amount of false positive hits you get, when you want to also consider a bunch of synonyms for a city.

What I do agree with you on, is giving the user at least the option to turn off case sensitive lookup. This is a quick fix. I'm happy to receive a PR. Otherwise I work on it when I find the time to. :)

@Za-Re
Copy link

Za-Re commented Jun 30, 2020

Hi @iwpnd

Enabling case sensitive option would be great. You're right about named entities in newspapers and articles. However another potential scenario could be extracting country names from social media posts and comments, e.g. in my case Tweets. They don't follow capitalization strictly. In this case, this option would be really helpful.

@iwpnd iwpnd added the enhancement New feature or request label Jul 2, 2020
@iwpnd iwpnd self-assigned this Jul 2, 2020
@iwpnd
Copy link
Owner

iwpnd commented Feb 28, 2021

I know this is super old but I finally addressed it in v0.3.0 pending pip release.

@iwpnd iwpnd closed this as completed Feb 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants