Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Advanced Admin area scoring - taking population and popularity into account [PL-CG12] #46

Closed
dianashk opened this issue Mar 3, 2015 · 1 comment
Assignees
Labels

Comments

@dianashk
Copy link
Contributor

dianashk commented Mar 3, 2015

Consider various possible scoring systems:

  • Score admin areas by population
  • Score admin areas by search frequency/popularity

There has been a lot of work being done around scoring and it's crucial that we use population and popularity information correctly - this helps with coarse geocoding without a geobias.

related issues:

@dianashk dianashk added the story label Mar 3, 2015
@hkrishna hkrishna self-assigned this Mar 4, 2015
@hkrishna hkrishna changed the title Admin area scoring [PL-CG12] Advanced Admin area scoring - taking population and popularity into account [PL-CG12] Mar 5, 2015
@hkrishna
Copy link
Contributor

hkrishna commented Apr 1, 2015

Things that we attempted (highly experimental)

  1. population: all records were getting population data from Quattroshapes (through pelias-admin-lookup) So, a point in New York, NY will have New York's Population score
  2. popularity: all records were getting popularity data from Quattroshapes (through pelias-admin-lookup) So, a point in Chelsea, NY will have Chelsea's popularity.

This obviously didn't pan out the way we had hoped for. It worked well in few cases while it broke a few working cases.

Examples where it worked:

  • big ben with no geobias would return Big Ben, Whitehall, Greater London
  • wall street without any geobias would return Wall Street, Manhattan, NY as the top results.
  • Statue of liberty returned Statue of Liberty, Manhattan, NY

Example where it didn't work (regression)

  • paris with no geobias would return Paris, Brazil instead of Paris, France because Brazil's population was higher than France's.
  • paris with no geobias and not taking population score into account would return Paris, New York and not Paris, France because Paris, New York inherited New York's high popularity score.
  • There were other 40 failing test cases

Because of the overwhelming evidence of regression we've decided to close this issue, revert our changes to admin_lookup (admin_lookup should no longer aggregate population/popularity data for a region and assign said value to every point in that region - this clearly results in cases like Paris, NY which could be a cafe trumping Paris, France atleast till we have category ranking in place)

In conclusion:

  1. Popularity in quattroshapes are unreliable and not consistent.
  2. Popularity and population cannot be aggregated to all records contained in an admin area (for example: a point in SoHo shouldnt get population or popularity of SoHo)
  3. It may be useful to add a field called admin_popularity that’d contain aggregated popularity value from vaious admins that the point belongs to
  4. Using popularity from quattroshapes for just the centroids of shapes could help boost popular neighborhoods higher than others Use population and popularity values  pelias-deprecated/quattroshapes#20
  5. Use featureClass/ featureCode to make sure admin boundaries (town, city, state, country etc) from geonames has the right admin names from admin_lookup. Further, we should explore featureClass and featureCode more and use it in conjunction with category stuff that @PeliasPete is working on.. map FeatureClass and FeatureCode to the pelias taxonomy geonames#20

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants