Advanced Admin area scoring - taking population and popularity into account [PL-CG12] #46

dianashk · 2015-03-03T20:27:44Z

Consider various possible scoring systems:

Score admin areas by population
Score admin areas by search frequency/popularity

There has been a lot of work being done around scoring and it's crucial that we use population and popularity information correctly - this helps with coarse geocoding without a geobias.

related issues:

Distant Administrative area names should not appear in results unless they represent highly populated metropolitan areas [PL-CG04] #45
provide a popularity/population score pelias-deprecated/admin-lookup#5
Scoring based on the admin area it belongs to [PL-AG05] #51 takes popularity of the admin area it belongs to, into account. This issues aims at using population of the admin area into account as well.

hkrishna · 2015-04-01T17:07:00Z

Things that we attempted (highly experimental)

population: all records were getting population data from Quattroshapes (through pelias-admin-lookup) So, a point in New York, NY will have New York's Population score
popularity: all records were getting popularity data from Quattroshapes (through pelias-admin-lookup) So, a point in Chelsea, NY will have Chelsea's popularity.

This obviously didn't pan out the way we had hoped for. It worked well in few cases while it broke a few working cases.

Examples where it worked:

big ben with no geobias would return Big Ben, Whitehall, Greater London
wall street without any geobias would return Wall Street, Manhattan, NY as the top results.
Statue of liberty returned Statue of Liberty, Manhattan, NY

Example where it didn't work (regression)

paris with no geobias would return Paris, Brazil instead of Paris, France because Brazil's population was higher than France's.
paris with no geobias and not taking population score into account would return Paris, New York and not Paris, France because Paris, New York inherited New York's high popularity score.
There were other 40 failing test cases

Because of the overwhelming evidence of regression we've decided to close this issue, revert our changes to admin_lookup (admin_lookup should no longer aggregate population/popularity data for a region and assign said value to every point in that region - this clearly results in cases like Paris, NY which could be a cafe trumping Paris, France atleast till we have category ranking in place)

In conclusion:

Popularity in quattroshapes are unreliable and not consistent.
Popularity and population cannot be aggregated to all records contained in an admin area (for example: a point in SoHo shouldnt get population or popularity of SoHo)
It may be useful to add a field called admin_popularity that’d contain aggregated popularity value from vaious admins that the point belongs to
Using popularity from quattroshapes for just the centroids of shapes could help boost popular neighborhoods higher than others Use population and popularity values pelias-deprecated/quattroshapes#20
Use featureClass/ featureCode to make sure admin boundaries (town, city, state, country etc) from geonames has the right admin names from admin_lookup. Further, we should explore featureClass and featureCode more and use it in conjunction with category stuff that @PeliasPete is working on.. map FeatureClass and FeatureCode to the pelias taxonomy geonames#20

dianashk added the story label Mar 3, 2015

dianashk mentioned this issue Mar 3, 2015

Scoring based on the admin area it belongs to [PL-AG05] #51

Closed

hkrishna self-assigned this Mar 4, 2015

hkrishna changed the title ~~Admin area scoring [PL-CG12]~~ Advanced Admin area scoring - taking population and popularity into account [PL-CG12] Mar 5, 2015

sevko added the in review label Mar 19, 2015

sevko added in progress and removed in review labels Mar 27, 2015

hkrishna closed this as completed Apr 1, 2015

hkrishna removed the in progress label Apr 1, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Advanced Admin area scoring - taking population and popularity into account [PL-CG12] #46

Advanced Admin area scoring - taking population and popularity into account [PL-CG12] #46

dianashk commented Mar 3, 2015

hkrishna commented Apr 1, 2015

Advanced Admin area scoring - taking population and popularity into account [PL-CG12] #46

Advanced Admin area scoring - taking population and popularity into account [PL-CG12] #46

Comments

dianashk commented Mar 3, 2015

hkrishna commented Apr 1, 2015

Things that we attempted (highly experimental)

Examples where it worked:

Example where it didn't work (regression)

In conclusion: