Skip to content
This repository has been archived by the owner on Jan 20, 2022. It is now read-only.

provide a popularity/population score #5

Closed
hkrishna opened this issue Feb 24, 2015 · 6 comments
Closed

provide a popularity/population score #5

hkrishna opened this issue Feb 24, 2015 · 6 comments

Comments

@hkrishna
Copy link

When we import geonames, osm nodes, ways etc - we lookup what admin boundaries each point (lat/lon) belongs to and populate a document object. I think this lookup should return all admin info (admin0, admin1, neighborhood, locality, alpha3 etc) and a score (based on population, popularity, category scores of individual admin types).

This way when we search for 123 main st - 123 main st, new york, ny has a higher score than 123 main st, lnyxville, wi

@sevko
Copy link
Contributor

sevko commented Feb 24, 2015

This is easily doable. All Quattroshapes shapefile have a qs_pop (presumably "popularity" or "population") attributes; here's the coverage:

layer pop coverage
adm0 0
adm1 0
adm2 0
localadmin 23478 out of 110119
localities 95196 out of 163385

neighborhoods has a bunch of different values. While adm0, adm1, and adm2 don't have any, we can probably provide them for at least adm0 using another dataset (of country populations, for instance). Here are some sample queries for localadmin, localities, and neighborhoods, sorted by the respective popularity ratings:

localadmin

+---------------+---------------+---------------------------------+---------+
|    qs_adm0    |     qs_a1     |              qs_la              | qs_pop  |
+---------------+---------------+---------------------------------+---------+
| United States | Illinois      | Chicago                         | 2695598 |
| United States | New York      | Brooklyn                        | 2504700 |
| United States | New York      | Queens                          | 2230722 |
| United States | New York      | Manhattan                       | 1585873 |
| United States | Pennsylvania  | Philadelphia                    | 1526006 |
| United States | New York      | Bronx                           | 1385108 |
| United States | Ohio          | Columbus                        |  770122 |
| United States | New York      | Hempstead                       |  759757 |
| United States | Michigan      | Detroit                         |  713777 |
| United States | Massachusetts | Boston                          |  617594 |
| United States | Wisconsin     | Milwaukee                       |  594833 |
| United States | New York      | Brookhaven                      |  486040 |
| United States | New York      | Staten Island                   |  468730 |
| United States | Nebraska      | Omaha                           |  408958 |

localities

This one's a little screwy.

+----------------------------+---------------------------------------+----------------+----------+
|          qs_adm0           |                 qs_a1                 |     qs_loc     |  qs_pop  |
+----------------------------+---------------------------------------+----------------+----------+
| Brazil                     | SP                                    | São Paulo     | 19592271 |
| Brazil                     | RJ                                    | Rio de Janeiro | 11849940 |
| United Kingdom             | City and County of the City of London | London         |  8278251 |
| United Kingdom             | Surrey                                | London         |  8278251 |
| United Kingdom             | Surrey                                | London         |  8278251 |
| United Kingdom             | Essex                                 | London         |  8278251 |
| United Kingdom             | Surrey                                | London         |  8278251 |
| United Kingdom             | Surrey                                | London         |  8278251 |
| United Kingdom             | Kent                                  | London         |  8278251 |
| United Kingdom             | Hertfordshire                         | London         |  8278251 |
| United Kingdom             | Essex                                 | London         |  8278251 |
| Brazil                     | MG                                    | Belo Horizonte |  5100265 |
| Brazil                     | PE                                    | Recife         |  3677355 |
| Brazil                     | BA                                    | Salvador       |  3664096 |
| Bundesrepublik Deutschland | Berlin                                | Berlin         |  3416255 |

neighborhoods

+---------------+-----------+-----------------------+---------+
|   name_adm0   | name_adm1 |         name          |   pop   |
+---------------+-----------+-----------------------+---------+
| United States | New York  | Upper East Side       | 2619480 |
| United States | New York  | Koreatown             | 2619004 |
| United States | New York  | Theatre District      | 2616068 |
| United States | New York  | Garment District      | 2613966 |
| United States | New York  | NoHo                  | 2613472 |
| United States | New York  | Battery Park City     | 2611224 |
| United States | New York  | Midtown West          | 2609547 |
| United States | New York  | Financial District    | 2608558 |
| United States | New York  | Coney Island          | 2608417 |
| United States | New York  | West Side             | 2604324 |
| United States | New York  | Chelsea               | 2602084 |
| United States | New York  | Meat Packing District | 2599539 |
| United States | New York  | Willets Point         | 2599461 |
| United States | New York  | Flatiron District     | 2598346 |
| United States | New York  | San Juan Hill         | 2598246 |
| United States | New York  | North Side            | 2597870 |
| United States | New York  | Tenderloin            | 2592199 |
| United States | New York  | Clinton               | 2591931 |
| United States | New York  | Union Square          | 2590368 |
| United States | New York  | Southern Tip          | 2589176 |

@sevko
Copy link
Contributor

sevko commented Feb 24, 2015

An admin-lookup that extracts Quattro population data and adds it to incoming Documents (in addition to admin names) now lives in the admin-metadata branch.

@sevko
Copy link
Contributor

sevko commented Mar 2, 2015

Now that the Quattroshapes popularity pre-processing scripts have shipped, we're roughly ready to run an experimental import with admin popularity/population integrated. @hkrishna , when should we aim for?

@hkrishna
Copy link
Author

hkrishna commented Mar 2, 2015

How about early next week? We still need the current dev build with admin field schema change to roll over to prod. lets give it a few more days to be done.

@sevko
Copy link
Contributor

sevko commented Mar 2, 2015

Sounds good. Will prepare relevant branches.

@sevko
Copy link
Contributor

sevko commented Mar 17, 2015

Implemented in #10.

@sevko sevko closed this as completed Mar 17, 2015
@sevko sevko removed the in review label Mar 17, 2015
@sevko sevko changed the title Admin lookup should also provide a score provide a popularity/population score Apr 15, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants