Matchup yelp IDs with TA IDs not in Crosswalk #25

jhugman · 2016-10-24T14:37:02Z

Because Factual Crosswalk isn't reliable.

This would be a regular crawl, disconnected with a user hitting the the prox-server API.

This would also include add new Crosswalk records to the Factual database.

jhugman · 2016-10-24T14:39:25Z

As a user I want to get information from a variety of sources as often as possible.

jhugman · 2016-10-24T14:40:50Z

Build replacement for Factual Crosswalk API

mcomella · 2017-01-24T23:45:57Z

Prox shows information from all existing sources - Yelp, TripAdvisor and Wikipedia

mcomella · 2017-01-24T23:51:05Z

Steps:

Are we running out of Factual calls during the crawl and missing data because of that? (See also Create tooling so we know when we hit API limits in crawl script #88)
Figure out which Yelp places are missing TA data (re-use No issue: Add scripts/places_missing_data script. #86 ? Or maybe this is dependent on ^)
Create automation that matches Yelp -> TA (search by place name & gps in TA, do fuzzy name match to confirm). If it's correct often enough, use it, else...
Do ^, but have a human check over that the results are correct (and allow them to add in corrections).

As James mentions, it'd be great to give back our findings to Factual.

…(not just url).

mcomella · 2017-01-26T02:43:17Z

I discovered we can put and into TA's location_mapper API and it'll pass back IDs – no dealing with messy search!

In my first experiment, centered on the Ferry Building, 15/50 (30%) places were missing TA data:

8 are roughly uncorrectable (16% of the total)
- 6 are not in TA
- 2 have incorrect data in TA leading to mismatches ("Barbarossa" is named "Bubble Lounge" & TA thinks "Cafe Algiers" is closed)
2 may be a bug in my code (yay unicode) but the added accents are not in TA but are in Yelp so that could cause name mismatches (correctable if we remove accents before searching)
5 names are slightly different from Yelp to TA, causing a miss (e.g. "Mariposa Baking" on Yelp vs. "Mariposa" on TA).
- To correct these, we could make a search by coordinates without place names (assuming the API works this way), take the results, and do our own name matching

The remaining 35 places were correct.

The full analysis is in a gist.

Next TODO:

~~Try to fix unicode bug~~
~~(?) Make this verification easily reproducible~~
~~Verify results in other locations are consistent (e.g. 30% missing)~~
If consistent, try to correct for name mismatches
(Make code easily runnable, finding missing places from cache, getting our own crosswalk for these places, and updating the cache with these places)

mcomella · 2017-01-27T23:16:27Z

My current implementation puts the info directly into TA's location_mapper, taking the first result, and, if there are no results, strips any accents and tries again. A possible improvement is to remove the name query and do our own place matching.

With the current implementation, using the 50 best match Yelp results with our top level categories from an 800m radius around the following locations, I got the following results:

Ferry Building SF
- 72% (36/50) correct matches
- 5 name mismatches
- 6 places not on TA
- 2 incorrect data in TA causing mismatch
- 1 food truck (so matching location isn't really possible)
YVR office
- 82% (41/50) correct matches (1 place had different address between the two services)
- 8 name mismatches
- 1 place not on TA
Cloud Gate in Chicago
- 78% (39/50) correct matches
- 9 name mismatches
- 2 places not on TA

Notes:

"Best match" will likely be more prominent locations so there's more likely to be TA matches (I figured these are the locations we'd want to surface anyway).
Bolded name mismatches are potentially correctable (with the improvement mentioned above), but could introduce more error
My raw notes (with specific name mismatches & sfo, yvr., chi place list) can be found in this gist.
I've been storing a list of name mismatches in docs/yelp_at_name_mismatches.yml

Updated TODO:

We should compare our success rates against factual crosswalk (to have a metric of improvement).
- If we haven't improved much, consider the improvement mentioned above ^
? Test on distance sort, rather than best match, for a more realistic test
Find places missing TA data from place cache, store their crosswalk
Figure out how to merge crosswalk ^ into main code base

…add improvements notes. See code comments for details and improvement notes.

…(not just url).

…add improvements notes. See code comments for details and improvement notes.

…(not just url).

…add improvements notes. See code comments for details and improvement notes.

This will allow devs to check out yelp places in different areas to see how well TA matches.

…tails.

This will allow us to figure out which places don't have TA data so we can run crosswalk on it.

mcomella · 2017-01-30T23:21:37Z

Update for non-dense area and analysis of Factual crosswalk:

Data: 0.5km crawl around Nashville (36.162963, -86.780758) = 41 places. Adjusting for places where Yelp serves an area rather than a specific location (6), there are 35 places.

74% (26/35) correct matches
- 6 name mismatches
- 1 not on TA
- 2 food trucks
- Unioning the results of Factual crosswalk, we get to 31/35 (89%)
  - Removing the places that are uncorrectable, there is only one place missing ("Crazy Town Nashville") for 97% (31/32)
  - Factual gets 5 unique places, we get 15 unique
  - Factual alone gets 16 total places

Raw notes added to the gist.

Overall, it seems we're getting about 75% correct from this method. This one test of factual shows 46% correct for TA.

…add improvements notes. See code comments for details and improvement notes.

This will allow devs to check out yelp places in different areas to see how well TA matches.

…tails.

This will allow us to figure out which places don't have TA data so we can run crosswalk on it.

…ts notes. See code comments for details and improvement notes.

This will allow devs to check out yelp places in different areas to see how well TA matches.

This will allow us to figure out which places don't have TA data so we can run crosswalk on it.

mcomella · 2017-02-02T23:19:32Z

We can do yelp -> TA: we just need to integrate (#91).

jhugman changed the title ~~Matchup yelp IDs with TripAdivsor IDs~~ Matchup yelp IDs with TripAdivsor IDs not in Crosswalk Oct 24, 2016

jhugman added the enhancement label Nov 30, 2016

mcomella added this to the Sprint #6 – fix place data milestone Jan 24, 2017

mcomella changed the title ~~Matchup yelp IDs with TripAdivsor IDs not in Crosswalk~~ Matchup yelp IDs with TA IDs not in Crosswalk, wikipedia, websites, etc. Jan 24, 2017

mcomella self-assigned this Jan 24, 2017

mcomella mentioned this issue Jan 24, 2017

Use our crosswalk replacement in our crawls #91

Open

mcomella changed the title ~~Matchup yelp IDs with TA IDs not in Crosswalk, wikipedia, websites, etc.~~ Matchup yelp IDs with TA IDs not in Crosswalk Jan 24, 2017

mcomella mentioned this issue Jan 24, 2017

Match yelp IDs to websites #92

Closed

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 26, 2017

Issue mozilla-mobile#25: Add tripadvisor location_mapper/ search method.

039c31e

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 26, 2017

Issue mozilla-mobile#25: Refactor yelp resolution so can do with key …

5a0aa9e

…(not just url).

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 26, 2017

Issue mozilla-mobile#25: Add scripts/prox_crosswalk with yelp -> TA ids.

5bca3e3

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 26, 2017

Issue mozilla-mobile#25: Add fn to write ta crosswalk to DB.

7478367

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 27, 2017

Issue mozilla-mobile#25: Add scripts/prox_crosswalk with yelp -> TA ids.

f3044d2

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 27, 2017

Issue mozilla-mobile#25: Add fn to write ta crosswalk to DB.

5f5c206

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 27, 2017

Issue mozilla-mobile#25: Add accent correction for yelp->TA queries; …

41d0d5d

…add improvements notes. See code comments for details and improvement notes.

mcomella mentioned this issue Jan 28, 2017

No issue: Add scripts/places_missing_data script. #86

Closed

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Add tripadvisor location_mapper/ search method.

72c0025

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Refactor yelp resolution so can do with key …

0715b7a

…(not just url).

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Add scripts/prox_crosswalk with yelp -> TA ids.

b69106b

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Add fn to write ta crosswalk to DB.

396e389

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Add accent correction for yelp->TA queries; …

a06eb91

…add improvements notes. See code comments for details and improvement notes.

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Add geo.get_place_ids_in_radius.

b8615b7

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Add tripadvisor location_mapper/ search method.

0e97b38

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Refactor yelp resolution so can do with key …

263ebe3

…(not just url).

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Add scripts/prox_crosswalk with yelp -> TA ids.

c051459

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Add fn to write ta crosswalk to DB.

517229d

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Add accent correction for yelp->TA queries; …

acb2822

…add improvements notes. See code comments for details and improvement notes.

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Add yelp->ta verification fn.

0584889

This will allow devs to check out yelp places in different areas to see how well TA matches.

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Add geo.get_place_ids_in_radius.

ac68145

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Add request_handler.readCacheVenueIterableDe…

e172010

…tails.

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Add scripts.places_missing_provider_data.

248ede4

This will allow us to figure out which places don't have TA data so we can run crosswalk on it.

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 28, 2017

Issue mozilla-mobile#25: Add scripts.places_missing_provider_data.

0f8c055

This will allow us to figure out which places don't have TA data so we can run crosswalk on it.

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 31, 2017

Issue mozilla-mobile#25: Add clarification to TODO.

988fda8

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 31, 2017

Issue mozilla-mobile#25: Add fn to write ta to db.

a70b02f

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 31, 2017

Issue mozilla-mobile#25: Add accent correction for yelp->TA queries; …

b2ed8e6

…add improvements notes. See code comments for details and improvement notes.

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 31, 2017

Issue mozilla-mobile#25: Add yelp->ta verification fn.

308cbf9

This will allow devs to check out yelp places in different areas to see how well TA matches.

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 31, 2017

Issue mozilla-mobile#25: Add geo.get_place_ids_in_radius.

2e62c3a

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 31, 2017

Issue mozilla-mobile#25: Add request_handler.readCacheVenueIterableDe…

821eee0

…tails.

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 31, 2017

Issue mozilla-mobile#25: Add scripts.places_missing_provider_data.

bbe7bee

This will allow us to figure out which places don't have TA data so we can run crosswalk on it.

mcomella added a commit to mcomella/prox-server that referenced this issue Jan 31, 2017

Issue mozilla-mobile#25: Add fn to write ta to db.

32665cc

liuche pushed a commit that referenced this issue Feb 1, 2017

Issue #25: Add tripadvisor location_mapper/ search method.

3c44897

liuche pushed a commit that referenced this issue Feb 1, 2017

Issue #25: Refactor yelp resolution so can do with key (not just url).

4b60054

liuche pushed a commit that referenced this issue Feb 1, 2017

Issue #25: Add scripts/prox_crosswalk with yelp -> TA ids.

c6e44b9

liuche pushed a commit that referenced this issue Feb 1, 2017

Issue #25: Add fn to write ta crosswalk to DB.

40f4b9f

liuche pushed a commit that referenced this issue Feb 1, 2017

Issue #25: Add accent correction for yelp->TA queries; add improvemen…

053962f

…ts notes. See code comments for details and improvement notes.

liuche pushed a commit that referenced this issue Feb 1, 2017

Issue #25: Add yelp->ta verification fn.

ef5e289

This will allow devs to check out yelp places in different areas to see how well TA matches.

liuche pushed a commit that referenced this issue Feb 1, 2017

Issue #25: Add geo.get_place_ids_in_radius.

01aef58

liuche pushed a commit that referenced this issue Feb 1, 2017

Issue #25: Add request_handler.readCacheVenueIterableDetails.

844cbab

liuche pushed a commit that referenced this issue Feb 1, 2017

Issue #25: Add scripts.places_missing_provider_data.

532cc3e

This will allow us to figure out which places don't have TA data so we can run crosswalk on it.

liuche pushed a commit that referenced this issue Feb 1, 2017

Issue #25: Add fn to write ta to db.

d499591

mcomella closed this as completed Feb 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Matchup yelp IDs with TA IDs not in Crosswalk #25

Matchup yelp IDs with TA IDs not in Crosswalk #25

jhugman commented Oct 24, 2016 •

edited

Loading

jhugman commented Oct 24, 2016

jhugman commented Oct 24, 2016

mcomella commented Jan 24, 2017

mcomella commented Jan 24, 2017 •

edited

Loading

mcomella commented Jan 26, 2017 •

edited

Loading

mcomella commented Jan 27, 2017 •

edited

Loading

mcomella commented Jan 30, 2017 •

edited

Loading

mcomella commented Feb 2, 2017

Matchup yelp IDs with TA IDs not in Crosswalk #25

Matchup yelp IDs with TA IDs not in Crosswalk #25

Comments

jhugman commented Oct 24, 2016 • edited Loading

jhugman commented Oct 24, 2016

jhugman commented Oct 24, 2016

mcomella commented Jan 24, 2017

mcomella commented Jan 24, 2017 • edited Loading

mcomella commented Jan 26, 2017 • edited Loading

mcomella commented Jan 27, 2017 • edited Loading

mcomella commented Jan 30, 2017 • edited Loading

mcomella commented Feb 2, 2017

jhugman commented Oct 24, 2016 •

edited

Loading

mcomella commented Jan 24, 2017 •

edited

Loading

mcomella commented Jan 26, 2017 •

edited

Loading

mcomella commented Jan 27, 2017 •

edited

Loading

mcomella commented Jan 30, 2017 •

edited

Loading