Reviewing labelled and predicted changesets from feature classifier #58

bkowshik · 2017-06-14T05:34:17Z

With results from #43, I plan to review 15 changesets in each of the following categories:

Labelled problematic and predicted problematic
Labelled problematic and predicted good
Labelled good and predicted good
Labelled good and predicted problematic

The text was updated successfully, but these errors were encountered:

bkowshik · 2017-06-14T05:58:34Z

Labelled problematic and predicted problematic

Duplicate looking tags

One of the attributes used for training was the number of duplicate looking tags in the feature
The idea was to flag features with landuse, landuse_1, etc
Not every duplicate looking tag is a 👎 though.
Good tags to have
- building, building:level, etc
- addr, addr:city, etc
Not so good to have
- landuse, landuse_1, ...
- surface, surface_1, ...

Users with blocks

A feature gets an old_name=Порт-Артура in https://osmcha.mapbox.com/44818408/
But, this might not be problematic as is without the larger context of user having a user block
User block: http://www.openstreetmap.org/user_blocks/1147
The user does start using old_name tag, but still is part of the giant reverted
Model don't have the additional context of the user block

Adding `leisure=park`

We have seen lots of activity around converting existing features to parks.
This should totally be part of our Regression Test Suite

Personal information

Often, mappers add some personal information onto OSM
Feature names could have things like home, My friend's place, etc
Ex: https://osmcha.mapbox.com/46374766/

bkowshik · 2017-06-14T06:09:59Z

Labelled problematic and predicted good

Additional context

Sometimes, where the feature is on the map is crucial to understand the feature
In https://osmcha.mapbox.com/47898163/, a highway=footway is good as is
But, when we add environment the feature is in, we see there already exists a highway=footway
Similarly, in https://osmcha.mapbox.com/46383265/ we need to know about the existence of a similar feature nearby, a hospital with the same name in this case.
A park with same name nearby: https://osmcha.mapbox.com/47051957/

Impossible tags

There are some features and should not have some tags.
Ex: The node Paris gets a shop=bicycle in https://osmcha.mapbox.com/46978170/
Note: Value for name:en=France is inappropriate too

bkowshik · 2017-06-14T06:42:49Z

Labelled good and predicted good

Name translations

The model currently is making a guess when there is a name translation
Ex: In https://osmcha.mapbox.com/48119284/, name:en=Chalcis was added
The only related attributes the model gets are feature_name_translation_new_version=8 and feature_name_translation_old_version=7

Wikidata

There are inherent properties about every tag and value. Ex: Every Wikidata tag should start with a Q? Ex: Q6529766
In https://osmcha.mapbox.com/48448760/, the Wikidata tag has the value of Wikipedia ru:Церковь_Святого_Филиппа_(Ташкент)

General to specific

When a value goes from a general value to a more specific value, it mostly is a good thing
Ex: in https://osmcha.mapbox.com/48044675/, we go from building=yes -> building=school

bkowshik · 2017-06-14T07:16:41Z

Labelled good and predicted problematic

Pure geometry modifications

Looks like the model does not have enough information to make this decision
A majority of features we currently have for the model are property based
The attributes that come close to being relevant are:
- feature_area and feature_area_old: Since the feature is a node have a value of 0
- The values for leisure and sport remain unchanged so might not be very useful
We definitely need more geometry based attributes. Ex:
- Distance between new version and old version of feature
- Number of nodes in the feature

Feature name

If the value of name tag does not have any profanity, it should mostly be good.
Percentage of name modification could a feature we could explore. Ex:
- Levenshtein distance based scores.
- In, https://osmcha.mapbox.com/46823062/, the distance between value of name tag is 8
- Could be useful here as well: https://osmcha.mapbox.com/46987866/

Redundant tags

There are tags that are reference only and do not contribute similarly to other properties
In https://osmcha.mapbox.com/47919135/, the feature gets a description.
Skip checking such tags could be a wise thing to do at the current state of the project

bkowshik · 2017-06-22T16:34:26Z

No next actions here. Closing.

bkowshik mentioned this issue Jun 21, 2017

Weekly update from Gabbarland #26

Open

bkowshik closed this as completed Jun 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reviewing labelled and predicted changesets from feature classifier #58

Reviewing labelled and predicted changesets from feature classifier #58

bkowshik commented Jun 14, 2017 •

edited

Loading

bkowshik commented Jun 14, 2017 •

edited

Loading

bkowshik commented Jun 14, 2017

bkowshik commented Jun 14, 2017

bkowshik commented Jun 14, 2017

bkowshik commented Jun 22, 2017

Reviewing labelled and predicted changesets from feature classifier #58

Reviewing labelled and predicted changesets from feature classifier #58

Comments

bkowshik commented Jun 14, 2017 • edited Loading

bkowshik commented Jun 14, 2017 • edited Loading

Labelled problematic and predicted problematic

Duplicate looking tags

Users with blocks

Adding leisure=park

Personal information

bkowshik commented Jun 14, 2017

Labelled problematic and predicted good

Additional context

Impossible tags

bkowshik commented Jun 14, 2017

Labelled good and predicted good

Name translations

Wikidata

General to specific

bkowshik commented Jun 14, 2017

Labelled good and predicted problematic

Pure geometry modifications

Feature name

Redundant tags

bkowshik commented Jun 22, 2017

bkowshik commented Jun 14, 2017 •

edited

Loading

bkowshik commented Jun 14, 2017 •

edited

Loading

Adding `leisure=park`