Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype an anomaly detection model for highways #88

Open
bkowshik opened this issue Jul 5, 2017 · 2 comments
Open

Prototype an anomaly detection model for highways #88

bkowshik opened this issue Jul 5, 2017 · 2 comments

Comments

@bkowshik
Copy link
Contributor

bkowshik commented Jul 5, 2017

Ref: #80 and #69

tumblr_inline_o6kjvapgbs1ta78fg_540

We all know labelled data is gold in machine learning land. But, in the context of OpenStreetMap and osmcha, there are two things:

1. Labelled harmful highways

On osmcha, labelling happens at changeset level. A changeset is either good or harmful. But, there are scenarios where not all features of a changeset are harmful. So, we should not assume all features of harmful changeset are harmful. In Gabbar, we worked with changesets where one feature was touched thus, if the changeset was good, the only feature was good and if the changeset is harmful, the only feature was harmful as there was only one feature in the changeset.

This worked ok for a generic classifier, but in the highway classifier, the size of the dataset is too low. For example, the latest highway classier was trained on 2217 good highways and a mere 55 harmful highways. Yes, the number of harmful highways is low. This means, supervised learning algorithms might not be fed enough to be strong and healthy.

2. Labelled good highways

But, we have an abundance (comparatively) of labelled highway that are good. The 2217 changesets from ^ are there but there is even more. When a changeset is labelled good, it is safe to assume all features in the changeset are good. Which in-turn means, all features in the changeset are good too including the highway features. Yay!

There are 50,000+ changesets labelled on osmcha and assuming every changeset has atleat one highway as highway are one among the frequently edited features on OpenStreetMap, we could potentially have around 50,000+ labelled good highways. This might be an interesting scenario to try anomaly detection models.

From https://en.wikipedia.org/wiki/Anomaly_detection

anomaly detection (also outlier detection) is the identification of items, events or observations which do not conform to an expected pattern or other items in a dataset.

Another potentially big advantage of anomaly detection models is that they flag when things are different than expected. This means, we are now not limited by the different types of harmful edits we have seen or given the model for training but in a way are ready for new and unknown types of anomalies. One important thing about anomaly detection is these models don't tell you whether a changeset is good or bad, they tell you if is something expected or something different.


cc: @anandthakker @geohacker @batpad

@bkowshik
Copy link
Contributor Author

bkowshik commented Jul 6, 2017

We have initial results from the anomaly detection model.

The following are results on the small validation dataset which includes:

  • 399 highways labelled good (potential inliers)
  • 55 highways labelled harmful (potential outliers)

Confusion matrix

Predicted harmful Predicted good
Labelled harmful 40 15
Labelled good 41 358

Classification report

                precision    recall  f1-score   support

        -1      0.49        0.73      0.59        55
        1       0.96        0.90      0.93       399

avg / total     0.90        0.88      0.89       454

@bkowshik
Copy link
Contributor Author

bkowshik commented Jul 8, 2017

Initial results

Anomaly detection algorithms won't tell you whether a feature or a feature modification is good or harmful. Instead, the models flag identify outliers, data points that are different in comparison to the rest of the sample set.

A highway now open after construction! 🎆

screen shot 2017-07-08 at 9 47 19 pm

Residential highway's don't tend to connect towns

screen shot 2017-07-08 at 10 02 32 pm

A highway=path eventually becomes waterway=river

screen shot 2017-07-08 at 10 12 58 pm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant