-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More activity types, and open sourcing the ML classifiers #34
Comments
Okay, I want to start on this now. I'll initially be doing it in several smaller stages, so that the small classifier changes can be tested individually in the wild, for improvements / regressions in accuracy results. Stage 1 will be items 1 and 2 from the notes list. So the stage 1 goal will be classifier feature / weighting tweaks, to hopefully eliminate the bogus transport type results that've been turning up in locations where they obviously shouldn't. I'll also probably throw in items 3 and 4, because life is short and people want things now, not later. So it's more important to get results quickly rather than achieving an exact mathematical science. I'll also probably throw in "tram" as a new transport type. Because literally hundreds of people have asked for it, and adding in one more base type will be an interesting first test for whether the other changes have been beneficial or not. Oh, I guess I'm supposed to be open sourcing the classifier code too. Hmm. I don't have a solid plan for that yet. So I'll eye the code up while I'm going along, and if there's anything base level that I can cleanly open source along the way, I'll do it then. Basically mix in the open sourcing process along with the classifier improvements. Alright ... time to get this happening. |
Stage 1 Todos
Stage 2 Todos
|
…seful for much, but it costs battery #34
I gave this one a couple of days on my test devices, and it was a disaster. The pedometer's data is far too gappy to make it feasible. A large portion of walking samples are recorded with zero cadence and zero step counts. Which results in the walking model getting bloated with zero stepHz values, and the classifier treating a large percentage of vaguely moving indoor samples as walking (eg playing with your phone while lying on the couch). If the building produces drifting location data, then the percentage of false positives goes up significantly. It's a mess. So that's a failed experiment, and definitely not going to ship!
This one is on the fence. It'll need more time and data before it becomes clear whether it's helped or hindered. I haven't been able to observe any positive difference yet, and there has possibly been some slight negative trend. But the activity types that would benefit from this most are types that are going through a rapid geographical coverage expansion at the moment, so they're going through the usual expected downward slope in reported accuracy, before starting to trend upwards. I'll give the coreMotionActivityType change another week or two before deciding. |
When will you please open sourcing the ML classifiers? |
Hi @dolmens! The ML classifiers are already open sourced. Have a look in the https://github.com/sobri909/LocoKit/tree/develop/LocoKit/Timelines/ActivityTypes |
Fantastic. Thank you for your excellent work. Read that code later. |
It's too early to start on this one yet, but implementation ideas have been popping up in my head all week, so I might as well get them written down.
Misc random thoughts
Ditch the coords pseudo counts for transport types that are coordinate bound (eg car, bus, train, and any future road/track/etc bound types).
Only include types in the transport classifier that have non zero sample/coord counts for the D2 region, to avoid over stuffing the classifiers with possible types.
Look at ditching some model features, due to possibly being a negative influence.
coreMotionActivityType
is the most likely candidate for ditching, due to being wrong more often than right. Next would becourseVariance
, due to being essentially meaningless for all types except stationary, since the introduction of the Kalmans.Try letting zero step Hz back in for walking. There's certainly enough data for the models now, and phones are newer and smarter, so there's less risk of false zeroes, and more data to cope gracefully with those cases now too.
Look at deepening / broadening the composite classifiers tree. Could consider clustering all "steps" based types in a subtree (walking, running, cycling, horse riding, maybe skateboarding, others). Consider clustering the road bound types (car, bus, tram?). And there's the open question of whether a third depth would make sense in some cases, for when there's too many types to be sensibly clustered in a single classifier.
I'll come back to this and add more thoughts over the next week or two. There's a bunch more details that have already crossed my mind, but I'm not remembering them right now...
The text was updated successfully, but these errors were encountered: