Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More activity types, and open sourcing the ML classifiers #34

Closed
sobri909 opened this issue Jul 29, 2018 · 6 comments
Closed

More activity types, and open sourcing the ML classifiers #34

sobri909 opened this issue Jul 29, 2018 · 6 comments
Assignees

Comments

@sobri909
Copy link
Owner

sobri909 commented Jul 29, 2018

It's too early to start on this one yet, but implementation ideas have been popping up in my head all week, so I might as well get them written down.

Misc random thoughts

  1. Ditch the coords pseudo counts for transport types that are coordinate bound (eg car, bus, train, and any future road/track/etc bound types).

  2. Only include types in the transport classifier that have non zero sample/coord counts for the D2 region, to avoid over stuffing the classifiers with possible types.

  3. Look at ditching some model features, due to possibly being a negative influence. coreMotionActivityType is the most likely candidate for ditching, due to being wrong more often than right. Next would be courseVariance, due to being essentially meaningless for all types except stationary, since the introduction of the Kalmans.

  4. Try letting zero step Hz back in for walking. There's certainly enough data for the models now, and phones are newer and smarter, so there's less risk of false zeroes, and more data to cope gracefully with those cases now too.

  5. Look at deepening / broadening the composite classifiers tree. Could consider clustering all "steps" based types in a subtree (walking, running, cycling, horse riding, maybe skateboarding, others). Consider clustering the road bound types (car, bus, tram?). And there's the open question of whether a third depth would make sense in some cases, for when there's too many types to be sensibly clustered in a single classifier.

  • Need to flesh out the storage model for user custom types. And for custom types that have the potential to being upgraded to shared types. Things like naming specific train lines, naming specific roads, etc. These might initially come in as custom user types, but there needs to be a privacy conscious, opt-in path to upgrading these to shared types, if it makes sense to do so.

I'll come back to this and add more thoughts over the next week or two. There's a bunch more details that have already crossed my mind, but I'm not remembering them right now...

@sobri909 sobri909 self-assigned this Jul 29, 2018
@sobri909
Copy link
Owner Author

Okay, I want to start on this now.

I'll initially be doing it in several smaller stages, so that the small classifier changes can be tested individually in the wild, for improvements / regressions in accuracy results.

Stage 1 will be items 1 and 2 from the notes list. So the stage 1 goal will be classifier feature / weighting tweaks, to hopefully eliminate the bogus transport type results that've been turning up in locations where they obviously shouldn't.

I'll also probably throw in items 3 and 4, because life is short and people want things now, not later. So it's more important to get results quickly rather than achieving an exact mathematical science.

I'll also probably throw in "tram" as a new transport type. Because literally hundreds of people have asked for it, and adding in one more base type will be an interesting first test for whether the other changes have been beneficial or not.

Oh, I guess I'm supposed to be open sourcing the classifier code too. Hmm. I don't have a solid plan for that yet. So I'll eye the code up while I'm going along, and if there's anything base level that I can cleanly open source along the way, I'll do it then. Basically mix in the open sourcing process along with the classifier improvements.

Alright ... time to get this happening.

@sobri909
Copy link
Owner Author

sobri909 commented Aug 11, 2018

Stage 1 Todos

  • Stop using coreMotionActivityType [Live in Arc App v2.1.9]
  • Remove the special case rejection of zero stepHz values for walking (Maybe? As above, number hunt first)
  • Ditch pseudo counts for location bound types (car, train, bus, etc)
  • If best match score is too low, fall back to "transport", to avoid the zero pseudo counts causing new regions to be classify everything as walking (or whatever the best bad match happens to be).
  • Add "tram", because I want to give people something more than just "trust me, it's better" in the first iteration

Stage 2 Todos

  • Stop creating empty UD models (ie Arc's private models) for non-core types
  • Ditch the "transport" meta type and consolidate into single classifier
  • Only auto create GD models server side for base types
  • Stop generating / updating GD transport models server side
  • Make sure the zero pseudocount coord maps are definitely happening
  • Only use D2 models for extended types (ie coordinate bound types)

@sobri909
Copy link
Owner Author

Remove the special case rejection of zero stepHz values for walking (Maybe? As above, number hunt first)

I gave this one a couple of days on my test devices, and it was a disaster.

The pedometer's data is far too gappy to make it feasible. A large portion of walking samples are recorded with zero cadence and zero step counts. Which results in the walking model getting bloated with zero stepHz values, and the classifier treating a large percentage of vaguely moving indoor samples as walking (eg playing with your phone while lying on the couch). If the building produces drifting location data, then the percentage of false positives goes up significantly. It's a mess.

So that's a failed experiment, and definitely not going to ship!

Stop using coreMotionActivityType [Live in Arc App v2.1.9]

This one is on the fence. It'll need more time and data before it becomes clear whether it's helped or hindered.

I haven't been able to observe any positive difference yet, and there has possibly been some slight negative trend. But the activity types that would benefit from this most are types that are going through a rapid geographical coverage expansion at the moment, so they're going through the usual expected downward slope in reported accuracy, before starting to trend upwards.

I'll give the coreMotionActivityType change another week or two before deciding.

sobri909 added a commit that referenced this issue Aug 24, 2018
sobri909 added a commit that referenced this issue Sep 20, 2018
@sobri909 sobri909 changed the title More activity types, custom types, and open sourcing the ML classifiers More activity types, and open sourcing the ML classifiers Sep 21, 2018
sobri909 added a commit that referenced this issue Sep 22, 2018
@sobri909 sobri909 closed this as completed Nov 4, 2018
@dolmens
Copy link

dolmens commented Dec 30, 2018

When will you please open sourcing the ML classifiers?

@sobri909
Copy link
Owner Author

Hi @dolmens! The ML classifiers are already open sourced. Have a look in the develop branch under Timelines/ActivityTypes.

https://github.com/sobri909/LocoKit/tree/develop/LocoKit/Timelines/ActivityTypes

@dolmens
Copy link

dolmens commented Dec 30, 2018

Fantastic. Thank you for your excellent work. Read that code later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants