Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should ground truth data be classified? #25

Closed
Nate-Wessel opened this issue May 8, 2018 · 1 comment
Closed

How should ground truth data be classified? #25

Nate-Wessel opened this issue May 8, 2018 · 1 comment
Labels
discussion the means of resolving this have yet to be decided

Comments

@Nate-Wessel
Copy link
Contributor

Nate-Wessel commented May 8, 2018

The purpose of the ground truth data is to test the performance of the algorithm on a known dataset. It seems to me that there are two broad potential approaches to this:

  1. We can classify the data according to what actually happened on the ground, just translated into the required language of discrete trips and activities.
  2. Or we can classify according to what we see in the GPS points, informed by what actually happened on the ground.

The ground truth data we currently have (my own) is a sloppy mix of these.

To give an example, should we include activity locations that we actually visited but that don't look like activities in coordinates.csv, perhaps because of missing or inaccurate data?

The benefit of producing a properly true ground truth is that we can measure how far our algorithm (considered as encompassing the app, the phone, etc.) is from actual reality as interpreted by the one who lived it, or at least from a more traditional activity survey.

The benefit of ground truth as manual classification of input data is that it tells us how far we are from the best possible results we can get from the data we have available.

My Reality > Phone's Reality > Our interpretation of Phone's Reality

@Nate-Wessel Nate-Wessel added the discussion the means of resolving this have yet to be decided label May 8, 2018
@Nate-Wessel
Copy link
Contributor Author

Discussed with Michael and Felipe. Consensus seemed to be that it's our job to interpret itinerum, itinerum's job to represent reality.

But I think we have to remember then that we have no idea (quantitatively) how well it does that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion the means of resolving this have yet to be decided
Projects
None yet
Development

No branches or pull requests

1 participant