Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow additional sources for my.location #154

Closed
seanbreckenridge opened this issue Apr 3, 2021 · 7 comments · Fixed by #237
Closed

Allow additional sources for my.location #154

seanbreckenridge opened this issue Apr 3, 2021 · 7 comments · Fixed by #237
Labels

Comments

@seanbreckenridge
Copy link
Contributor

seanbreckenridge commented Apr 3, 2021

Dont expect any changes on this problem any time soon, just creating this issue to track the problem I'm having

Currently I have to overlay time.tz.via_location, since I use a different data source (combined from gpslogger, ips from google, facebook, discord etc)

On this repo, it uses location.google to grab that info. Slightly unrelated, but I've also parsed the takeout using lxml instead, so my structure there is different

I would prefer if there was a common entrypoint (like my.location.all) that could take multiple entrypoints as input, falling back to empty iterators if they aren't enabled/fail to be imported, as that would localize my overlayed changes to the my.location package

You can see the current structure for my.location here

.
├── all.py
├── gpslogger.py
├── ip.py
└── models.py

I created this following the discussion we had regarding merging pushshift data

I've also slightly modified the Location NT, so it can track whether this source was from an accurate (e.g. Google or gpslogger) or estimate (geolocation based on ip)

class Location(NamedTuple):
    lng: float
    lat: float
    dt: datetime
    # approximate accuracy
    # true means exact, false means its based on ip/auxiliary info
    accuracy: bool

Am a bit conflicted on how to handle this many data sources...

Would need some modifications, would probably create individual files for:

  • google
  • apple (locations from gdpr export)
  • ip-based (need inidividual empty fallbacks for blizzard, facebook, discord)
  • gpslogger

Some of those could stay on my branch if you're not interested in having them here, I think its more important to have the following here:

  • a common.py file, including:
  • a Location NT which all other location providers would convert their NT/DTs to
  • a merge_locations function, with the typical set/emitted behavior

Then, to enable additional location providers, I could either just overlay the all.py file, including my additional imports -- which probably wouldn't ever have to be changed

Something like what was described in the comment here

If there are no issues you foresee here, I'm willing to implement this at some point in the future.

Will probably not touch the location.google file here, except to create a standard interface across all the submodules which all.py would then import from.

Also, unsure if you settled on using all.py or main.py, I tend to prefer all.py for namespace packages which are merging multiple data sources

@seanbreckenridge
Copy link
Contributor Author

seanbreckenridge commented Feb 9, 2022

Once Im done with splitting the databases by type, I think it would make sense to move the google_takeout file here?

Since for locations here, this uses the location.google, could perhaps deprecate that and phase in using the full google takeout parser?

Could also merge the gpslogger module here, and then make location.all configurable, so I can remove the customizations I've done to my via_location and just use a custom all.py file in my.location.all

@seanbreckenridge
Copy link
Contributor Author

seanbreckenridge commented Feb 9, 2022

Another thing I've been thinking about for fallbacks (other than just location.home) is tracking flights?

Haven't looked into how to do that/if its possible, but is something I wanted to look into (maybe hooking into some API which has flight data?)

My locations only go back to about 2015, but if I can use old passports to figure out flights or something, it could go back to something like 2000

@Joshfindit
Copy link

Some other things to consider as well:

  • Flights can have both departure/arrival (with flight numbers, gates, terminals, etc), as well as GPS timestamps in-air (one of my favourite uses of early GPS loggers was someone plotting the entire flight as a 3D line).
  • Estimated location data can typically have a defined radius or zone that covers all possible locations in the estimate.

@karlicoss
Copy link
Owner

I've also slightly modified the Location NT, so it can track whether this source was from an accurate (e.g. Google or gpslogger) or estimate (geolocation based on ip)

btw maybe makes sense to keep accuracy in meters? it's actually present in google takeout data, e.g.

    "accuracy" : 6,
    "source" : "GPS"
    ...
    "accuracy" : 20,
    "source" : "WIFI"
  ...
    "accuracy" : 1014,
    "source" : "CELL"

Then for IP data it could be either some ridiculously big number (city radius? :) ), or just None?

Also, unsure if you settled on using all.py or main.py, I tend to prefer all.py for namespace packages which are merging multiple data sources

yeah let's got for all.py

Once Im done with seanbreckenridge/google_takeout_parser#5, I think it would make sense to move the google_takeout file here?

Yep, that would be great!
Are you thinking of my/google_takeout.py path? Or somewhere under my/google/takeout/?

@seanbreckenridge
Copy link
Contributor Author

seanbreckenridge commented Feb 9, 2022

accuracy in meters

Yeah, that makes sense -- the bool was a quick fix on my end

Are you thinking of my/google_takeout.py path? Or somewhere under my/google/takeout/?

reason I named it my/google_takeout.py to begin with was because otherwise there might be conflicts with the upstream files if someone was trying to use both for something

I think to allow for deprecation of the current google takeout files here, I could name it my/google/takeout/parser.py (to keep with the google_takeout_parser name. If needed can create an all.py file to perhaps handle merging sources in the future then

@karlicoss
Copy link
Owner

karlicoss commented Feb 9, 2022

Just a list of things to keep in mind for the migration to the new takeout parser:

  • extract this as a global variable (e.g. ABBR_TIMEZONES) and extend with stuff from user_forced() in the HPI module
  • strip off https://www.google.com/url?q= in Promnesia (currently happening here)
  • Titles might start with Visited now (previously was stipped off) -- might make sense to strip off in Promnesia
  • Activity.titleUrl might be None, need to handle gracefully in Promnesia (e.g. for Used Chrome entries)

@seanbreckenridge
Copy link
Contributor Author

Also updated the location to include accuracy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants