Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change in Google Takeout format? #78

Closed
lewurm opened this issue Jan 8, 2022 · 6 comments
Closed

Change in Google Takeout format? #78

lewurm opened this issue Jan 8, 2022 · 6 comments

Comments

@lewurm
Copy link

lewurm commented Jan 8, 2022

Quoting from https://github.com/mholt/timeliner/wiki/Data-Source:-Google-Photos :

  • Google can change the Takeout archive format at any time, breaking this implementation. Please help maintain this feature if you use it!

Did this happen now? Exports larger than 50gb will be split now:
gtakeout1
gtakeout2

While the first archive of a split seems to be accepted fine by timeliner import, the remaining archives do not print anything (even with -v) and exit after a few seconds.

I also tried to unpack all the files and repackage them into a single large one, but timeliner import fails right away:

2022/01/08 20:57:38 [ERROR][google_photos/me@gmail.com] Importing: importing: walking metadata.json: walking ._IMG_1337.HEIC.json: decoding item metadata file Takeout/Google Photos/Album2021/._IMG_1337.HEIC.json: invalid character '\x00' looking for beginning of value

Maybe that's related to the way I repackage it? The file headers look like this:

$ file takeout-20220106T172751Z-001.tgz takeout-20220106T172751Z-all.tgz
takeout-20220106T172751Z-001.tgz: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT), original size modulo 2^32 2273460736
takeout-20220106T172751Z-all.tgz: gzip compressed data, last modified: Fri Jan  7 20:52:46 2022, from Unix, original size modulo 2^32 395496448

where takeout-20220106T172751Z-all.tgz is my repackaged archive (on macOS).

Anyway that would be merely a workaround, but it would be great if timeliner import supports those split archives generated by Google.

@mholt
Copy link
Owner

mholt commented Jan 8, 2022

Good question. I haven't tried with split takeout files yet. Am mobile right now but want to get this working. Contributions / proposals welcome here 🙂

@lewurm
Copy link
Author

lewurm commented Jan 8, 2022

From a quick look, it seems like all .json files are in the first split archive only. I might dig into the source code a bit tomorrow 🙂

@mholt
Copy link
Owner

mholt commented Jan 9, 2022

Ohh that's interesting... hmm, and somewhat problematic. Will think on this. Let me know if you think of something!

@lewurm
Copy link
Author

lewurm commented Jan 12, 2022

So tried my repackaging idea again, but this time using GNU tar on macOS (brew install gnu-tar) and then timeliner at least doesn't trip:

$ cat takeout-20220106T172751Z-0*.tgz | gtar xzivf -
$ gtar -cvzf takeout-20220106T172751Z-all.tgz Takeout/

However, I still do not see GPS info in most pictures when doing timeliner import ... with the combined archive. Not sure what's going on, but it's definitely quite slow and does a lot of disk reading.

I was looking a bit at takeoutarchive.go regarding supporting multiple archives, but I think instead it would be easier and more performant if instead it would operate on the unpacked Takeout folder. It even looks like with archiver v4 that should be rather easy to do, while also keeping support for a single archive file?

@mholt
Copy link
Owner

mholt commented Jan 12, 2022

Nice find with the gnu-tar fix. I also wonder if filenames like ._* are macOS-only or something weird.

However, I still do not see GPS info in most pictures when doing timeliner import ... with the combined archive. Not sure what's going on, but it's definitely quite slow and does a lot of disk reading.

One thought... if they already existed in your timeline, it's possible that timeliner is skipping those ones entirely. Or maybe our EXIF reader just isn't finding the data in some files for some reason.

It even looks like with archiver v4 that should be rather easy to do, while also keeping support for a single archive file?

Yep, exactly, and I've already got that working locally in Timeliner's successor, Timelinize:

And was the primary motivation for writing archiver v4.

It's my nights-and-weekends project so I still have a lot to do before it's polished enough to share, but I'm making progress 💪

@mholt
Copy link
Owner

mholt commented Jan 19, 2024

I now have more info about Timelinize, as well as a Discord community if you want to help try it out and offer feedback. https://timelinize.com (also updated this project's README).

@mholt mholt closed this as completed Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants