Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation on how google handles duplicates #22

Closed
johncolby opened this issue Dec 21, 2020 · 3 comments
Closed

Add documentation on how google handles duplicates #22

johncolby opened this issue Dec 21, 2020 · 3 comments

Comments

@johncolby
Copy link

I was stumped for awhile, thinking the script was not grabbing the dates for my emails correctly, but it was actually an issue with how google handles multiple copies of the same email (apparently it keeps the metadata from the first one for all copies).

What happened was:

  1. Upload an mbox (which just happens to have the wrong datetimes in the default from time field.
  2. Notice that the date is wrong in gmail.
  3. Trash email on gmail.
  4. Try the script again with the correct (for my use case) time-fields=date,received parameter.
  5. Notice on google that message has been re-uploaded, but still has the wrong date!

The issue was that the first/wrong message was still in my Trash on google, and not actually deleted. Soo, when I was trying the re-upload with the correct date, I think the google servers see the duplicate and simply un-trash it (complete with the original incorrect date).

Anyway, maybe you could add a mention of this behavior to your docs? Maybe it'll help other people in who encounter the same situation.

Thanks for the AWESOME work!

@rgladwell
Copy link
Owner

Thanks for reporting this.

It is an interesting issue. I'm a bit cautious about adding this to the docs without understand what is really going on.

What sort of text were you thinking?

@johncolby
Copy link
Author

From googling around a bit, it seems Gmail only stores a single copy of each message, according to its message-ID uuid, and discards any future duplicates on arrival.

Maybe just a sentence to that effect? No worries if it sounds like too much of an edge case to warrant documentation...even just having this issue documented will be helpful to future searches. Feel free to close. Thanks again for this super useful tool! 👍

rgladwell added a commit that referenced this issue Jan 7, 2021
@rgladwell
Copy link
Owner

rgladwell commented Jan 7, 2021

Closed by #20. I'm a big fan of DRY and I think this is a nice solution to supplying a list of known issues, with all the relevant information and discussion in one place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants