Skip to content
This repository has been archived by the owner on Nov 11, 2021. It is now read-only.

Cache the diffs #30

Closed
klahnakoski opened this issue Mar 16, 2018 · 3 comments
Closed

Cache the diffs #30

klahnakoski opened this issue Mar 16, 2018 · 3 comments

Comments

@klahnakoski
Copy link
Contributor

Caching the diffs may be useful, at the very least knowing what files are touched by what revisions will help us store a sparse table of tuids:

CREATE TABLE diffs (
    file VARCHAR,
    changeset VARCHAR,
    diff VARCHAR
)

the diff column is not the regular diff, rather some serialization of a structure that tells us what are new lines (+), deleted lines (-), or unchanged (|). Maybe a string with a length that matches the source file:

||||||+++--||||||||||||||||||+|||||-

or a run-length-encoded version of the same

[("|", 6), ("+", 3), ("-", 2), ("|", 18), ("+", 1), ("|", 5), ("-", 1)]

or something smarter.

Even so, this diff may not be needed since it should have been applied to the subsequent changeset, and stored in the tuid table already.

@klahnakoski
Copy link
Contributor Author

One reason to store the diffs is if the logic that collects all the new diffs is separate from the logic that serially applies them to the files between the known revision and target revision.

@klahnakoski
Copy link
Contributor Author

If this caching is done in hg_mozilla_org, then the ETL machines can ensure that it is up to date, and the tuid service can pull diffs faster.

@gmierz
Copy link
Collaborator

gmierz commented May 8, 2018

Diffs are now cached by ES and obtained with HgMozillaOrg: https://github.com/mozilla/TUID/blob/dev/tuid/service.py#L179

@gmierz gmierz closed this as completed May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants