Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fsck: Mark commits with missing or deleted object partial #1533

Closed
wants to merge 3 commits into from

Conversation

alexlarsson
Copy link
Member

This means we can later use various operations to heal the repository because ostree does not assume all objects are there.

This the begining of a fix for #345

@cgwalters
Copy link
Member

For this I'd initially thought of moving the guts of fsck into the library, much like we did in 5c94098

Adding new public API for commit parents feels somewhat special case to this...or, hm, maybe someone would want to know "what commits refer to this object" for space reduction purposes.

old_parents = g_hash_table_lookup (inout_parents, key);
if (old_parents == NULL)
{
/* For the common case of a single pointer we skip using an array to save memory. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One worry I have with this is that there's also going to be a class of objects which are basically in every single commit. It feels like we may want to distinguish between client-side versus server-side repositories.

In the server side case, if I have say 100 commits of history, and I'm taking the same base component set across my builds (say bash binaries) or whatever, then every one of those objects is going to point to all of the parents.

Throwing together some numbers, currently:

# ostree ls -R fedora-27:fedora/27/x86_64/workstation | wc -l
129280

Say we have server side 100 commits (not unreasonable?) and between them about 75% of the files are in common...that's about:

gjs> (((100000*0.75)*100)*8)/(1024*1024)
57.220458984375

57 megs of pointer data. Not a big deal I guess. I just wanted to work through it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, the flathub repo currently has 2822 refs, and 7530 commits. However, we don't share anywhere near 75% of files, because apps are different, and we don't keep long histories.

There are certain objects that are very common though, like the "default" dirmeta, which will be in essentially every commit... Not going to be a lot of these though.

g_autofree GVariant **new_parents = NULL;
gsize len, i;

len = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor but would you mind using declare-and-initialize for new code?

@alexlarsson
Copy link
Member Author

Yeah, for historical reasons the API with a separate hashtable is a bit ugly, but it seems like the backtracking could be generally useful even outside fsck.

@rh-atomic-bot
Copy link

☔ The latest upstream changes (presumably eb506c7) made this pull request unmergeable. Please resolve the merge conflicts.

This is a version of ostree_repo_traverse_commit_union that also
remembers where the objects came from, by recording the parent
relationships in a hashtable. This can be used to later find which
commits each object was from, which we want to use in fsck.
This means we can later use various operations to heal the repository
because ostree does not assume all objects are there.

This the begining of a fix for ostreedev#345
@cgwalters
Copy link
Member

Rebased 🏄‍♂️

@cgwalters
Copy link
Member

I also added a test to itest-pull.sh which is a more "realistic" data set, and verified that re-doing a pull gets us the data back.

One thing I hit on is that this only works for .file content objects; deleting a .dirtree/.dirmeta will throw an error during traversal. That's something to fix for later; since file objects are hardlinked (and a lot larger) they're the most likely ones to be corrupted.

@rh-atomic-bot r+ ac9a98f

@rh-atomic-bot
Copy link

⌛ Testing commit ac9a98f with merge 474556b...

rh-atomic-bot pushed a commit that referenced this pull request Apr 14, 2018
This means we can later use various operations to heal the repository
because ostree does not assume all objects are there.

This the begining of a fix for #345

Closes: #1533
Approved by: cgwalters
@rh-atomic-bot
Copy link

☀️ Test successful - status-atomicjenkins
Approved by: cgwalters
Pushing 474556b to master...

@@ -30,6 +30,20 @@ ostree --repo=bare-repo init --mode=bare-user
ostree --repo=bare-repo remote add origin --set=gpg-verify=false $(cat ${test_tmpdir}/httpd-address)
log_timestamps ostree --repo=bare-repo pull --disable-static-deltas origin ${host_nonremoteref}

echo "ok pull"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to update some number-of-tests count when you add a test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests aren't yet being parsed by something that understands TAP...someday later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants