-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reftable support git-core #539
Conversation
Welcome to GitGitGadgetHi @hanwen, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests. Please make sure that your Pull Request has a good description, as it will be used as cover letter. Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:
It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code. Contributing the patchesBefore you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form Both the person who commented An alternative is the channel
Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment If you want to see what email(s) would be sent for a After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail). If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the curl -g --user "<EMailAddress>:<Password>" \
--url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):
To send a new iteration, just add another PR comment with the contents: Need help?New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join. You may also be able to find help in real time in the developer IRC channel, |
hi there @dscho ! this probably will fail CI spectacularly, but it does enough to get some first feedback. Could you do the /allow , /submit dance? |
/allow |
User hanwen is now allowed to use GitGitGadget. |
You'll have to do the
It does. I will leave a couple suggestions to help you with that. |
/submit |
Submitted as pull.539.git.1579808479.gitgitgadget@gmail.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left a couple of suggestions how you can fix the build. Fixing this will go a long way of attracting reviewers; I, for one, am unlikely to even look at patches that break the build or the test suite.
reftable/file.c
Outdated
struct file_block_source *b = (struct file_block_source *)v; | ||
assert(off + size <= b->size); | ||
dest->data = malloc(size); | ||
if (pread(b->fd, dest->data, size, off) != size) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't compile on Windows, but there is xpread()
which you probably want to use, or pread_in_full()
.
Based on your usage of the pread()
function, I strongly suspect that you may want to use the pread_in_full()
function instead.
Or do you want to keep the code in reftable/
as independent from the rest of Git's source code as possible? If that is the case, you will have to reimplement the pread_in_full()
/xpread()
functionality.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is
#define pread git_pread
in your compat header. Does that mean I can just keep using pread?
in general, is there a way to compile this for windows without actually rigging up a Windows installation?
A little premature, I would think. It does not even compile yet... |
The idea is to keep this separate. Thanks for the pointers on portability; I'll have a look. at this point, I am more interested in resolving the questions in places marked with XXX in the last commit of the chain, though. |
On the Git mailing list, Junio C Hamano wrote (reply to this):
|
On the Git mailing list, Stephan Beyer wrote (reply to this):
|
On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):
|
On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):
|
On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):
|
/submit |
Submitted as pull.539.v2.git.1580134944.gitgitgadget@gmail.com |
@@ -226,8 +226,6 @@ static int create_default_files(const char *template_path, | |||
* We need to create a "refs" dir in any case so that older |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Han-Wen Nienhuys <hanwen@google.com>
> Subject: Re: [PATCH v2 2/5] create .git/refs in files-backend.c
>
> This prepares for supporting the reftable format, which creates a file
> in that place.
The idea is sound, I think. We want to let each backend to be
responsible for creating and maintaining what is at .git/refs on the
filesystem.
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> Change-Id: I2fc47c89f5ec605734007ceff90321c02474aa92
Do we need to keep this, which is pretty much private name for the
patch that is not valid for most of the people on the list?
> ---
> builtin/init-db.c | 2 --
> refs/files-backend.c | 4 ++++
> 2 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/init-db.c b/builtin/init-db.c
> index 944ec77fe1..45bdea0589 100644
> --- a/builtin/init-db.c
> +++ b/builtin/init-db.c
> @@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
> * We need to create a "refs" dir in any case so that older
> * versions of git can tell that this is a repository.
> */
> - safe_create_dir(git_path("refs"), 1);
> - adjust_shared_perm(git_path("refs"));
>
> if (refs_init_db(&err))
> die("failed to set up refs db: %s", err.buf);
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 0ea66a28b6..f49b6f2ab6 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -3158,6 +3158,10 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
> files_downcast(ref_store, REF_STORE_WRITE, "init_db");
> struct strbuf sb = STRBUF_INIT;
>
> + files_ref_path(refs, &sb, "refs");
> + safe_create_dir(sb.buf, 1);
> + // XXX adjust_shared_perm ?
I am not sure what's there to wonder about with the question mark.
If this step is meant to be a preparation before we actually allow
a different backend to be used, shouldn't the updated and prepared
code behave identically in externally visible ways?
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):
On Mon, Jan 27, 2020 at 11:28 PM Junio C Hamano <gitster@pobox.com> wrote:
> > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
>
>
> > Change-Id: I2fc47c89f5ec605734007ceff90321c02474aa92
>
> Do we need to keep this, which is pretty much private name for the
> patch that is not valid for most of the people on the list?
No, I can make do with like gitgitgadget.
> > - safe_create_dir(git_path("refs"), 1);
> > - adjust_shared_perm(git_path("refs"));
> >
> > if (refs_init_db(&err))
> > die("failed to set up refs db: %s", err.buf);
> > diff --git a/refs/files-backend.c b/refs/files-backend.c
> > index 0ea66a28b6..f49b6f2ab6 100644
> > --- a/refs/files-backend.c
> > +++ b/refs/files-backend.c
> > @@ -3158,6 +3158,10 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
> > files_downcast(ref_store, REF_STORE_WRITE, "init_db");
> > struct strbuf sb = STRBUF_INIT;
> >
> > + files_ref_path(refs, &sb, "refs");
> > + safe_create_dir(sb.buf, 1);
> > + // XXX adjust_shared_perm ?
>
> I am not sure what's there to wonder about with the question mark.
I forgot why I put the XXX, but note that safe_create_dirs runs
adjust_shared_perms implicitly.
--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Han-Wen Nienhuys <hanwen@google.com> writes:
>> > + files_ref_path(refs, &sb, "refs");
>> > + safe_create_dir(sb.buf, 1);
>> > + // XXX adjust_shared_perm ?
>
> I forgot why I put the XXX, but note that safe_create_dirs runs
> adjust_shared_perms implicitly.
That piece of information would make an excellent part of the log
message, which is to describe why "questionable" changes are made.
"git init" is designed to be runnable in an existing repository, but
safe_create_dir() would not run adjust_shared_perm() when mkdir()
says EEXIST, and I think that is why the original makes a call to
adjust_shared_perm() after safe_create_dir() returns.
Thanks.
refs/refs-internal.h
Outdated
@@ -269,6 +269,9 @@ int refs_rename_ref_available(struct ref_store *refs, | |||
* to the next entry, ref_iterator_advance() aborts the iteration, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
> From: Han-Wen Nienhuys <hanwen@google.com>
> Subject: Re: [PATCH v2 3/5] Document how ref iterators and symrefs interact
"Subject: refs: document how ...", perhaps?
Also isn't it more like iterators and symrefs do not interact in any
unexpected way, is it? iterators while enumerating refs when they
see a symref, they do not dereference and give the underlying ref
the symref is pointing at---the underlying ref will be listed when
it comes its turn to be shown as an ordinary ref. I am not sure
what is there to single out and document...
> Change-Id: Ie3ee63c52254c000ef712986246ca28f312b4301
> Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> ---
> refs/refs-internal.h | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index ff2436c0fb..fc18b12340 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -269,6 +269,9 @@ int refs_rename_ref_available(struct ref_store *refs,
> * to the next entry, ref_iterator_advance() aborts the iteration,
> * frees the ref_iterator, and returns ITER_ERROR.
> *
> + * Ref iterators cannot return symref targets, so symbolic refs must be
> + * dereferenced during the iteration.
What this says is not techincally incorrect. Iterators do not give
each_ref_fn what underlying ref a symref is pointing at. But it
also is misleading. If your callback wants to know what object each
ref is pointing at do not need to do anything extra when it sees a
symref, as name of the object pointed at by the underlying ref is
given to it. Only callbacks that wants to know the other ref, not
the value of the other ref, needs to dereference when called with a
symref.
But I wonder if we need to even say this. Isn't it obvious from the
each_ref_fn API that nothing other than the refname, object id, and
what kind of ref it is, will be given to the user of the API, so it
would be natural for a caller that wants to do extra things it needs
to do for symrefs must act when it learns a ref is a symref? After
all, that is why the flags word is one of the parameters given to an
each_ref_fn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):
On Mon, Jan 27, 2020 at 11:53 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > From: Han-Wen Nienhuys <hanwen@google.com>
> > Subject: Re: [PATCH v2 3/5] Document how ref iterators and symrefs interact
>
> "Subject: refs: document how ...", perhaps?
>
> Also isn't it more like iterators and symrefs do not interact in any
> unexpected way, is it? iterators while enumerating refs when they
> see a symref, they do not dereference and give the underlying ref
> the symref is pointing at---the underlying ref will be listed when
> it comes its turn to be shown as an ordinary ref. I am not sure
> what is there to single out and document...
I mean, you can't return a zero OID and set REF_ISSYMLINK. Instead,
the ref backend has to follow the symlink recursively until it finds a
ref pointing to a OID.
> > Change-Id: Ie3ee63c52254c000ef712986246ca28f312b4301
> > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
> > ---
> > refs/refs-internal.h | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> > index ff2436c0fb..fc18b12340 100644
> > --- a/refs/refs-internal.h
> > +++ b/refs/refs-internal.h
> > @@ -269,6 +269,9 @@ int refs_rename_ref_available(struct ref_store *refs,
> > * to the next entry, ref_iterator_advance() aborts the iteration,
> > * frees the ref_iterator, and returns ITER_ERROR.
> > *
> > + * Ref iterators cannot return symref targets, so symbolic refs must be
> > + * dereferenced during the iteration.
>
> What this says is not techincally incorrect. Iterators do not give
> each_ref_fn what underlying ref a symref is pointing at. But it
> also is misleading. If your callback wants to know what object each
> ref is pointing at do not need to do anything extra when it sees a
> symref, as name of the object pointed at by the underlying ref is
> given to it. Only callbacks that wants to know the other ref, not
> the value of the other ref, needs to dereference when called with a
> symref.
>
> But I wonder if we need to even say this. Isn't it obvious from the
> each_ref_fn API that nothing other than the refname, object id, and
> what kind of ref it is, will be given to the user of the API, so it
> would be natural for a caller that wants to do extra things it needs
> to do for symrefs must act when it learns a ref is a symref? After
> all, that is why the flags word is one of the parameters given to an
> each_ref_fn.
Maybe it is obvious to you, but it was not obvious to me, coming from
JGit working on the RefDatabase. One could imagine that the caller
needs to do something special to handle a symref, and returning a zero
object ID is OK, and this is a natural assumption if you implement a
ref backend, as dereferencing the symref to find out the final OID may
do more work than the caller needs.
--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Han-Wen Nienhuys <hanwen@google.com> writes:
>> But I wonder if we need to even say this. Isn't it obvious from the
>> each_ref_fn API that nothing other than the refname, object id, and
>> what kind of ref it is, will be given to the user of the API, so it
>> would be natural for a caller that wants to do extra things it needs
>> to do for symrefs must act when it learns a ref is a symref? After
>> all, that is why the flags word is one of the parameters given to an
>> each_ref_fn.
>
> Maybe it is obvious to you, but it was not obvious to me, coming from
> JGit working on the RefDatabase.
If you found each_ref_fn API docs lacking, that probably is where
the information should go (i.e. regardless of type of refs, the
object name is filled correctly). We need to make it obvious to the
developers who needs to use and work on the code with the API
defined and described in refs.h
Thanks.
@@ -813,6 +813,7 @@ TEST_SHELL_PATH = $(SHELL_PATH) | |||
LIB_FILE = libgit.a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Mon, Jan 27, 2020 at 02:22:24PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
> [...]
I'll try to add my 2 cents to all of the XXX spots you flagged.
> @@ -1839,10 +1839,22 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map,
> static struct ref_store *ref_store_init(const char *gitdir,
> unsigned int flags)
> {
> - const char *be_name = "files";
> - struct ref_storage_be *be = find_ref_storage_backend(be_name);
> + struct strbuf refs_path = STRBUF_INIT;
> +
> + /* XXX this should probably come from a git config setting and not
> + default to reftable. */
> + const char *be_name = "reftable";
I think the scheme here needs to be something like:
- "struct repository" gets a new "ref_format" field
- setup.c:check_repo_format() learns about an extensions.refFormat
config key, which we use to set repo->ref_format
- init/clone should take a command-line option for the ref format of
the new repository. Anybody choosing reftables would want to set
core.repositoryformatversion to "1" and set the extensions.refFormat
key.
- likewise, there should be a config setting to choose the default ref
format. It would obviously stay at "files" for now, but we'd
eventually perhaps flip the default to "reftable".
Some thoughts on compatibility:
The idea of the config changes is that older versions of Git will either
complain about repositoryformatversion (if very old), or will complain
that they don't know about extensions.refFormat. But the changes you
made in patch 1 indicate that existing versions of Git won't consider it
to be a Git repository at all!
I think this is slightly non-ideal, in that we'd continue walking up the
tree looking for a Git repo. And either:
- we'd find one, which would be confusing and possibly wrong
- we wouldn't, in which case the error would be something like "no git
repository", and not "your git repository is too new"
So it would be really nice if we continued to have a HEAD file (just
with some sentinel value in it) and a refs/ directory, so that existing
versions of Git realize "there's a repository here, but it's too new for
me".
There's a slight downside in that tools which _aren't_ careful about
repositoryformatversion might try to operate on the repository, writing
into refs/ or whatever. But such tools are broken, and I'm not sure we
should be catering to them (besides which, the packed-refs ship sailed
long ago, so they're already potentially dangerous).
> +/* XXX which ordering are these? Newest or oldest first? */
> int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
> int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
The top one is chronological order (i.e., reading the file from start to
finish), and the latter is reverse-chronological (seeking backwards in
the file).
> +static struct ref_iterator *
> +reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix,
> + unsigned int flags)
> +{
> + struct reftable_ref_store *refs =
> + (struct reftable_ref_store *)ref_store;
> + struct reftable_iterator *ri = xcalloc(1, sizeof(*ri));
> + struct merged_table *mt = NULL;
> + int err = 0;
> + if (refs->err) {
> + /* how to propagate errors? */
> + return NULL;
> + }
> +
> + mt = stack_merged_table(refs->stack);
> +
> + /* XXX something with flags? */
I think the flags here are DO_FOR_EACH_*. You might need to consider
INCLUDE_BROKEN here (or it might be dealt with at a layer above).
There's also PER_WORKTREE_ONLY; I think you'd need to check ref_type()
there, like the files backend does.
> + err = merged_table_seek_ref(mt, &ri->iter, prefix);
> + /* XXX what to do with err? */
Hmm, I don't think you can return an error from iterator_begin(). It
would probably be OK to record the error in your local struct, and then
later return ITER_ERROR from iterator_advance().
I notice that the more-recent dir_iterator_begin() returns NULL in case
of an error, but looking at the callers of the ref iterators, they don't
seem to be ready to handle NULL.
> +static int write_transaction_table(struct writer *writer, void *arg)
> +{
> + struct ref_transaction *transaction = (struct ref_transaction *)arg;
> + struct reftable_ref_store *refs =
> + (struct reftable_ref_store *)transaction->ref_store;
> + uint64_t ts = stack_next_update_index(refs->stack);
> + int err = 0;
> + /* XXX - are we allowed to mutate the input data? */
> + qsort(transaction->updates, transaction->nr,
> + sizeof(struct ref_update *), ref_update_cmp);
I don't offhand know of anything that would break, but it's probably a
bad idea to do so. If you need a sorted view, can you make an auxiliary
array-of-pointers? Also, the QSORT() macro is a little shorter and has
some extra checks.
> + for (int i = 0; i < transaction->nr; i++) {
> + struct ref_update *u = transaction->updates[i];
> + if (u->flags & REF_HAVE_NEW) {
> + struct object_id out_oid = {};
> + int out_flags = 0;
> + /* XXX who owns the memory here? */
> + const char *resolved = refs_resolve_ref_unsafe(
> + transaction->ref_store, u->refname, 0, &out_oid,
> + &out_flags);
In the "unsafe" version, the memory belongs to a static buffer inside
the function. You shouldn't need to free it.
> +static int
> +reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
> [...]
> + /* XXX const? */
> + memcpy(&ri->oid.hash, ri->log.new_hash, GIT_SHA1_RAWSZ);
You'd want probably want to use hashcpy() here (or better yet, use
"struct object_id" in ri->log, too, and then use oidcpy()).
But that raises a question: how ready are reftables to handle non-sha1
object ids? I see a lot of GIT_SHA1_RAWSZ, and I think the on-disk
format actually has binary sha1s, right? In theory if those all become
the_hash_algo->rawsz, then it might "Just Work" to read and write
slightly larger entries.
> +reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store,
> + const char *refname,
> + each_reflog_ent_fn fn, void *cb_data)
> [...]
> + /* XXX committer = email? name? */
> + if (fn(&old_oid, &new_oid, log.name, log.time,
> + log.tz_offset, log.message, cb_data)) {
> + err = -1;
> + break;
> + }
The committer entry we pass back to each_reflog_ent_fn should be the
full "Human Name <email@host>".
> +static int reftable_reflog_expire(struct ref_store *ref_store,
> + const char *refname,
> + const struct object_id *oid,
> + unsigned int flags,
> + reflog_expiry_prepare_fn prepare_fn,
> + reflog_expiry_should_prune_fn should_prune_fn,
> + reflog_expiry_cleanup_fn cleanup_fn,
> + void *policy_cb_data)
> +{
> + /*
> + XXX
> +
> + This doesn't fit with the reftable API. If we are expiring for space
> + reasons, the expiry should be combined with a compaction, and there
> + should be a policy that can be called for all refnames, not for a
> + single ref name.
> +
> + If this is for cleaning up individual entries, we'll have to write
> + extra data to create tombstones.
> + */
> + return 0;
> +}
I agree that we'd generally want to expire and compact all entries at
once. Unfortunately I think this per-ref interface is exposed by
git-reflog. I.e., you can do "git reflog expire refs/heads/foo".
Could we add an extra API call for "expire everything", and then:
- have refs/files-backend.c implement that by just iterating over all
of the refs and pruning them one by one
- have builtin/reflog.c trigger the "expire everything" API when it
sees "--all"
- teach refs/reftables-backend.c a crappy version of the per-ref
expiration, where it just inserts tombstones but doesn't do any
compaction. It doesn't have to be fast or produce a good output;
it's just for compatibility.
Alternatively, I'd be OK with die("sorry, reftables doesn't support
per-ref reflog expiration") as a first step.
-Peff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Martin Fick wrote (reply to this):
On Tuesday, January 28, 2020 2:31:00 AM MST Jeff King wrote:
> Some thoughts on compatibility:
>
> The idea of the config changes is that older versions of Git will either
> complain about repositoryformatversion (if very old), or will complain
> that they don't know about extensions.refFormat. But the changes you
> made in patch 1 indicate that existing versions of Git won't consider it
> to be a Git repository at all!
>
> I think this is slightly non-ideal, in that we'd continue walking up the
> tree looking for a Git repo. And either:
>
> - we'd find one, which would be confusing and possibly wrong
>
> - we wouldn't, in which case the error would be something like "no git
> repository", and not "your git repository is too new"
>
> So it would be really nice if we continued to have a HEAD file (just
> with some sentinel value in it) and a refs/ directory, so that existing
> versions of Git realize "there's a repository here, but it's too new for
> me".
>
> There's a slight downside in that tools which _aren't_ careful about
> repositoryformatversion might try to operate on the repository, writing
> into refs/ or whatever. But such tools are broken, and I'm not sure we
> should be catering to them (besides which, the packed-refs ship sailed
> long ago, so they're already potentially dangerous).
Could you elaborate on this a bit because it seems on the surface that these
tools aren't very dangerous, and therefore likely many of them exist?
What are the dangers today of tools that understand/operate on loose and
packed refs without reading repositoryformatversion?
-Martin
--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):
On Tue, Jan 28, 2020 at 8:31 AM Jeff King <peff@peff.net> wrote:
>
> On Mon, Jan 27, 2020 at 02:22:24PM +0000, Han-Wen Nienhuys via GitGitGadget wrote:
>
> > [...]
>
> I'll try to add my 2 cents to all of the XXX spots you flagged.
>
thanks! that was really helpful.
>
> Some thoughts on compatibility:
>
> The idea of the config changes is that older versions of Git will either
> complain about repositoryformatversion (if very old), or will complain
> that they don't know about extensions.refFormat. But the changes you
> made in patch 1 indicate that existing versions of Git won't consider it
> to be a Git repository at all!
>
> I think this is slightly non-ideal, in that we'd continue walking up the
> tree looking for a Git repo. And either:
>
> - we'd find one, which would be confusing and possibly wrong
>
> - we wouldn't, in which case the error would be something like "no git
> repository", and not "your git repository is too new"
>
> So it would be really nice if we continued to have a HEAD file (just
> with some sentinel value in it) and a refs/ directory, so that existing
> versions of Git realize "there's a repository here, but it's too new for
> me".
You have good points.
JGit currently implements what we have here, as this is what's spelled
out in the spec that Shawn posted back in the day. It's probably
acceptable to this, though, as the reftable support has only landed in
JGit very recently and will probably appear very experimental to
folks.
How would the layout be then? We'll have
HEAD - dummy file
reftable/ - the tables
refs/ - dummy dir
where shall we store the reftable list? maybe in a file called
reftable-list
If we have both HEAD/refs + (refable/reftable-list), what should we
put there to ensure that no git version actually manages to use the
repository? (what happens if someone deletes the version setting from
the .git/config file)
> > +/* XXX which ordering are these? Newest or oldest first? */
> > int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data);
> > int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data);
>
> The top one is chronological order (i.e., reading the file from start to
> finish), and the latter is reverse-chronological (seeking backwards in
> the file).
thanks, I put in a commit to clarify this.
> > + err = merged_table_seek_ref(mt, &ri->iter, prefix);
> > + /* XXX what to do with err? */
>
> Hmm, I don't think you can return an error from iterator_begin(). It
> would probably be OK to record the error in your local struct, and then
> later return ITER_ERROR from iterator_advance().
good point. I did that.
> > + /* XXX - are we allowed to mutate the input data? */
> > + qsort(transaction->updates, transaction->nr,
> > + sizeof(struct ref_update *), ref_update_cmp);
>
> I don't offhand know of anything that would break, but it's probably a
> bad idea to do so. If you need a sorted view, can you make an auxiliary
> array-of-pointers? Also, the QSORT() macro is a little shorter and has
> some extra checks.
Done.
> In the "unsafe" version, the memory belongs to a static buffer inside
> the function. You shouldn't need to free it.
OK.
> > +static int
> > +reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator)
> > [...]
> > + /* XXX const? */
> > + memcpy(&ri->oid.hash, ri->log.new_hash, GIT_SHA1_RAWSZ);
>
> You'd want probably want to use hashcpy() here (or better yet, use
> "struct object_id" in ri->log, too, and then use oidcpy()).
It's more practical in the library to use pointers rather than fixed
size arrays, so it'll be hashcpy.
> But that raises a question: how ready are reftables to handle non-sha1
> object ids? I see a lot of GIT_SHA1_RAWSZ, and I think the on-disk
> format actually has binary sha1s, right? In theory if those all become
> the_hash_algo->rawsz, then it might "Just Work" to read and write
> slightly larger entries.
The format fixes the reftable at 20 bytes, and there is not enough
framing information to just write more data. We'll have to encode the
hash size in the version number somehow, eg. we could use the higher
order bit of the version byte to encode it, for example.
But it needs a new version of the spec. I think it's premature to do
this while v1 of reftable isn't in git-core yet.
> The committer entry we pass back to each_reflog_ent_fn should be the
> full "Human Name <email@host>".
thx, done.
> I agree that we'd generally want to expire and compact all entries at
> once. Unfortunately I think this per-ref interface is exposed by
> git-reflog. I.e., you can do "git reflog expire refs/heads/foo".
>
> Could we add an extra API call for "expire everything", and then:
I took note of your remarks, and added a BUG() for now.
I'll focus on getting the CI on gitgitgadget to build this code first.
--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Tue, Jan 28, 2020 at 08:36:53AM -0700, Martin Fick wrote:
> > There's a slight downside in that tools which _aren't_ careful about
> > repositoryformatversion might try to operate on the repository, writing
> > into refs/ or whatever. But such tools are broken, and I'm not sure we
> > should be catering to them (besides which, the packed-refs ship sailed
> > long ago, so they're already potentially dangerous).
>
> Could you elaborate on this a bit because it seems on the surface that these
> tools aren't very dangerous, and therefore likely many of them exist?
>
> What are the dangers today of tools that understand/operate on loose and
> packed refs without reading repositoryformatversion?
I was mostly thinking of hacky scripts that tried to touch .git/refs
directly. And there are a few levels of dangerous there:
- if you're doing "echo $sha1 >.git/refs/heads/master", then you're
not locking properly. But it would probably work most of the time.
- if you're properly taking a lock on ".git/refs/heads/master.lock"
and renaming into place but not looking at packed-refs, then you
might overwrite somebody else's update which is in the packed file
- if you're trying to read refs and not reading packed-refs, obviously
you might miss some values
If you're actually doing the correct locking and packed-refs read (which
"real" implementations like libgit2 do) then no, I don't think that's
dangerous. And I think libgit2 properly complains when it sees a
repositoryformatversion above 0. I don't know offhand about JGit, or any
of the lesser-used ones like dulwich or go-git.
-Peff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Tue, Jan 28, 2020 at 04:56:26PM +0100, Han-Wen Nienhuys wrote:
> JGit currently implements what we have here, as this is what's spelled
> out in the spec that Shawn posted back in the day. It's probably
> acceptable to this, though, as the reftable support has only landed in
> JGit very recently and will probably appear very experimental to
> folks.
>
> How would the layout be then? We'll have
>
> HEAD - dummy file
> reftable/ - the tables
> refs/ - dummy dir
>
> where shall we store the reftable list? maybe in a file called
>
> reftable-list
>
> If we have both HEAD/refs + (refable/reftable-list), what should we
> put there to ensure that no git version actually manages to use the
> repository? (what happens if someone deletes the version setting from
> the .git/config file)
Yeah, it would be nice to have something that an older version of Git
would totally choke on, but I'm not sure we have a lot of leeway. What
we put in HEAD has to be syntactically legitimate enough to appease
validate_headref(), so our options are either "ref:
refs/something/bogus" or an object hash that we don't have (e.g.,
0{40}). The former would be preferable because it would (in theory)
prevent us from writing to HEAD, as well.
I wondered what would happen if you put in a syntactically invalid ref,
like "ref: refs/.not/.valid" (leading dots are not allowed in path
components of refnames). It does cause _some_ parts of Git to choke, but
sadly "git update-ref HEAD $sha1" actually writes to .git/refs/.not/.valid.
Even "refs/../../dangerous" doesn't give it pause. Yikes. It seems we're
pretty willing to accept symref destinations without further checking.
Making "refs" a file instead of a directory does work nicely, as any
attempts to read or write would get ENOTDIR. And we can fool
is_git_directory() as long as it's marked executable. That's OK on POSIX
systems, but I'm not sure how it would work on Windows (or maybe it
would work just fine, since we presumably just say "yep, everything is
executable").
So perhaps that's enough, and what we put in HEAD won't matter (since
nobody will be able to write into refs/ anyway).
> > But that raises a question: how ready are reftables to handle non-sha1
> > object ids? I see a lot of GIT_SHA1_RAWSZ, and I think the on-disk
> > format actually has binary sha1s, right? In theory if those all become
> > the_hash_algo->rawsz, then it might "Just Work" to read and write
> > slightly larger entries.
>
> The format fixes the reftable at 20 bytes, and there is not enough
> framing information to just write more data. We'll have to encode the
> hash size in the version number somehow, eg. we could use the higher
> order bit of the version byte to encode it, for example.
>
> But it needs a new version of the spec. I think it's premature to do
> this while v1 of reftable isn't in git-core yet.
I don't know that we technically need the reftables file to say how long
the hashes are. The git config will tell us which hash we're using, and
everything else is supposed to follow. So I think it would work OK as
long as you're able to be told by the rest of Git that hashes are N
bytes, and just use that to compute the fixed-size records.
That said, it might make for easier debugging if the reftables file
declares the size it assumes.
-Peff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Tue, Feb 04, 2020 at 08:06:57PM +0100, Han-Wen Nienhuys wrote:
> On Wed, Jan 29, 2020 at 11:47 AM Jeff King <peff@peff.net> wrote:
>
> > That said, it might make for easier debugging if the reftables file
> > declares the size it assumes.
>
> If there is a mismatch in size, the reftable will look completely
> corrupted data. I think this will be a bad experience.
Yes, but I think it would take a bunch of other failures to get there.
And it's true of all of the other parts of Git, too; the master switch
is a config setting that says "I'm a sha-256 repository".
That said, I'm not at all opposed to a header in the reftables file
saying "I'm a sha-256 reftable". We shouldn't ever see a mismatch there,
but that's what I meant by "easier debugging".
-Peff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Tue, Feb 04, 2020 at 07:54:12PM +0100, Han-Wen Nienhuys wrote:
> > PS I don't know if it's set in stone yet, but if we're changing things
> > around, is it an option to put reftable-list inside the reftable
> > directory, just to keep the top-level uncluttered?
>
> I put up https://git.eclipse.org/r/c/157167/ to update the spec inside
> JGit. Please review.
That looks quite reasonable. The one change I'd make is to put
"refs/heads/.invalid" into HEAD (instead of "refs/.invalid"). In my
experiments, existing Git is happy to write into an invalid refname via
"git update-ref HEAD". But by putting it past the non-directory "heads",
such a write would always fail.
It's also supposed to be a requirement that HEAD only puts to
refs/heads/, though our validate_headref() does not enforce that. I
don't know if any other implementations might, though.
I did compare earlier your "make refs/heads a file" versus "make refs/ a
file and just give it an executable bit". And I think the "refs/heads"
solution does behave slightly better in a few cases. But if I understand
correctly, just giving "refs/" an executable bit would mean we retain
compatibility with the existing JGit implementation.
It does end up playing games with permissions, which I warned against,
but I think it might be tolerable:
1. Unlike read vs write, there's not generally a reason users would
need to separate read and execute.
2. Portability-wise, POSIX permissions give us what we need, and
non-POSIX systems tend to just say everything is executable.
So I could also live with that direction, if switching the JGit behavior
at this point is too much of a pain.
-Peff
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):
On Tue, Jan 28, 2020 at 8:31 AM Jeff King <peff@peff.net> wrote:
>
> > {
> > - const char *be_name = "files";
> > - struct ref_storage_be *be = find_ref_storage_backend(be_name);
> > + struct strbuf refs_path = STRBUF_INIT;
> > +
> > + /* XXX this should probably come from a git config setting and not
> > + default to reftable. */
> > + const char *be_name = "reftable";
>
> I think the scheme here needs to be something like:
>
> - "struct repository" gets a new "ref_format" field
>
> - setup.c:check_repo_format() learns about an extensions.refFormat
> config key, which we use to set repo->ref_format
>
> - init/clone should take a command-line option for the ref format of
> the new repository. Anybody choosing reftables would want to set
> core.repositoryformatversion to "1" and set the extensions.refFormat
> key.
I did this, but are you sure this works? Where would the
repo->ref_storage_format get set? I tried adding it to repo_init(),
but this doesn't get called in a normal startup sequence.
Breakpoint 3, ref_store_init (gitdir=0x555555884b70 ".git",
be_name=0x5555557942ca "reftable", flags=15) at refs.c:1841
warning: Source file is more recent than executable.
1841 {
(gdb) up
#1 0x00005555556de2c8 in get_main_ref_store (r=0x555555871d40
<the_repo>) at refs.c:1862
(gdb) p r->ref_storage_format
$1 = 0x0
}
I'm sending the revised patch series now.
--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):
On Tue, Feb 4, 2020 at 9:06 PM Jeff King <peff@peff.net> wrote:
>
> On Tue, Feb 04, 2020 at 07:54:12PM +0100, Han-Wen Nienhuys wrote:
>
> > > PS I don't know if it's set in stone yet, but if we're changing things
> > > around, is it an option to put reftable-list inside the reftable
> > > directory, just to keep the top-level uncluttered?
> >
> > I put up https://git.eclipse.org/r/c/157167/ to update the spec inside
> > JGit. Please review.
>
> That looks quite reasonable. The one change I'd make is to put
> "refs/heads/.invalid" into HEAD (instead of "refs/.invalid"). In my
> experiments, existing Git is happy to write into an invalid refname via
> "git update-ref HEAD". But by putting it past the non-directory "heads",
> such a write would always fail.
thanks, incorporated.
> It's also supposed to be a requirement that HEAD only puts to
> refs/heads/, though our validate_headref() does not enforce that. I
> don't know if any other implementations might, though.
>
> I did compare earlier your "make refs/heads a file" versus "make refs/ a
> file and just give it an executable bit". And I think the "refs/heads"
> solution does behave slightly better in a few cases. But if I understand
> correctly, just giving "refs/" an executable bit would mean we retain
> compatibility with the existing JGit implementation.
I hadn't looked at the JGit discovery in detail, but it looks like the
currently proposed approach works with JGit. (I haven't adapted JGit
for the new path setup yet.)
--
Han-Wen Nienhuys - Google Munich
I work 80%. Don't expect answers from me on Fridays.
--
Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jeff King wrote (reply to this):
On Tue, Feb 04, 2020 at 09:22:36PM +0100, Han-Wen Nienhuys wrote:
> > - init/clone should take a command-line option for the ref format of
> > the new repository. Anybody choosing reftables would want to set
> > core.repositoryformatversion to "1" and set the extensions.refFormat
> > key.
>
> I did this, but are you sure this works? Where would the
> repo->ref_storage_format get set? I tried adding it to repo_init(),
> but this doesn't get called in a normal startup sequence.
>
> Breakpoint 3, ref_store_init (gitdir=0x555555884b70 ".git",
> be_name=0x5555557942ca "reftable", flags=15) at refs.c:1841
> warning: Source file is more recent than executable.
> 1841 {
> (gdb) up
> #1 0x00005555556de2c8 in get_main_ref_store (r=0x555555871d40
> <the_repo>) at refs.c:1862
> (gdb) p r->ref_storage_format
> $1 = 0x0
> }
Hmm. Looks like repo_init() is only used for submodules. Which is pretty
horrid, because that setup sequence is basically duplicated between
there and setup_git_directory(), which is the traditional mechanism.
That's something that ought to be cleaned up, but it's definitely
outside the scope of your series.
The patch below was enough to make something like this work:
git init --reftable repo
cd repo
git commit --allow-empty -m foo
git for-each-ref
I had to make some tweaks to the init code, too:
- setting the_repository->ref_storage_format based on the --reftable
flag
- we can still recognize a reinit by the presence of a HEAD file
(whether it's a real one or a dummy). And we should call
create_symref() in either case, either to create HEAD or to create
the HEAD symlink inside reftables
I also fixed a bug in your first patch, where the creation of refs/
moves into files_init_db(). The strbuf needs reset there, or we just
append to it, and end up with a doubled path (you could see it by
running the same commands above without --reftable, but of course not
with your patch series as-is because it will _always_ choose reftable).
diff --git a/builtin/init-db.c b/builtin/init-db.c
index dcea74610c..ea2a333c4a 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -213,6 +213,8 @@ static int create_default_files(const char *template_path,
is_bare_repository_cfg = init_is_bare_repository;
if (init_shared_repository != -1)
set_shared_repository(init_shared_repository);
+ if (flags & INIT_DB_REFTABLE)
+ the_repository->ref_storage_format = xstrdup("reftable");
/*
* We would have created the above under user's umask -- under
@@ -222,6 +224,15 @@ static int create_default_files(const char *template_path,
adjust_shared_perm(get_git_dir());
}
+ /*
+ * Check to see if .git/HEAD exists; this must happen before
+ * initializing the ref db, because we want to see if there is an
+ * existing HEAD.
+ */
+ path = git_path_buf(&buf, "HEAD");
+ reinit = (!access(path, R_OK) ||
+ readlink(path, junk, sizeof(junk) - 1) != -1);
+
/*
* We need to create a "refs" dir in any case so that older
* versions of git can tell that this is a repository.
@@ -234,18 +245,14 @@ static int create_default_files(const char *template_path,
* Create the default symlink from ".git/HEAD" to the "master"
* branch, if it does not exist yet.
*/
- if (flags & INIT_DB_REFTABLE) {
- reinit = 0; /* XXX - how do we recognize a reinit,
- * and what should we do? */
- } else {
- path = git_path_buf(&buf, "HEAD");
- reinit = (!access(path, R_OK) ||
- readlink(path, junk, sizeof(junk) - 1) != -1);
- }
-
if (!reinit) {
if (create_symref("HEAD", "refs/heads/master", NULL) < 0)
exit(1);
+ } else {
+ /*
+ * XXX should check whether our ref backend matches the
+ * original one; if not, either die() or convert
+ */
}
/* This forces creation of new config file */
diff --git a/refs.c b/refs.c
index 493d3fb673..d6ddbbcc67 100644
--- a/refs.c
+++ b/refs.c
@@ -1859,10 +1859,9 @@ struct ref_store *get_main_ref_store(struct repository *r)
BUG("attempting to get main_ref_store outside of repository");
r->refs = ref_store_init(r->gitdir,
- /* XXX r->ref_storage_format == NULL. Where
- * should the config file be parsed out? */
- r->ref_storage_format ? r->ref_storage_format :
- "reftable",
+ r->ref_storage_format ?
+ r->ref_storage_format :
+ "files",
REF_STORE_ALL_CAPS);
return r->refs;
}
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0c53b246e8..6f9efde9ca 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -3166,6 +3166,7 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err)
/*
* Create .git/refs/{heads,tags}
*/
+ strbuf_reset(&sb);
files_ref_path(refs, &sb, "refs/heads");
safe_create_dir(sb.buf, 1);
diff --git a/repository.c b/repository.c
index aff47302c9..ff0988dac8 100644
--- a/repository.c
+++ b/repository.c
@@ -174,9 +174,7 @@ int repo_init(struct repository *repo,
if (worktree)
repo_set_worktree(repo, worktree);
- repo->ref_storage_format = format.ref_storage != NULL ?
- xstrdup(format.ref_storage) :
- "files"; /* XXX */
+ repo->ref_storage_format = xstrdup_or_null(format.ref_storage);
clear_repository_format(&format);
return 0;
diff --git a/setup.c b/setup.c
index a339186d83..58c5cd3bf0 100644
--- a/setup.c
+++ b/setup.c
@@ -450,7 +450,7 @@ static int check_repo_format(const char *var, const char *value, void *vdata)
data->partial_clone = xstrdup(value);
} else if (!strcmp(ext, "worktreeconfig")) {
data->worktree_config = git_config_bool(var, value);
- } else if (!strcmp(ext, "refStorage")) {
+ } else if (!strcmp(ext, "refstorage")) {
data->ref_storage = xstrdup(value);
} else
string_list_append(&data->unknown_extensions, ext);
@@ -1184,8 +1184,11 @@ const char *setup_git_directory_gently(int *nongit_ok)
gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
setup_git_env(gitdir);
}
- if (startup_info->have_repository)
+ if (startup_info->have_repository) {
repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
+ the_repository->ref_storage_format =
+ xstrdup_or_null(repo_fmt.ref_storage);
+ }
}
strbuf_release(&dir);
there is
#define pread git_pread
in your compat header. Does that mean I can just keep using pread?
I guess so ;-)
in general, is there a way to compile this for windows without actually rigging up a Windows installation?
The best I can offer is to push updates to this PR and let the PR build go its way. You will have to look at the results, of course ;-)
|
7ab3c4d
to
fd06ade
Compare
The reftable/ directory is structured as a library, so it cannot crash on misuse. Instead, it returns an error codes. In addition, the error code can be used to signal conditions from lower levels of the library to be handled by higher levels of the library. For example, a transaction might legitimately write an empty reftable file, but in that case, we'd want to shortcut the transaction overhead. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
This commit provides basic utility classes for the reftable library. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The reftable format is usually used with files for storage. However, we abstract away this using the blocksource data structure. This has two advantages: * log blocks are zlib compressed, and handling them is simplified if we can discard byte segments from within the block layer. * for unittests, it is useful to read and write in-memory. The blocksource allows us to abstract the data away from on-disk files. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
The reftable format is structured as a sequence of blocks, and each block contains a sequence of prefix-compressed key-value records. There are 4 types of records, and they have similarities in how they must be handled. This is achieved by introducing a polymorphic 'record' type that encapsulates ref, log, index and object records. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
The reftable format is structured as a sequence of block. Within a block, records are prefix compressed, with an index of offsets for fully expand keys to enable binary search within blocks. This commit provides the logic to read and write these blocks. Includes a code snippet copied from zlib Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
The reftable format includes support for an (OID => ref) map. This map can speed up visibility and reachability checks. In particular, various operations along the fetch/push path within Gerrit have ben sped up by using this structure. The map is constructed with help of a binary tree. Object IDs are hashes, so they are uniformly distributed. Hence, the tree does not attempt forced rebalancing. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
This supports reading a single reftable file. The commit introduces an abstract iterator type, which captures the usecases both of reading individual refs, and iterating over a segment of the ref namespace. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
With support for reading and writing files in place, we can construct files (in memory) and attempt to read them back. Because some sections of the format are optional (eg. indices, log entries), we have to exercise this code using multiple sizes of input data Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
For background, see the previous commit introducing the library. This introduces the file refs/reftable-backend.c containing a reftable-powered ref storage backend. It can be activated by passing --ref-storage=reftable to "git init", or setting GIT_TEST_REFTABLE in the environment. Example use: see t/t0031-reftable.sh Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Junio Hamano <gitster@pobox.com> Co-authored-by: Jeff King <peff@peff.net>
In our git-prompt script we strive to use Bash builtins wherever possible, because fork()-ing subshells for command substitutions and fork()+exec()-ing Git commands are expensive on some platforms. We even read and parse '.git/HEAD' using Bash builtins to get the name of the current branch [1]. However, the upcoming reftable refs backend won't use '.git/HEAD' at all, but will write an invalid refname as placeholder for backwards compatibility instead, which will break our git-prompt script. Update the git-prompt script to recognize the placeholder '.git/HEAD' written by the reftable backend (its content is specified in the reftable specs), and then fall back to use 'git symbolic-ref' to get the name of the current branch. [1] 3a43c4b (bash prompt: use bash builtins to find out current branch, 2011-03-31) Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
This command dumps individual tables or a stack of of tables. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
There was a status update about the branch The "reftable" backend for the refs API. |
There was a status update about the branch The "reftable" backend for the refs API. |
There was a status update about the branch The "reftable" backend for the refs API. |
There was a status update about the branch The "reftable" backend for the refs API. |
There was a status update about the branch The "reftable" backend for the refs API. |
There was a status update about the branch The "reftable" backend for the refs API. |
There was a status update about the branch The "reftable" backend for the refs API. What's the status of this topic? |
There was a status update in the "Cooking" section about the branch The "reftable" backend for the refs API. Waiting for reviews. |
There was a status update in the "Cooking" section about the branch The "reftable" backend for the refs API. Waiting for reviews. |
There was a status update in the "Cooking" section about the branch The "reftable" backend for the refs API. Reroll exists; needs picking up. cf. <pull.847.v7.git.git.1618832276.gitgitgadget@gmail.com> |
This patch series was integrated into seen via git@597c232. |
This patch series was integrated into seen via git@eb39b33. |
this is now being worked on further in git#847 |
This adds the reftable library, and hooks it up as a ref backend.
This is still based on next, as there are conflicts basing on master.
Includes testing support, to test: make -C t/ GIT_TEST_REFTABLE=1
Summary
455 tests fail
v24