Reftable support git-core #539

hanwen · 2020-01-23T19:19:24Z

This adds the reftable library, and hooks it up as a ref backend.

This is still based on next, as there are conflicts basing on master.

Includes testing support, to test: make -C t/ GIT_TEST_REFTABLE=1

Summary
455 tests fail

v24

rebase
FETCH_HEAD special casing

gitgitgadget · 2020-01-23T19:20:13Z

Welcome to GitGitGadget

Hi @hanwen, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that your Pull Request has a good description, as it will be used as cover letter.

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

the lines should not exceed 76 columns,
the first line should be like a header and typically start with a prefix like "tests:" or "commit:", and
the commit messages' body should be describing the "why?" of the change.
Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Search: is:pr is:open "/allow"

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the FreeNode IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Freenode. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

hanwen · 2020-01-23T19:25:29Z

hi there @dscho !

this probably will fail CI spectacularly, but it does enough to get some first feedback. Could you do the /allow , /submit dance?

dscho · 2020-01-23T19:34:24Z

/allow

gitgitgadget · 2020-01-23T19:35:15Z

User hanwen is now allowed to use GitGitGadget.

dscho · 2020-01-23T19:36:14Z

Could you do the /allow , /submit dance?

You'll have to do the /submit yourself, GitGitGadget only allows the PR owner to submit (as only the owner can make subsequent changes as required).

this probably will fail CI spectacularly

It does. I will leave a couple suggestions to help you with that.

hanwen · 2020-01-23T19:40:35Z

/submit

gitgitgadget · 2020-01-23T19:41:33Z

Submitted as pull.539.git.1579808479.gitgitgadget@gmail.com

dscho

I left a couple of suggestions how you can fix the build. Fixing this will go a long way of attracting reviewers; I, for one, am unlikely to even look at patches that break the build or the test suite.

reftable/file.c

dscho · 2020-01-23T19:44:23Z

reftable/file.c

+  struct file_block_source *b = (struct file_block_source *)v;
+  assert(off + size <= b->size);
+  dest->data = malloc(size);
+  if (pread(b->fd, dest->data, size, off) != size) {


This won't compile on Windows, but there is xpread() which you probably want to use, or pread_in_full().

Based on your usage of the pread() function, I strongly suspect that you may want to use the pread_in_full() function instead.

Or do you want to keep the code in reftable/ as independent from the rest of Git's source code as possible? If that is the case, you will have to reimplement the pread_in_full()/xpread() functionality.

there is

#define pread git_pread

in your compat header. Does that mean I can just keep using pread?

in general, is there a way to compile this for windows without actually rigging up a Windows installation?

reftable/record.c

reftable/stack.c

reftable/block.c

dscho · 2020-01-23T20:03:16Z

/submit

A little premature, I would think. It does not even compile yet...

hanwen · 2020-01-23T20:33:18Z

The idea is to keep this separate. Thanks for the pointers on portability; I'll have a look.

at this point, I am more interested in resolving the questions in places marked with XXX in the last commit of the chain, though.

gitgitgadget · 2020-01-23T21:45:37Z

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This adds the reftable library, and hooks it up as a ref backend.

Just a quick impression before getting into details of individual
steps.

 * With this series, the reftable backend seems to take over as the
   default and only backend.  We would need to design and decide how
   repositories would specify which backend it uses (I personally do
   not think we need to allow more than one backend to be active at
   the same time) before we take this series out of RFC status.

 * What's reftable/VERSION file?  Does it really have what you
   intended to add?

 * Mixed indentation and many whitespace breakages make the code
   distracting to review.

 * Comparison with 0 is written as "if (!strcmp(a, b))" in this
   codebase, and never "if (0 != strcmp(a, b))".

 * We unfortunately do not use var defn "for (int i = 0; ..." yet.

 * We do not use // comments.

Thanks.


> Han-Wen Nienhuys (5):
>   setup.c: enable repo detection for reftable
>   create .git/refs in files-backend.c
>   Document how ref iterators and symrefs interact
>   Add reftable library
>   Reftable support for git-core
>
>  Makefile                  |   23 +-
>  builtin/init-db.c         |    2 -
>  refs.c                    |   18 +-
>  refs.h                    |    2 +
>  refs/files-backend.c      |    4 +
>  refs/refs-internal.h      |    4 +
>  refs/reftable-backend.c   |  780 ++++++++++++++++++++++++++++
>  reftable/README.md        |   12 +
>  reftable/VERSION          |  293 +++++++++++
>  reftable/basics.c         |  180 +++++++
>  reftable/basics.h         |   38 ++
>  reftable/block.c          |  382 ++++++++++++++
>  reftable/block.h          |   71 +++
>  reftable/block_test.c     |  145 ++++++
>  reftable/blocksource.h    |   20 +
>  reftable/bytes.c          |    0
>  reftable/constants.h      |   27 +
>  reftable/dump.c           |  102 ++++
>  reftable/file.c           |   98 ++++
>  reftable/iter.c           |  211 ++++++++
>  reftable/iter.h           |   56 ++
>  reftable/merged.c         |  262 ++++++++++
>  reftable/merged.h         |   34 ++
>  reftable/merged_test.c    |  247 +++++++++
>  reftable/pq.c             |  116 +++++
>  reftable/pq.h             |   34 ++
>  reftable/reader.c         |  672 ++++++++++++++++++++++++
>  reftable/reader.h         |   52 ++
>  reftable/record.c         | 1030 +++++++++++++++++++++++++++++++++++++
>  reftable/record.h         |   79 +++
>  reftable/record_test.c    |  313 +++++++++++
>  reftable/reftable.h       |  396 ++++++++++++++
>  reftable/reftable_test.c  |  434 ++++++++++++++++
>  reftable/slice.c          |  179 +++++++
>  reftable/slice.h          |   39 ++
>  reftable/slice_test.c     |   38 ++
>  reftable/stack.c          |  931 +++++++++++++++++++++++++++++++++
>  reftable/stack.h          |   40 ++
>  reftable/stack_test.c     |  265 ++++++++++
>  reftable/test_framework.c |   60 +++
>  reftable/test_framework.h |   63 +++
>  reftable/tree.c           |   60 +++
>  reftable/tree.h           |   24 +
>  reftable/tree_test.c      |   54 ++
>  reftable/writer.c         |  586 +++++++++++++++++++++
>  reftable/writer.h         |   46 ++
>  setup.c                   |   20 +-
>  47 files changed, 8530 insertions(+), 12 deletions(-)
>  create mode 100644 refs/reftable-backend.c
>  create mode 100644 reftable/README.md
>  create mode 100644 reftable/VERSION
>  create mode 100644 reftable/basics.c
>  create mode 100644 reftable/basics.h
>  create mode 100644 reftable/block.c
>  create mode 100644 reftable/block.h
>  create mode 100644 reftable/block_test.c
>  create mode 100644 reftable/blocksource.h
>  create mode 100644 reftable/bytes.c
>  create mode 100644 reftable/constants.h
>  create mode 100644 reftable/dump.c
>  create mode 100644 reftable/file.c
>  create mode 100644 reftable/iter.c
>  create mode 100644 reftable/iter.h
>  create mode 100644 reftable/merged.c
>  create mode 100644 reftable/merged.h
>  create mode 100644 reftable/merged_test.c
>  create mode 100644 reftable/pq.c
>  create mode 100644 reftable/pq.h
>  create mode 100644 reftable/reader.c
>  create mode 100644 reftable/reader.h
>  create mode 100644 reftable/record.c
>  create mode 100644 reftable/record.h
>  create mode 100644 reftable/record_test.c
>  create mode 100644 reftable/reftable.h
>  create mode 100644 reftable/reftable_test.c
>  create mode 100644 reftable/slice.c
>  create mode 100644 reftable/slice.h
>  create mode 100644 reftable/slice_test.c
>  create mode 100644 reftable/stack.c
>  create mode 100644 reftable/stack.h
>  create mode 100644 reftable/stack_test.c
>  create mode 100644 reftable/test_framework.c
>  create mode 100644 reftable/test_framework.h
>  create mode 100644 reftable/tree.c
>  create mode 100644 reftable/tree.h
>  create mode 100644 reftable/tree_test.c
>  create mode 100644 reftable/writer.c
>  create mode 100644 reftable/writer.h
>
>
> base-commit: 232378479ee6c66206d47a9be175e3a39682aea6
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-539%2Fhanwen%2Freftable-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-539/hanwen/reftable-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/539

gitgitgadget · 2020-01-23T22:48:41Z

On the Git mailing list, Stephan Beyer wrote (reply to this):

Hi Han-Wen,

On 1/23/20 8:41 PM, Han-Wen Nienhuys via GitGitGadget wrote:
> This adds the reftable library, and hooks it up as a ref backend.
>
> Han-Wen Nienhuys (5):
>   setup.c: enable repo detection for reftable
>   create .git/refs in files-backend.c
>   Document how ref iterators and symrefs interact
>   Add reftable library
>   Reftable support for git-core

I am most of the time just a curious reader on this list but as someone
who has no idea what the reftable library does (except that it can serve
as a "ref backend"), I would expect much more elaborate commit messages.
In particular, patch 4/5 (i.e. the commit message of the "add reftable
library" commit) should describe the purpose of the reftable library at
least briefly; and patch 5/5 should not contain shell commands and
output without context but a short description why the patch is doing
what it is doing. For example, if the use of the reftable library makes
certain git commands a lot faster, it should state that.

Thanks.

Stephan

gitgitgadget · 2020-01-27T13:57:07Z

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Thu, Jan 23, 2020 at 10:44 PM Junio C Hamano <gitster@pobox.com> wrote:
> > This adds the reftable library, and hooks it up as a ref backend.
>
> Just a quick impression before getting into details of individual
> steps.
>
>  * With this series, the reftable backend seems to take over as the
>    default and only backend.  We would need to design and decide how
>    repositories would specify which backend it uses (I personally do
>    not think we need to allow more than one backend to be active at
>    the same time) before we take this series out of RFC status.

Yes, obviously. Would you have concrete ideas on how this should work?

>  * What's reftable/VERSION file?  Does it really have what you
>    intended to add?

Fixed, and explained in reftable/VERSION.

>  * Mixed indentation and many whitespace breakages make the code
>    distracting to review.

Do you mean in the changes to refs/reftable-backend.c? Or the imported code?

The imported code is uniformly formatted with clang-format.

Is there a clang-format setting for uniformly formatting to Linux/Git
style? I really love automated formatting, so we don't have to spend
time debating irrelevant details.


>
>  * Comparison with 0 is written as "if (!strcmp(a, b))" in this
>    codebase, and never "if (0 != strcmp(a, b))".
>
>  * We unfortunately do not use var defn "for (int i = 0; ..." yet.
>
>  * We do not use // comments.
>

fixed.

gitgitgadget · 2020-01-27T14:03:11Z

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

I added some more background in one of the commit messages. You can
find extensive discussion here:

  https://github.com/google/reftable#background-reading


On Thu, Jan 23, 2020 at 11:45 PM Stephan Beyer <s-beyer@gmx.net> wrote:
>
> Hi Han-Wen,
>
> On 1/23/20 8:41 PM, Han-Wen Nienhuys via GitGitGadget wrote:
> > This adds the reftable library, and hooks it up as a ref backend.
> >
> > Han-Wen Nienhuys (5):
> >   setup.c: enable repo detection for reftable
> >   create .git/refs in files-backend.c
> >   Document how ref iterators and symrefs interact
> >   Add reftable library
> >   Reftable support for git-core
>
> I am most of the time just a curious reader on this list but as someone
> who has no idea what the reftable library does (except that it can serve
> as a "ref backend"), I would expect much more elaborate commit messages.
> In particular, patch 4/5 (i.e. the commit message of the "add reftable
> library" commit) should describe the purpose of the reftable library at
> least briefly; and patch 5/5 should not contain shell commands and
> output without context but a short description why the patch is doing
> what it is doing. For example, if the use of the reftable library makes
> certain git commands a lot faster, it should state that.
>
> Thanks.
>
> Stephan

gitgitgadget · 2020-01-27T14:03:12Z

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Mon, Jan 27, 2020 at 2:52 PM Han-Wen Nienhuys <hanwen@xs4all.nl> wrote:
> >  * Mixed indentation and many whitespace breakages make the code
> >    distracting to review.
>
> Do you mean in the changes to refs/reftable-backend.c? Or the imported code?
>
> The imported code is uniformly formatted with clang-format.
>
> Is there a clang-format setting for uniformly formatting to Linux/Git
> style? I really love automated formatting, so we don't have to spend
> time debating irrelevant details.

Ah, there is a .clang-format entry in the git repository already. Thanks!

hanwen · 2020-01-27T14:21:35Z

/submit

gitgitgadget · 2020-01-27T14:22:40Z

Submitted as pull.539.v2.git.1580134944.gitgitgadget@gmail.com

gitgitgadget · 2020-01-27T22:33:32Z

builtin/init-db.c

@@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
 	 * We need to create a "refs" dir in any case so that older


On the Git mailing list, Junio C Hamano wrote (reply to this):

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Han-Wen Nienhuys <hanwen@google.com> > Subject: Re: [PATCH v2 2/5] create .git/refs in files-backend.c > > This prepares for supporting the reftable format, which creates a file > in that place. The idea is sound, I think. We want to let each backend to be responsible for creating and maintaining what is at .git/refs on the filesystem. > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> > Change-Id: I2fc47c89f5ec605734007ceff90321c02474aa92 Do we need to keep this, which is pretty much private name for the patch that is not valid for most of the people on the list? > --- > builtin/init-db.c | 2 -- > refs/files-backend.c | 4 ++++ > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/builtin/init-db.c b/builtin/init-db.c > index 944ec77fe1..45bdea0589 100644 > --- a/builtin/init-db.c > +++ b/builtin/init-db.c > @@ -226,8 +226,6 @@ static int create_default_files(const char *template_path, > * We need to create a "refs" dir in any case so that older > * versions of git can tell that this is a repository. > */ > - safe_create_dir(git_path("refs"), 1); > - adjust_shared_perm(git_path("refs")); > > if (refs_init_db(&err)) > die("failed to set up refs db: %s", err.buf); > diff --git a/refs/files-backend.c b/refs/files-backend.c > index 0ea66a28b6..f49b6f2ab6 100644 > --- a/refs/files-backend.c > +++ b/refs/files-backend.c > @@ -3158,6 +3158,10 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err) > files_downcast(ref_store, REF_STORE_WRITE, "init_db"); > struct strbuf sb = STRBUF_INIT; > > + files_ref_path(refs, &sb, "refs"); > + safe_create_dir(sb.buf, 1); > + // XXX adjust_shared_perm ? I am not sure what's there to wonder about with the question mark. If this step is meant to be a preparation before we actually allow a different backend to be used, shouldn't the updated and prepared code behave identically in externally visible ways? Thanks.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Mon, Jan 27, 2020 at 11:28 PM Junio C Hamano <gitster@pobox.com> wrote: > > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> > > > > Change-Id: I2fc47c89f5ec605734007ceff90321c02474aa92 > > Do we need to keep this, which is pretty much private name for the > patch that is not valid for most of the people on the list? No, I can make do with like gitgitgadget. > > - safe_create_dir(git_path("refs"), 1); > > - adjust_shared_perm(git_path("refs")); > > > > if (refs_init_db(&err)) > > die("failed to set up refs db: %s", err.buf); > > diff --git a/refs/files-backend.c b/refs/files-backend.c > > index 0ea66a28b6..f49b6f2ab6 100644 > > --- a/refs/files-backend.c > > +++ b/refs/files-backend.c > > @@ -3158,6 +3158,10 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err) > > files_downcast(ref_store, REF_STORE_WRITE, "init_db"); > > struct strbuf sb = STRBUF_INIT; > > > > + files_ref_path(refs, &sb, "refs"); > > + safe_create_dir(sb.buf, 1); > > + // XXX adjust_shared_perm ? > > I am not sure what's there to wonder about with the question mark. I forgot why I put the XXX, but note that safe_create_dirs runs adjust_shared_perms implicitly. -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg GeschÃ¤ftsfÃ¼hrer: Paul Manicle, Halimah DeLaine Prado

On the Git mailing list, Junio C Hamano wrote (reply to this):

Han-Wen Nienhuys <hanwen@google.com> writes: >> > + files_ref_path(refs, &sb, "refs"); >> > + safe_create_dir(sb.buf, 1); >> > + // XXX adjust_shared_perm ? > > I forgot why I put the XXX, but note that safe_create_dirs runs > adjust_shared_perms implicitly. That piece of information would make an excellent part of the log message, which is to describe why "questionable" changes are made. "git init" is designed to be runnable in an existing repository, but safe_create_dir() would not run adjust_shared_perm() when mkdir() says EEXIST, and I think that is why the original makes a call to adjust_shared_perm() after safe_create_dir() returns. Thanks.

gitgitgadget · 2020-01-27T22:54:42Z

refs/refs-internal.h

@@ -269,6 +269,9 @@ int refs_rename_ref_available(struct ref_store *refs,
 * to the next entry, ref_iterator_advance() aborts the iteration,


On the Git mailing list, Junio C Hamano wrote (reply to this):

"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Han-Wen Nienhuys <hanwen@google.com> > Subject: Re: [PATCH v2 3/5] Document how ref iterators and symrefs interact "Subject: refs: document how ...", perhaps? Also isn't it more like iterators and symrefs do not interact in any unexpected way, is it? iterators while enumerating refs when they see a symref, they do not dereference and give the underlying ref the symref is pointing at---the underlying ref will be listed when it comes its turn to be shown as an ordinary ref. I am not sure what is there to single out and document... > Change-Id: Ie3ee63c52254c000ef712986246ca28f312b4301 > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> > --- > refs/refs-internal.h | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/refs/refs-internal.h b/refs/refs-internal.h > index ff2436c0fb..fc18b12340 100644 > --- a/refs/refs-internal.h > +++ b/refs/refs-internal.h > @@ -269,6 +269,9 @@ int refs_rename_ref_available(struct ref_store *refs, > * to the next entry, ref_iterator_advance() aborts the iteration, > * frees the ref_iterator, and returns ITER_ERROR. > * > + * Ref iterators cannot return symref targets, so symbolic refs must be > + * dereferenced during the iteration. What this says is not techincally incorrect. Iterators do not give each_ref_fn what underlying ref a symref is pointing at. But it also is misleading. If your callback wants to know what object each ref is pointing at do not need to do anything extra when it sees a symref, as name of the object pointed at by the underlying ref is given to it. Only callbacks that wants to know the other ref, not the value of the other ref, needs to dereference when called with a symref. But I wonder if we need to even say this. Isn't it obvious from the each_ref_fn API that nothing other than the refname, object id, and what kind of ref it is, will be given to the user of the API, so it would be natural for a caller that wants to do extra things it needs to do for symrefs must act when it learns a ref is a symref? After all, that is why the flags word is one of the parameters given to an each_ref_fn.

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Mon, Jan 27, 2020 at 11:53 PM Junio C Hamano <gitster@pobox.com> wrote: > > "Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > From: Han-Wen Nienhuys <hanwen@google.com> > > Subject: Re: [PATCH v2 3/5] Document how ref iterators and symrefs interact > > "Subject: refs: document how ...", perhaps? > > Also isn't it more like iterators and symrefs do not interact in any > unexpected way, is it? iterators while enumerating refs when they > see a symref, they do not dereference and give the underlying ref > the symref is pointing at---the underlying ref will be listed when > it comes its turn to be shown as an ordinary ref. I am not sure > what is there to single out and document... I mean, you can't return a zero OID and set REF_ISSYMLINK. Instead, the ref backend has to follow the symlink recursively until it finds a ref pointing to a OID. > > Change-Id: Ie3ee63c52254c000ef712986246ca28f312b4301 > > Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> > > --- > > refs/refs-internal.h | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/refs/refs-internal.h b/refs/refs-internal.h > > index ff2436c0fb..fc18b12340 100644 > > --- a/refs/refs-internal.h > > +++ b/refs/refs-internal.h > > @@ -269,6 +269,9 @@ int refs_rename_ref_available(struct ref_store *refs, > > * to the next entry, ref_iterator_advance() aborts the iteration, > > * frees the ref_iterator, and returns ITER_ERROR. > > * > > + * Ref iterators cannot return symref targets, so symbolic refs must be > > + * dereferenced during the iteration. > > What this says is not techincally incorrect. Iterators do not give > each_ref_fn what underlying ref a symref is pointing at. But it > also is misleading. If your callback wants to know what object each > ref is pointing at do not need to do anything extra when it sees a > symref, as name of the object pointed at by the underlying ref is > given to it. Only callbacks that wants to know the other ref, not > the value of the other ref, needs to dereference when called with a > symref. > > But I wonder if we need to even say this. Isn't it obvious from the > each_ref_fn API that nothing other than the refname, object id, and > what kind of ref it is, will be given to the user of the API, so it > would be natural for a caller that wants to do extra things it needs > to do for symrefs must act when it learns a ref is a symref? After > all, that is why the flags word is one of the parameters given to an > each_ref_fn. Maybe it is obvious to you, but it was not obvious to me, coming from JGit working on the RefDatabase. One could imagine that the caller needs to do something special to handle a symref, and returning a zero object ID is OK, and this is a natural assumption if you implement a ref backend, as dereferencing the symref to find out the final OID may do more work than the caller needs. -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg GeschÃ¤ftsfÃ¼hrer: Paul Manicle, Halimah DeLaine Prado

On the Git mailing list, Junio C Hamano wrote (reply to this):

Han-Wen Nienhuys <hanwen@google.com> writes: >> But I wonder if we need to even say this. Isn't it obvious from the >> each_ref_fn API that nothing other than the refname, object id, and >> what kind of ref it is, will be given to the user of the API, so it >> would be natural for a caller that wants to do extra things it needs >> to do for symrefs must act when it learns a ref is a symref? After >> all, that is why the flags word is one of the parameters given to an >> each_ref_fn. > > Maybe it is obvious to you, but it was not obvious to me, coming from > JGit working on the RefDatabase. If you found each_ref_fn API docs lacking, that probably is where the information should go (i.e. regardless of type of refs, the object name is filled correctly). We need to make it obvious to the developers who needs to use and work on the code with the API defined and described in refs.h Thanks.

gitgitgadget · 2020-01-28T07:37:02Z

Makefile

@@ -813,6 +813,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
 LIB_FILE = libgit.a


On the Git mailing list, Jeff King wrote (reply to this):

On Mon, Jan 27, 2020 at 02:22:24PM +0000, Han-Wen Nienhuys via GitGitGadget wrote: > [...] I'll try to add my 2 cents to all of the XXX spots you flagged. > @@ -1839,10 +1839,22 @@ static struct ref_store *lookup_ref_store_map(struct hashmap *map, > static struct ref_store *ref_store_init(const char *gitdir, > unsigned int flags) > { > - const char *be_name = "files"; > - struct ref_storage_be *be = find_ref_storage_backend(be_name); > + struct strbuf refs_path = STRBUF_INIT; > + > + /* XXX this should probably come from a git config setting and not > + default to reftable. */ > + const char *be_name = "reftable"; I think the scheme here needs to be something like: - "struct repository" gets a new "ref_format" field - setup.c:check_repo_format() learns about an extensions.refFormat config key, which we use to set repo->ref_format - init/clone should take a command-line option for the ref format of the new repository. Anybody choosing reftables would want to set core.repositoryformatversion to "1" and set the extensions.refFormat key. - likewise, there should be a config setting to choose the default ref format. It would obviously stay at "files" for now, but we'd eventually perhaps flip the default to "reftable". Some thoughts on compatibility: The idea of the config changes is that older versions of Git will either complain about repositoryformatversion (if very old), or will complain that they don't know about extensions.refFormat. But the changes you made in patch 1 indicate that existing versions of Git won't consider it to be a Git repository at all! I think this is slightly non-ideal, in that we'd continue walking up the tree looking for a Git repo. And either: - we'd find one, which would be confusing and possibly wrong - we wouldn't, in which case the error would be something like "no git repository", and not "your git repository is too new" So it would be really nice if we continued to have a HEAD file (just with some sentinel value in it) and a refs/ directory, so that existing versions of Git realize "there's a repository here, but it's too new for me". There's a slight downside in that tools which _aren't_ careful about repositoryformatversion might try to operate on the repository, writing into refs/ or whatever. But such tools are broken, and I'm not sure we should be catering to them (besides which, the packed-refs ship sailed long ago, so they're already potentially dangerous). > +/* XXX which ordering are these? Newest or oldest first? */ > int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data); > int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data); The top one is chronological order (i.e., reading the file from start to finish), and the latter is reverse-chronological (seeking backwards in the file). > +static struct ref_iterator * > +reftable_ref_iterator_begin(struct ref_store *ref_store, const char *prefix, > + unsigned int flags) > +{ > + struct reftable_ref_store *refs = > + (struct reftable_ref_store *)ref_store; > + struct reftable_iterator *ri = xcalloc(1, sizeof(*ri)); > + struct merged_table *mt = NULL; > + int err = 0; > + if (refs->err) { > + /* how to propagate errors? */ > + return NULL; > + } > + > + mt = stack_merged_table(refs->stack); > + > + /* XXX something with flags? */ I think the flags here are DO_FOR_EACH_*. You might need to consider INCLUDE_BROKEN here (or it might be dealt with at a layer above). There's also PER_WORKTREE_ONLY; I think you'd need to check ref_type() there, like the files backend does. > + err = merged_table_seek_ref(mt, &ri->iter, prefix); > + /* XXX what to do with err? */ Hmm, I don't think you can return an error from iterator_begin(). It would probably be OK to record the error in your local struct, and then later return ITER_ERROR from iterator_advance(). I notice that the more-recent dir_iterator_begin() returns NULL in case of an error, but looking at the callers of the ref iterators, they don't seem to be ready to handle NULL. > +static int write_transaction_table(struct writer *writer, void *arg) > +{ > + struct ref_transaction *transaction = (struct ref_transaction *)arg; > + struct reftable_ref_store *refs = > + (struct reftable_ref_store *)transaction->ref_store; > + uint64_t ts = stack_next_update_index(refs->stack); > + int err = 0; > + /* XXX - are we allowed to mutate the input data? */ > + qsort(transaction->updates, transaction->nr, > + sizeof(struct ref_update *), ref_update_cmp); I don't offhand know of anything that would break, but it's probably a bad idea to do so. If you need a sorted view, can you make an auxiliary array-of-pointers? Also, the QSORT() macro is a little shorter and has some extra checks. > + for (int i = 0; i < transaction->nr; i++) { > + struct ref_update *u = transaction->updates[i]; > + if (u->flags & REF_HAVE_NEW) { > + struct object_id out_oid = {}; > + int out_flags = 0; > + /* XXX who owns the memory here? */ > + const char *resolved = refs_resolve_ref_unsafe( > + transaction->ref_store, u->refname, 0, &out_oid, > + &out_flags); In the "unsafe" version, the memory belongs to a static buffer inside the function. You shouldn't need to free it. > +static int > +reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator) > [...] > + /* XXX const? */ > + memcpy(&ri->oid.hash, ri->log.new_hash, GIT_SHA1_RAWSZ); You'd want probably want to use hashcpy() here (or better yet, use "struct object_id" in ri->log, too, and then use oidcpy()). But that raises a question: how ready are reftables to handle non-sha1 object ids? I see a lot of GIT_SHA1_RAWSZ, and I think the on-disk format actually has binary sha1s, right? In theory if those all become the_hash_algo->rawsz, then it might "Just Work" to read and write slightly larger entries. > +reftable_for_each_reflog_ent_newest_first(struct ref_store *ref_store, > + const char *refname, > + each_reflog_ent_fn fn, void *cb_data) > [...] > + /* XXX committer = email? name? */ > + if (fn(&old_oid, &new_oid, log.name, log.time, > + log.tz_offset, log.message, cb_data)) { > + err = -1; > + break; > + } The committer entry we pass back to each_reflog_ent_fn should be the full "Human Name <email@host>". > +static int reftable_reflog_expire(struct ref_store *ref_store, > + const char *refname, > + const struct object_id *oid, > + unsigned int flags, > + reflog_expiry_prepare_fn prepare_fn, > + reflog_expiry_should_prune_fn should_prune_fn, > + reflog_expiry_cleanup_fn cleanup_fn, > + void *policy_cb_data) > +{ > + /* > + XXX > + > + This doesn't fit with the reftable API. If we are expiring for space > + reasons, the expiry should be combined with a compaction, and there > + should be a policy that can be called for all refnames, not for a > + single ref name. > + > + If this is for cleaning up individual entries, we'll have to write > + extra data to create tombstones. > + */ > + return 0; > +} I agree that we'd generally want to expire and compact all entries at once. Unfortunately I think this per-ref interface is exposed by git-reflog. I.e., you can do "git reflog expire refs/heads/foo". Could we add an extra API call for "expire everything", and then: - have refs/files-backend.c implement that by just iterating over all of the refs and pruning them one by one - have builtin/reflog.c trigger the "expire everything" API when it sees "--all" - teach refs/reftables-backend.c a crappy version of the per-ref expiration, where it just inserts tombstones but doesn't do any compaction. It doesn't have to be fast or produce a good output; it's just for compatibility. Alternatively, I'd be OK with die("sorry, reftables doesn't support per-ref reflog expiration") as a first step. -Peff

On the Git mailing list, Martin Fick wrote (reply to this):

On Tuesday, January 28, 2020 2:31:00 AM MST Jeff King wrote: > Some thoughts on compatibility: > > The idea of the config changes is that older versions of Git will either > complain about repositoryformatversion (if very old), or will complain > that they don't know about extensions.refFormat. But the changes you > made in patch 1 indicate that existing versions of Git won't consider it > to be a Git repository at all! > > I think this is slightly non-ideal, in that we'd continue walking up the > tree looking for a Git repo. And either: > > - we'd find one, which would be confusing and possibly wrong > > - we wouldn't, in which case the error would be something like "no git > repository", and not "your git repository is too new" > > So it would be really nice if we continued to have a HEAD file (just > with some sentinel value in it) and a refs/ directory, so that existing > versions of Git realize "there's a repository here, but it's too new for > me". > > There's a slight downside in that tools which _aren't_ careful about > repositoryformatversion might try to operate on the repository, writing > into refs/ or whatever. But such tools are broken, and I'm not sure we > should be catering to them (besides which, the packed-refs ship sailed > long ago, so they're already potentially dangerous). Could you elaborate on this a bit because it seems on the surface that these tools aren't very dangerous, and therefore likely many of them exist? What are the dangers today of tools that understand/operate on loose and packed refs without reading repositoryformatversion? -Martin -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Tue, Jan 28, 2020 at 8:31 AM Jeff King <peff@peff.net> wrote: > > On Mon, Jan 27, 2020 at 02:22:24PM +0000, Han-Wen Nienhuys via GitGitGadget wrote: > > > [...] > > I'll try to add my 2 cents to all of the XXX spots you flagged. > thanks! that was really helpful. > > Some thoughts on compatibility: > > The idea of the config changes is that older versions of Git will either > complain about repositoryformatversion (if very old), or will complain > that they don't know about extensions.refFormat. But the changes you > made in patch 1 indicate that existing versions of Git won't consider it > to be a Git repository at all! > > I think this is slightly non-ideal, in that we'd continue walking up the > tree looking for a Git repo. And either: > > - we'd find one, which would be confusing and possibly wrong > > - we wouldn't, in which case the error would be something like "no git > repository", and not "your git repository is too new" > > So it would be really nice if we continued to have a HEAD file (just > with some sentinel value in it) and a refs/ directory, so that existing > versions of Git realize "there's a repository here, but it's too new for > me". You have good points. JGit currently implements what we have here, as this is what's spelled out in the spec that Shawn posted back in the day. It's probably acceptable to this, though, as the reftable support has only landed in JGit very recently and will probably appear very experimental to folks. How would the layout be then? We'll have HEAD - dummy file reftable/ - the tables refs/ - dummy dir where shall we store the reftable list? maybe in a file called reftable-list If we have both HEAD/refs + (refable/reftable-list), what should we put there to ensure that no git version actually manages to use the repository? (what happens if someone deletes the version setting from the .git/config file) > > +/* XXX which ordering are these? Newest or oldest first? */ > > int for_each_reflog_ent(const char *refname, each_reflog_ent_fn fn, void *cb_data); > > int for_each_reflog_ent_reverse(const char *refname, each_reflog_ent_fn fn, void *cb_data); > > The top one is chronological order (i.e., reading the file from start to > finish), and the latter is reverse-chronological (seeking backwards in > the file). thanks, I put in a commit to clarify this. > > + err = merged_table_seek_ref(mt, &ri->iter, prefix); > > + /* XXX what to do with err? */ > > Hmm, I don't think you can return an error from iterator_begin(). It > would probably be OK to record the error in your local struct, and then > later return ITER_ERROR from iterator_advance(). good point. I did that. > > + /* XXX - are we allowed to mutate the input data? */ > > + qsort(transaction->updates, transaction->nr, > > + sizeof(struct ref_update *), ref_update_cmp); > > I don't offhand know of anything that would break, but it's probably a > bad idea to do so. If you need a sorted view, can you make an auxiliary > array-of-pointers? Also, the QSORT() macro is a little shorter and has > some extra checks. Done. > In the "unsafe" version, the memory belongs to a static buffer inside > the function. You shouldn't need to free it. OK. > > +static int > > +reftable_reflog_ref_iterator_advance(struct ref_iterator *ref_iterator) > > [...] > > + /* XXX const? */ > > + memcpy(&ri->oid.hash, ri->log.new_hash, GIT_SHA1_RAWSZ); > > You'd want probably want to use hashcpy() here (or better yet, use > "struct object_id" in ri->log, too, and then use oidcpy()). It's more practical in the library to use pointers rather than fixed size arrays, so it'll be hashcpy. > But that raises a question: how ready are reftables to handle non-sha1 > object ids? I see a lot of GIT_SHA1_RAWSZ, and I think the on-disk > format actually has binary sha1s, right? In theory if those all become > the_hash_algo->rawsz, then it might "Just Work" to read and write > slightly larger entries. The format fixes the reftable at 20 bytes, and there is not enough framing information to just write more data. We'll have to encode the hash size in the version number somehow, eg. we could use the higher order bit of the version byte to encode it, for example. But it needs a new version of the spec. I think it's premature to do this while v1 of reftable isn't in git-core yet. > The committer entry we pass back to each_reflog_ent_fn should be the > full "Human Name <email@host>". thx, done. > I agree that we'd generally want to expire and compact all entries at > once. Unfortunately I think this per-ref interface is exposed by > git-reflog. I.e., you can do "git reflog expire refs/heads/foo". > > Could we add an extra API call for "expire everything", and then: I took note of your remarks, and added a BUG() for now. I'll focus on getting the CI on gitgitgadget to build this code first. -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg GeschÃ¤ftsfÃ¼hrer: Paul Manicle, Halimah DeLaine Prado

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Jan 28, 2020 at 08:36:53AM -0700, Martin Fick wrote: > > There's a slight downside in that tools which _aren't_ careful about > > repositoryformatversion might try to operate on the repository, writing > > into refs/ or whatever. But such tools are broken, and I'm not sure we > > should be catering to them (besides which, the packed-refs ship sailed > > long ago, so they're already potentially dangerous). > > Could you elaborate on this a bit because it seems on the surface that these > tools aren't very dangerous, and therefore likely many of them exist? > > What are the dangers today of tools that understand/operate on loose and > packed refs without reading repositoryformatversion? I was mostly thinking of hacky scripts that tried to touch .git/refs directly. And there are a few levels of dangerous there: - if you're doing "echo $sha1 >.git/refs/heads/master", then you're not locking properly. But it would probably work most of the time. - if you're properly taking a lock on ".git/refs/heads/master.lock" and renaming into place but not looking at packed-refs, then you might overwrite somebody else's update which is in the packed file - if you're trying to read refs and not reading packed-refs, obviously you might miss some values If you're actually doing the correct locking and packed-refs read (which "real" implementations like libgit2 do) then no, I don't think that's dangerous. And I think libgit2 properly complains when it sees a repositoryformatversion above 0. I don't know offhand about JGit, or any of the lesser-used ones like dulwich or go-git. -Peff

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Jan 28, 2020 at 04:56:26PM +0100, Han-Wen Nienhuys wrote: > JGit currently implements what we have here, as this is what's spelled > out in the spec that Shawn posted back in the day. It's probably > acceptable to this, though, as the reftable support has only landed in > JGit very recently and will probably appear very experimental to > folks. > > How would the layout be then? We'll have > > HEAD - dummy file > reftable/ - the tables > refs/ - dummy dir > > where shall we store the reftable list? maybe in a file called > > reftable-list > > If we have both HEAD/refs + (refable/reftable-list), what should we > put there to ensure that no git version actually manages to use the > repository? (what happens if someone deletes the version setting from > the .git/config file) Yeah, it would be nice to have something that an older version of Git would totally choke on, but I'm not sure we have a lot of leeway. What we put in HEAD has to be syntactically legitimate enough to appease validate_headref(), so our options are either "ref: refs/something/bogus" or an object hash that we don't have (e.g., 0{40}). The former would be preferable because it would (in theory) prevent us from writing to HEAD, as well. I wondered what would happen if you put in a syntactically invalid ref, like "ref: refs/.not/.valid" (leading dots are not allowed in path components of refnames). It does cause _some_ parts of Git to choke, but sadly "git update-ref HEAD $sha1" actually writes to .git/refs/.not/.valid. Even "refs/../../dangerous" doesn't give it pause. Yikes. It seems we're pretty willing to accept symref destinations without further checking. Making "refs" a file instead of a directory does work nicely, as any attempts to read or write would get ENOTDIR. And we can fool is_git_directory() as long as it's marked executable. That's OK on POSIX systems, but I'm not sure how it would work on Windows (or maybe it would work just fine, since we presumably just say "yep, everything is executable"). So perhaps that's enough, and what we put in HEAD won't matter (since nobody will be able to write into refs/ anyway). > > But that raises a question: how ready are reftables to handle non-sha1 > > object ids? I see a lot of GIT_SHA1_RAWSZ, and I think the on-disk > > format actually has binary sha1s, right? In theory if those all become > > the_hash_algo->rawsz, then it might "Just Work" to read and write > > slightly larger entries. > > The format fixes the reftable at 20 bytes, and there is not enough > framing information to just write more data. We'll have to encode the > hash size in the version number somehow, eg. we could use the higher > order bit of the version byte to encode it, for example. > > But it needs a new version of the spec. I think it's premature to do > this while v1 of reftable isn't in git-core yet. I don't know that we technically need the reftables file to say how long the hashes are. The git config will tell us which hash we're using, and everything else is supposed to follow. So I think it would work OK as long as you're able to be told by the rest of Git that hashes are N bytes, and just use that to compute the fixed-size records. That said, it might make for easier debugging if the reftables file declares the size it assumes. -Peff

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Feb 04, 2020 at 08:06:57PM +0100, Han-Wen Nienhuys wrote: > On Wed, Jan 29, 2020 at 11:47 AM Jeff King <peff@peff.net> wrote: > > > That said, it might make for easier debugging if the reftables file > > declares the size it assumes. > > If there is a mismatch in size, the reftable will look completely > corrupted data. I think this will be a bad experience. Yes, but I think it would take a bunch of other failures to get there. And it's true of all of the other parts of Git, too; the master switch is a config setting that says "I'm a sha-256 repository". That said, I'm not at all opposed to a header in the reftables file saying "I'm a sha-256 reftable". We shouldn't ever see a mismatch there, but that's what I meant by "easier debugging". -Peff

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Feb 04, 2020 at 07:54:12PM +0100, Han-Wen Nienhuys wrote: > > PS I don't know if it's set in stone yet, but if we're changing things > > around, is it an option to put reftable-list inside the reftable > > directory, just to keep the top-level uncluttered? > > I put up https://git.eclipse.org/r/c/157167/ to update the spec inside > JGit. Please review. That looks quite reasonable. The one change I'd make is to put "refs/heads/.invalid" into HEAD (instead of "refs/.invalid"). In my experiments, existing Git is happy to write into an invalid refname via "git update-ref HEAD". But by putting it past the non-directory "heads", such a write would always fail. It's also supposed to be a requirement that HEAD only puts to refs/heads/, though our validate_headref() does not enforce that. I don't know if any other implementations might, though. I did compare earlier your "make refs/heads a file" versus "make refs/ a file and just give it an executable bit". And I think the "refs/heads" solution does behave slightly better in a few cases. But if I understand correctly, just giving "refs/" an executable bit would mean we retain compatibility with the existing JGit implementation. It does end up playing games with permissions, which I warned against, but I think it might be tolerable: 1. Unlike read vs write, there's not generally a reason users would need to separate read and execute. 2. Portability-wise, POSIX permissions give us what we need, and non-POSIX systems tend to just say everything is executable. So I could also live with that direction, if switching the JGit behavior at this point is too much of a pain. -Peff

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Tue, Jan 28, 2020 at 8:31 AM Jeff King <peff@peff.net> wrote: > > > { > > - const char *be_name = "files"; > > - struct ref_storage_be *be = find_ref_storage_backend(be_name); > > + struct strbuf refs_path = STRBUF_INIT; > > + > > + /* XXX this should probably come from a git config setting and not > > + default to reftable. */ > > + const char *be_name = "reftable"; > > I think the scheme here needs to be something like: > > - "struct repository" gets a new "ref_format" field > > - setup.c:check_repo_format() learns about an extensions.refFormat > config key, which we use to set repo->ref_format > > - init/clone should take a command-line option for the ref format of > the new repository. Anybody choosing reftables would want to set > core.repositoryformatversion to "1" and set the extensions.refFormat > key. I did this, but are you sure this works? Where would the repo->ref_storage_format get set? I tried adding it to repo_init(), but this doesn't get called in a normal startup sequence. Breakpoint 3, ref_store_init (gitdir=0x555555884b70 ".git", be_name=0x5555557942ca "reftable", flags=15) at refs.c:1841 warning: Source file is more recent than executable. 1841 { (gdb) up #1 0x00005555556de2c8 in get_main_ref_store (r=0x555555871d40 <the_repo>) at refs.c:1862 (gdb) p r->ref_storage_format $1 = 0x0 } I'm sending the revised patch series now. -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg GeschÃ¤ftsfÃ¼hrer: Paul Manicle, Halimah DeLaine Prado

On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):

On Tue, Feb 4, 2020 at 9:06 PM Jeff King <peff@peff.net> wrote: > > On Tue, Feb 04, 2020 at 07:54:12PM +0100, Han-Wen Nienhuys wrote: > > > > PS I don't know if it's set in stone yet, but if we're changing things > > > around, is it an option to put reftable-list inside the reftable > > > directory, just to keep the top-level uncluttered? > > > > I put up https://git.eclipse.org/r/c/157167/ to update the spec inside > > JGit. Please review. > > That looks quite reasonable. The one change I'd make is to put > "refs/heads/.invalid" into HEAD (instead of "refs/.invalid"). In my > experiments, existing Git is happy to write into an invalid refname via > "git update-ref HEAD". But by putting it past the non-directory "heads", > such a write would always fail. thanks, incorporated. > It's also supposed to be a requirement that HEAD only puts to > refs/heads/, though our validate_headref() does not enforce that. I > don't know if any other implementations might, though. > > I did compare earlier your "make refs/heads a file" versus "make refs/ a > file and just give it an executable bit". And I think the "refs/heads" > solution does behave slightly better in a few cases. But if I understand > correctly, just giving "refs/" an executable bit would mean we retain > compatibility with the existing JGit implementation. I hadn't looked at the JGit discovery in detail, but it looks like the currently proposed approach works with JGit. (I haven't adapted JGit for the new path setup yet.) -- Han-Wen Nienhuys - Google Munich I work 80%. Don't expect answers from me on Fridays. -- Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg GeschÃ¤ftsfÃ¼hrer: Paul Manicle, Halimah DeLaine Prado

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, Feb 04, 2020 at 09:22:36PM +0100, Han-Wen Nienhuys wrote: > > - init/clone should take a command-line option for the ref format of > > the new repository. Anybody choosing reftables would want to set > > core.repositoryformatversion to "1" and set the extensions.refFormat > > key. > > I did this, but are you sure this works? Where would the > repo->ref_storage_format get set? I tried adding it to repo_init(), > but this doesn't get called in a normal startup sequence. > > Breakpoint 3, ref_store_init (gitdir=0x555555884b70 ".git", > be_name=0x5555557942ca "reftable", flags=15) at refs.c:1841 > warning: Source file is more recent than executable. > 1841 { > (gdb) up > #1 0x00005555556de2c8 in get_main_ref_store (r=0x555555871d40 > <the_repo>) at refs.c:1862 > (gdb) p r->ref_storage_format > $1 = 0x0 > } Hmm. Looks like repo_init() is only used for submodules. Which is pretty horrid, because that setup sequence is basically duplicated between there and setup_git_directory(), which is the traditional mechanism. That's something that ought to be cleaned up, but it's definitely outside the scope of your series. The patch below was enough to make something like this work: git init --reftable repo cd repo git commit --allow-empty -m foo git for-each-ref I had to make some tweaks to the init code, too: - setting the_repository->ref_storage_format based on the --reftable flag - we can still recognize a reinit by the presence of a HEAD file (whether it's a real one or a dummy). And we should call create_symref() in either case, either to create HEAD or to create the HEAD symlink inside reftables I also fixed a bug in your first patch, where the creation of refs/ moves into files_init_db(). The strbuf needs reset there, or we just append to it, and end up with a doubled path (you could see it by running the same commands above without --reftable, but of course not with your patch series as-is because it will _always_ choose reftable). diff --git a/builtin/init-db.c b/builtin/init-db.c index dcea74610c..ea2a333c4a 100644 --- a/builtin/init-db.c +++ b/builtin/init-db.c @@ -213,6 +213,8 @@ static int create_default_files(const char *template_path, is_bare_repository_cfg = init_is_bare_repository; if (init_shared_repository != -1) set_shared_repository(init_shared_repository); + if (flags & INIT_DB_REFTABLE) + the_repository->ref_storage_format = xstrdup("reftable"); /* * We would have created the above under user's umask -- under @@ -222,6 +224,15 @@ static int create_default_files(const char *template_path, adjust_shared_perm(get_git_dir()); } + /* + * Check to see if .git/HEAD exists; this must happen before + * initializing the ref db, because we want to see if there is an + * existing HEAD. + */ + path = git_path_buf(&buf, "HEAD"); + reinit = (!access(path, R_OK) || + readlink(path, junk, sizeof(junk) - 1) != -1); + /* * We need to create a "refs" dir in any case so that older * versions of git can tell that this is a repository. @@ -234,18 +245,14 @@ static int create_default_files(const char *template_path, * Create the default symlink from ".git/HEAD" to the "master" * branch, if it does not exist yet. */ - if (flags & INIT_DB_REFTABLE) { - reinit = 0; /* XXX - how do we recognize a reinit, - * and what should we do? */ - } else { - path = git_path_buf(&buf, "HEAD"); - reinit = (!access(path, R_OK) || - readlink(path, junk, sizeof(junk) - 1) != -1); - } - if (!reinit) { if (create_symref("HEAD", "refs/heads/master", NULL) < 0) exit(1); + } else { + /* + * XXX should check whether our ref backend matches the + * original one; if not, either die() or convert + */ } /* This forces creation of new config file */ diff --git a/refs.c b/refs.c index 493d3fb673..d6ddbbcc67 100644 --- a/refs.c +++ b/refs.c @@ -1859,10 +1859,9 @@ struct ref_store *get_main_ref_store(struct repository *r) BUG("attempting to get main_ref_store outside of repository"); r->refs = ref_store_init(r->gitdir, - /* XXX r->ref_storage_format == NULL. Where - * should the config file be parsed out? */ - r->ref_storage_format ? r->ref_storage_format : - "reftable", + r->ref_storage_format ? + r->ref_storage_format : + "files", REF_STORE_ALL_CAPS); return r->refs; } diff --git a/refs/files-backend.c b/refs/files-backend.c index 0c53b246e8..6f9efde9ca 100644 --- a/refs/files-backend.c +++ b/refs/files-backend.c @@ -3166,6 +3166,7 @@ static int files_init_db(struct ref_store *ref_store, struct strbuf *err) /* * Create .git/refs/{heads,tags} */ + strbuf_reset(&sb); files_ref_path(refs, &sb, "refs/heads"); safe_create_dir(sb.buf, 1); diff --git a/repository.c b/repository.c index aff47302c9..ff0988dac8 100644 --- a/repository.c +++ b/repository.c @@ -174,9 +174,7 @@ int repo_init(struct repository *repo, if (worktree) repo_set_worktree(repo, worktree); - repo->ref_storage_format = format.ref_storage != NULL ? - xstrdup(format.ref_storage) : - "files"; /* XXX */ + repo->ref_storage_format = xstrdup_or_null(format.ref_storage); clear_repository_format(&format); return 0; diff --git a/setup.c b/setup.c index a339186d83..58c5cd3bf0 100644 --- a/setup.c +++ b/setup.c @@ -450,7 +450,7 @@ static int check_repo_format(const char *var, const char *value, void *vdata) data->partial_clone = xstrdup(value); } else if (!strcmp(ext, "worktreeconfig")) { data->worktree_config = git_config_bool(var, value); - } else if (!strcmp(ext, "refStorage")) { + } else if (!strcmp(ext, "refstorage")) { data->ref_storage = xstrdup(value); } else string_list_append(&data->unknown_extensions, ext); @@ -1184,8 +1184,11 @@ const char *setup_git_directory_gently(int *nongit_ok) gitdir = DEFAULT_GIT_DIR_ENVIRONMENT; setup_git_env(gitdir); } - if (startup_info->have_repository) + if (startup_info->have_repository) { repo_set_hash_algo(the_repository, repo_fmt.hash_algo); + the_repository->ref_storage_format = + xstrdup_or_null(repo_fmt.ref_storage); + } } strbuf_release(&dir);

dscho · 2020-01-28T15:50:34Z

there is #define pread git_pread in your compat header. Does that mean I can just keep using pread?

I guess so ;-)

in general, is there a way to compile this for windows without actually rigging up a Windows installation?

The best I can offer is to push updates to this PR and let the PR build go its way. You will have to look at the results, of course ;-)

The reftable/ directory is structured as a library, so it cannot crash on misuse. Instead, it returns an error codes. In addition, the error code can be used to signal conditions from lower levels of the library to be handled by higher levels of the library. For example, a transaction might legitimately write an empty reftable file, but in that case, we'd want to shortcut the transaction overhead. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

This commit provides basic utility classes for the reftable library. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>

The reftable format is usually used with files for storage. However, we abstract away this using the blocksource data structure. This has two advantages: * log blocks are zlib compressed, and handling them is simplified if we can discard byte segments from within the block layer. * for unittests, it is useful to read and write in-memory. The blocksource allows us to abstract the data away from on-disk files. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

The reftable format is structured as a sequence of blocks, and each block contains a sequence of prefix-compressed key-value records. There are 4 types of records, and they have similarities in how they must be handled. This is achieved by introducing a polymorphic 'record' type that encapsulates ref, log, index and object records. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

The reftable format is structured as a sequence of block. Within a block, records are prefix compressed, with an index of offsets for fully expand keys to enable binary search within blocks. This commit provides the logic to read and write these blocks. Includes a code snippet copied from zlib Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

The reftable format includes support for an (OID => ref) map. This map can speed up visibility and reachability checks. In particular, various operations along the fetch/push path within Gerrit have ben sped up by using this structure. The map is constructed with help of a binary tree. Object IDs are hashes, so they are uniformly distributed. Hence, the tree does not attempt forced rebalancing. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

This supports reading a single reftable file. The commit introduces an abstract iterator type, which captures the usecases both of reading individual refs, and iterating over a segment of the ref namespace. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

With support for reading and writing files in place, we can construct files (in memory) and attempt to read them back. Because some sections of the format are optional (eg. indices, log entries), we have to exercise this code using multiple sizes of input data Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

For background, see the previous commit introducing the library. This introduces the file refs/reftable-backend.c containing a reftable-powered ref storage backend. It can be activated by passing --ref-storage=reftable to "git init", or setting GIT_TEST_REFTABLE in the environment. Example use: see t/t0031-reftable.sh Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Helped-by: Junio Hamano <gitster@pobox.com> Co-authored-by: Jeff King <peff@peff.net>

In our git-prompt script we strive to use Bash builtins wherever possible, because fork()-ing subshells for command substitutions and fork()+exec()-ing Git commands are expensive on some platforms. We even read and parse '.git/HEAD' using Bash builtins to get the name of the current branch [1]. However, the upcoming reftable refs backend won't use '.git/HEAD' at all, but will write an invalid refname as placeholder for backwards compatibility instead, which will break our git-prompt script. Update the git-prompt script to recognize the placeholder '.git/HEAD' written by the reftable backend (its content is specified in the reftable specs), and then fall back to use 'git symbolic-ref' to get the name of the current branch. [1] 3a43c4b (bash prompt: use bash builtins to find out current branch, 2011-03-31) Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>

This command dumps individual tables or a stack of of tables. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

gitgitgadget · 2021-03-15T05:54:41Z

There was a status update about the branch hn/reftable on the Git mailing list:

The "reftable" backend for the refs API.

gitgitgadget · 2021-03-17T22:36:22Z

There was a status update about the branch hn/reftable on the Git mailing list:

The "reftable" backend for the refs API.

gitgitgadget · 2021-03-20T21:46:49Z

There was a status update about the branch hn/reftable on the Git mailing list:

The "reftable" backend for the refs API.

gitgitgadget · 2021-03-23T03:49:28Z

There was a status update about the branch hn/reftable on the Git mailing list:

The "reftable" backend for the refs API.

gitgitgadget · 2021-03-31T00:16:29Z

There was a status update about the branch hn/reftable on the Git mailing list:

The "reftable" backend for the refs API.

gitgitgadget · 2021-04-06T01:28:07Z

There was a status update about the branch hn/reftable on the Git mailing list:

The "reftable" backend for the refs API.

gitgitgadget · 2021-04-08T22:51:26Z

There was a status update about the branch hn/reftable on the Git mailing list:

The "reftable" backend for the refs API.

What's the status of this topic?

gitgitgadget · 2021-04-13T00:06:29Z

There was a status update in the "Cooking" section about the branch hn/reftable on the Git mailing list:

The "reftable" backend for the refs API.

Waiting for reviews.

gitgitgadget · 2021-04-16T00:06:22Z

There was a status update in the "Cooking" section about the branch hn/reftable on the Git mailing list:

The "reftable" backend for the refs API.

Waiting for reviews.

gitgitgadget · 2021-04-19T23:46:04Z

There was a status update in the "Cooking" section about the branch hn/reftable on the Git mailing list:

The "reftable" backend for the refs API.

Reroll exists; needs picking up.
cf. <pull.847.v7.git.git.1618832276.gitgitgadget@gmail.com>

gitgitgadget · 2021-04-20T23:50:28Z

This patch series was integrated into seen via git@597c232.

gitgitgadget · 2021-04-21T17:56:00Z

This patch series was integrated into seen via git@eb39b33.

hanwen · 2021-04-26T14:39:26Z

this is now being worked on further in git#847

hanwen force-pushed the reftable branch from f03c268 to a961098 Compare January 23, 2020 19:19

hanwen force-pushed the reftable branch from a961098 to a615afa Compare January 23, 2020 19:21

dscho previously requested changes Jan 23, 2020

View reviewed changes

hanwen force-pushed the reftable branch from a615afa to 7212012 Compare January 27, 2020 14:15

hanwen requested a review from dscho January 27, 2020 19:16

gitgitgadget bot reviewed Jan 27, 2020

View reviewed changes

gitgitgadget bot reviewed Jan 28, 2020

View reviewed changes

hanwen force-pushed the reftable branch 5 times, most recently from 7ab3c4d to fd06ade Compare January 29, 2020 13:48

hanwen added 3 commits March 12, 2021 19:14

reftable: utility functions

5bbbafb

This commit provides basic utility classes for the reftable library. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de>

hanwen force-pushed the reftable branch from 508a942 to 81ba3ba Compare March 12, 2021 18:17

hanwen and others added 10 commits March 12, 2021 20:15

reftable: write reftable files

1da5494

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

reftable: rest of library

3e1c064

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

Add "test-tool dump-reftable" command.

2634784

This command dumps individual tables or a stack of of tables. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>

hanwen force-pushed the reftable branch from 81ba3ba to 2634784 Compare March 12, 2021 19:15

dscho force-pushed the next branch from aa74c7b to 6335e70 Compare March 23, 2021 00:45

hanwen closed this Apr 26, 2021

		@@ -226,8 +226,6 @@ static int create_default_files(const char *template_path,
		* We need to create a "refs" dir in any case so that older

		@@ -269,6 +269,9 @@ int refs_rename_ref_available(struct ref_store *refs,
		* to the next entry, ref_iterator_advance() aborts the iteration,

		@@ -813,6 +813,7 @@ TEST_SHELL_PATH = $(SHELL_PATH)
		LIB_FILE = libgit.a

Reftable support git-core #539

Reftable support git-core #539

Conversation

hanwen commented Jan 23, 2020 • edited Loading

gitgitgadget bot commented Jan 23, 2020

Welcome to GitGitGadget

Contributing the patches

Need help?

hanwen commented Jan 23, 2020

dscho commented Jan 23, 2020

gitgitgadget bot commented Jan 23, 2020

dscho commented Jan 23, 2020

hanwen commented Jan 23, 2020

gitgitgadget bot commented Jan 23, 2020

dscho left a comment

Choose a reason for hiding this comment

dscho Jan 23, 2020

Choose a reason for hiding this comment

hanwen Jan 27, 2020

Choose a reason for hiding this comment

dscho commented Jan 23, 2020

hanwen commented Jan 23, 2020 • edited Loading

gitgitgadget bot commented Jan 23, 2020

gitgitgadget bot commented Jan 23, 2020

gitgitgadget bot commented Jan 27, 2020

gitgitgadget bot commented Jan 27, 2020

gitgitgadget bot commented Jan 27, 2020

hanwen commented Jan 27, 2020

gitgitgadget bot commented Jan 27, 2020

gitgitgadget bot Jan 27, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jan 28, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jan 30, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jan 27, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jan 28, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jan 28, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jan 28, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jan 28, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jan 28, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jan 29, 2020

Choose a reason for hiding this comment

gitgitgadget bot Jan 29, 2020

Choose a reason for hiding this comment

gitgitgadget bot Feb 4, 2020

Choose a reason for hiding this comment

gitgitgadget bot Feb 4, 2020

Choose a reason for hiding this comment

gitgitgadget bot Feb 4, 2020

Choose a reason for hiding this comment

gitgitgadget bot Feb 4, 2020

Choose a reason for hiding this comment

gitgitgadget bot Feb 4, 2020

Choose a reason for hiding this comment

dscho commented Jan 28, 2020 via email

gitgitgadget bot commented Mar 15, 2021

gitgitgadget bot commented Mar 17, 2021

gitgitgadget bot commented Mar 20, 2021

gitgitgadget bot commented Mar 23, 2021

gitgitgadget bot commented Mar 31, 2021

gitgitgadget bot commented Apr 6, 2021

gitgitgadget bot commented Apr 8, 2021

gitgitgadget bot commented Apr 13, 2021

gitgitgadget bot commented Apr 16, 2021

gitgitgadget bot commented Apr 19, 2021

gitgitgadget bot commented Apr 20, 2021

gitgitgadget bot commented Apr 21, 2021

hanwen commented Apr 26, 2021

hanwen commented Jan 23, 2020 •

edited

Loading

hanwen commented Jan 23, 2020 •

edited

Loading