New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minimal patch series to fix the CI runs of hn/reftable
#623
Conversation
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This happens implicitly in the files/packed ref backend; making it explicit simplifies adding alternate ref storage backends, such as reftable. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
This prepares for supporting the reftable format, which will want create its own file system layout in .git Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Shawn Pearce explains: Some repositories contain a lot of references (e.g. android at 866k, rails at 31k). The reftable format provides: - Near constant time lookup for any single reference, even when the repository is cold and not in process or kernel cache. - Near constant time verification a SHA-1 is referred to by at least one reference (for allow-tip-sha1-in-want). - Efficient lookup of an entire namespace, such as `refs/tags/`. - Support atomic push `O(size_of_update)` operations. - Combine reflog storage with ref storage. This file format spec was originally written in July, 2017 by Shawn Pearce. Some refinements since then were made by Shawn and by Han-Wen Nienhuys based on experiences implementing and experimenting with the format. (All of this was in the context of our work at Google and Google is happy to contribute the result to the Git project.) Imported from JGit[1]'s current version (c217d33ff, "Documentation/technical/reftable: improve repo layout", 2020-02-04) of Documentation/technical/reftable.md and converted to asciidoc by running pandoc -t asciidoc -f markdown reftable.md >reftable.txt using pandoc 2.2.1. The result required the following additional minor changes: - removed the [TOC] directive to add a table of contents, since asciidoc does not support it - replaced git-scm.com/docs links with linkgit: directives that link to other pages within Git's documentation [1] https://eclipse.googlesource.com/jgit/jgit Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The format allows for some ambiguity, as a lone footer also starts with a valid file header. However, the current JGit code will barf on this. This commit codifies this behavior into the standard. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reftable is a format for storing the ref database. Its rationale and specification is in the preceding commit. This imports the upstream library as one big commit. For understanding the code, it is suggested to read the code in the following order: * The specification under Documentation/technical/reftable.txt * reftable.h - the public API * record.{c,h} - reading and writing records * block.{c,h} - reading and writing blocks. * writer.{c,h} - writing a complete reftable file. * merged.{c,h} and pq.{c,h} - reading a stack of reftables * stack.{c,h} - writing and compacting stacks of reftable on the filesystem. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
For background, see the previous commit introducing the library. TODO: * Resolve spots marked with XXX * Detect and prevent directory/file conflicts in naming. * Support worktrees (t0002-gitfile "linked repo" testcase) Example use: see t/t0031-reftable.sh Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
* Add GIT_TEST_REFTABLE environment var to control default ref storage * Add test_prerequisite REFTABLE. Skip t/t3210-pack-refs.sh for REFTABLE. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reading and writing .git/refs/* assumes that refs are stored in the 'files' ref backend. Signed-off-by: Han-Wen Nienhuys <hanwen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Close files before trying to unlink them. This is actually required on Windows. Note: this is probably not the only part of the code that requires this type of fix, but it seems to be enough to let the _current_ version of t0031 pass. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Do close the file descriptors to obsolete files before trying to delete them. This is actually required on Windows. Note: this patch is probably a bit heavy-handed, releasing more than necessary (which will then have to be re-read). But it seems to be enough to let t0031 pass on Windows, so it is at least a clue as to what the fix will look like eventually. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
A `void` function cannot return a value. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
`usleep()` is unportable. We need to find a way _not_ to use it. For Visual C, we can use `sleep_millisec()`, but the current design of libreftable seems to be _really_ keen _not_ to depend on anything in libgit.a. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Yet another instance of `= {}` initialization. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This allows Git to be compiled via Visual Studio again after integrating the `hn/reftable` branch. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
/submit |
Submitted as pull.623.git.1588599086.gitgitgadget@gmail.com |
On the Git mailing list, Junio C Hamano wrote (reply to this):
|
On the Git mailing list, Han-Wen Nienhuys wrote (reply to this):
|
@@ -700,7 +700,7 @@ vcxproj: | |||
# Make .vcxproj files and add them |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> This allows Git to be compiled via Visual Studio again after integrating
> the `hn/reftable` branch.
>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
> config.mak.uname | 2 +-
> contrib/buildsystems/Generators/Vcxproj.pm | 11 ++++++++++-
> 2 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/config.mak.uname b/config.mak.uname
> index 0ab8e009383..8a01a0da3f1 100644
> --- a/config.mak.uname
> +++ b/config.mak.uname
> @@ -700,7 +700,7 @@ vcxproj:
> # Make .vcxproj files and add them
> unset QUIET_GEN QUIET_BUILT_IN; \
> perl contrib/buildsystems/generate -g Vcxproj
> - git add -f git.sln {*,*/lib,t/helper/*}/*.vcxproj
> + git add -f git.sln {*,*/lib,*/libreftable,t/helper/*}/*.vcxproj
>
> # Generate the LinkOrCopyBuiltins.targets and LinkOrCopyRemoteHttp.targets file
> (echo '<Project xmlns="http://schemas.microsoft.com/developer/msbuild/2003">' && \
> diff --git a/contrib/buildsystems/Generators/Vcxproj.pm b/contrib/buildsystems/Generators/Vcxproj.pm
> index 5c666f9ac03..33a08d31652 100644
> --- a/contrib/buildsystems/Generators/Vcxproj.pm
> +++ b/contrib/buildsystems/Generators/Vcxproj.pm
> @@ -77,7 +77,7 @@ sub createProject {
> my $libs_release = "\n ";
> my $libs_debug = "\n ";
> if (!$static_library) {
> - $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
> + $libs_release = join(";", sort(grep /^(?!libgit\.lib|xdiff\/lib\.lib|vcs-svn\/lib\.lib|reftable\/libreftable\.lib)/, @{$$build_structure{"$prefix${name}_LIBS"}}));
> $libs_debug = $libs_release;
> $libs_debug =~ s/zlib\.lib/zlibd\.lib/g;
> $libs_debug =~ s/libcurl\.lib/libcurl-d\.lib/g;
> @@ -231,6 +231,7 @@ sub createProject {
> EOM
> if (!$static_library || $target =~ 'vcs-svn' || $target =~ 'xdiff') {
> my $uuid_libgit = $$build_structure{"LIBS_libgit_GUID"};
> + my $uuid_libreftable = $$build_structure{"LIBS_reftable/libreftable_GUID"};
> my $uuid_xdiff_lib = $$build_structure{"LIBS_xdiff/lib_GUID"};
>
> print F << "EOM";
> @@ -240,6 +241,14 @@ sub createProject {
> <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
> </ProjectReference>
> EOM
> + if (!($name =~ /xdiff|libreftable/)) {
Wow. That is messy. I find it somewhat funny that this is not just
libreftable, i.e. "don't include/refer to libreftable in itself",
but as long as it works, I won't complain ;-)
Thanks for helping the tip of 'pu' moving forward.
> + print F << "EOM";
> + <ProjectReference Include="$cdup\\reftable\\libreftable\\libreftable.vcxproj">
> + <Project>$uuid_libreftable</Project>
> + <ReferenceOutputAssembly>false</ReferenceOutputAssembly>
> + </ProjectReference>
> +EOM
> + }
> if (!($name =~ 'xdiff')) {
> print F << "EOM";
> <ProjectReference Include="$cdup\\xdiff\\lib\\xdiff_lib.vcxproj">
This patch series was integrated into pu via git@7a06b4b. |
This patch series was integrated into pu via git@82b6f41. |
On the Git mailing list, Johannes Schindelin wrote (reply to this):
|
On the Git mailing list, Johannes Schindelin wrote (reply to this):
|
This patch series was integrated into pu via git@3163555. |
This patch series was integrated into pu via git@1699668. |
This patch series was integrated into pu via git@06f931f. |
This patch series was integrated into pu via git@eda5e82. |
This patch series was integrated into pu via git@9f480bd. |
This patch series was integrated into pu via git@cb6edfd. |
This patch series was integrated into pu via git@a774050. |
I'll close this as some of it was accepted, and I do not plan on working on this anymore. |
These patches are not intended to be complete, not by any stretch of imagination. They are just enough to get the CI run to pass, even in the Windows-specific parts.
As I mentioned elsewhere, I would much prefer for
hn/reftable
to not re-inventget_be*()
,struct strbuf
,struct string_list
,struct lock_file
etc.However, in the context of the test failures, I do not think that this would have made a big difference. Apart from the unportable constructs, and from the "delete/rename while there is still a handle on the file" issues, it would appear that one big reason why it was so hard to debug and fix the test is the recursive complexity of the data structures.
To elaborate on that:
struct reftable_stack
has an attribute calledmerged
that has an array ofstruct reftable_reader *
(confusingly called "stack"). Each of these readers has an attribute of typestruct reftable_block_source
that uses a vtable and an opaque pointer to abstract three types of block sources: file, slice (which is apparently unused, i.e. it is apparently just dead weight for now) and malloc.I am not sure that this abstraction fest serves us well here.
Quite honestly, I would have expected the
packed_git
data structure to be imitated more closely, as it strikes me as similar in purpose, and it has seen a ton of testing over the past decade and a half. I could not recognize that design in the reftable, though.It is quite obvious, of course, that the code tries to imitate the object-oriented nature of the Go code, but it seems quite obvious from my difficulties to address the problem where
stack_compact_range()
would try to delete stale reftable files (without even so much as a warning when the files could not be deleted!) without releasing all file handles to all reftable files, even the ones that do not need to be deleted. To be smarter about this, the code has to know more about the nature of the block source than the abstraction layer suggests. It has to know that a block source refers to a file, and that that file is marked for deletion. My heavy-handed work-around, even if it works, is not really a good solution, but it is a testament that there is a lot of opportunity to simplify the code drastically while at the same time simplifying the design, too.I know you have been putting a lot of effort into this library, so I feel a bit bad about saying the following: The
hn/reftable
patches will need substantial work before we can merge it with confidence. Part of the reason why it is so hard to review the reftable patches is that they intentionally refuse to integrate well within Git's source code, such as (re-)implementing its own fundamental data structures, intentionally using a totally different coding style, and it concerns me that the stated requirement for bug fixes is to treat Git's source code as a downstream of the actual project. I am not too bad a C developer and would consider myself competent in debugging issues, even hard ones, in Git, and yet... it was really hard to wade through the layers of abstraction to figure out where the file handles should be closed that were opened and prevented deleting/renaming files.At this point, I don't feel that it makes sense to keep insisting on having this in a separate library. The only other user of that library will be libgit2, and I really think that given libgit2's heritage, it won't be a problem to adapt the code after it stabilized in git.git (and since libgit2 treats git.git as upstream for the libxdiff code, it won't be a problem to do the same for the reftable code, too). I believe that the best course of action is to reuse the data structures libgit.a provides, and to delete the re-implementations in
reftable/
. Only then can we start working effectively on refactoring the code to simplify the data structures in order to clarify resource usage (which was the root cause for the bugs I fixed, although I am sure that there are way more lurking in there, hidden by the fact that the code is not covered thoroughly by our tests).Cc: Han-Wen Nienhuys hanwen@google.com