-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple hash algorithms #4438
Multiple hash algorithms #4438
Conversation
ccf7733
to
a4fe12f
Compare
a4fe12f
to
cb79aa1
Compare
Rebased to fix a whole lot of conflicts |
d590a31
to
62690ae
Compare
Todo: remove this weird separation between |
7eafcf9
to
3b4409e
Compare
src/hash.h
Outdated
union { | ||
git_hash_sha1_ctx sha1; | ||
} ctx; | ||
git_hash_algorithm alg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put the algo first ? I'm thinking we might want a GIT_HASH_ALGORITHM_CUSTOM
, and having that appear first would allow a user-provided struct to be overlayed transparently ?
It may be weird, but the goal is to avoid allocations in The current implementation was chosen to be as similar to the existing init / update / finish pattern as possible, just adding a reusable object on top that you need to allocate and free. (I called that object a context, because our constructors are usually called "init" and needed to disambiguate between hash initialization.) Indeed we could make Certainly moving to having a "hash context" that uses the |
Thanks for the clarafication, @ethomson. I didn't realize that
I think that renaming |
3b4409e
to
d72f112
Compare
Rebased to fix conflicts. I've tried to reduce the scope a bit by only doing the split between generic and SHA1-specific hashing layers. I didn't do any API changes at all |
By the way: I noticed that our "Generic" SHA1 implementation doesn't compile right now. This is being fixed in here, too, but it shows that we're not as thorough with testing the different implementations as we maybe should be. Or, alternatively, that we have too many backends and that some aren't used at all. |
d72f112
to
2c3cfcc
Compare
Rebased to fix conflicts |
2c3cfcc
to
30927c0
Compare
30927c0
to
2ee01f8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor, completely ignorable comments. It's pretty unintrusive as well (for now), so 👍.
src/hash.h
Outdated
typedef struct { | ||
void *data; | ||
size_t len; | ||
} git_buf_vec; | ||
|
||
typedef enum { | ||
GIT_HASH_ALGO_UNDEF = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: UNKNOWN
? It's never really undefined, as it's either SHA1 or something we don't know. Semantics 😉.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed
src/hash.h
Outdated
typedef enum { | ||
GIT_HASH_ALGO_UNDEF = 0, | ||
GIT_HASH_ALGO_SHA1, | ||
} git_hash_algo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Remark: we've started adding _t
suffixes to some of these, but I've yet to grasp the rules.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea. From a quick glance it looks like _t
prefixes are more common, so I'll just add it
* a Linking Exception. For full terms see the included COPYING file. | ||
*/ | ||
|
||
#ifndef INCLUDE_hash_sha1_collisiondetect_h__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any chance this could be subsumed as "private implementation detail" inside their respective C files (or sha1.h
) ? I don't think we really care about accessing that hashing context, so apart from a struct-sizing/compilation POV, that would reduce the amount of "small files". Do ignore that if you have a compelling reason though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, no. We allocate these structs on the stack, so their size needs to be known at compile time :/ Would've loved that, too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about moving the required include + struct inside the #ifdef cascade in sha1.h
? At least the ones that are 10-20 lines (ie. keep win32.h separate) ? I think I prefer more "complete", straightforward #define blocks vs 3-4 smaller ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd definitely agree if all of those struct declarations were small. But as you correctly point out, the "win32.h" one is a rather complex beast. I do have the goal of refactoring it in the future to internalize more of its implementation, but I'd like to keep that separate from this PR.
So for now, I think we should keep those headers separate to not have a hybrid of everything in "sha1.h" but not "win32.h".
src/hash/sha1/mbedtls.h
Outdated
@@ -5,20 +5,15 @@ | |||
* a Linking Exception. For full terms see the included COPYING file. | |||
*/ | |||
|
|||
#ifndef INCLUDE_hash_mbedtld_h__ | |||
#define INCLUDE_hash_mbedtld_h__ | |||
#ifndef INCLUDE_hash_sha1_mbedtld_h__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mbedtld
🙄.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, in fact I can point at you here :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(the eyesballs were my own blame-acceptance 😉 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot see any smileys, thus I didn't get that :P
2ee01f8
to
362557a
Compare
Fixed comments by @tiennou. Thanks for your review! |
362557a
to
359017b
Compare
Rebased to fix conflicts with #5055 |
The structure `git_hash_prov` is only ever used by the Win32 SHA1 backend. As such, it doesn't make much sense to expose it via the generic "hash.h" header, as it is an implementation detail of the Win32 backend only. Move the typedef of `git_hash_prov` into "hash/sha1/win32.h" to fix this.
The hash source files have circular include dependencies right now, which shows by our broken generic hash implementation. The "hash.h" header declares two functions and the `git_hash_ctx` typedef before actually including the hash backend header and can only declare the remaining hash functions after the include due to possibly static function declarations inside of the implementation includes. Let's break this cycle and help maintainability by creating a real implementation file for each of the hash implementations. Instead of relying on the exact include order, we now especially avoid the use of `GIT_INLINE` for function declarations.
359017b
to
b7187ed
Compare
Rebased to fix SHA1DC breakage |
As we will include additional hash algorithms in the future due to upstream git discussing a move away from SHA1, we should accomodate for that and prepare for the move. As a first step, move all SHA1 implementations into a common subdirectory. Also, create a SHA1-specific header file that lives inside the hash folder. This header will contain the SHA1-specific header includes, function declarations and the SHA1 context structure.
As a preparatory step to allow multiple hashing APIs to exist at the same time, split the hashing functions into one layer for generic hashing and one layer for SHA1-specific hashing. Right now, this is simply an additional indirection layer that doesn't yet serve any purpose. In the future, the generic API will be extended to allow for choosing which hash to use, though, by simply passing an enum to the hash context initialization function. This is necessary as a first step to be ready for Git's move to SHA256.
Create a separate `git_hash_sha1_ctx` structure that is specific to the SHA1 implementation and move all SHA1 functions over to use that one instead of the generic `git_hash_ctx`. The `git_hash_ctx` for now simply has a union containing this single SHA1 implementation, only, without any mechanism to distinguish between different algortihms.
Create an enum that allows us to distinguish between different hashing algorithms. This enum is embedded into each `git_hash_ctx` and will instruct the code to which hashing function the particular request shall be dispatched. As we do not yet have multiple hashing algorithms, we simply initialize the hash algorithm to always be SHA1. At a later point, we will have to extend the `git_hash_init_ctx` function to get as parameter which algorithm shall be used.
I'm just going ahead now and merge this. It's been open for a rather long time, and I finally want to continue working on this. There's still the one commen from @tiennou about inlining struct decls into "sha1.h", but we might as well tackle that in a future iteration. |
A first step towards an abstraction for the use of multiple of hash algorithms. It lays the groundwork for having
git_hash_ctx
dispatch to different hash algorithms based on which one is being requested. Next steps:git_repository
by agit_hash_algorithm
fieldgit_hash_init
andgit_hash_ctx_init
to accept an algorithm which is then used for hashing by that particular context only