Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sha1-file: split OBJECT_INFO_FOR_PREFETCH #228

Closed

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented May 28, 2019

I found this interaction while testing the VFS for Git fork rebasing onto v2.22.0-rc1 [1]. It seems this new flag meant for partial clone prefetches interacts poorly with the read-object hook we use in our fork.

The issue is that OBJECT_INFO_FOR_PREFETCH has multiple bits on, so testing "flag & OBJECT_INFO_FOR_PREFETCH" can be true even if not all bits are on. My fix simply splits the new bit into a special flag while keeping OBJECT_INFO_FOR_PREFETCH as a union of flags.

Jury is out if this fixes our problem, but it definitely seems like a bug waiting to happen in Git, too.

Thanks,
-Stolee

[1] microsoft#140

Cc: jonathantanmy@google.com

The OBJECT_INFO_FOR_PREFETCH bitflag was added to sha1-file.c in 0f4a4fb
(sha1-file: support OBJECT_INFO_FOR_PREFETCH, 2019-03-29) and is used to
prevent the fetch_objects() method when enabled.

However, there is a problem with the current use. The definition of
OBJECT_INFO_FOR_PREFETCH is given by adding 32 to OBJECT_INFO_QUICK. This is
clearly stated above the definition (in a comment) that this is so
OBJECT_INFO_FOR_PREFETCH implies OBJECT_INFO_QUICK. The problem is that using
"flag & OBJECT_INFO_FOR_PREFETCH" means that OBJECT_INFO_QUICK also implies
OBJECT_INFO_FOR_PREFETCH.

Split out the single bit from OBJECT_INFO_FOR_PREFETCH into a new
OBJECT_INFO_SKIP_FETCH_OBJECT as the single bit and keep
OBJECT_INFO_FOR_PREFETCH as the union of two flags. This allows a clearer use
of flag checking while also keeping the implication of OBJECT_INFO_QUICK.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
@derrickstolee
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented May 28, 2019

Submitted as pull.228.git.gitgitgadget@gmail.com

@@ -282,10 +282,14 @@ struct object_info {
#define OBJECT_INFO_IGNORE_LOOSE 16
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Derrick Stolee <dstolee@microsoft.com>
>
> The OBJECT_INFO_FOR_PREFETCH bitflag was added to sha1-file.c in 0f4a4fb1
> (sha1-file: support OBJECT_INFO_FOR_PREFETCH, 2019-03-29) and is used to
> prevent the fetch_objects() method when enabled.
>
> However, there is a problem with the current use. The definition of
> OBJECT_INFO_FOR_PREFETCH is given by adding 32 to OBJECT_INFO_QUICK. This is
> clearly stated above the definition (in a comment) that this is so
> OBJECT_INFO_FOR_PREFETCH implies OBJECT_INFO_QUICK. The problem is that using
> "flag & OBJECT_INFO_FOR_PREFETCH" means that OBJECT_INFO_QUICK also implies
> OBJECT_INFO_FOR_PREFETCH.

So the right test to see prefetch is in effect is not

	if (!!(flag & TWO_BITS))

but to use

	if ((flag & TWO_BITS) == TWO_BITS)

instead?

> Split out the single bit from OBJECT_INFO_FOR_PREFETCH into a new
> OBJECT_INFO_SKIP_FETCH_OBJECT as the single bit and keep
> OBJECT_INFO_FOR_PREFETCH as the union of two flags. This allows a clearer use
> of flag checking while also keeping the implication of OBJECT_INFO_QUICK.

OK, I guess that would work and with far less damage to the existing
code.

>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  object-store.h | 10 +++++++---
>  sha1-file.c    |  2 +-
>  2 files changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/object-store.h b/object-store.h
> index dd3f9b75f0..c90628d839 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -282,10 +282,14 @@ struct object_info {
>  #define OBJECT_INFO_IGNORE_LOOSE 16
>  /*
>   * Do not attempt to fetch the object if missing (even if fetch_is_missing is
> - * nonzero). This is meant for bulk prefetching of missing blobs in a partial
> - * clone. Implies OBJECT_INFO_QUICK.
> + * nonzero).
>   */
> -#define OBJECT_INFO_FOR_PREFETCH (32 + OBJECT_INFO_QUICK)
> +#define OBJECT_INFO_SKIP_FETCH_OBJECT 32
> +/*
> + * This is meant for bulk prefetching of missing blobs in a partial
> + * clone. Implies OBJECT_INFO_SKIP_FETCH_OBJECT and OBJECT_INFO_QUICK
> + */
> +#define OBJECT_INFO_FOR_PREFETCH (OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK)
>  
>  int oid_object_info_extended(struct repository *r,
>  			     const struct object_id *,
> diff --git a/sha1-file.c b/sha1-file.c
> index ad02649124..0299fdd516 100644
> --- a/sha1-file.c
> +++ b/sha1-file.c
> @@ -1371,7 +1371,7 @@ int oid_object_info_extended(struct repository *r, const struct object_id *oid,
>  		/* Check if it is a missing object */
>  		if (fetch_if_missing && repository_format_partial_clone &&
>  		    !already_retried && r == the_repository &&
> -		    !(flags & OBJECT_INFO_FOR_PREFETCH)) {
> +		    !(flags & OBJECT_INFO_SKIP_FETCH_OBJECT)) {
>  			/*
>  			 * TODO Investigate having fetch_object() return
>  			 * TODO error/success and stopping the music here.

@@ -282,10 +282,14 @@ struct object_info {
#define OBJECT_INFO_IGNORE_LOOSE 16
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Tue, May 28, 2019 at 08:19:07AM -0700, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The OBJECT_INFO_FOR_PREFETCH bitflag was added to sha1-file.c in 0f4a4fb1
> (sha1-file: support OBJECT_INFO_FOR_PREFETCH, 2019-03-29) and is used to
> prevent the fetch_objects() method when enabled.
> 
> However, there is a problem with the current use. The definition of
> OBJECT_INFO_FOR_PREFETCH is given by adding 32 to OBJECT_INFO_QUICK. This is
> clearly stated above the definition (in a comment) that this is so
> OBJECT_INFO_FOR_PREFETCH implies OBJECT_INFO_QUICK. The problem is that using
> "flag & OBJECT_INFO_FOR_PREFETCH" means that OBJECT_INFO_QUICK also implies
> OBJECT_INFO_FOR_PREFETCH.
> 
> Split out the single bit from OBJECT_INFO_FOR_PREFETCH into a new
> OBJECT_INFO_SKIP_FETCH_OBJECT as the single bit and keep
> OBJECT_INFO_FOR_PREFETCH as the union of two flags. This allows a clearer use
> of flag checking while also keeping the implication of OBJECT_INFO_QUICK.

Oof. I actually suggested splitting these up for review, but thought it
was only a clarity/flexibility issue, and completely missed the
correctness aspect of checking when the bit is set.

I agree with Junio's other response that using "==" would be the right
way for a multi-bit check, in general. But I like the split here,
because I think the result is more clear to read and harder to get
wrong for future checks.

I'd even go so far as to say...

> + * This is meant for bulk prefetching of missing blobs in a partial
> + * clone. Implies OBJECT_INFO_SKIP_FETCH_OBJECT and OBJECT_INFO_QUICK
> + */
> +#define OBJECT_INFO_FOR_PREFETCH (OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK)

we could dump this, and callers should just say what they mean (i.e.,
specify both flags).

There are only two of them, and I think both would be more readable with
a helper more like:

  int should_prefetch_object(struct repository *r,
                             const struct object_id *oid) {
	return !oid_object_info_extended(r, oid, NULL,
	                                 OBJECT_INFO_SKIP_FETCH_OBJECT |
					 OBJECT_INFO_QUICK);
  }

but unless everybody is immediately on-board with "yes, that is much
nicer", I don't want bikeshedding to hold up your important and
obviously-correct fix.

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 5/28/2019 4:54 PM, Jeff King wrote:
> On Tue, May 28, 2019 at 08:19:07AM -0700, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <dstolee@microsoft.com>
>>
>> The OBJECT_INFO_FOR_PREFETCH bitflag was added to sha1-file.c in 0f4a4fb1
>> (sha1-file: support OBJECT_INFO_FOR_PREFETCH, 2019-03-29) and is used to
>> prevent the fetch_objects() method when enabled.
>>
>> However, there is a problem with the current use. The definition of
>> OBJECT_INFO_FOR_PREFETCH is given by adding 32 to OBJECT_INFO_QUICK. This is
>> clearly stated above the definition (in a comment) that this is so
>> OBJECT_INFO_FOR_PREFETCH implies OBJECT_INFO_QUICK. The problem is that using
>> "flag & OBJECT_INFO_FOR_PREFETCH" means that OBJECT_INFO_QUICK also implies
>> OBJECT_INFO_FOR_PREFETCH.
>>
>> Split out the single bit from OBJECT_INFO_FOR_PREFETCH into a new
>> OBJECT_INFO_SKIP_FETCH_OBJECT as the single bit and keep
>> OBJECT_INFO_FOR_PREFETCH as the union of two flags. This allows a clearer use
>> of flag checking while also keeping the implication of OBJECT_INFO_QUICK.
> 
> Oof. I actually suggested splitting these up for review, but thought it
> was only a clarity/flexibility issue, and completely missed the
> correctness aspect of checking when the bit is set.
> 
> I agree with Junio's other response that using "==" would be the right
> way for a multi-bit check, in general. But I like the split here,
> because I think the result is more clear to read and harder to get
> wrong for future checks.

Thanks, for the feedback, both of you.

> I'd even go so far as to say...
> 
>> + * This is meant for bulk prefetching of missing blobs in a partial
>> + * clone. Implies OBJECT_INFO_SKIP_FETCH_OBJECT and OBJECT_INFO_QUICK
>> + */
>> +#define OBJECT_INFO_FOR_PREFETCH (OBJECT_INFO_SKIP_FETCH_OBJECT | OBJECT_INFO_QUICK)
> 
> we could dump this, and callers should just say what they mean (i.e.,
> specify both flags).

Dropping the _PREFETCH flag also makes oid_object_info_extended() slightly
less "coupled" to the prefetch feature, and instead describes more explicitly
the way the flag is changing the behavior of the method.
 
> There are only two of them, and I think both would be more readable with
> a helper more like:
> 
>   int should_prefetch_object(struct repository *r,
>                              const struct object_id *oid) {
> 	return !oid_object_info_extended(r, oid, NULL,
> 	                                 OBJECT_INFO_SKIP_FETCH_OBJECT |
> 					 OBJECT_INFO_QUICK);
>   }
> 
> but unless everybody is immediately on-board with "yes, that is much
> nicer", I don't want bikeshedding to hold up your important and
> obviously-correct fix.

I'll come back with another series to drop the _PREFETCH flag after the
release calms down. It can give more time for others to chime in here.

Thanks, Junio for the quick turnaround in taking the patch.

Thanks,
-Stolee

@gitgitgadget
Copy link

gitgitgadget bot commented May 28, 2019

This branch is now known as ds/object-info-for-prefetch-fix.

@gitgitgadget
Copy link

gitgitgadget bot commented May 28, 2019

This patch series was integrated into pu via git@fc46a92.

@gitgitgadget gitgitgadget bot added the pu label May 28, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented May 29, 2019

This patch series was integrated into pu via git@0d0baa0.

@gitgitgadget
Copy link

gitgitgadget bot commented May 29, 2019

This patch series was integrated into pu via git@204a6c5.

@gitgitgadget
Copy link

gitgitgadget bot commented May 29, 2019

This patch series was integrated into next via git@ee0055b.

@gitgitgadget gitgitgadget bot added the next label May 29, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented May 30, 2019

This patch series was integrated into pu via git@1ac55e5.

@gitgitgadget
Copy link

gitgitgadget bot commented May 30, 2019

This patch series was integrated into pu via git@77f1162.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 3, 2019

This patch series was integrated into pu via git@0ddd52f.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 6, 2019

This patch series was integrated into pu via git@9d57b99.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 7, 2019

This patch series was integrated into pu via git@d13b851.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 10, 2019

This patch series was integrated into pu via git@823e800.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 12, 2019

This patch series was integrated into pu via git@f1c28e6.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 13, 2019

This patch series was integrated into pu via git@ce29cf8.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 14, 2019

This patch series was integrated into pu via git@ee9457f.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 18, 2019

This patch series was integrated into pu via git@5d5c46b.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 18, 2019

This patch series was integrated into next via git@5d5c46b.

@gitgitgadget
Copy link

gitgitgadget bot commented Jun 18, 2019

This patch series was integrated into master via git@5d5c46b.

@gitgitgadget gitgitgadget bot added the master label Jun 18, 2019
@gitgitgadget
Copy link

gitgitgadget bot commented Jun 18, 2019

Closed via 5d5c46b.

@gitgitgadget gitgitgadget bot closed this Jun 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant