Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

object.c: localize global the_repository variable into r #545

Open
wants to merge 5 commits into
base: maint
Choose a base branch
from

Conversation

ParthGala2k
Copy link

@ParthGala2k ParthGala2k commented Feb 3, 2020

I have created this commit in response to #379 (comment).
All the functions in object.c which relied on 'the_repository' have been modified.
The functions calling these functions in object.c consisted calls to other functions
using 'the_repository' as well, and although I intended to use 'r' at all those instances,
I thought it would make more sense when we would deal with their callee functions
in another similar patch.
What do you think ?

Signed-off-by: Parth Gala parthpgala@gmail.com

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 3, 2020

Welcome to GitGitGadget

Hi @ParthGala2k, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that your Pull Request has a good description, as it will be used as cover letter.

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

  • the lines should not exceed 76 columns,
  • the first line should be like a header and typically start with a prefix like "tests:" or "commit:", and
  • the commit messages' body should be describing the "why?" of the change.
  • Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the FreeNode IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Freenode. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

@dscho
Copy link
Member

dscho commented Feb 3, 2020

/allow

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 3, 2020

User ParthGala2k is now allowed to use GitGitGadget.

WARNING: ParthGala2k has no public email address set on GitHub

dscho
dscho previously requested changes Feb 3, 2020
Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that quite a few tests fail. Just to pick one at semi-random (I like to look for failures that happen early in the test script, to make it quicker to reproduce, i.e. I look for small YY in the reported tXXXX.YY), I see that t5614.4 fails thusly:

[...]
expecting success of 5614.4 'shallow clone does not imply shallow submodule': 
	test_when_finished "rm -rf super_clone" &&
	git clone --recurse-submodules --depth 2 "file://$pwd/." super_clone &&
	git -C super_clone log --oneline >lines &&
	test_line_count = 2 lines &&
	git -C super_clone/sub log --oneline >lines &&
	test_line_count = 3 lines

+ test_when_finished rm -rf super_clone
+ test 0 = 0
+ test_cleanup={ rm -rf super_clone
		} && (exit "$eval_ret"); eval_ret=$?; :
+ git clone --recurse-submodules --depth 2 file:///home/vsts/work/1/s/t/trash directory.t5614-clone-submodules-shallow/. super_clone
Cloning into 'super_clone'...
Submodule 'sub' (file:///home/vsts/work/1/s/t/trash directory.t5614-clone-submodules-shallow/sub) registered for path 'sub'
Cloning into '/home/vsts/work/1/s/t/trash directory.t5614-clone-submodules-shallow/super_clone/sub'...
error: index-pack died of signal 11
fatal: index-pack failed
fatal: clone of 'file:///home/vsts/work/1/s/t/trash directory.t5614-clone-submodules-shallow/sub' into submodule path '/home/vsts/work/1/s/t/trash directory.t5614-clone-submodules-shallow/super_clone/sub' failed
Failed to clone 'sub'. Retry scheduled
Cloning into '/home/vsts/work/1/s/t/trash directory.t5614-clone-submodules-shallow/super_clone/sub'...
error: index-pack died of signal 11
fatal: index-pack failed
fatal: clone of 'file:///home/vsts/work/1/s/t/trash directory.t5614-clone-submodules-shallow/sub' into submodule path '/home/vsts/work/1/s/t/trash directory.t5614-clone-submodules-shallow/super_clone/sub' failed
Failed to clone 'sub' a second time, aborting
error: last command exited with $?=1
+ rm -rf super_clone
+ exit 1
+ eval_ret=1
+ :

The "signal 11" indicates a segmentation violation, most typically this means that a NULL is being dereferenced.

The reason is easy to see: those local variables repo = NULL should be r = the_repository instead.

builtin/fsck.c Outdated
@@ -375,6 +375,7 @@ static void check_object(struct object *obj)
static void check_connectivity(void)
{
int i, max;
struct repository *repo = NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we settled for the convention to call this r instead of repo.

Besides, initializing this to NULL is incorrect, it should be initialized to the_repository (so as to not introduce any functional changes, the code should behave identically before and after your patch).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will make that change.

@@ -375,6 +375,7 @@ static void check_object(struct object *obj)
static void check_connectivity(void)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at object.c after this patch, it sounds as if the commit message oversells it a bit: the following functions still use the_repository:

  • lookup_unknown_object()
  • parse_object_or_die()
  • clear_object_flags() and
  • clear_commit_marks_all()

Note: I am not saying that you should modify all of those in the same commit. Instead, I would propose to structure this PR into 6 commits, one per function in object.c that needs to take an r parameter. Each commit will then add that parameter to only one function and adjust all callers of said function, either by using an already-available r variable, or by introducing such a variable and initializing it to the_repository (these callers, unless they are in builtin/*.c, will have to be modified later to take an r parameter instead of the local variable).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I failed to notice the

  • clear_object_flags()
  • clear_commit_marks_all()

functions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be wise to make a single commit for functions get_max_object_index and get_indexed_objects given they have a complete overlap in terms of the functions calling them?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do as you wish, of course, but yes, my suggestion was to structure this by function whose signature is modified.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good now, even if I would have preferred individual commits for the two functions (it is just so much easier to review that way).

Having said that, I think the commit message needs some work. Have you looked through the commit history (e.g. via git log -G'struct repository \*r') to see how previous commit messages regarding similar changes were written?

Copy link
Author

@ParthGala2k ParthGala2k Feb 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you looked through the commit history (e.g. via git log -G'struct repository \*r')

I did check previous such commits. Probably my messages need to be more comprehensive in describing their change. But since my commits are essentially doing the same stuff for all the different functions, do I have to keep repeating that message in every commit. Can you please elaborate how I can improve the messages ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not mean to imply that you should repeat a whole lot of text in all the commit messages. To the contrary, I wanted to point out that our commit messages try not to repeat what the associated diff already makes clear. In other words, they focus on the "why?" (unless obvious) over the "how?".

In that light, I would say that the following paragraph is even a bit verbose ;-)

'lookup_unknown_object()' and its callers are modified to enable
passing 'r' as an argument to 'lookup_unknown_object()' in an
effort to reduce dependence on global 'the_repository' variable.

Now, let's compare that to the (very short, maybe even too short) commit message of 90d3405:

match-trees.c: remove the_repo from shift_tree*()

My preference would be somewhere in the middle ;-) But then, my preference is also to do this one function at a time (i.e. change one function's signature per commit, adjusting all the callers in the same commit but nothing more), and as you see, the above-mentioned commit does not necessarily follow that rule.

Meaning: you have a lot of leeway how you want to structure and write your commits and commit messages.

You could even /submit right now 😄

@dscho
Copy link
Member

dscho commented Feb 3, 2020

Don't merge this in yet.

No worries at all: the Git project does not accept Pull Requests on GitHub. You can use this Pull Request to contribute a corresponding patch series to the Git mailing list, though, as instructed. I would strongly recommend to fix the test failures first, though, by making the suggested changes, amending and splitting the commit, and force-pushing.

@dscho
Copy link
Member

dscho commented Feb 3, 2020

Please also note that "to localize" means something different: it means to provide translateable messages (see https://github.com/git/git/blob/v2.25.0/po/README).

A better way to express this might be found by looking through Git's own commit history, searching for commits that already performed a similar cleanup (git log -Gthe_repository should get you going).

@ParthGala2k
Copy link
Author

ParthGala2k commented Feb 8, 2020

What needs to be done about these tests that are failing @dscho ?

@dscho
Copy link
Member

dscho commented Feb 9, 2020

What needs to be done about these tests that are failing @dscho ?

The ones in the "Documentation" job? They seem to be caused by rubygems/rubygems#2984 not quite trickling down to a release of "rubygems" yet...

@dscho
Copy link
Member

dscho commented Feb 9, 2020

I modified the PR build definition. Please also compare to git#707, which I intend to contribute to fix the CI builds of the integration branches, too.

@dscho dscho dismissed their stale review February 11, 2020 23:20

My concerns have mostly been addressed

@ParthGala2k
Copy link
Author

/preview

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 12, 2020

Preview email sent as pull.545.git.1581534649.gitgitgadget@gmail.com

@ParthGala2k
Copy link
Author

/submit

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 12, 2020

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 12, 2020

On the Git mailing list, Taylor Blau wrote (reply to this):

Hi Parth,

On Wed, Feb 12, 2020 at 07:19:06PM +0000, Parth Gala via GitGitGadget wrote:
> I have created this commit in response to
> https://github.com/gitgitgadget/git/issues/379#issue-503866117.

Fantastic! Thank you for working on this.

> All the functions in object.c which relied on 'the_repository' have
> been modified.  The functions calling these functions in object.c
> consisted calls to other functions using 'the_repository' as well, and
> although I intended to use 'r' at all those instances, I thought it
> would make more sense when we would deal with their callee functions
> in another similar patch. What do you think ?

That makes sense, and follows the conventions that other similar
refactorings have done in the past.

Any reason why you decided to start with 'object.c'? Not that any such
reason is necessary, I'm just curious about how you came to this
decision.

> Signed-off-by: Parth Gala parthpgala@gmail.com [parthpgala@gmail.com]

Even though you *do* need a 'Signed-off-by' in each of your patches,
adding it in the cover letter is not necessary.

> Parth Gala (5):
>   object.c: get_max_object_index and get_indexed_object accept 'r'
>     parameter
>   object.c: lookup_unknown_object() accept 'r' as parameter
>   object.c: parse_object_or_die() accept 'r' as parameter
>   object.c: clear_object_flags() accept 'r' as parameter
>   object.c: clear_commit_marks_all() accept 'r' as parameter
>
>  builtin/checkout.c               |  3 ++-
>  builtin/fsck.c                   |  8 +++++---
>  builtin/grep.c                   |  6 ++++--
>  builtin/index-pack.c             |  5 +++--
>  builtin/log.c                    |  3 ++-
>  builtin/name-rev.c               |  5 +++--
>  builtin/pack-objects.c           |  3 ++-
>  builtin/prune.c                  |  3 ++-
>  bundle.c                         |  8 +++++---
>  http-push.c                      |  3 ++-
>  object.c                         | 32 ++++++++++++++++----------------
>  object.h                         | 12 ++++++------
>  pack-bitmap.c                    |  5 +++--
>  reachable.c                      |  6 ++++--
>  refs.c                           |  3 ++-
>  revision.c                       |  3 ++-
>  shallow.c                        | 13 ++++++++-----
>  t/helper/test-example-decorate.c |  7 ++++---
>  upload-pack.c                    | 19 ++++++++++++-------
>  walker.c                         |  3 ++-
>  20 files changed, 89 insertions(+), 61 deletions(-)
>
>
> base-commit: 0ad714499976290d9a0229230cbe4efae930b8dc
> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-545%2FParthGala2k%2Flocalize-the_repository-variable-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-545/ParthGala2k/localize-the_repository-variable-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/545
> --
> gitgitgadget

Thanks,
Taylor

@@ -375,6 +375,7 @@ static void check_object(struct object *obj)
static void check_connectivity(void)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Taylor Blau wrote (reply to this):

On Wed, Feb 12, 2020 at 07:19:07PM +0000, Parth Gala via GitGitGadget wrote:
> From: Parth Gala <parthpgala@gmail.com>
>
> Currently the two functions use global variable 'the_repository' to access
> the values stored in it. This makes 'the_repository' to be existent even
> when not required.

Nit: please wrap your commit messages at 72 characters instead of 80.

> This commit replaces it with 'r' which is passed as a parameter to those
> functions

Makes sense.

> Signed-off-by: Parth Gala <parthpgala@gmail.com>
> ---
>  builtin/fsck.c       |  5 +++--
>  builtin/index-pack.c |  5 +++--
>  builtin/name-rev.c   |  5 +++--
>  object.c             |  8 ++++----
>  object.h             |  4 ++--
>  shallow.c            | 10 ++++++----
>  upload-pack.c        | 10 ++++++----
>  7 files changed, 27 insertions(+), 20 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 8d13794b14..d2b4336f7e 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -375,6 +375,7 @@ static void check_object(struct object *obj)
>  static void check_connectivity(void)
>  {
>  	int i, max;
> +	struct repository *r = the_repository;

Is there a reason that you assign use/assign 'r' here? I can find a few
other such instances of it in:

  $ git grep -l 'struct repository \*r = '
  builtin/merge-tree.c
  builtin/sparse-checkout.c
  builtin/update-index.c
  fetch-pack.c
  read-cache.c
  t/helper/test-reach.c
  tree.c

but I'm not sure that it's necessary here. Could you instead pass
'the_repository' directly to the functions that now require it?

>  	/* Traverse the pending reachable objects */
>  	traverse_reachable();
> @@ -400,12 +401,12 @@ static void check_connectivity(void)
>  	}
>
>  	/* Look up all the requirements, warn about missing objects.. */
> -	max = get_max_object_index();
> +	max = get_max_object_index(r);

For example, changing this line to:

  max = get_max_object_index(the_repository);

>  	if (verbose)
>  		fprintf_ln(stderr, _("Checking connectivity (%d objects)"), max);
>
>  	for (i = 0; i < max; i++) {
> -		struct object *obj = get_indexed_object(i);
> +		struct object *obj = get_indexed_object(r, i);
>
>  		if (obj)
>  			check_object(obj);
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 60a5591039..d2115535bc 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -218,14 +218,15 @@ static unsigned check_object(struct object *obj)
>  static unsigned check_objects(void)
>  {
>  	unsigned i, max, foreign_nr = 0;
> +	struct repository *r = the_repository;
>
> -	max = get_max_object_index();
> +	max = get_max_object_index(r);
>
>  	if (verbose)
>  		progress = start_delayed_progress(_("Checking objects"), max);
>
>  	for (i = 0; i < max; i++) {
> -		foreign_nr += check_object(get_indexed_object(i));
> +		foreign_nr += check_object(get_indexed_object(r, i));
>  		display_progress(progress, i + 1);
>  	}
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index 6b9e8c850b..afe9f6df01 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -456,6 +456,7 @@ static void name_rev_line(char *p, struct name_ref_data *data)
>  int cmd_name_rev(int argc, const char **argv, const char *prefix)
>  {
>  	struct object_array revs = OBJECT_ARRAY_INIT;
> +	struct repository *r = the_repository;
>  	int all = 0, transform_stdin = 0, allow_undefined = 1, always = 0, peel_tag = 0;
>  	struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP, STRING_LIST_INIT_NODUP };
>  	struct option opts[] = {
> @@ -553,9 +554,9 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
>  	} else if (all) {
>  		int i, max;
>
> -		max = get_max_object_index();
> +		max = get_max_object_index(r);
>  		for (i = 0; i < max; i++) {
> -			struct object *obj = get_indexed_object(i);
> +			struct object *obj = get_indexed_object(r, i);
>  			if (!obj || obj->type != OBJ_COMMIT)
>  				continue;
>  			show_name(obj, NULL,
> diff --git a/object.c b/object.c
> index 142ef69399..549fbe69ca 100644
> --- a/object.c
> +++ b/object.c
> @@ -10,14 +10,14 @@
>  #include "packfile.h"
>  #include "commit-graph.h"
>
> -unsigned int get_max_object_index(void)
> +unsigned int get_max_object_index(struct repository *r)
>  {
> -	return the_repository->parsed_objects->obj_hash_size;
> +	return r->parsed_objects->obj_hash_size;
>  }
>
> -struct object *get_indexed_object(unsigned int idx)
> +struct object *get_indexed_object(struct repository *r, unsigned int idx)
>  {
> -	return the_repository->parsed_objects->obj_hash[idx];
> +	return r->parsed_objects->obj_hash[idx];
>  }
>
>  static const char *object_type_strings[] = {
> diff --git a/object.h b/object.h
> index 25f5ab3d54..5a8ae274ee 100644
> --- a/object.h
> +++ b/object.h
> @@ -98,12 +98,12 @@ int type_from_string_gently(const char *str, ssize_t, int gentle);
>  /*
>   * Return the current number of buckets in the object hashmap.
>   */
> -unsigned int get_max_object_index(void);
> +unsigned int get_max_object_index(struct repository *);
>
>  /*
>   * Return the object from the specified bucket in the object hashmap.
>   */
> -struct object *get_indexed_object(unsigned int);
> +struct object *get_indexed_object(struct repository *, unsigned int);
>
>  /*
>   * This can be used to see if we have heard of the object before, but
> diff --git a/shallow.c b/shallow.c
> index 7fd04afed1..4537d98482 100644
> --- a/shallow.c
> +++ b/shallow.c
> @@ -510,6 +510,7 @@ static void paint_down(struct paint_info *info, const struct object_id *oid,
>  		       unsigned int id)
>  {
>  	unsigned int i, nr;
> +	struct repository *r = the_repository;
>  	struct commit_list *head = NULL;
>  	int bitmap_nr = DIV_ROUND_UP(info->nr_bits, 32);
>  	size_t bitmap_size = st_mult(sizeof(uint32_t), bitmap_nr);
> @@ -563,9 +564,9 @@ static void paint_down(struct paint_info *info, const struct object_id *oid,
>  		}
>  	}
>
> -	nr = get_max_object_index();
> +	nr = get_max_object_index(r);
>  	for (i = 0; i < nr; i++) {
> -		struct object *o = get_indexed_object(i);
> +		struct object *o = get_indexed_object(r, i);
>  		if (o && o->type == OBJ_COMMIT)
>  			o->flags &= ~SEEN;
>  	}
> @@ -608,6 +609,7 @@ void assign_shallow_commits_to_refs(struct shallow_info *info,
>  	struct object_id *oid = info->shallow->oid;
>  	struct oid_array *ref = info->ref;
>  	unsigned int i, nr;
> +	struct repository *r = the_repository;
>  	int *shallow, nr_shallow = 0;
>  	struct paint_info pi;
>
> @@ -622,9 +624,9 @@ void assign_shallow_commits_to_refs(struct shallow_info *info,
>  	 * Prepare the commit graph to track what refs can reach what
>  	 * (new) shallow commits.
>  	 */
> -	nr = get_max_object_index();
> +	nr = get_max_object_index(r);
>  	for (i = 0; i < nr; i++) {
> -		struct object *o = get_indexed_object(i);
> +		struct object *o = get_indexed_object(r, i);
>  		if (!o || o->type != OBJ_COMMIT)
>  			continue;
>
> diff --git a/upload-pack.c b/upload-pack.c
> index a00d7ece6b..cb7312268f 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -450,6 +450,7 @@ static int do_reachable_revlist(struct child_process *cmd,
>  		"rev-list", "--stdin", NULL,
>  	};
>  	struct object *o;
> +	struct repository *r = the_repository;
>  	char namebuf[GIT_MAX_HEXSZ + 2]; /* ^ + hash + LF */
>  	int i;
>  	const unsigned hexsz = the_hash_algo->hexsz;
> @@ -472,8 +473,8 @@ static int do_reachable_revlist(struct child_process *cmd,
>
>  	namebuf[0] = '^';
>  	namebuf[hexsz + 1] = '\n';
> -	for (i = get_max_object_index(); 0 < i; ) {
> -		o = get_indexed_object(--i);
> +	for (i = get_max_object_index(r); 0 < i; ) {
> +		o = get_indexed_object(r, --i);
>  		if (!o)
>  			continue;
>  		if (reachable && o->type == OBJ_COMMIT)
> @@ -520,6 +521,7 @@ static int get_reachable_list(struct object_array *src,
>  	struct child_process cmd = CHILD_PROCESS_INIT;
>  	int i;
>  	struct object *o;
> +	struct repository *r = the_repository;
>  	char namebuf[GIT_MAX_HEXSZ + 2]; /* ^ + hash + LF */
>  	const unsigned hexsz = the_hash_algo->hexsz;
>
> @@ -538,8 +540,8 @@ static int get_reachable_list(struct object_array *src,
>  			o->flags &= ~TMP_MARK;
>  		}
>  	}
> -	for (i = get_max_object_index(); 0 < i; i--) {
> -		o = get_indexed_object(i - 1);
> +	for (i = get_max_object_index(r); 0 < i; i--) {
> +		o = get_indexed_object(r, i - 1);
>  		if (o && o->type == OBJ_COMMIT &&
>  		    (o->flags & TMP_MARK)) {
>  			add_object_array(o, NULL, reachable);
> --
> gitgitgadget

Otherwise this looks pretty good.

Thanks,
Taylor

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Eric Sunshine wrote (reply to this):

On Wed, Feb 12, 2020 at 3:22 PM Taylor Blau <me@ttaylorr.com> wrote:
> On Wed, Feb 12, 2020 at 07:19:07PM +0000, Parth Gala via GitGitGadget wrote:
> > diff --git a/builtin/fsck.c b/builtin/fsck.c
> > @@ -375,6 +375,7 @@ static void check_object(struct object *obj)
> >  static void check_connectivity(void)
> >  {
> >       int i, max;
> > +     struct repository *r = the_repository;
>
> Is there a reason that you assign use/assign 'r' here? [...]
> but I'm not sure that it's necessary here. Could you instead pass
> 'the_repository' directly to the functions that now require it?

One benefit of doing it this way is that future patches will be much
less noisy which convert these callers to also work with any
'repository' object. For instance, after the current patch, we have:

    static void check_connectivity(void)
    {
        struct repository *r = the_repository;
        max = get_max_object_index(r);
        ...

A future patch which converts check_connectivity() to work with any
repository will then require very little change -- just adding an 'r'
argument to the function declaration, and dropping the line which
declares 'r':

    static void check_connectivity(struct repository *r)
    {
        max = get_max_object_index(r);
        ...

So, I'm fine with this patch series' approach of declaring a variable
'r' like this.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, parth gala wrote (reply to this):


On 13/02/20 1:52 am, Taylor Blau wrote:
> On Wed, Feb 12, 2020 at 07:19:07PM +0000, Parth Gala via GitGitGadget wrote:
>> From: Parth Gala <parthpgala@gmail.com>
>>
>> Currently the two functions use global variable 'the_repository' to access
>> the values stored in it. This makes 'the_repository' to be existent even
>> when not required.
> Nit: please wrap your commit messages at 72 characters instead of 80.

Alright.

>> This commit replaces it with 'r' which is passed as a parameter to those
>> functions
> Makes sense.
>
>> Signed-off-by: Parth Gala <parthpgala@gmail.com>
>> ---
>>   builtin/fsck.c       |  5 +++--
>>   builtin/index-pack.c |  5 +++--
>>   builtin/name-rev.c   |  5 +++--
>>   object.c             |  8 ++++----
>>   object.h             |  4 ++--
>>   shallow.c            | 10 ++++++----
>>   upload-pack.c        | 10 ++++++----
>>   7 files changed, 27 insertions(+), 20 deletions(-)
>>
>> diff --git a/builtin/fsck.c b/builtin/fsck.c
>> index 8d13794b14..d2b4336f7e 100644
>> --- a/builtin/fsck.c
>> +++ b/builtin/fsck.c
>> @@ -375,6 +375,7 @@ static void check_object(struct object *obj)
>>   static void check_connectivity(void)
>>   {
>>   	int i, max;
>> +	struct repository *r = the_repository;
> Is there a reason that you assign use/assign 'r' here? I can find a few
> other such instances of it in:
>
>    $ git grep -l 'struct repository \*r = '
>    builtin/merge-tree.c
>    builtin/sparse-checkout.c
>    builtin/update-index.c
>    fetch-pack.c
>    read-cache.c
>    t/helper/test-reach.c
>    tree.c
>
> but I'm not sure that it's necessary here. Could you instead pass
> 'the_repository' directly to the functions that now require it?

This was the more convenient thing to do, given that the ultimate
goal is to actually use the `r` that is passed to these caller
functions rather than defining them locally, which would require
another layer of refactoring.

Besides it is what was decided in the issue.

Also, it would be easier to differentiate whether or not a
function has been refactored, upon seeing an `r` rather than
`the_repository`(which is otherwise commonly used in a global
scope) ie. you would be confused whether `the_repository` used
here was passed to the function or not(especially in longer
functions where you'd have to scroll up to the function
definition to check).

>>   	/* Traverse the pending reachable objects */
>>   	traverse_reachable();
>> @@ -400,12 +401,12 @@ static void check_connectivity(void)
>>   	}
>>
>>   	/* Look up all the requirements, warn about missing objects.. */
>> -	max = get_max_object_index();
>> +	max = get_max_object_index(r);
> For example, changing this line to:
>
>    max = get_max_object_index(the_repository);
>
>>   	if (verbose)
>>   		fprintf_ln(stderr, _("Checking connectivity (%d objects)"), max);
>>
>>   	for (i = 0; i < max; i++) {
>> -		struct object *obj = get_indexed_object(i);
>> +		struct object *obj = get_indexed_object(r, i);
>>
>>   		if (obj)
>>   			check_object(obj);
>> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
>> index 60a5591039..d2115535bc 100644
>> --- a/builtin/index-pack.c
>> +++ b/builtin/index-pack.c
>> @@ -218,14 +218,15 @@ static unsigned check_object(struct object *obj)
>>   static unsigned check_objects(void)
>>   {
>>   	unsigned i, max, foreign_nr = 0;
>> +	struct repository *r = the_repository;
>>
>> -	max = get_max_object_index();
>> +	max = get_max_object_index(r);
>>
>>   	if (verbose)
>>   		progress = start_delayed_progress(_("Checking objects"), max);
>>
>>   	for (i = 0; i < max; i++) {
>> -		foreign_nr += check_object(get_indexed_object(i));
>> +		foreign_nr += check_object(get_indexed_object(r, i));
>>   		display_progress(progress, i + 1);
>>   	}
>>
>> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
>> index 6b9e8c850b..afe9f6df01 100644
>> --- a/builtin/name-rev.c
>> +++ b/builtin/name-rev.c
>> @@ -456,6 +456,7 @@ static void name_rev_line(char *p, struct name_ref_data *data)
>>   int cmd_name_rev(int argc, const char **argv, const char *prefix)
>>   {
>>   	struct object_array revs = OBJECT_ARRAY_INIT;
>> +	struct repository *r = the_repository;
>>   	int all = 0, transform_stdin = 0, allow_undefined = 1, always = 0, peel_tag = 0;
>>   	struct name_ref_data data = { 0, 0, STRING_LIST_INIT_NODUP, STRING_LIST_INIT_NODUP };
>>   	struct option opts[] = {
>> @@ -553,9 +554,9 @@ int cmd_name_rev(int argc, const char **argv, const char *prefix)
>>   	} else if (all) {
>>   		int i, max;
>>
>> -		max = get_max_object_index();
>> +		max = get_max_object_index(r);
>>   		for (i = 0; i < max; i++) {
>> -			struct object *obj = get_indexed_object(i);
>> +			struct object *obj = get_indexed_object(r, i);
>>   			if (!obj || obj->type != OBJ_COMMIT)
>>   				continue;
>>   			show_name(obj, NULL,
>> diff --git a/object.c b/object.c
>> index 142ef69399..549fbe69ca 100644
>> --- a/object.c
>> +++ b/object.c
>> @@ -10,14 +10,14 @@
>>   #include "packfile.h"
>>   #include "commit-graph.h"
>>
>> -unsigned int get_max_object_index(void)
>> +unsigned int get_max_object_index(struct repository *r)
>>   {
>> -	return the_repository->parsed_objects->obj_hash_size;
>> +	return r->parsed_objects->obj_hash_size;
>>   }
>>
>> -struct object *get_indexed_object(unsigned int idx)
>> +struct object *get_indexed_object(struct repository *r, unsigned int idx)
>>   {
>> -	return the_repository->parsed_objects->obj_hash[idx];
>> +	return r->parsed_objects->obj_hash[idx];
>>   }
>>
>>   static const char *object_type_strings[] = {
>> diff --git a/object.h b/object.h
>> index 25f5ab3d54..5a8ae274ee 100644
>> --- a/object.h
>> +++ b/object.h
>> @@ -98,12 +98,12 @@ int type_from_string_gently(const char *str, ssize_t, int gentle);
>>   /*
>>    * Return the current number of buckets in the object hashmap.
>>    */
>> -unsigned int get_max_object_index(void);
>> +unsigned int get_max_object_index(struct repository *);
>>
>>   /*
>>    * Return the object from the specified bucket in the object hashmap.
>>    */
>> -struct object *get_indexed_object(unsigned int);
>> +struct object *get_indexed_object(struct repository *, unsigned int);
>>
>>   /*
>>    * This can be used to see if we have heard of the object before, but
>> diff --git a/shallow.c b/shallow.c
>> index 7fd04afed1..4537d98482 100644
>> --- a/shallow.c
>> +++ b/shallow.c
>> @@ -510,6 +510,7 @@ static void paint_down(struct paint_info *info, const struct object_id *oid,
>>   		       unsigned int id)
>>   {
>>   	unsigned int i, nr;
>> +	struct repository *r = the_repository;
>>   	struct commit_list *head = NULL;
>>   	int bitmap_nr = DIV_ROUND_UP(info->nr_bits, 32);
>>   	size_t bitmap_size = st_mult(sizeof(uint32_t), bitmap_nr);
>> @@ -563,9 +564,9 @@ static void paint_down(struct paint_info *info, const struct object_id *oid,
>>   		}
>>   	}
>>
>> -	nr = get_max_object_index();
>> +	nr = get_max_object_index(r);
>>   	for (i = 0; i < nr; i++) {
>> -		struct object *o = get_indexed_object(i);
>> +		struct object *o = get_indexed_object(r, i);
>>   		if (o && o->type == OBJ_COMMIT)
>>   			o->flags &= ~SEEN;
>>   	}
>> @@ -608,6 +609,7 @@ void assign_shallow_commits_to_refs(struct shallow_info *info,
>>   	struct object_id *oid = info->shallow->oid;
>>   	struct oid_array *ref = info->ref;
>>   	unsigned int i, nr;
>> +	struct repository *r = the_repository;
>>   	int *shallow, nr_shallow = 0;
>>   	struct paint_info pi;
>>
>> @@ -622,9 +624,9 @@ void assign_shallow_commits_to_refs(struct shallow_info *info,
>>   	 * Prepare the commit graph to track what refs can reach what
>>   	 * (new) shallow commits.
>>   	 */
>> -	nr = get_max_object_index();
>> +	nr = get_max_object_index(r);
>>   	for (i = 0; i < nr; i++) {
>> -		struct object *o = get_indexed_object(i);
>> +		struct object *o = get_indexed_object(r, i);
>>   		if (!o || o->type != OBJ_COMMIT)
>>   			continue;
>>
>> diff --git a/upload-pack.c b/upload-pack.c
>> index a00d7ece6b..cb7312268f 100644
>> --- a/upload-pack.c
>> +++ b/upload-pack.c
>> @@ -450,6 +450,7 @@ static int do_reachable_revlist(struct child_process *cmd,
>>   		"rev-list", "--stdin", NULL,
>>   	};
>>   	struct object *o;
>> +	struct repository *r = the_repository;
>>   	char namebuf[GIT_MAX_HEXSZ + 2]; /* ^ + hash + LF */
>>   	int i;
>>   	const unsigned hexsz = the_hash_algo->hexsz;
>> @@ -472,8 +473,8 @@ static int do_reachable_revlist(struct child_process *cmd,
>>
>>   	namebuf[0] = '^';
>>   	namebuf[hexsz + 1] = '\n';
>> -	for (i = get_max_object_index(); 0 < i; ) {
>> -		o = get_indexed_object(--i);
>> +	for (i = get_max_object_index(r); 0 < i; ) {
>> +		o = get_indexed_object(r, --i);
>>   		if (!o)
>>   			continue;
>>   		if (reachable && o->type == OBJ_COMMIT)
>> @@ -520,6 +521,7 @@ static int get_reachable_list(struct object_array *src,
>>   	struct child_process cmd = CHILD_PROCESS_INIT;
>>   	int i;
>>   	struct object *o;
>> +	struct repository *r = the_repository;
>>   	char namebuf[GIT_MAX_HEXSZ + 2]; /* ^ + hash + LF */
>>   	const unsigned hexsz = the_hash_algo->hexsz;
>>
>> @@ -538,8 +540,8 @@ static int get_reachable_list(struct object_array *src,
>>   			o->flags &= ~TMP_MARK;
>>   		}
>>   	}
>> -	for (i = get_max_object_index(); 0 < i; i--) {
>> -		o = get_indexed_object(i - 1);
>> +	for (i = get_max_object_index(r); 0 < i; i--) {
>> +		o = get_indexed_object(r, i - 1);
>>   		if (o && o->type == OBJ_COMMIT &&
>>   		    (o->flags & TMP_MARK)) {
>>   			add_object_array(o, NULL, reachable);
>> --
>> gitgitgadget
> Otherwise this looks pretty good.
>
> Thanks,
> Taylor

@@ -375,6 +375,7 @@ static void check_object(struct object *obj)
static void check_connectivity(void)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Taylor Blau wrote (reply to this):

On Wed, Feb 12, 2020 at 07:19:08PM +0000, Parth Gala via GitGitGadget wrote:
> From: Parth Gala <parthpgala@gmail.com>
>
> 'lookup_unknown_object()' and its callers are modified to enable
> passing 'r' as an argument to 'lookup_unknown_object()' in an
> effort to reduce dependence on global 'the_repository' variable.

The changes in 'object.[ch]' look sane to me here, again, but I have the
same question about why assigning:

  struct repository *r = the_repository;

and passing 'r' everywhere is preferable to simply passing
'the_repository' in directly.

> Signed-off-by: Parth Gala <parthpgala@gmail.com>
> ---
>  builtin/fsck.c                   | 3 ++-
>  builtin/pack-objects.c           | 3 ++-
>  http-push.c                      | 3 ++-
>  object.c                         | 8 ++++----
>  object.h                         | 2 +-
>  refs.c                           | 3 ++-
>  t/helper/test-example-decorate.c | 7 ++++---
>  upload-pack.c                    | 3 ++-
>  walker.c                         | 3 ++-
>  9 files changed, 21 insertions(+), 14 deletions(-)
>
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index d2b4336f7e..cd0b67f3bc 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -745,7 +745,8 @@ static int fsck_cache_tree(struct cache_tree *it)
>
>  static void mark_object_for_connectivity(const struct object_id *oid)
>  {
> -	struct object *obj = lookup_unknown_object(oid);
> +	struct repository *r = the_repository;
> +	struct object *obj = lookup_unknown_object(r, oid);
>  	obj->flags |= HAS_OBJ;
>  }
>
> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 393c20a2d7..b03f4378a0 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -2891,6 +2891,7 @@ static void add_objects_in_unpacked_packs(void)
>  {
>  	struct packed_git *p;
>  	struct in_pack in_pack;
> +	struct repository *r = the_repository;
>  	uint32_t i;
>
>  	memset(&in_pack, 0, sizeof(in_pack));
> @@ -2910,7 +2911,7 @@ static void add_objects_in_unpacked_packs(void)
>
>  		for (i = 0; i < p->num_objects; i++) {
>  			nth_packed_object_oid(&oid, p, i);
> -			o = lookup_unknown_object(&oid);
> +			o = lookup_unknown_object(r, &oid);
>  			if (!(o->flags & OBJECT_ADDED))
>  				mark_in_pack_object(o, p, &in_pack);
>  			o->flags |= OBJECT_ADDED;
> diff --git a/http-push.c b/http-push.c
> index 822f326599..c26d03b21b 100644
> --- a/http-push.c
> +++ b/http-push.c
> @@ -1416,6 +1416,7 @@ static void one_remote_ref(const char *refname)
>  {
>  	struct ref *ref;
>  	struct object *obj;
> +	struct repository *r = the_repository;
>
>  	ref = alloc_ref(refname);
>
> @@ -1432,7 +1433,7 @@ static void one_remote_ref(const char *refname)
>  	 * may be required for updating server info later.
>  	 */
>  	if (repo->can_update_info_refs && !has_object_file(&ref->old_oid)) {
> -		obj = lookup_unknown_object(&ref->old_oid);
> +		obj = lookup_unknown_object(r, &ref->old_oid);
>  		fprintf(stderr,	"  fetch %s for %s\n",
>  			oid_to_hex(&ref->old_oid), refname);
>  		add_fetch_request(obj);
> diff --git a/object.c b/object.c
> index 549fbe69ca..90338a509c 100644
> --- a/object.c
> +++ b/object.c
> @@ -177,12 +177,12 @@ void *object_as_type(struct repository *r, struct object *obj, enum object_type
>  	}
>  }
>
> -struct object *lookup_unknown_object(const struct object_id *oid)
> +struct object *lookup_unknown_object(struct repository *r, const struct object_id *oid)
>  {
> -	struct object *obj = lookup_object(the_repository, oid);
> +	struct object *obj = lookup_object(r, oid);
>  	if (!obj)
> -		obj = create_object(the_repository, oid,
> -				    alloc_object_node(the_repository));
> +		obj = create_object(r, oid,
> +				    alloc_object_node(r));
>  	return obj;
>  }
>
> diff --git a/object.h b/object.h
> index 5a8ae274ee..375236cec3 100644
> --- a/object.h
> +++ b/object.h
> @@ -144,7 +144,7 @@ struct object *parse_object_or_die(const struct object_id *oid, const char *name
>  struct object *parse_object_buffer(struct repository *r, const struct object_id *oid, enum object_type type, unsigned long size, void *buffer, int *eaten_p);
>
>  /** Returns the object, with potentially excess memory allocated. **/
> -struct object *lookup_unknown_object(const struct object_id *oid);
> +struct object *lookup_unknown_object(struct repository *, const struct object_id *oid);
>
>  struct object_list *object_list_insert(struct object *item,
>  				       struct object_list **list_p);
> diff --git a/refs.c b/refs.c
> index 1ab0bb54d3..a630a8c271 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -379,7 +379,8 @@ static int filter_refs(const char *refname, const struct object_id *oid,
>
>  enum peel_status peel_object(const struct object_id *name, struct object_id *oid)
>  {
> -	struct object *o = lookup_unknown_object(name);
> +	struct repository *r = the_repository;
> +	struct object *o = lookup_unknown_object(r, name);
>
>  	if (o->type == OBJ_NONE) {
>  		int type = oid_object_info(the_repository, name, NULL);
> diff --git a/t/helper/test-example-decorate.c b/t/helper/test-example-decorate.c
> index c8a1cde7d2..6b3262a9d3 100644
> --- a/t/helper/test-example-decorate.c
> +++ b/t/helper/test-example-decorate.c
> @@ -10,6 +10,7 @@ int cmd__example_decorate(int argc, const char **argv)
>  	struct object_id two_oid = { {2} };
>  	struct object_id three_oid = { {3} };
>  	struct object *one, *two, *three;
> +	struct repository *r = the_repository;
>
>  	int decoration_a, decoration_b;
>
> @@ -26,8 +27,8 @@ int cmd__example_decorate(int argc, const char **argv)
>  	 * Add 2 objects, one with a non-NULL decoration and one with a NULL
>  	 * decoration.
>  	 */
> -	one = lookup_unknown_object(&one_oid);
> -	two = lookup_unknown_object(&two_oid);
> +	one = lookup_unknown_object(r, &one_oid);
> +	two = lookup_unknown_object(r, &two_oid);
>  	ret = add_decoration(&n, one, &decoration_a);
>  	if (ret)
>  		BUG("when adding a brand-new object, NULL should be returned");
> @@ -56,7 +57,7 @@ int cmd__example_decorate(int argc, const char **argv)
>  	ret = lookup_decoration(&n, two);
>  	if (ret != &decoration_b)
>  		BUG("lookup should return added declaration");
> -	three = lookup_unknown_object(&three_oid);
> +	three = lookup_unknown_object(r, &three_oid);
>  	ret = lookup_decoration(&n, three);
>  	if (ret)
>  		BUG("lookup for unknown object should return NULL");
> diff --git a/upload-pack.c b/upload-pack.c
> index cb7312268f..6d196e275b 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -962,7 +962,8 @@ static void receive_needs(struct packet_reader *reader, struct object_array *wan
>  static int mark_our_ref(const char *refname, const char *refname_full,
>  			const struct object_id *oid)
>  {
> -	struct object *o = lookup_unknown_object(oid);
> +	struct repository *r = the_repository;
> +	struct object *o = lookup_unknown_object(r, oid);
>
>  	if (ref_is_hidden(refname, refname_full)) {
>  		o->flags |= HIDDEN_REF;
> diff --git a/walker.c b/walker.c
> index 06cd2bd569..098c69ebe1 100644
> --- a/walker.c
> +++ b/walker.c
> @@ -258,6 +258,7 @@ void walker_targets_free(int targets, char **target, const char **write_ref)
>  int walker_fetch(struct walker *walker, int targets, char **target,
>  		 const char **write_ref, const char *write_ref_log_details)
>  {
> +	struct repository *r = the_repository;
>  	struct strbuf refname = STRBUF_INIT;
>  	struct strbuf err = STRBUF_INIT;
>  	struct ref_transaction *transaction = NULL;
> @@ -285,7 +286,7 @@ int walker_fetch(struct walker *walker, int targets, char **target,
>  			error("Could not interpret response from server '%s' as something to pull", target[i]);
>  			goto done;
>  		}
> -		if (process(walker, lookup_unknown_object(&oids[i])))
> +		if (process(walker, lookup_unknown_object(r, &oids[i])))
>  			goto done;
>  	}
>
> --
> gitgitgadget
>

Thanks,
Taylor

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Taylor Blau <me@ttaylorr.com> writes:

> ... same question about why assigning:
>
>   struct repository *r = the_repository;
>
> and passing 'r' everywhere is preferable to simply passing
> 'the_repository' in directly.
> ...
>>  static void mark_object_for_connectivity(const struct object_id *oid)
>>  {
>> -	struct object *obj = lookup_unknown_object(oid);
>> +	struct repository *r = the_repository;
>> +	struct object *obj = lookup_unknown_object(r, oid);
>>  	obj->flags |= HAS_OBJ;
>>  }

I do not claim that it applies to this particular function, and the
function is too small for it to matter, but when a function is large
enough and it always works on one single repository, it would make
it easier to later update the function further to set up 'r'
upfront, making it point at the_repository for now, and to use 'r'
throughout the function.  That way, when the time comes to update
the function to work on an arbitrary repository, the only change
required will be to turn the local variable 'r' to an incoming
parameter the caller can supply.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Taylor Blau wrote (reply to this):

On Wed, Feb 12, 2020 at 01:11:02PM -0800, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > ... same question about why assigning:
> >
> >   struct repository *r = the_repository;
> >
> > and passing 'r' everywhere is preferable to simply passing
> > 'the_repository' in directly.
> > ...
> >>  static void mark_object_for_connectivity(const struct object_id *oid)
> >>  {
> >> -	struct object *obj = lookup_unknown_object(oid);
> >> +	struct repository *r = the_repository;
> >> +	struct object *obj = lookup_unknown_object(r, oid);
> >>  	obj->flags |= HAS_OBJ;
> >>  }
>
> I do not claim that it applies to this particular function, and the
> function is too small for it to matter, but when a function is large
> enough and it always works on one single repository, it would make
> it easier to later update the function further to set up 'r'
> upfront, making it point at the_repository for now, and to use 'r'
> throughout the function.  That way, when the time comes to update
> the function to work on an arbitrary repository, the only change
> required will be to turn the local variable 'r' to an incoming
> parameter the caller can supply.

Right, but my suggestion was that this advice doesn't apply to this
particular instance since I don't expect that we'd ever passing
something other than 'the_repository'.

Specifically, I was worried that we'd get bitten by re-assigning 'r' in
the middle of the function and then end up in some odd broken state.

Maybe I'm worrying too much.


Thanks,
Taylor

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Taylor Blau <me@ttaylorr.com> writes:

> On Wed, Feb 12, 2020 at 01:11:02PM -0800, Junio C Hamano wrote:
>> Taylor Blau <me@ttaylorr.com> writes:
>>
>> > ... same question about why assigning:
>> >
>> >   struct repository *r = the_repository;
>> >
>> > and passing 'r' everywhere is preferable to simply passing
>> > 'the_repository' in directly.
>> > ...
>> >>  static void mark_object_for_connectivity(const struct object_id *oid)
>> >>  {
>> >> -	struct object *obj = lookup_unknown_object(oid);
>> >> +	struct repository *r = the_repository;
>> >> +	struct object *obj = lookup_unknown_object(r, oid);
>> >>  	obj->flags |= HAS_OBJ;
>> >>  }
>> ...
> Right, but my suggestion was that this advice doesn't apply to this
> particular instance since I don't expect that we'd ever passing
> something other than 'the_repository'.
>
> Specifically, I was worried that we'd get bitten by re-assigning 'r' in
> the middle of the function and then end up in some odd broken state.

"git fsck" works only in a single, "the", repository, so I guess you
are right to be worried about unnecessary complexity here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, Feb 13, 2020 at 10:10:45AM -0800, Junio C Hamano wrote:

> > Right, but my suggestion was that this advice doesn't apply to this
> > particular instance since I don't expect that we'd ever passing
> > something other than 'the_repository'.
> >
> > Specifically, I was worried that we'd get bitten by re-assigning 'r' in
> > the middle of the function and then end up in some odd broken state.
> 
> "git fsck" works only in a single, "the", repository, so I guess you
> are right to be worried about unnecessary complexity here.

I think the end-game for this whole repository transition would be to
get rid of the_repository, though. I.e., I'd envision the progression
something like this:

  1. Teach all of the library code to take (and operate on) "struct
     repository".

  2. Teach static local functions like this to pass in the_repository.

  3. Teach top-level commands like cmd_fsck() to pass the_repository to
     all of those static local helpers.

  4. Teach top-level commands to get a real repository pointer, either
     from the git.c wrapper (when RUN_SETUP is used) or by calling
     setup_git_repository() themselves.

  5. Grep for the_repository and drop it everywhere.

Here we're at step 2 now, but declaring "r" makes moving to step 3 just
a little easier. And I think the existence of steps 4 and 5 implies that
it would eventually be worth going through step 3.

Of course I just wrote those steps down for the first time, so maybe
nobody else shares my vision. ;)

-Peff

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Taylor Blau wrote (reply to this):

On Thu, Feb 13, 2020 at 01:52:35PM -0500, Jeff King wrote:
> On Thu, Feb 13, 2020 at 10:10:45AM -0800, Junio C Hamano wrote:
>
> > > Right, but my suggestion was that this advice doesn't apply to this
> > > particular instance since I don't expect that we'd ever passing
> > > something other than 'the_repository'.
> > >
> > > Specifically, I was worried that we'd get bitten by re-assigning 'r' in
> > > the middle of the function and then end up in some odd broken state.
> >
> > "git fsck" works only in a single, "the", repository, so I guess you
> > are right to be worried about unnecessary complexity here.
>
> I think the end-game for this whole repository transition would be to
> get rid of the_repository, though. I.e., I'd envision the progression
> something like this:
>
>   1. Teach all of the library code to take (and operate on) "struct
>      repository".
>
>   2. Teach static local functions like this to pass in the_repository.
>
>   3. Teach top-level commands like cmd_fsck() to pass the_repository to
>      all of those static local helpers.
>
>   4. Teach top-level commands to get a real repository pointer, either
>      from the git.c wrapper (when RUN_SETUP is used) or by calling
>      setup_git_repository() themselves.
>
>   5. Grep for the_repository and drop it everywhere.
>
> Here we're at step 2 now, but declaring "r" makes moving to step 3 just
> a little easier. And I think the existence of steps 4 and 5 implies that
> it would eventually be worth going through step 3.

Ah, the transition to step 3 justifies this, I think. I wasn't aware
that steps 3+ existed. If they didn't, I'd stand by my original advice,
but given that they do, the approach here makes more sense long-term.

> Of course I just wrote those steps down for the first time, so maybe
> nobody else shares my vision. ;)

Thanks for writing it down. I'm sure that it has been loosely discussed
over a while, but this is the first time that I've seen it all in one
place.

> -Peff

Thanks,
Taylor

@gitgitgadget
Copy link

gitgitgadget bot commented Feb 13, 2020

On the Git mailing list, parth gala wrote (reply to this):


On 13/02/20 1:48 am, Taylor Blau wrote:
> Hi Parth,
>
> On Wed, Feb 12, 2020 at 07:19:06PM +0000, Parth Gala via GitGitGadget wrote:
>> I have created this commit in response to
>> https://github.com/gitgitgadget/git/issues/379#issue-503866117.
> Fantastic! Thank you for working on this.
>
>> All the functions in object.c which relied on 'the_repository' have
>> been modified.  The functions calling these functions in object.c
>> consisted calls to other functions using 'the_repository' as well, and
>> although I intended to use 'r' at all those instances, I thought it
>> would make more sense when we would deal with their callee functions
>> in another similar patch. What do you think ?
> That makes sense, and follows the conventions that other similar
> refactorings have done in the past.
>
> Any reason why you decided to start with 'object.c'? Not that any such
> reason is necessary, I'm just curious about how you came to this
> decision.

I am new to the git community and while searching for issues to solve I
found the issue linked above. I figured solving it would give me a good
experience navigating between functions and exploring the huge repository.

Initially I grepped to find all functions using `the_repository` but
randomly chose object.c since the refactoring had to start somewhere.
Special thanks to Johannes Schindelin for this.

>> Signed-off-by: Parth Gala parthpgala@gmail.com [parthpgala@gmail.com]
> Even though you *do* need a 'Signed-off-by' in each of your patches,
> adding it in the cover letter is not necessary.

I probably did not check the preview I sent to myself carefully for
this. Will make note of it.

>> Parth Gala (5):
>>    object.c: get_max_object_index and get_indexed_object accept 'r'
>>      parameter
>>    object.c: lookup_unknown_object() accept 'r' as parameter
>>    object.c: parse_object_or_die() accept 'r' as parameter
>>    object.c: clear_object_flags() accept 'r' as parameter
>>    object.c: clear_commit_marks_all() accept 'r' as parameter
>>
>>   builtin/checkout.c               |  3 ++-
>>   builtin/fsck.c                   |  8 +++++---
>>   builtin/grep.c                   |  6 ++++--
>>   builtin/index-pack.c             |  5 +++--
>>   builtin/log.c                    |  3 ++-
>>   builtin/name-rev.c               |  5 +++--
>>   builtin/pack-objects.c           |  3 ++-
>>   builtin/prune.c                  |  3 ++-
>>   bundle.c                         |  8 +++++---
>>   http-push.c                      |  3 ++-
>>   object.c                         | 32 ++++++++++++++++----------------
>>   object.h                         | 12 ++++++------
>>   pack-bitmap.c                    |  5 +++--
>>   reachable.c                      |  6 ++++--
>>   refs.c                           |  3 ++-
>>   revision.c                       |  3 ++-
>>   shallow.c                        | 13 ++++++++-----
>>   t/helper/test-example-decorate.c |  7 ++++---
>>   upload-pack.c                    | 19 ++++++++++++-------
>>   walker.c                         |  3 ++-
>>   20 files changed, 89 insertions(+), 61 deletions(-)
>>
>>
>> base-commit: 0ad714499976290d9a0229230cbe4efae930b8dc
>> Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-545%2FParthGala2k%2Flocalize-the_repository-variable-v1
>> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-545/ParthGala2k/localize-the_repository-variable-v1
>> Pull-Request: https://github.com/gitgitgadget/git/pull/545
>> --
>> gitgitgadget
> Thanks,
> Taylor

…meter

Currently the two functions use global variable `the_repository` to access
the values stored in it. This makes `the_repository` to be existent even
when not required.

This commit replaces it with `r` which is passed as a parameter to those
functions. This is in line with a general trend to reduce dependence on
usage of global variables wherever possible.

This task will be carried out gradually in stages, wherein the caller
functions of these functions modified in object.c, might themselves also
be using `the_repository` in form of a global variable, shall be passed
the `r` variable in a future patch series, and so on for their callers.

Signed-off-by: Parth Gala <parthpgala@gmail.com>
`lookup_unknown_object()` and its callers are modified to enable
passing `r` as an argument to `lookup_unknown_object()` in an
effort to reduce dependence on global `the_repository` variable as
stated in 60ffcbd.

Signed-off-by: Parth Gala <parthpgala@gmail.com>
`parse_object_or_die()` and its callers are modified to enable
passing `r` as an argument to `parse_object_or_die()`. Refer commit
60ffcbd.

Signed-off-by: Parth Gala <parthpgala@gmail.com>
`clear_object_flags()` and its callers are modified to enable
passing `r` as an argument to `clear_object_flags()`. Refer commit
60ffcbd.

Signed-off-by: Parth Gala <parthpgala@gmail.com>
`clear_commit_marks_all()` and its callers are modified to enable
passing `r` as an argument to `clear_commit_marks_all()`. Refer commit
60ffcbd.

Signed-off-by: Parth Gala <parthpgala@gmail.com>
@ParthGala2k ParthGala2k force-pushed the localize-the_repository-variable branch from 746d9ff to 55a21d0 Compare April 19, 2020 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants