Skip to content

submodule fetch config

Heiko Voigt edited this page Nov 11, 2013 · 10 revisions

About reading and applying configuration of non-checked out revisions during fetch

Goal

Our goal is to teach fetch to prepare the repository for recursive checkout so it has all necessary commits to do the checkout in typical use cases. We want git to handle submodules like normal files as close as possible by default. After starting to implement this too many questions and decisions arose which we try to clarify/discuss using this document.

Thanks to Jonathan Nieder and Jens Lehmann for discussing/brainstorming all the different things with me.

Introduction or What The Hell Do We Talk About

Whether git will fetch or clone a submodule depends on the value of submodule.recurseSubmodules or submodule.autoInit and their submodule specific relatives.

recurseSubmodules configures whether fetch will do a recursive fetch in submodules. autoInit configures whether a new submodule will be cloned on fetch and its url copied from the fetched revisions .gitmodules into .git/config so everything is prepared for checkout to do a checkout of the submodule.

If fetch or clone are given the --recurse-submodules option the autoinit configuration is assumed to be true.

We add/rely on the following switches:

  1. --recurse-submodules

  2. submodule.autoinit

  3. submodule."name".autoinit

  4. fetch.recurseSubmodules (config only)

  5. submodule."name".fetchRecurseSubmodules (.gitmodules / config)

TODO: Talk about the options values like on, off, on-demand ?

For expensive operations we might have a special commandline option like: --recurse-submodules=all ?

How do we solve conflicting configurations from .gitmodules in commits?

  1. We do it the simple way, warn, skip and rely on the user to solve them.

  2. Later once some more experience is gained we might solve some constellations automatically.

  3. We output a suggestion how to proceed. E.g.:

	git checkout origin/master
	git submodule sync
	git submodule update --init --recursive

See the "Configuration precedence" section for information about what entries can be used to override others.

User Stories

Here we will describe some typical use cases to support the plan we introduce after this section and hopefully ensure that we do not forget any important situation.

Clone with submodule enabled

E.g.:

$ git clone --recurse-submodules your-cool-project
  1. I expect that all submodules that are contained in the revision to be checked out after finishing the clone will also be cloned and its urls will be copied from .gitmodules to .git/config

  2. I expect that submodules not contained in the revision to be checked out will not be fetched. Since that can be expensive and old submodules are likely not existing anymore.

  3. I expect that all submodules contained in revisions that are not contained in the revision to be checked out to be cloned / config initialized. Since these revisions are likely to be open branches with new submodules they should be cloned to prepare for later integration. Otherwise its likely that the on-demand fetch logic will not pickup these new submodules when they are integrated in main line causing a recursive checkout to fail.

Submodule Recursion For Checked Out Submodules

In this description we only care about submodules that are contained in the checked out working dir.

I expect all initialized submodules to behave after the local or checked out configuration (.git/config and .gitmodules).

Let’s say we have configured submodule.recursesubmodules = on or given the --recurse-submodules[=on] commandline option to fetch.

Superproject has a linear history:

  1. I expect "git fetch" to run fetch in all submodules checked out in the worktree

  2. I do not expect it to run fetch in old submodules not checked out in the worktree, but I wouldn’t mind if it did either.

  3. I expect "git fetch" to clone any new submodules that I would need to "git merge --recurse-submodules FETCH_HEAD", and any submodules needed for intermediate states.

Superproject has multiple branches:

  1. Same as before, but I’d expect it to clone or fetch new and existing submodules for <existing tip>..<new tip> of all branches.

Submodule Recursion On-Demand (local config)

Let’s say submodule.recursesubmodules = on-demand or given the --recurse-submodules=on-demand commandline option to fetch.

Superproject has a linear history:

  1. Same as with --recurse-submodules=on but I would only expect fetch to fetch or clone submodules that have changed/are new in the commits fetched

Superproject hast multiple branches:

  1. Similar to before but also fetch changed/new submodules between <existing tip>..<new tip> of all branches.

Submodule Recursion Off (local config)

Let’s say submodule.recursesubmodules = off or commandline option --recurse-submodules=off or no config or option.

No submodules will be fetched

Submodule Recursion Not Checked Out

Let’s say the submodule is not in the current index (checked out) and the .gitmodules in all fetched commits agree on submodule <name>'s config submodule.<name>.recursesubmodules = on

  1. I expect the submodule to be fetched/cloned when the recorded submodule-commit was changed in any commit between <existing tip>..<new tip>.

  2. I do not expect the submodule to be fetched/cloned if there was no change of the recorded submodule-commit in the fetched commits but I would not mind if it was either.

The same applies to submodule.<name>.recursesubmodules = on-demand except that I would mind if a submodule was fetched which had no change in the recorded submodule-commit.

Let’s assume the same situation as before but the configurations of submodule.<name>.recursesubmodules in .gitmodules of the fetched commits disagree.

  1. I expect git to warn me that the configuration of submodule <name> was not the same across all fetched commits.

    • I expect git to skip the recursive fetch of those submodules and tell me what I can do to solve this.

Let’s say submodule.recursesubmodules = off or --recurse-submodules=off commandline option.

  1. I expect "git fetch" to skip all submodules by default except the ones configured otherwise

Implementation Plan or How Can That Work

This is a rough description of the recursive fetch/clone strategy. We concentrate on the on-demand case. Since on and off are quite simple.

changed_submodule_names is a string_list of submodules names and additionally stores the result of the parsed final configuration in its util pointer. It has a conflict marker for entries that have been parsed but need skipping.

For --recurse-submodules=on instead of the first step (adding only fetched revisions), add all names from .git/modules to changed_submodule_names. And add all submodules that exist in new revisions.

TODO: rename changed_submodule_names variable to reflect the on case.

In superproject

  1. Lookup all changed submodule names from commits received during fetch and if it is not contained in the list yet add to changed_submodule_names. During collection phase: For each revision that changes a submodule store in a parse cache:

    1. .gitmodules sha1 (for subsequent readings of the same config)

    2. path

    3. name

  2. For each submodule in the index that is in changed_submodule_names

    1. Read and parse local and checked out configuration and reset conflict marker

    2. Mark entry as finished_parsing.

  3. For each entry in changed_submodule_names not marked finished_parsing

    1. Skip if entry has finished_parsing flag set.

    2. Skip if entry has no conflict.

    3. Read and parse local and checked out configuration and reset conflict marker in entry if the local config solves the conflict. TODO: It seems we need a conflict marker for every single value so we can decide what caused the conflict and when it is resolved.

    4. Mark entry as solved if last step was successful.

  4. Fetch / Clone all submodules in changed_submodule_names depending on stored configuration in that list, the conflict marker and whether referenced commits are already present.

  5. Will be left for future extension: If we were given the special option --recurse-submodules=all it overrides all revisions, checked out or local recurseSubmodules configurations

    1. Parse all commits .gitmodules

    2. Try to fetch or clone all found module names by their url. Warn and skip submodule names that have conflicting urls in commits. The user can configure the url locally with git config submodule."name".url to solve this situation.

Revisions, .gitmodules And Local Config

In this section we will discuss the handling of values from .gitmodules that are not in the worktree but in revisions.

All configuration values for initialized and checked out submodules in .git/modules/ like recurseSubmodules, autoInit, …​ come from local (checked out) config or commandline.

We now describe submodules that are not in the index nor in the checked out .gitmodules. That means they are only referenced in .gitmodules of commits. If there is a submodule described in some of the fetched revisions .gitmodules and we come to the conclusion that it should be fetched. The fetch automatically clones it into .git/modules and copies the url from .gitmodules into .git/config

If .gitmodules configuration values disagree between revisions and no overriding configuration is provided. We fall back on the configured global default. E.g. : submodule.recursesubmodules or the --recurse-submodules-default commandline option. If nothing is configured the current default will be warn and skip.

If all parsed .gitmodules values from fetched commits are consistent we behave like they say.

Configuration precedence (the latter overrides the earlier)

  1. General config (fetch.RecurseSubmodules)

  2. Command line default (--recurse-submodules-default)

  3. consolidated .gitmodules from each commit (submodule."name".fetchRecurseSubmodules)

  4. Per submodule config (system, user, repo) (submodule."name".fetchRecurseSubmodules)

  5. Command line option (--recurse-submodules)

Roadmap

Current state: fetching of initialized submodules, but the .gitmodules config is still taken from the work tree.

Next steps:

  1. Fetch renamed initialized submodules using the path to name mapping of the .gitmodules file of the correct commit.

  2. Make fetch use the consolidated .gitmodules configuration from all fetched commits.

  3. Implement autoinit config cloning the bare submodule repo into .git/modules and initializing them by putting the consolidated URL into .git/config.