Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Incorrect git config parsing (error when cloning local repository with trailing backslash on Windows) #1088

Open
zhansliu opened this issue Nov 8, 2022 · 0 comments

Comments

@zhansliu
Copy link

zhansliu commented Nov 8, 2022

I ran into a ValueError ("escape character followed by unknown character 102...") when trying to clone a local repository on Windows. The issue turned out to be because of how trailing backslashes interact with how dulwich parses git config files.

Reproduction steps

(I think this affects lots of other operations on things that have been cloned from a local remote on Windows, but I'll show a way to reproduce by cloning, which was how I first encountered the issue.)

Environment: Windows 10, with output of pip list being

Package    Version
---------- ---------
certifi    2022.9.24
dulwich    0.20.50
pip        21.1.1
setuptools 56.0.0
urllib3    1.26.12
  1. Create a new git repository that we're going to clone. In Powershell:
mkdir C:\source_repo
cd C:\source_repo
git init
git commit --allow-empty -m "Initial commit"
  1. Try to clone it using dulwich:
python -c "from dulwich import porcelain; porcelain.clone('C:\\source_repo\\', 'C:\\cloned_repo')"

(note the trailing backslash in the path for source_repo -- without it it works fine).

Expected result

The dulwich.porcelain.clone command works and clones the repository.

Actual result

We get an exception:

Traceback (most recent call last):
  File "C:\Users\hliu\venvs\test_dulwich\lib\site-packages\dulwich\config.py", line 413, in _parse_string
    v = _ESCAPE_TABLE[value[i]]
KeyError: 102

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\porcelain.py", line 551, in clone
    return client.clone(
  File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\client.py", line 761, in clone
    _import_remote_refs(
  File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\refs.py", line 1211, in _import_remote_refs
    refs_container.import_refs(
  File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\refs.py", line 187, in import_refs
    self.set_if_equals(
  File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\refs.py", line 871, in set_if_equals
    self._log(
  File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\refs.py", line 128, in _log
    self._logger(ref, old_sha, new_sha, committer, timestamp, timezone, message)
  File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\repo.py", line 1207, in _write_reflog
    config = self.get_config_stack()
  File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\repo.py", line 713, in get_config_stack
    backends = [self.get_config()] + StackedConfig.default_backends()
  File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\repo.py", line 1602, in get_config
    return ConfigFile.from_path(path)
  File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\config.py", line 593, in from_path
    ret = cls.from_file(f)
  File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\config.py", line 583, in from_file
    value = _parse_string(continuation)
  File "C:\Users\<username>\venvs\test_dulwich\lib\site-packages\dulwich\config.py", line 419, in _parse_string
    raise ValueError(
ValueError: escape character followed by unknown character 102 at 16 in bytearray(b'C:\\\\source_repo\\fetch = +refs/heads/*:refs/remotes/origin/*')

Analysis

I believe this is a result of incorrect parsing of the git config file by dulwich.

When you clone the repository like this, the contents of C:\cloned_repo\.git\config will contain a section like:

[remote "origin"]
	url = C:\\source_repo\\
	fetch = +refs/heads/*:refs/remotes/origin/*

dulwich (incorrectly) treats the final backslash in url = C:\\source_repo\\ as a line continuation character, so joins the two lines, and then gives an error when trying to parse url = C:\\source_repo\\fetch ...

The code to do this is here, where we simply treat every line ending in backslash as continuing onto the next line. This is incorrect in when the actual value on the line ends in an escape sequence \\.

Compare to how the git code parses config files here, which correctly parses the line one character at a time, and ends up treating the final backslash in url = C:\\source_repo\\ as "consumed" as part of an escape sequence \\.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant