Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhandled IndexError when calling .read() on a malformed config file #1887

Closed
DaveLak opened this issue Mar 30, 2024 · 6 comments · Fixed by #1908
Closed

Unhandled IndexError when calling .read() on a malformed config file #1887

DaveLak opened this issue Mar 30, 2024 · 6 comments · Fixed by #1908

Comments

@DaveLak
Copy link
Contributor

DaveLak commented Mar 30, 2024

Hi, I noticed the fuzzing tests that OSS-Fuzz runs on this project are broken and while I was working on fixing them I believe I came across a minor bug:

The Bug

An Uncaught Python exception: IndexError: string index out of range can be triggered if when trying to call .read() on GitConfigParser if it was initialized with a malformed config file.

Current Behavior

It's easiest to demonstrate, so please consider this example:

import io
from git.config import GitConfigParser


def reproduce_issue():

    malformed_config_content_bytestring = b'[-]\nk:"v\n"'

    problematic_config_file = io.BytesIO(malformed_config_content_bytestring)

    # problematic_config_file looks now like this:
    """
    [-]
    k:"v
    "
    """

    # We have to name the file otherwise we'll trigger
    # `AttributeError: '_io.BytesIO' object has no attribute 'name'` here:
    # https://github.com/gitpython-developers/GitPython/blob/c7675d2cedcd737f20359a4a786e213510452413/git/config.py#L623
    problematic_config_file.name = "fuzzedconfig.config"

    # This is fine
    git_config = GitConfigParser(problematic_config_file)
    
    # The next line raised an unhandled `IndexError: string index out of range`
    git_config.read()


if __name__ == "__main__":
    reproduce_issue()

Assuming that code is in /path/to/example/config_indexerror_reproduction.py then

python config_indexerror_reproduction.py

produces something akin to:

Traceback (most recent call last):
  File "/path/to/example/config_indexerror_reproduction.py", line 30, in <module>
    reproduce_issue()
  File "/path/to/example/config_indexerror_reproduction.py", line 26, in reproduce_issue
    git_config.read()
  File "/path/to/example/.venv/lib/python3.12/site-packages/git/config.py", line 607, in read
    self._read(file_path, file_path.name)
  File "/path/to/example/.venv/lib/python3.12/site-packages/git/config.py", line 514, in _read
    cursect.setlast(optname, optval + string_decode(line))
                                      ^^^^^^^^^^^^^^^^^^^
  File "/path/to/example/.venv/lib/python3.12/site-packages/git/config.py", line 441, in string_decode
    if v[-1] == "\\":
       ~^^^^
IndexError: string index out of range
My Reproduction Environment Details

The reproduction code above was tested on:

Python Version: 3.12.1 (main, Feb  5 2024, 16:23:00) [Clang 15.0.0 (clang-1500.1.0.2.5)]
OS Information: macOS-14.4.1-x86_64-i386-64bit
Installed Packages:
Package   Version
--------- -------
gitdb     4.0.11
GitPython 3.1.42
pip       24.0
smmap     5.0.1

And the fuzzer environment was:

Python Version: 3.8.3 (default, Mar 17 2024, 03:21:27)
[Clang 15.0.0 (https://github.com/llvm/llvm-project.git bf7f8d6fa6f460bf0a16ffe
OS Information: Linux-6.6.16-linuxkit-x86_64-with-glibc2.2.5
Installed Packages:
Package                   Version
------------------------- -------
altgraph                  0.17.4
atheris                   2.3.0
coverage                  6.3.2
importlib_metadata        7.0.2
gitdb                     4.0.11
GitPython                 3.1.42
pip                       24.0
smmap                     5.0.1
packaging                 24.0
pyinstaller               5.0.1
setuptools                41.0.1
six                       1.15.0
zipp                      3.18.1

So, if I'm reading the source correct, it seems like the combination of some header section ([-] above) followed by a key/value assignment that has a value consisting of a double quoted string with a new line inside it confuses the check here which strips the " on the new line:

GitPython/git/config.py

Lines 521 to 528 in c7675d2

else:
line = line.rstrip()
if line.endswith('"'):
is_multi_line = False
line = line[:-1]
# END handle quotations
optval = cursect.getlast(optname)
cursect.setlast(optname, optval + string_decode(line))

and passes an empty string to string_decode which isn't expecting that when it indexes into it's arg:

GitPython/git/config.py

Lines 454 to 455 in c7675d2

def string_decode(v: str) -> str:
if v[-1] == "\\":

Expected Behavior

I'd expect an explicitly raised ParsingError similar to how it's handled a little further up:

GitPython/git/config.py

Lines 514 to 518 in c7675d2

else:
# Check if it's an option with no value - it's just ignored by git.
if not self.OPTVALUEONLY.match(line):
if not e:
e = cp.ParsingError(fpname)

@DaveLak
Copy link
Contributor Author

DaveLak commented Mar 30, 2024

On a related note, do the maintainers of this project have any interest in bringing the fuzzing test code into this repo? I searched the repository for prior discussions about it but didn't come across anything.

I'm happy to:

  • open a new issue or GH discussion thread with more context if it'd be helpful
  • perform the migration/integration work; since I have the necessary changes locally already it's just a matter of what repo I make the PR against with the test improvements
  • help support the fuzz integration if desired

@Byron
Copy link
Member

Byron commented Mar 31, 2024

Thanks a lot for this wonderfully complete bug report, it's much appreciated.

Personally, I think the config-parser implementation here is probably very broken and non-conforming, and the fuzzer will have a very good time with it. A problem I see is that reproductions are usually hidden behind a login, so it requires extra work to make these available here so the community has a chance to fix them. However, I am open to give it a shot and see how it goes.

I definitely welcome you to help with maintaining the fuzzer integration, and would appreciate if that would also entail the maintenance of the issues it finds such that all information for reproduction would be contained within them so it's not only maintainers who can fix them.

@DaveLak
Copy link
Contributor Author

DaveLak commented Mar 31, 2024

Thanks for the prompt reply 🙂

Silly me for not realizing earlier, but I see gitoxide has been integrated for a while with OSS-Fuzz issues mirrored in GH issues and some good results. Given that context, and this project's maintenance mode status I definitely hear where your coming from.

I think it probably is worth opening up a discussion thread to continue the fuzzing convo for the sake of keeping this issue on topic as I do have some follow-up questions for you related to what you think a fix for this might look like (if any). I went ahead and replied to you here: #1889 (comment)

DaveLak added a commit to DaveLak/GitPython that referenced this issue Apr 12, 2024
Migrates the OSS-Fuzz tests and setup scripts from the OSS-Fuzz
repository to GitPython's repo as discussed here:
gitpython-developers#1887 (comment)

These files include the changes that were originally proposed in:
google/oss-fuzz#11763

Additional changes include:
- A first pass at documenting the contents of the fuzzing set up in a
  dedicated README.md
- Adding the dictionary files to this repo for improved visibility. Seed
  corpra zips are still located in an external repo pending further
 discussion regarding where those should live in the long term.
@DaveLak
Copy link
Contributor Author

DaveLak commented Apr 21, 2024

I started to take a closer look at this and now have a better idea of what's going on here as well as a proof-of-concept fix, but I wanted to get your thoughts on this @Byron, before I put it into PR. CC @EliahKagan in case you have thoughts here as well.

Initially, my instinct was to refer to the git-scm.com docs on Git config syntax and use that to inform the fix here, but (ignoring the fact that there isn't much guidance on value formats relevant to this issue) this portion of @Byron's comment above caught my attention and made re-think things a bit:

I think the config-parser implementation here is probably very broken and non-conforming

Specifically, the "non-conforming" part caught my eye. If GitConfigParser's behavior is already known (and to some degree expected) to be non-conforming to canonical Git's behavior, then I don't think I agree with my initial assessment of the "Expected Behavior" that I described in the PR description (i.e., this should throw an exception) regardless of how canonical Git would handle it. Introducing a "fix" that throws an exception seems unnecessarily disruptive to users that may have otherwise working code that was written around this "non-conforming implementation". Preserving the existing behavior and just making it safer around exception cases seems like a better approach.

So instead, I think a more considerate and generally simpler fix would be to maintain the existing parsing behavior, and just make its checks a little more robust to eliminate the risk of the unhandled IndexError described in this issue.

The diff below shows the change I am considering which would:

  • Improve the guarding if check in the string_decode function to safely handle empty strings and prevent index errors when accessing string elements (this resolves the root cause of this issue.)
  • Update the option value parsing logic in the _read method to more robustly handle quoted values, ensuring correct handling even when quotes are not closed properly or when the line continues beyond a quoted segment.
  • Add checks to prevent passing empty strings to string_decode, which could occur in multi-line value scenarios.

Here's the diff:

diff --git a/git/config.py b/git/config.py
index 3ce9b123..91b69531 100644
--- a/git/config.py
+++ b/git/config.py
@@ -452,7 +452,7 @@ class GitConfigParser(cp.RawConfigParser, metaclass=MetaParserBuilder):
         e = None  # None, or an exception.
 
         def string_decode(v: str) -> str:
-            if v[-1] == "\\":
+            if v and v[-1] == "\\":
                 v = v[:-1]
             # END cut trailing escapes to prevent decode error
 
@@ -508,6 +508,8 @@ class GitConfigParser(cp.RawConfigParser, metaclass=MetaParserBuilder):
                     if len(optval) > 1 and optval[0] == '"' and optval[-1] != '"':
                         is_multi_line = True
                         optval = string_decode(optval[1:])
+                    elif len(optval) > 1 and optval[0] == '"' and optval[-1] == '"':
+                        optval = string_decode(optval[1:-1])  # safely decode fully quoted value
                     # END handle multi-line
                     # Preserves multiple values for duplicate optnames.
                     cursect.add(optname, optval)
@@ -524,8 +526,10 @@ class GitConfigParser(cp.RawConfigParser, metaclass=MetaParserBuilder):
                     is_multi_line = False
                     line = line[:-1]
                 # END handle quotations
+                if line:  # Ensure line is not empty before decoding
+                    line = string_decode(line)
                 optval = cursect.getlast(optname)
-                cursect.setlast(optname, optval + string_decode(line))
+                cursect.setlast(optname, optval + line)
             # END parse section or option
         # END while reading

What do you think? Does this sound like a reasonable approach here, or is there some context I might be missing?

If this does sound like something worth introducing, let me know and:

  1. I'll add a Unit test.
  2. Remove the workaround for this error in the fuzz_config.py test so it's covered there as well.
  3. Open a PR.

@Byron
Copy link
Member

Byron commented Apr 22, 2024

So instead, I think a more considerate and generally simpler fix would be to maintain the existing parsing behavior, and just make its checks a little more robust to eliminate the risk of the unhandled IndexError described in this issue.

The above hits home, and perfectly describes how I imagine handling this. It's an improvement, without going overboard and try to fix something that is systematically broken - a conforming parser can't be made based on an INI parser.

If this does sound like something worth introducing, let me know and:

  1. I'll add a Unit test.
  2. Remove the workaround for this error in the fuzz_config.py test so it's covered there as well.
  3. Open a PR.

This sounds like a plan! Thanks again.

DaveLak added a commit to DaveLak/GitPython that referenced this issue Apr 22, 2024
Improve the guarding `if` check in `GitConfigParser`'s `string_decode`
function to safely handle empty strings and prevent `IndexError`s when
accessing string elements.

This resolves an IndexError in the `GitConfigParser`'s `.read()`
method when the config file contains a quoted value ending with
a new line.

Fixes:
gitpython-developers#1887
DaveLak added a commit to DaveLak/GitPython that referenced this issue Apr 22, 2024
Improve the guarding `if` check in `GitConfigParser`'s `string_decode`
function to safely handle empty strings and prevent `IndexError`s when
accessing string elements.

This resolves an IndexError in the `GitConfigParser`'s `.read()`
method when the config file contains a quoted value containing a
trailing new line.

Fixes:
gitpython-developers#1887
@DaveLak
Copy link
Contributor Author

DaveLak commented Apr 22, 2024

PR with the fix: #1908

Only the changes to string_decode were necessary, so that's all I changed.

DaveLak added a commit to DaveLak/GitPython that referenced this issue Apr 22, 2024
Improve the guarding `if` check in `GitConfigParser`'s `string_decode`
function to safely handle empty strings and prevent `IndexError`s when
accessing string elements.

This resolves an IndexError in the `GitConfigParser`'s `.read()`
method when the config file contains a quoted value containing a
trailing new line.

Fixes:
gitpython-developers#1887
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants