-
-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some more checks on commit/tag #568
Conversation
dulwich/objects.py
Outdated
time = int(timetext) | ||
timezone, timezone_neg_utc = parse_timezone(timezonetext) | ||
# checking for overflow error | ||
datetime.datetime.fromtimestamp(time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an expensive way of checking the time, and depends on the implementation of datetime. Could you just check if the time was higher than the supported CGit value in .check() ?
This new behaviour also makes it impossible to load commits with a high timestamp; this seems more strict than C Git.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will look into that.
Thanks for the heads up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need some help here, please. Here are my findings.
According to git fsck's code, here is
the check on parsing the date and failing if error..
The definition of the function is in date.c.
This checks for a TIME_MAX value.
Through the hide-and-seek game of header inclusion, we can then find the definition of that value.
(from: date.c -> cache.h -> git-compat-util.h)
Its value is UINTMAX_MAX
(unsigned integer max value).
So, is 2^32 sounds good enough?
Note: I interpreted the CGIT value you mentioned as the main git repository.
I'm not so sure i interpreted right though :)
dulwich/tests/test_objects.py
Outdated
message=b'an awesome explanation ../b\n')) | ||
|
||
def test_check_commit_with_overflow_date(self): | ||
# committer Kirill A. Korinskiy <catap@catap.ru> 18446743887488505614 +42707004 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is too long.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, even though i read it in the doc, i completely forgot to check for pep8 violations. Sorry about that.
Will fix.
dulwich/objects.py
Outdated
@@ -1072,6 +1072,9 @@ def format_timezone(offset, unnecessary_negative_timezone=False): | |||
(sign, offset / 3600, (offset / 60) % 60)).encode('ascii') | |||
|
|||
|
|||
MAX_TIME = 4294967296 # 2**32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the same limit that C Git uses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not certain of it.
Thus my question here #568 (comment).
dulwich/objects.py
Outdated
except ValueError as e: | ||
raise ObjectFormatException(e) | ||
# Prevent overflow error | ||
if time > MAX_TIME: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please have this fail on .check(), not at parse time - similar to C Git. In other words, reading this incorrect times will work but .check will fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right!
On Tue, Oct 31, 2017 at 10:58:54AM -0700, Antoine R. Dumont wrote:
ardumont commented on this pull request.
> + :param value: Bytes representing a git commit/tag line
+ :raise: ObjectFormatException in case of parsing error.
+ :return: Tuple of (author, time, (timezone, timezone_neg_utc))
+ """
+ try:
+ sep = value.index(b'> ')
+ except ValueError:
+ return (value, None, (None, False))
+ try:
+ person = value[0:sep+1]
+ rest = value[sep+2:]
+ timetext, timezonetext = rest.rsplit(b' ', 1)
+ time = int(timetext)
+ timezone, timezone_neg_utc = parse_timezone(timezonetext)
+ # checking for overflow error
+ datetime.datetime.fromtimestamp(time)
I need some help here, please. Here are my findings.
According to git fsck's code, [here is
the check on parsing the date and failing if error](https://github.com/git/git/blob/master/fsck.c#L695-L696)..
The definition of the function is in [date.c](https://github.com/git/git/blob/master/date.c#L1206-L1221).
This checks for a TIME_MAX value.
[Through the hide-and-seek game of header inclusion, we can then find the definition of that value](https://github.com/git/git/blob/master/git-compat-util.h#L328).
(from: date.c -> cache.h -> git-compat-util.h)
Its value is `UINTMAX_MAX` (unsigned integer max value).
So, is 2^32 sounds good enough?
On my system, UINTMAX_MAX is 18446744073709551615 (from stdint.h), or
2**64. That's the same as sys.maxint*2 (since int is signed) in
Python; could we use that?
2**32 is only 4294967296.
Note: I interpreted the CGIT value you mentioned as the main git repository.
I'm not so sure i interpreted right though :)
Yep, that's indeed what I meant :)
…--
Jelmer Vernooij <jelmer@jelmer.uk>
PGP Key: https://www.jelmer.uk/D729A457.asc
|
Note, I have pushed multiple commits and all. |
Well, that's where the trouble lies. This is too high (well for I guess the question might be, where do we draw the line between technicals (c/python runtimes) and 2^32 boundary sets us in the future already (but not that much, year 2^64 is not even possible for the python runtime to work with (well as Working on exponents, here is my experiment to see when we can reach
As expected, with that limit (my initial max time btw), the tests created (from the real sample i have mentioned in #567) won't fail.
I'm looking forward to hear your thoughts about it!
Cool. Cheers, |
On Wed, Nov 01, 2017 at 09:45:39AM +0000, Antoine R. Dumont wrote:
> On my system, UINTMAX_MAX is 18446744073709551615 (from stdint.h), or
2**64. That's the same as sys.maxint*2 (since int is signed) in
Python; could we use that?
Well, that's where the trouble lies. This is too high (well for
python3 on my box at least).
I think we mostly need to check that the number fits into an int (rather than
making sure that the resulting value is too large).
I guess the question might be, where do we draw the line between technicals (c/python runtimes) and
reality (we are reading stuff from the past)?
The answer to that is very simple: we need to be consistent with C Git;
Dulwich should accept and reject the same values that C git does.
2^64 is not even possible for the python runtime to work with (well as
you mentioned earlier depending on the date library's implementation
used, i settled on datetime since it's in native python).
It's fine to allow values that are bigger than what fits into a datetime
object. Users of datetime can always catch the OverflowException from datetime.
Cheers,
Jelmer
|
Ack on your latest consistent response. In this current implementation though, that somehow defeats the point of me initiating the discussion and proposing a fix. Well, not totally defeating it :) but for my case it does ;) ... And now, i remember that this implemented check is not totally matching the current C git check. I sense it will answer the part missing for my case. Thanks for your patience. Cheers, |
TL;DR
For the curious, here is my rabbit hole path :)According to the comment linked, they implement a range check to be within the I tried to read the Anyway, I settled for the documentation, which confirms the platform dependency and hint at a possible size. Quoting:
According to my current understanding, I sense a loop in the Force! Anyway, checking again some header file,
Settling on 2^64 as mentioned earlier did not match the reality of my experience with So, I took the problem the other way around. Then And then i realized that in
So there goes the story \m/. For reproductibility:
Note: Cheers, |
33e19fe
to
71505e5
Compare
Codecov Report
@@ Coverage Diff @@
## master #568 +/- ##
==========================================
+ Coverage 90.9% 90.92% +0.02%
==========================================
Files 73 73
Lines 18190 18213 +23
Branches 1945 1946 +1
==========================================
+ Hits 16535 16560 +25
+ Misses 1257 1256 -1
+ Partials 398 397 -1
Continue to review full report at Codecov.
|
I've rebased to work around Which was the good call since some errors occurred for some python3 versions (about bytes and join syntax). |
Hello, Should i squash commits now? Note that i don't get the errors in the appveyor ci either... Cheers, |
Looks good! It'd be great if you could squash the commits, and I'll merge. |
This checks for overflow date errors in objects's check() methods (tag, commit). If such a situation occurs, an ObjectFormatException is raised. Related jelmer#567
9a40658
to
694d27c
Compare
Sure. Cheers, |
Merged, thank you! |
Here is my take on #567.
Cheers,