Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some more checks on commit/tag #568

Merged
merged 1 commit into from
Nov 3, 2017

Conversation

ardumont
Copy link
Contributor

Here is my take on #567.

Cheers,

time = int(timetext)
timezone, timezone_neg_utc = parse_timezone(timezonetext)
# checking for overflow error
datetime.datetime.fromtimestamp(time)
Copy link
Owner

@jelmer jelmer Oct 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an expensive way of checking the time, and depends on the implementation of datetime. Could you just check if the time was higher than the supported CGit value in .check() ?

This new behaviour also makes it impossible to load commits with a high timestamp; this seems more strict than C Git.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will look into that.
Thanks for the heads up!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need some help here, please. Here are my findings.

According to git fsck's code, here is
the check on parsing the date and failing if error
..

The definition of the function is in date.c.
This checks for a TIME_MAX value.

Through the hide-and-seek game of header inclusion, we can then find the definition of that value.
(from: date.c -> cache.h -> git-compat-util.h)

Its value is UINTMAX_MAX (unsigned integer max value).
So, is 2^32 sounds good enough?

Note: I interpreted the CGIT value you mentioned as the main git repository.
I'm not so sure i interpreted right though :)

message=b'an awesome explanation ../b\n'))

def test_check_commit_with_overflow_date(self):
# committer Kirill A. Korinskiy <catap@catap.ru> 18446743887488505614 +42707004
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is too long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, even though i read it in the doc, i completely forgot to check for pep8 violations. Sorry about that.
Will fix.

@@ -1072,6 +1072,9 @@ def format_timezone(offset, unnecessary_negative_timezone=False):
(sign, offset / 3600, (offset / 60) % 60)).encode('ascii')


MAX_TIME = 4294967296 # 2**32
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same limit that C Git uses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not certain of it.
Thus my question here #568 (comment).

except ValueError as e:
raise ObjectFormatException(e)
# Prevent overflow error
if time > MAX_TIME:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please have this fail on .check(), not at parse time - similar to C Git. In other words, reading this incorrect times will work but .check will fail.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right!

@jelmer
Copy link
Owner

jelmer commented Oct 31, 2017 via email

@ardumont
Copy link
Contributor Author

ardumont commented Nov 1, 2017

Note, I have pushed multiple commits and all.
I'll rebase (and possibly squash commits :) when we find a reasonable implementation.

@ardumont
Copy link
Contributor Author

ardumont commented Nov 1, 2017

On my system, UINTMAX_MAX is 18446744073709551615 (from stdint.h), or
2**64. That's the same as sys.maxint*2 (since int is signed) in
Python; could we use that?

Well, that's where the trouble lies. This is too high (well for
python3 on my box at least).

I guess the question might be, where do we draw the line between technicals (c/python runtimes) and
reality (we are reading stuff from the past)?

2^32 boundary sets us in the future already (but not that much, year
2106).

2^64 is not even possible for the python runtime to work with (well as
you mentioned earlier depending on the date library's implementation
used, i settled on datetime since it's in native python).

Working on exponents, here is my experiment to see when we can reach
(~ year 6325):

Python 3.5.3 (default, Jan 19 2017, 14:11:04)
[GCC 6.3.0 20170118] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import datetime
>>> for e in range(32, 64): print('2^%s' % e); d= datetime.datetime.fromtimestamp(2**e); print('date: %s' % d);
...
2^32
date: 2106-02-07 07:28:16
2^33
date: 2242-03-16 13:56:32
2^34
date: 2514-05-30 03:53:04
2^35
date: 3058-10-26 05:46:08
2^36
date: 4147-08-20 09:32:16
2^37
date: 6325-04-08 17:04:32
2^38
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: year is out of range

As expected, with that limit (my initial max time btw), the tests created (from the real sample i have mentioned in #567) won't fail.
But since 2^38 sets us in year 6325 already... 2^64's year is clearly way out there in terms of current reality :)

$ python3 -m unittest dulwich/tests/test_objects.py
....................F...................................F..........................s..s.s...
======================================================================
FAIL: test_check_commit_with_overflow_date (dulwich.tests.test_objects.CommitParseTests)
Date with overflow should raise an ObjectFormatException when checked
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tony/repo/public/dulwich/dulwich/tests/test_objects.py", line 672, in test_check_commit_with_overflow_date
    commit.check()
AssertionError: ObjectFormatException not raised

======================================================================
FAIL: test_check_tag_with_overflow_time (dulwich.tests.test_objects.TagParseTests)
Date with overflow should raise an ObjectFormatException when checked
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tony/repo/public/dulwich/dulwich/tests/test_objects.py", line 1051, in test_check_tag_with_overflow_time
    tag.check()
AssertionError: ObjectFormatException not raised

----------------------------------------------------------------------
Ran 92 tests in 0.026s

FAILED (failures=2, skipped=3)
python3 -m unittest dulwich/tests/test_objects.py  0.17s user 0.01s system 97% cpu 0.185 total

I'm looking forward to hear your thoughts about it!

Yep, that's indeed what I meant :)

Cool.
Sometimes, some creepy feeling crawls and makes me doubt my initial understanding.

Cheers,

@jelmer
Copy link
Owner

jelmer commented Nov 1, 2017 via email

@ardumont
Copy link
Contributor Author

ardumont commented Nov 1, 2017

Ack on your latest consistent response.
I'll adapt accordingly to that limit then (2^64).

In this current implementation though, that somehow defeats the point of me initiating the discussion and proposing a fix.

Well, not totally defeating it :) but for my case it does ;)

...
(thinking)
...

And now, i remember that this implemented check is not totally matching the current C git check.
That is but the first one.
There is a secondary check done which i did not grasp at the time and which slipped my mind.

I sense it will answer the part missing for my case.
So i will continue digging this and get back to you on this :)

Thanks for your patience.

Cheers,

@ardumont
Copy link
Contributor Author

ardumont commented Nov 1, 2017

There is a secondary check done which i did not grasp at the time and which slipped my mind.

TL;DR

(2^63)-1 is the limit which matches git fsck.

For the curious, here is my rabbit hole path :)

According to the comment linked, they implement a range check to be within the time_t structure (which is platform-dependent).

I tried to read the time.h header but i'm not so sure of my reading.
I believed that it confirms the 2^64 bits max (well preprocessor
conditional, aliasing, including and all that sorts of things does not
quite help)...

Anyway, I settled for the documentation, which confirms the platform dependency and hint at a possible size.

Quoting:

In the GNU C Library, time_t is equivalent to long int. In other
systems, time_t might be either an integer or floating-point type.

According to my current understanding, long int.
This would either be a max of 2^32 or 2^64...

I sense a loop in the Force!
/o\

Anyway, checking again some header file, /usr/include/limits.h:

/* Minimum and maximum values a `signed long int' can hold.  */

/* Maximum value an `unsigned long int' can hold.  (Minimum is 0.)  */
#  if __WORDSIZE == 64
#   define ULONG_MAX	18446744073709551615UL
#  else
#   define ULONG_MAX	4294967295UL
#  endif

Settling on 2^64 as mentioned earlier did not match the reality of my experience with
git fsck though (#267 example again which are the tests i programmed as well).

So, I took the problem the other way around.
With the current dulwich code, i crafted a dummy repository and iterated
over a range of 32-64 exponents (to create commit date time).

Then git fsck it when done. Turns out the commits which are
detected on overflow errors are from 2^63 and beyond.

And then i realized that in limits.h, that's the signed long max limit just prior unsigned long ones:
(9223372036854775807L is (2^63)-1):

/* Minimum and maximum values a `signed long int' can hold.  */
#  if __WORDSIZE == 64
#   define LONG_MAX	9223372036854775807L
#  else
#   define LONG_MAX	2147483647L
#  endif
#  define LONG_MIN	(-LONG_MAX - 1L)

So there goes the story \m/.

For reproductibility:

$ git init repository
$ cd repository
$ touch README.md ; git add . ; git commit -m 'initial commit'
$ python3
>>> for i in range(32, 65): r.do_commit(b'basic commit - time in seconds at 2^%s' % str(i).encode(), commit_timestamp=(2**i))
$ git fsck
error in commit b425efbaa606f2faa7fb125902aec0e5d8964c8c: badDateOverflow: invalid author/committer line - date causesinteger overflow
error in commit e82b7260e85dec96d3baf4a7c41fe6ad169d68c8: badDateOverflow: invalid author/committer line - date causesinteger overflow
Checking object directories: 100% (256/256), done.
$ git show --format=raw b425efbaa606f2faa7fb125902aec0e5d8964c8c
commit b425efbaa606f2faa7fb125902aec0e5d8964c8c
tree f93e3a1a1525fb5b91020da86e44810c87a2d7bc
parent 325d8cfced8681094edbc91ec46399901d53ab06
author Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> 9223372036854775808 +0000
committer Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> 9223372036854775808 +0000

    basic commit - time in seconds at 2^63
$ git show --format=raw e82b7260e85dec96d3baf4a7c41fe6ad169d68c8
commit e82b7260e85dec96d3baf4a7c41fe6ad169d68c8
tree f93e3a1a1525fb5b91020da86e44810c87a2d7bc
parent b425efbaa606f2faa7fb125902aec0e5d8964c8c
author Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> 18446744073709551616 +0000
committer Antoine R. Dumont (@ardumont) <antoine.romain.dumont@gmail.com> 18446744073709551616 +0000

    basic commit - time in seconds at 2^64

Note:
I forgot how painful reading c header could be (i have a headache now ;)

Cheers,

@codecov
Copy link

codecov bot commented Nov 1, 2017

Codecov Report

Merging #568 into master will increase coverage by 0.02%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #568      +/-   ##
==========================================
+ Coverage    90.9%   90.92%   +0.02%     
==========================================
  Files          73       73              
  Lines       18190    18213      +23     
  Branches     1945     1946       +1     
==========================================
+ Hits        16535    16560      +25     
+ Misses       1257     1256       -1     
+ Partials      398      397       -1
Impacted Files Coverage Δ
dulwich/objects.py 89.78% <100%> (+0.37%) ⬆️
dulwich/tests/test_objects.py 99.66% <100%> (+0.01%) ⬆️
dulwich/porcelain.py 73.98% <0%> (-0.11%) ⬇️
dulwich/tests/test_porcelain.py 100% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3be30ce...694d27c. Read the comment docs.

@ardumont
Copy link
Contributor Author

ardumont commented Nov 1, 2017

I've rebased to work around the won't trigger the ci build due to conflict policy.

Which was the good call since some errors occurred for some python3 versions (about bytes and join syntax).

@ardumont
Copy link
Contributor Author

ardumont commented Nov 2, 2017

Hello,

Should i squash commits now?
I have left them for now to demonstrate the iterative steps (if that could help understand).
Not sure you need it though :)

Note that i don't get the errors in the appveyor ci either...

Cheers,

@jelmer
Copy link
Owner

jelmer commented Nov 2, 2017

Looks good! It'd be great if you could squash the commits, and I'll merge.

This checks for overflow date errors in objects's check()
methods (tag, commit).  If such a situation occurs, an
ObjectFormatException is raised.

Related jelmer#567
@ardumont
Copy link
Contributor Author

ardumont commented Nov 3, 2017

Looks good! It'd be great if you could squash the commits, and I'll merge.

Sure.
Done (+ rebased to latest master)
Thanks.

Cheers,

@jelmer jelmer merged commit 694d27c into jelmer:master Nov 3, 2017
@jelmer
Copy link
Owner

jelmer commented Nov 3, 2017

Merged, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants