New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-35066: _dateime.datetime.strftime copies trailing '%' #10692

Merged
merged 14 commits into from Jan 14, 2019

Conversation

Projects
None yet
8 participants
@MichaelSaah
Copy link
Contributor

MichaelSaah commented Nov 23, 2018

@the-knights-who-say-ni

This comment has been minimized.

Copy link

the-knights-who-say-ni commented Nov 23, 2018

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Our records indicate we have not received your CLA. For legal reasons we need you to sign this before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for your contribution, we look forward to reviewing it!

@MichaelSaah

This comment has been minimized.

Copy link
Contributor

MichaelSaah commented Nov 23, 2018

Summary (also posted to tracker):

Modules/_datetimemodule.c and Lib/datetime.py do not behave identically. Specifically, the strftime functions do not match when passed a format string
terminated with a '%'. The C function performs an explicit check for this condition, and raises a ValueError on it. The Py version does not perform this check. Both pass the
format string (after doing substitutions for %z, %Z, and %f tags) to the system strftime or wcfstime, depending on platform. These live within the python time module. The
time module wrapper function does not perform this check.

This situation leads to a scenario in which, for example, "%D %" passed to datetime.strftime (with the C extension included) raises a value error. The same string passed to
time.strftime returns "mm/dd/yy %", at least on OSX. Furthermore, if Python is built without the C module, "mm/dd/yy %" is returned when datetime.strftime is called.

To summarise, there are two problems: (1) datetime does not comply with PEP-399, and (2) a higher-order module raises an exception on a case that the (exposed) lower-order
module has no problem with, causing a mismatch in behavior between datetime.strftime and time.strftime.

This PR attempts to fix this problem by removing the case check from the datetime C module. This solves both (1) and (2).

There was much talk on the issue thread about there existing a test case for time.strftime that documented a platform-dependent failure on a dangling '%'. I wish to note
that my patch does not touch the time module at all, it only removes a seemingly unnecessary check in the datetime C module.

@pganssle

This comment has been minimized.

Copy link
Contributor

pganssle commented Nov 26, 2018

@MichaelSaah @tirkarthi Can either of you point to the existing test case on this? I cannot find it.

I think that this still needs a new test, because obviously the existing test case does not cover it specifically. Ideally you would want to test that this does not fail for datetime.strftime specifically. If there's a platform-dependent failure in time.strftime, then you can write the test such that the datetime test is skipped if the time test would fail, e.g.:

# Whether datetime.strftime can handle a trailing % is platform-dependent,
# so detect whether we're on one of the platforms where this fails ([bpo-35066](https://bugs.python.org/issue35066))
try:
    time.strftime('%')
    skip_trailing_percent_strftime = False
except ValueError:
    skip_trailing_percent_strftime = True

This is assuming that we don't already know the platforms where the trailing-% fails. If we know which platforms this fails on, then both the time.strftime and datetime.strftime tests should be skipped on the platform with the known failure.

@tirkarthi

This comment has been minimized.

Copy link
Contributor

tirkarthi commented Nov 26, 2018

@pganssle In the tracker I was referring to

def test_strftime_format_check(self):
which was introduced with e5b5895 and has an explicit comment about the test being platform dependent.

@pganssle

This comment has been minimized.

Copy link
Contributor

pganssle commented Nov 26, 2018

@tirkarthi Hm. That is a very unfortunately designed test. Even legitimate failures will be swallowed because it treats all ValueErrors as success, even if there is no known platform where this is a failure case.

I think for now we should try the "detect if time.strftime fails, otherwise test datetime.strftime" method, and maybe in the future we can tighten up the failure conditions for time.strftime.

@MichaelSaah

This comment has been minimized.

Copy link
Contributor

MichaelSaah commented Nov 26, 2018

@pganssle Thanks Paul. I'm not sure if the platform failure modes are known; I didn't run into documentation of them during my work. I'll work on adding a test case for datetime that is aware of time.strftime's functionality, but it would help me to know what exactly would happen if stdlib call in time.strftime failed, so that I can catch the failure cleanly. Otherwise I guess I'll be spinning up a bunch of VMs and trying to induce failure.

It looks like there's some error handling being done by _Py_BEGIN_SUPPRESS_IPH, but I'm not sure how this propagates (format_time is a macro for strftime/wcfstime depending on platform):

_Py_BEGIN_SUPPRESS_IPH
buflen = format_time(outbuf, i, fmt, &buf);
_Py_END_SUPPRESS_IPH

The macro def'ns are a bit cryptic:

cpython/Include/pyport.h

Lines 763 to 773 in 542497a

#if defined _MSC_VER && _MSC_VER >= 1900
extern _invalid_parameter_handler _Py_silent_invalid_parameter_handler;
#define _Py_BEGIN_SUPPRESS_IPH { _invalid_parameter_handler _Py_old_handler = \
_set_thread_local_invalid_parameter_handler(_Py_silent_invalid_parameter_handler);
#define _Py_END_SUPPRESS_IPH _set_thread_local_invalid_parameter_handler(_Py_old_handler); }
#else
#define _Py_BEGIN_SUPPRESS_IPH
#define _Py_END_SUPPRESS_IPH

Any help appreciated.

@pganssle

This comment has been minimized.

Copy link
Contributor

pganssle commented Nov 26, 2018

@MichaelSaah I don't think you have to worry about any of that. We know that time.strftime('%') should either raise ValueError or succeed. You can just write a test for datetime.strftime that assumes that datetime_instance.strftime('%') returns '%', and then skip that test conditional on the logic I described in my earlier comment (i.e. skip it on all platforms where time.strftime('%') raises ValueError).

@MichaelSaah

This comment has been minimized.

Copy link
Contributor

MichaelSaah commented Nov 26, 2018

@pganssle Ah ok, I was reading the comment in the test-in-question incorrectly. I see the spec now. Thanks.

added test to make sure a trailing '%' in a format string passed to d…
…atetime.strftime doesn't fail, assuming time.strftime doesn't fail
@MichaelSaah

This comment has been minimized.

Copy link
Contributor

MichaelSaah commented Nov 26, 2018

Just added the test. Will probably open another PR to try to cleanup the datetime.strftime tests, they're all over the place.

Interestingly, looks like there's been some confusion about this before:

#make sure that invalid format specifiers are handled correctly
#self.assertRaises(ValueError, t.strftime, "%e")
#self.assertRaises(ValueError, t.strftime, "%")
#self.assertRaises(ValueError, t.strftime, "%#")
#oh well, some systems just ignore those invalid ones.
#at least, exercise them to make sure that no crashes
#are generated
for f in ["%e", "%", "%#"]:
try:
t.strftime(f)
except ValueError:
pass

Show resolved Hide resolved Lib/test/datetimetester.py Outdated
Show resolved Hide resolved Lib/test/datetimetester.py Outdated
Show resolved Hide resolved Lib/test/datetimetester.py Outdated
Show resolved Hide resolved Lib/test/datetimetester.py Outdated

while ((ch = *pin++) != '\0') {
if (ch != '%') {
do {

This comment has been minimized.

@pganssle

pganssle Nov 29, 2018

Contributor

Hm... It's a bit hard to keep track of where null checking happens in this loop.

I guess the reason that trailing % was forbidden in the original version was that they wanted to check for null termination every time the pin pointer is advanced.

I think there's a change in behavior around null-checking here, because before this change, the new format string did not include the null terminator, after the change, the new format string does include the null terminator. I don't know that this is a problem, but I'm wary of it.

Maybe you can change it like this:

while((ch = *pin++) != '\0') {
    if (ch != '%' || (ch = *pin++) == '\0') {
        ptoappend = pin - 1;
        ntoappend = 1;
    }
    else if (ch == 'z') {
    }

That should keep the logic the same.

This comment has been minimized.

@pganssle

pganssle Nov 29, 2018

Contributor

As an aside, this terse C-style stuff where you have to keep track of null pointers is for the birds. It took me a long time to convince myself that there was a change in behavior, and even still I am not certain about it. It really feels weird to advance the pointer in the conditionals.

This comment has been minimized.

@MichaelSaah

MichaelSaah Nov 29, 2018

Contributor

Just looking at it, I don't think your code does what you think it does. Without a null char check at the end of the loop, you'll read the null char and advance the pointer, and then the check in the while loop condition will actually be reading the char after the null char.

Like you said though, this stuff is tricky, so I'm going to test it and report back.

Also, my first attempt at fixing this, while I think a bit uglier than the do/while solution, comes closer to your desired behavior then the current one. Here it is:

while ((ch = *pin++) != '\0')
{
if (ch != '%') {
ptoappend = pin - 1;
ntoappend = 1;
}
// else if ((ch = *pin++) == '\0') {
/* There's a lone trailing %; doesn't make sense. */
// PyErr_SetString(PyExc_ValueError, "strftime format "
// "ends with raw %");
// goto Done;
// ptoappend = pin - 2;
// ntoappend = 2;
// }
/* A % has been seen and ch is the character after it. */
else if ((ch = *pin++) == 'z') {
if (zreplacement == NULL) {
/* format utcoffset */
char buf[100];
PyObject *tzinfo = get_tzinfo_member(object);
zreplacement = PyBytes_FromStringAndSize("", 0);
if (zreplacement == NULL) goto Done;
if (tzinfo != Py_None && tzinfo != NULL) {
assert(tzinfoarg != NULL);
if (format_utcoffset(buf,
sizeof(buf),
"",
tzinfo,
tzinfoarg) < 0)
goto Done;
Py_DECREF(zreplacement);
zreplacement =
PyBytes_FromStringAndSize(buf,
strlen(buf));
if (zreplacement == NULL)
goto Done;
}
}
assert(zreplacement != NULL);
ptoappend = PyBytes_AS_STRING(zreplacement);
ntoappend = PyBytes_GET_SIZE(zreplacement);
}
else if (ch == 'Z') {
/* format tzname */
if (Zreplacement == NULL) {
Zreplacement = make_Zreplacement(object,
tzinfoarg);
if (Zreplacement == NULL)
goto Done;
}
assert(Zreplacement != NULL);
assert(PyUnicode_Check(Zreplacement));
ptoappend = PyUnicode_AsUTF8AndSize(Zreplacement,
&ntoappend);
if (ptoappend == NULL)
goto Done;
}
else if (ch == 'f') {
/* format microseconds */
if (freplacement == NULL) {
freplacement = make_freplacement(object);
if (freplacement == NULL)
goto Done;
}
assert(freplacement != NULL);
assert(PyBytes_Check(freplacement));
ptoappend = PyBytes_AS_STRING(freplacement);
ntoappend = PyBytes_GET_SIZE(freplacement);
}
else {
/* percent followed by neither z nor Z */
ptoappend = pin - 2;
ntoappend = 2;
}
/* Append the ntoappend chars starting at ptoappend to
* the new format.
*/
if (ntoappend == 0)
continue;
assert(ptoappend != NULL);
assert(ntoappend > 0);
while (usednew + ntoappend > totalnew) {
if (totalnew > (PY_SSIZE_T_MAX >> 1)) { /* overflow */
PyErr_NoMemory();
goto Done;
}
totalnew <<= 1;
if (_PyBytes_Resize(&newfmt, totalnew) < 0)
goto Done;
pnew = PyBytes_AsString(newfmt) + usednew;
}
memcpy(pnew, ptoappend, ntoappend);
pnew += ntoappend;
usednew += ntoappend;
assert(usednew <= totalnew);
printf("%s\n", PyUnicode_AsUTF8(PyObject_Repr(newfmt)));
if (ch == '\0')
break;
} /* end while() */

As far as the pointer juggling in the conditionals goes, it's a pattern I've seen before in C parsing code. I think ideally you only mutate the pointer in one place, as it quickly becomes hard to reason about.

This comment has been minimized.

@MichaelSaah

MichaelSaah Nov 29, 2018

Contributor

As I suspected, your code doesn't do what we want. Passing it "%D %" produces the format string b'%D \x00\xfb\xfb\xfb\xfb\xfb\xfb\xfb\xfb\xcb\xcb\xcb\xcb\xcb\xcb\xcb\xcb'.

I obtained this by calling printf("%s\n", PyUnicode_AsUTF8(PyObject_Repr(newfmt))); after the while loop. If you put the call in the loop itself, you can watch it happen:

b'%D\xcb\xcb\xcb'
b'%D \xcb\xcb'
b'%D \x00\xcb'
b'%D \x00\xfb'
b'%D \x00\xfb\xfb\xcb\xcb\xcb\xcb'
b'%D \x00\xfb\xfb\xfb\xcb\xcb\xcb'
b'%D \x00\xfb\xfb\xfb\xfb\xcb\xcb'
b'%D \x00\xfb\xfb\xfb\xfb\xfb\xcb'
b'%D \x00\xfb\xfb\xfb\xfb\xfb\xfb'
b'%D \x00\xfb\xfb\xfb\xfb\xfb\xfb\xfb\xcb\xcb\xcb\xcb\xcb\xcb\xcb\xcb\xcb'
b'%D \x00\xfb\xfb\xfb\xfb\xfb\xfb\xfb\xfb\xcb\xcb\xcb\xcb\xcb\xcb\xcb\xcb'

I'm not sure what's causing it to eventually break out of the loop and not spin forever; I assume it's one of the length checks that happens in the loop.

This comment has been minimized.

@pganssle

pganssle Nov 29, 2018

Contributor

@MichaelSaah Good call, sorry, I didn't actually try my version, you are right that I advance but then never check if the next character is a NULL. I think maybe this will do what I was looking for?

while((ch = *pin++) != '\0') {
    if (ch != '%' || *pin == '\0') {
        ptoappend = pin - 1;
        ntoappend = 1;
    }
    else if ((ch = *pin++) == 'z') {
    }

Edit: No, just realized why this won't work. Hm... If I'm right that the behavior is slightly different, we may need to have more than one check in here, unfortunately.

This comment has been minimized.

@MichaelSaah

MichaelSaah Nov 29, 2018

Contributor

Did you check the earlier commit I linked to? That might be a blueprint for what you want. The key is the null check and break at the end of the loop.

@tirkarthi

This comment has been minimized.

Copy link
Contributor

tirkarthi commented Nov 29, 2018

@MichaelSaah You might want to add a NEWS entry given the impact and it will also help in keeping the PR green : https://devguide.python.org/committing/#what-s-new-and-news-entries

@MichaelSaah

This comment has been minimized.

Copy link
Contributor

MichaelSaah commented Dec 3, 2018

@pganssle I think you'll like the latest change. Behavior w/ regard to null bytes now matches existing code, as far as I can tell.

@pganssle

This comment has been minimized.

Copy link
Contributor

pganssle commented Dec 3, 2018

@MichaelSaah Sorry for the days of silence.

I think the behavior of this is now fixed. I'm not sure how I feel about decrementing the pointer, though. I think the two most obviously viable solutions (barring any that dramatically refactor the code, which I have not considered):

  1. Checking for a null at the beginning and the end of the loop
  2. Checking for a null at the beginning of the loop, but decrementing the pointer (as you've done in this case) when you find a second null

The first one is nice because it doesn't go through unnecessary iterations of the loop, but the second one keeps the null checking logic as close as possible to the point where the pointer is incremented, which keeps the amount of time between reaching the end of the string and knowing we've reached it to a minimum.

I think the only other decent option would be something like this:

while ((ch = *pin++) != '\0') {
    if(ch != '%') {
        ptoappend = pin - 1;
        ntoappend = 1;
    } else if (*(pin + 1) != '\0') {
        if ((ch = *pin++) == 'z') {
            ...

I am increasingly convincing myself that number 2 (your current version) is the right way to go, so I say we go ahead and merge this if no one else objects.

@pganssle

This comment has been minimized.

Copy link
Contributor

pganssle commented Dec 3, 2018

By the way, thank you for your patience with this @MichaelSaah, particularly since I know I created some extra work for you with my ill-considered alternative approaches.

@MichaelSaah

This comment has been minimized.

Copy link
Contributor

MichaelSaah commented Dec 3, 2018

Wow we really commented past each other there. No worries about the extra work, you've been a very clear and thorough reviewer, and I'm doing this for fun while looking for a new job anyway.

I originally didn't want to decrement the pointer either, but I'm convinced it's safe, since we only do that if we've already seen a '%', which means we won't be decrementing past the beginning of the buffer.

After staring at this for hours over the past couple weeks, I'm not convinced that there's any way to make the control flow more intuitive while keeping the null-checking contained.

As far as the difference between (1) and (2), I think (2) fits more naturally with the case-checking semantics that are already there, but it's a toss-up. I don't think either is more confusing (or clearer) than the other.

@MichaelSaah

This comment has been minimized.

Copy link
Contributor

MichaelSaah commented Dec 3, 2018

As an aside, I don't see how my solution leads to unnecessary traversals of the loop. As soon as it sees a '%' followed by a '\0', it copies the '%' and breaks on the next while condition check. Do you mean that it breaks on the while condition instead of at the bottom of the loop body?

@pganssle

This comment has been minimized.

Copy link
Contributor

pganssle commented Dec 19, 2018

CC: @abalkin

I think this is ready to be merged.

@pablogsal pablogsal self-assigned this Jan 9, 2019

@vstinner vstinner merged commit 454b3d4 into python:master Jan 14, 2019

5 checks passed

Azure Pipelines PR #20181203.44 succeeded
Details
bedevere/issue-number Issue number 35066 found
Details
bedevere/news News entry found in Misc/NEWS.d
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@miss-islington

This comment has been minimized.

Copy link

miss-islington commented Jan 14, 2019

Thanks @MichaelSaah for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒🤖

@bedevere-bot

This comment has been minimized.

Copy link

bedevere-bot commented Jan 14, 2019

GH-11550 is a backport of this pull request to the 3.7 branch.

miss-islington added a commit to miss-islington/cpython that referenced this pull request Jan 14, 2019

bpo-35066: _dateime.datetime.strftime copies trailing '%' (pythonGH-1…
…0692)

Previously, calling the strftime() method on a datetime object with a
trailing '%' in the format string would result in an exception. However,
this only occured when the datetime C module was being used; the python
implementation did not match this behavior. Datetime is now PEP-399
compliant, and will not throw an exception on a trailing '%'.
(cherry picked from commit 454b3d4)

Co-authored-by: MichaelSaah <mike.saah@gmail.com>

@vstinner vstinner changed the title bpo-35066 bpo-35066: _dateime.datetime.strftime copies trailing '%' Jan 14, 2019

miss-islington added a commit that referenced this pull request Jan 14, 2019

bpo-35066: _dateime.datetime.strftime copies trailing '%' (GH-10692)
Previously, calling the strftime() method on a datetime object with a
trailing '%' in the format string would result in an exception. However,
this only occured when the datetime C module was being used; the python
implementation did not match this behavior. Datetime is now PEP-399
compliant, and will not throw an exception on a trailing '%'.
(cherry picked from commit 454b3d4)

Co-authored-by: MichaelSaah <mike.saah@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment