Specify format for strptime in to_datetime #2213

Closed
wesm opened this Issue Nov 9, 2012 · 6 comments

Projects

None yet

3 participants

@paulproteus

The fix for this would begin by adding a new keyword argument to the to_datetime() function in pandas/tseries/tools.py. I suggest calling the keyword argument "time_format".

You would need to:

  • Modify the docstring on that code to explain the new argument

  • Modify the code so it calls datetime.strptime() appropriately

  • Make sure your new code handles errors in the same way as the function generally does (pay attention to errors='ignore' vs. errors='raise')

  • Write a test case covering this new code.

@dundo4he

I wanted to try out the Issue2213 branch. But I got error message when I tried to compile and install the package. The error message I got is

error: install-base or install-platbase supplied, but installation scheme is incomplete

What I did is

python setup.py build_ext
python setup.py install --install-base=/tmp install-platbase=/tmp

What does this error mean? How do I solve it?

I googled this error message but could not find an answer. I also asked on the irc channel, but did not get an answer either.

@wesm
Python for Data member

I've never used those options before-- for development I would suggest building the C extensions in place and working directly from the base of the git clone:

python setup.py build_ext --inplace

@dundo4he

@wesm Yes, I used this method for building the C extensions in place. But I did not see any improvement that Emily mentioned when comparing Issue2213 branch and master branch.

Here is what I did:

I first wanted to test the performance of Issue2213 branch.

mkdir pandas_1224
cd pandas_1224
git clone git://github.com/six5532one/pandas.git
cd pandas
git checkout Issue2213
git status
####Then, I added one line (print "hello world") in pandas/tslib.pyx
python setup.py build_ext --inplace
ipython
import pandas
rng = pandas.date_range('1/1/2000', periods=20000, freq='ms')
strings = [x.strftime("%Y%m%dT%H%M%S.%f') for x in rng]
timeit pandas.to_datetime(strings) 
#### there are lots of "hello world" printed. So it is sure the Issue2213 branch
quit()

Then, I wanted to see the performance of master branch.

git commit -m "Issue2213-print" -a
git checkout master
git status
python setup.py build_ext --inplace
ipython
import pandas
rng = pandas.date_range('1/1/2000', periods=20000, freq='ms')
strings = [x.strftime("%Y%m%dT%H%M%S.%f') for x in rng]
timeit pandas.to_datetime(strings) 
#### this time there is no "hello world" printed. So it is sure the master branch
quit()

I did not see any improvement. What did I do wrong?

@wesm
Python for Data member

You did not specify the date format in the first pandas.to_datetime usage. If you don't pass the format string, it will fall back on the dateutil slower parser.

@wesm
Python for Data member

This is done in 015447a

@wesm wesm closed this Feb 17, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment