Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: to_csv issue when multiple datetime64[ns] columns with some NaT #3437

Closed
jreback opened this issue Apr 23, 2013 · 23 comments

Comments

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2013

@y-p did we test multiple datetime64[ns] columns with NaT, I think getting confused somewhere....

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2013

it works if rows < 25000....bug! darn

@ghost

This comment has been minimized.

Copy link

commented Apr 23, 2013

Yeah, we really tested the hell out of it.... you can tell him them to use

In [11]: df.to_csv('a.csv',engine='python')

which we prepared with some foresight...

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2013

thanks.....see my PR to fix this....but I don't think we have a test case for this (e.g. > 25k rows, multiple datetime columns, some with NaT)

@ghost

This comment has been minimized.

Copy link

commented Apr 23, 2013

keep this open, I'll add a test when I get a chance.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2013

thnxs

@ghost ghost self-assigned this Apr 23, 2013

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2013

ok...user reports that it works....(the fix)

@ghost

This comment has been minimized.

Copy link

commented Apr 24, 2013

@jreback , can't repro with:

from pandas import NaT
def make_dtnat_arr(n,nnat=None):
    if nnat is None:
        nnat= int(n*0.1) # 10%
    s=list(date_range('2000',freq='5min',periods=n))
    if nnat:
        for i in np.random.randint(0,len(s),nnat):
            s[i] = NaT
        i = np.random.randint(100)
        s[-i] = NaT
        s[i] = NaT
    return s
N=30000
s1=make_dtnat_arr(N)
s2=make_dtnat_arr(N)
s3=make_dtnat_arr(N,0)
df=DataFrame(dict(a=s1,b=s2,c=s3))
df.to_csv('/tmp/1.csv')
s=DataFrame.from_csv('/tmp/1.csv')
s=s.convert_objects('coerce')
In [54]: df.shape == s.shape
Out[54]: True

have you got a repro snippet?

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2013

must e something pathological about 30000, try 35000 or >....fails

@ghost

This comment has been minimized.

Copy link

commented Apr 24, 2013

nope.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2013

your function was failing, change len(s1) -> len(s) (inside the function)

@ghost

This comment has been minimized.

Copy link

commented Apr 24, 2013

already did, i'm not missing lines on readback.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2013

are you using the fix?

In [12]: df
Out[12]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 35000 entries, 0 to 34999
Data columns (total 3 columns):
a    31657  non-null values
b    31645  non-null values
c    35000  non-null values
dtypes: datetime64[ns](3)

In [13]: s
Out[13]: 
<class 'pandas.core.frame.DataFrame'>
Int64Index: 35000 entries, 0 to 34999
Data columns (total 3 columns):
a    31657  non-null values
b    30295  non-null values
c    29999  non-null values
dtypes: datetime64[ns](3)
@ghost

This comment has been minimized.

Copy link

commented Apr 24, 2013

I don't think so.

λ gl
b544463 - (56 minutes ago) TST: add df.to_csv multiple dt cols with NaT GH3437 — y-p (HEAD, moar_more)
dfaf365 - (22 hours ago) DOC: mention numpydoc docstrings in CONTRIBUTING.md — y-p (upstream/master)
In [4]: pd.__version__
Out[4]: '0.12.0.dev-b544463'
@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2013

that's just weird then...I just recloned

@ghost

This comment has been minimized.

Copy link

commented Apr 24, 2013

wait, I misunderstood the reported issue. I thought the
len was supposed to be < 35000, but it's the non-null count
you meant.

got it.

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2013

no total count needs > 25000 (not sure why 30k works but 35k fails)....was setting the mask on the original array size and not the sliced version, numpy should really raise on this....(it just wraps the assignment I think), that's why it 'usually' works

@ghost

This comment has been minimized.

Copy link

commented Apr 24, 2013

I modified the test to generate based on chunksize, and i'm testing with a small chunksize
or the test becomes too long.

It consistentl;y fails before, passes after fix, can I merge both and close?

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2013

yes

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2013

thxs

@ghost

This comment has been minimized.

Copy link

commented Apr 24, 2013

5c3ccdb

I guess the RC always reaches less people, so an X.1 bug fix release is a good habit,
with new features PR merged only after that. live and learn.

@ghost ghost closed this Apr 24, 2013

@jreback

This comment has been minimized.

Copy link
Contributor Author

commented Apr 24, 2013

well the issue is you don't always try the rc on a production system, so don't find things.. (I know I didn't!))
found a bug when I tried it out on windows...

@ghost

This comment has been minimized.

Copy link

commented Apr 24, 2013

The RC is mostly psychological,basically getting a bunch of users to try a snapshot of master,
enticing them with the new features. But it only works up to a point, which is why we're getting
bug reports a day after final rather then within a week after the RC.

@wesm wesm unassigned ghost Oct 12, 2016

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.