Skip to content

Getting a ... in my CSV when using to_csv() #20969

@BrendanMartin

Description

@BrendanMartin

I have read in an hdf of 4 million+ rows and now I want to convert it to a sample CSV:

df_small = df[:int(1e6)]
df_small.to_csv("X.csv", sep='\t')
len(df_small)
# out: 1,000,000

The dataframe consists of a datetime index and a text column.

When I read the CSV back in, I get more rows than when I saved it:

df2 = pd.read_csv("X.csv",  
                  sep='\t',
                  engine='python', 
                  parse_dates=['datetime'],            
                  index_col='datetime'
                  infer_datetime_format=True)
len(df2)
# out: 1,000,002

And looking at my index, the datetime wasn't actually parsed, it's just dtype Object.

I used my own parser and it had an error when it hit a "..." in my datetime index, which wasn't there before.

I opened up the CSV in Excel and found a "..." in my datetime column, and I also noticed that my datetime index and first column were merged together. Not sure if that's relevant or just the way Excel reads it.

When I use read_csv the data comes in fine except for that couple of extra rows with "..." in the index. The row at that index is also just blank.

Metadata

Metadata

Assignees

No one assigned

    Labels

    IO CSVread_csv, to_csvNeeds InfoClarification about behavior needed to assess issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions