Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_csv converters "Buffer dtyle mismatch" #546

Closed
mattharrison opened this issue Dec 27, 2011 · 7 comments

Comments

@mattharrison
Copy link

commented Dec 27, 2011

Am trying to convert a string column to seconds in a dataframe. It looks like there is some sort of off by one error.

Here's my csv

"id","Time to Calculate"
1,"19:31:15"
2,"19:18:17"
3,"19:31:15"
4,"19:27:42"
5,"19:27:42"
6,"19:28:25"
7,"19:28:25"
8,"19:28:25"
9,"19:28:04"
10,"19:28:04"
11,"19:28:04"
12,"19:25:56"
13,"19:25:56"
14,"19:25:57"
15,"19:25:57"
16,"19:26:41"
17,"19:26:41"
18,"19:26:08"
19,"19:26:08"
20,"1 day, 1:04:33"
21,"1 day, 1:04:33"
22,"1 day, 1:04:33"
23,"2 days, 2:14:33"

Here's my code

from pandas import read_csv


def sec_to_calc(time_to_calc):
    print "TIMETD", time_to_calc, type(time_to_calc)
    total = 0
    days = 0
    if 'days' in time_to_calc:
        days, time_to_calc = time_to_calc.split(" days, ")
        days = int(days)
    elif 'day' in time_to_calc:
        days = 1
        _, time_to_calc = time_to_calc.split(" day, ")
    hours, min, sec = [int(x) for x in time_to_calc.split(':')]
    return days + 24 * 60 * 60 + hours *60 * 60 + min * 60 + sec


filename = '/tmp/test.csv'
df = read_csv(filename, converters={'Time to Calculate': sec_to_calc})

Here's my error:

  File "/tmp/test.py", line 19, in <module>
    df = read_csv(filename, converters={'Time to Calculate': sec_to_calc})
  File "/home/mharrison/work/pandas/env/lib/python2.6/site-packages/pandas/io/parsers.py", line 64, in read_csv
    return parser.get_chunk()
  File "/home/mharrison/work/pandas/env/lib/python2.6/site-packages/pandas/io/parsers.py", line 418, in get_chunk
    data = _convert_to_ndarrays(data, self.na_values)
  File "/home/mharrison/work/pandas/env/lib/python2.6/site-packages/pandas/io/parsers.py", line 462, in _convert_to_ndarrays
    result[c] = _convert_types(values, na_values)
  File "/home/mharrison/work/pandas/env/lib/python2.6/site-packages/pandas/io/parsers.py", line 470, in _convert_types
    lib.sanitize_objects(values, na_values)
  File "parsing.pyx", line 220, in pandas._tseries.sanitize_objects (pandas/src/tseries.c:54380)
ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'
@mattharrison

This comment has been minimized.

Copy link
Author

commented Dec 27, 2011

While I'm at it. How would I assign an additional column to a dataframe using the provided function (or the logic from it)?

@wesm

This comment has been minimized.

Copy link
Member

commented Dec 29, 2011

Fixed the bug in the above referenced commit. Thanks for reporting!

On your second question, you mean something like:

df['time_converted'] = df['Time to Calculate'].map(sec_to_calc)

?

@wesm wesm closed this Dec 29, 2011

@mattharrison

This comment has been minimized.

Copy link
Author

commented Dec 29, 2011

Thanks for the fix. I really appreciate this and am using the features I'm asking for the help do some reporting with pandas.

WRT adding a column, yes, you have mechanisms to apply some math operators to create new columns, but I want a function to create an arbitrary new column

@wesm

This comment has been minimized.

Copy link
Member

commented Dec 29, 2011

DataFrame.insert?

@mattharrison

This comment has been minimized.

Copy link
Author

commented Dec 29, 2011

I would prefer .insert_col. It's really hard for me to distinguish between row and col operations. For example, I keep feeling like when I iterate over a dataframe, I should get the rows, not the column keys.

I imagine I'm coming from a different background/use case. I've done time series stuff before for BI but not finance. So I feel like I'm trying to do SQL-like operations on dataframes. There I groupby, sort, create columns and tweak columns, but I normally iterate over rows.

@wesm

This comment has been minimized.

Copy link
Member

commented Dec 29, 2011

I see. It's very much a column-oriented data structure. You'll be much better off expressing logic as vector operations on columns rather than iterating through the rows. Sometimes it's unavoidable, though.

@mattharrison

This comment has been minimized.

Copy link
Author

commented Dec 29, 2011

Yep, it's an adjustment to think in vector ops on columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants
You can’t perform that action at this time.