create a community supported set of typical converters for read_csv #1180

Closed
timmie opened this Issue May 2, 2012 · 6 comments

Projects

None yet

3 participants

@timmie
timmie commented May 2, 2012

Unfortauntely, nearly no input data file uses ISO format but rather random columns and formats.

Following
https://groups.google.com/forum/?fromgroups#!topic/pydata/pZjQMX_avmY

and

#854

I would suggest to insert:

https://github.com/pydata/pandas/tree/master/pandas/tseries/converters.py

Where users could contribute/share typical converters for getting their date and times parsed into a pandas object from input.

@timmie
timmie commented May 7, 2012

OK, I assume that this is not core Pandas stuff.

So we could have a separate pandas.contrib package.

There users could upload their converters along with a sample data file.

The converters may then be documented in proper docstrings.

I did this for sckits.timeseries.tsfromtxt. and it worked quite well.
Only problem is how to assign meaningful names for the converter functions.

What do you think?

@changhiskhan

I think this is a great idea. We hope to make an announcement about this once v0.8 is released and the API is stable.

In the mean time, would you be interested in taking the lead and create pandas/io/converters.py with a some docs and a few sample converters? Further feedback on the converter API/interface would be greatly appreciated.

@timmie
timmie commented May 7, 2012

Yes, sure.

But I'd rather wait until the mutli-column date time functionality is there:
#1186
#1174

@timmie
timmie commented May 8, 2012

Here an example (still for tsfromtxt):

def dc_h_0to23_cols(year, month, day, hour):
    """column separated datetime counting 0-23 

    .. csv-table:: Hourly Values: 0-23
           :header: "YYYY", "MM", "DD", "HH:MM", "value"
           :delim: ;

           2004;2;1;00:00;0
           2004;2;1;01:00;0
           [...];[...];[...];[...];[...]
           2004;2;1;22:00;0
           2004;2;1;23:00;0

    Note
    -----
    assumed datecols::

        datecols = (0,1, 2, 3)

    """
@y-p
y-p commented Jan 1, 2014

It seems to me like this wiill either stagnate or grow into a melange of tailored
solutions to the 1001 weird data problems found in the wild that most pepole won't see.

I don't think users will look for these recipes when they encounter these problems
in their own data. They'll either hack a collection of helpers to suit the data they
work with or just solve the problem with a once-off. There's no general pattern here
to grow into a coherent collection of solutions.

The idea of a pandas.contrib is interesting in itself, not clear conception of that project
yet. We'll wait for that concensus to materialize.

closing.

@y-p y-p closed this Jan 1, 2014
@timmie
timmie commented Jan 1, 2014

@y-p
I understand that you want to close this very stalled PR.
But the solution is not understood:
What if we add an example file for each converter template?
I think it could be a useful resource...

@dacoex dacoex referenced this issue in pvlib/pvlib-python Mar 12, 2015
Open

add a io module #29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment