get_dummies docs #4444

hayd · 2013-08-02T12:45:33Z

Should put this in the online docs and elaborate / add an example to the docstring.

hayd · 2013-08-04T15:53:32Z

will include with pr for #4446

hayd · 2013-08-26T23:46:03Z

I'm tempted to keep this one open (it's the api rst, but not in reshaping). Should expand on it really, perhaps will consult wes' book (where get_dummies is mentioned).

jreback · 2013-09-28T19:28:48Z

@hayd for 0.13?

hayd · 2013-09-28T21:56:30Z

@jreback When is 0.13?

cpcloud · 2013-09-28T21:56:52Z

i'll spam the dev ML and ask

jreback · 2013-09-28T22:28:27Z

I think that the sql refactor should drive the timeline
how that coming?

hayd · 2013-09-29T06:35:40Z

@jreback If that is the main thing we're holding back 0.13 perhaps we should push sql back to 0.14... I think it would be sensible to have sql merged into master/tested for a while and better test coverage before releasing new sql anyways... atm @jtratner is hacking apart my refactor...

jreback · 2013-09-29T21:58:33Z

@hayd This all depends on you and @jtratner deciding best option. I would say:

leave existing code, introduce new code in new module as experimental
make new code default and move existing to legacy
use a fallback with new code engine='legacy', to call existing
use just new code
wait till 0.13.1

I don't think 0.13 will be ready for release candidate for say 2 weeks minimum, but could easily do 1 month or even more. I think its worthwhile to include an improved version of SQL, and am +1 on using SQLAlchemy

jtratner · 2013-09-29T22:06:11Z

@hayd can you just clarify how you expect to get an engine for SQLAlchemy? Once you get that to work, can just pass it to the class it should be fine.

At least personally, I think there are so many moving parts here that we may not get it together for 0.13 (in particular, the code to generate a connection from passed in parameters is really complicated and doesn't need to be). I propose we put this in as an experimental feature in a new module with a very small API (i.e., two methods, write_frame and read_sql and that's it), we can call it sql_experimental. Then when we eventually move the new sql code out of experimental, we can leave the aliases there.

To simplify for 0.13 ONLY, I propose that this experimental module:

Not support any driver but SQLAlchemy. (flavor = 'sqla')
Not support direct usage of the class.
Not support using tquery or uquery directly.

Then, for 0.14, we can replace the existing code in io/sql, move the legacy code to sql_legacy, and leave the legacy code in place for the tquery and uquery globals.

That way we can put something together and kick the tires. As soon as 0.13 is released, we can settle on a public API for classes and how that all should work out.

jreback · 2013-10-04T20:28:00Z

@hayd docs on this?

jreback · 2013-10-11T12:00:26Z

@hayd ping?

hayd · 2013-10-11T23:01:26Z

Opening wes' book on get_dummies page...

jreback · 2013-10-16T12:34:17Z

@hayd doc?

jreback · 2013-10-21T12:48:40Z

@hayd docs?

hayd · 2013-10-21T18:14:11Z

Sorry for needing so much pinging! I've shameless yoinked one example from wes' book and appended to reshape docs. I wanted to add a final example... but maybe I shouldn't.

Wes discusses doing manual get_dummies with delimited strings (of movie categories), by creating an empty DataFrame and filling it. I was going to suggest the following instead, but maybe this discussion is (for now) more of a cookbook example:

In [1]: s = pd.Series(['ab', 'bc', 'c', 'abc']).apply(list)

In [2]: s
Out[2]: 
0       [a, b]
1       [b, c]
2          [c]
3    [a, b, c]
dtype: object

In [3]: s.apply(lambda x: pd.get_dummies(x).sum()).fillna(0)
Out[3]: 
   a  b  c
0  1  1  0
1  0  1  1
2  0  0  1
3  1  1  1

Maybe this would even be more useful as a Series/str method (or is it already?).

jreback · 2013-10-21T18:51:07Z

there is an issue IIRC, but can't find it right now....about implemented that behavior directly
as there is a fair amount of overhead doing it via apply

jreback · 2013-10-21T18:52:14Z

here it is....put it on for 0.14....not hard to do this (and could be cythonized) ... #3695

jreback · 2013-10-21T18:53:59Z

maybe could be something ilke

s.get_dummies() ?

which could just call pd.get_dummies(split=True) or something

hayd · 2013-10-21T20:50:34Z

I forgot your neat lambda x: Series(1, x) trick! :)

I suppose kinda weird get_dummies isn't already a Series method as it is.

Maybe should take a delimiter, or are thinking split to work in the (shudders) case of a Series of lists?

jreback · 2013-10-21T22:23:36Z

thanks you sir!

This was referenced Aug 4, 2013

Get dummies #4458

Merged

Docs for lurking groupby methods #4500

Closed

hayd mentioned this issue Oct 21, 2013

DOC add get_dummies to reshaping.rst #5293

Merged

jreback closed this as completed in #5293 Oct 21, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get_dummies docs #4444

get_dummies docs #4444

hayd commented Aug 2, 2013

hayd commented Aug 4, 2013

hayd commented Aug 26, 2013

jreback commented Sep 28, 2013

hayd commented Sep 28, 2013

cpcloud commented Sep 28, 2013

jreback commented Sep 28, 2013

hayd commented Sep 29, 2013

jreback commented Sep 29, 2013

jtratner commented Sep 29, 2013

jreback commented Oct 4, 2013

jreback commented Oct 11, 2013

hayd commented Oct 11, 2013

jreback commented Oct 16, 2013

jreback commented Oct 21, 2013

hayd commented Oct 21, 2013

jreback commented Oct 21, 2013

jreback commented Oct 21, 2013

jreback commented Oct 21, 2013

hayd commented Oct 21, 2013

jreback commented Oct 21, 2013

get_dummies docs #4444

get_dummies docs #4444

Comments

hayd commented Aug 2, 2013

hayd commented Aug 4, 2013

hayd commented Aug 26, 2013

jreback commented Sep 28, 2013

hayd commented Sep 28, 2013

cpcloud commented Sep 28, 2013

jreback commented Sep 28, 2013

hayd commented Sep 29, 2013

jreback commented Sep 29, 2013

jtratner commented Sep 29, 2013

jreback commented Oct 4, 2013

jreback commented Oct 11, 2013

hayd commented Oct 11, 2013

jreback commented Oct 16, 2013

jreback commented Oct 21, 2013

hayd commented Oct 21, 2013

jreback commented Oct 21, 2013

jreback commented Oct 21, 2013

jreback commented Oct 21, 2013

hayd commented Oct 21, 2013

jreback commented Oct 21, 2013