Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_dummies docs #4444

Closed
hayd opened this issue Aug 2, 2013 · 20 comments · Fixed by #5293
Closed

get_dummies docs #4444

hayd opened this issue Aug 2, 2013 · 20 comments · Fixed by #5293
Labels
Milestone

Comments

@hayd
Copy link
Contributor

hayd commented Aug 2, 2013

Should put this in the online docs and elaborate / add an example to the docstring.

@hayd
Copy link
Contributor Author

hayd commented Aug 4, 2013

will include with pr for #4446

This was referenced Aug 4, 2013
@hayd
Copy link
Contributor Author

hayd commented Aug 26, 2013

I'm tempted to keep this one open (it's the api rst, but not in reshaping). Should expand on it really, perhaps will consult wes' book (where get_dummies is mentioned).

@jreback
Copy link
Contributor

jreback commented Sep 28, 2013

@hayd for 0.13?

@hayd
Copy link
Contributor Author

hayd commented Sep 28, 2013

@jreback When is 0.13?

@cpcloud
Copy link
Member

cpcloud commented Sep 28, 2013

i'll spam the dev ML and ask

@jreback
Copy link
Contributor

jreback commented Sep 28, 2013

I think that the sql refactor should drive the timeline
how that coming?

@hayd
Copy link
Contributor Author

hayd commented Sep 29, 2013

@jreback If that is the main thing we're holding back 0.13 perhaps we should push sql back to 0.14... I think it would be sensible to have sql merged into master/tested for a while and better test coverage before releasing new sql anyways... atm @jtratner is hacking apart my refactor...

@jreback
Copy link
Contributor

jreback commented Sep 29, 2013

@hayd This all depends on you and @jtratner deciding best option. I would say:

  • leave existing code, introduce new code in new module as experimental
  • make new code default and move existing to legacy
  • use a fallback with new code engine='legacy', to call existing
  • use just new code
  • wait till 0.13.1

I don't think 0.13 will be ready for release candidate for say 2 weeks minimum, but could easily do 1 month or even more. I think its worthwhile to include an improved version of SQL, and am +1 on using SQLAlchemy

@jtratner
Copy link
Contributor

@hayd can you just clarify how you expect to get an engine for SQLAlchemy? Once you get that to work, can just pass it to the class it should be fine.

At least personally, I think there are so many moving parts here that we may not get it together for 0.13 (in particular, the code to generate a connection from passed in parameters is really complicated and doesn't need to be). I propose we put this in as an experimental feature in a new module with a very small API (i.e., two methods, write_frame and read_sql and that's it), we can call it sql_experimental. Then when we eventually move the new sql code out of experimental, we can leave the aliases there.

To simplify for 0.13 ONLY, I propose that this experimental module:

  1. Not support any driver but SQLAlchemy. (flavor = 'sqla')
  2. Not support direct usage of the class.
  3. Not support using tquery or uquery directly.

Then, for 0.14, we can replace the existing code in io/sql, move the legacy code to sql_legacy, and leave the legacy code in place for the tquery and uquery globals.

That way we can put something together and kick the tires. As soon as 0.13 is released, we can settle on a public API for classes and how that all should work out.

@jreback
Copy link
Contributor

jreback commented Oct 4, 2013

@hayd docs on this?

@jreback
Copy link
Contributor

jreback commented Oct 11, 2013

@hayd ping?

@hayd
Copy link
Contributor Author

hayd commented Oct 11, 2013

Opening wes' book on get_dummies page...

@jreback
Copy link
Contributor

jreback commented Oct 16, 2013

@hayd doc?

@jreback
Copy link
Contributor

jreback commented Oct 21, 2013

@hayd docs?

@hayd
Copy link
Contributor Author

hayd commented Oct 21, 2013

Sorry for needing so much pinging! I've shameless yoinked one example from wes' book and appended to reshape docs. I wanted to add a final example... but maybe I shouldn't.

Wes discusses doing manual get_dummies with delimited strings (of movie categories), by creating an empty DataFrame and filling it. I was going to suggest the following instead, but maybe this discussion is (for now) more of a cookbook example:

In [1]: s = pd.Series(['ab', 'bc', 'c', 'abc']).apply(list)

In [2]: s
Out[2]: 
0       [a, b]
1       [b, c]
2          [c]
3    [a, b, c]
dtype: object

In [3]: s.apply(lambda x: pd.get_dummies(x).sum()).fillna(0)
Out[3]: 
   a  b  c
0  1  1  0
1  0  1  1
2  0  0  1
3  1  1  1

Maybe this would even be more useful as a Series/str method (or is it already?).

@jreback
Copy link
Contributor

jreback commented Oct 21, 2013

there is an issue IIRC, but can't find it right now....about implemented that behavior directly
as there is a fair amount of overhead doing it via apply

@jreback
Copy link
Contributor

jreback commented Oct 21, 2013

here it is....put it on for 0.14....not hard to do this (and could be cythonized) ... #3695

@jreback
Copy link
Contributor

jreback commented Oct 21, 2013

maybe could be something ilke

s.get_dummies() ?

which could just call pd.get_dummies(split=True) or something

@hayd
Copy link
Contributor Author

hayd commented Oct 21, 2013

I forgot your neat lambda x: Series(1, x) trick! :)

I suppose kinda weird get_dummies isn't already a Series method as it is.

Maybe should take a delimiter, or are thinking split to work in the (shudders) case of a Series of lists?

@jreback
Copy link
Contributor

jreback commented Oct 21, 2013

thanks you sir!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants