Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLN: Post Series subclass from NDFrame #4324

Closed
15 of 18 tasks
jreback opened this issue Jul 22, 2013 · 25 comments
Closed
15 of 18 tasks

CLN: Post Series subclass from NDFrame #4324

jreback opened this issue Jul 22, 2013 · 25 comments
Assignees
Labels
Admin Administrative tasks related to the pandas project API Design Internals Related to non-user accessible pandas implementation
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Jul 22, 2013

After #3482 is merged, some additional methods to fixup

@cpcloud
Copy link
Member

cpcloud commented Jul 23, 2013

@jreback What do you think about a core.common.align_multiple function that essentially does what I do in pd.eval, i.e., align a collection of pandas objects, but it would be exposed as an API method? Would that be useful? I could open a PR for that and maybe a new label "internals" for things involving pandas internals...

This would happen after the eval PR is merged.

@jreback
Copy link
Contributor Author

jreback commented Jul 23, 2013

in #3482 I basically just moved align to generic, and all it really does is make calls out to _align_series and _align_frame (FYI, there is not _align_panel). This is also needed in core/indexing.py mainly by the setitem methods.

So would +1 be in favor of that. maybe create a new module though, pandas.core.align ? and has the 2 cases, get (the align method and set (the setitem in the indexing stuff)

@jreback
Copy link
Contributor Author

jreback commented Jul 23, 2013

is refactor label differen tthan internals?

@cpcloud
Copy link
Member

cpcloud commented Jul 23, 2013

internals could be for things/ideas like discussions around the BlockManager like the one that @wesm raised in your series ndframe PR. refactor i think of as like "clean this up because too much technical debt has been accumulated here"

@cpcloud
Copy link
Member

cpcloud commented Jul 23, 2013

While my alignment does "work" for nd objects it's not clear that there's a "right" way to do it. E.g., what axis should a Series/DataFrame align on with a Panel?

@cpcloud
Copy link
Member

cpcloud commented Jul 23, 2013

i'll open a new issue for this to prevent too much OT stuff here

@jtratner
Copy link
Contributor

@jreback - I have most of the arithmetic refactor done - but I'd like to discuss a few decisions. Does it make sense for me to open it as a separate PR right now (based off your branch) so (hopefully) it can make it in soon after your big Series change?

@jreback
Copy link
Contributor Author

jreback commented Aug 13, 2013

that's totally fine
I keep rebasing to current master so as long as you want to rebase off of mine ok by me

@jtratner
Copy link
Contributor

Yeah, I just tag your current master when I rebase, and then cherry-pick, which makes it pretty painless, basically:

git checkout jreback/series2
git cherry-pick series2_base..arithmetic-refactor
git branch -D arithmetic-refactor
git checkout -b arithmetic-refactor
git tag -d series2_base
git tag series2_base jreback/series2

@jreback
Copy link
Contributor Author

jreback commented Aug 13, 2013

git fu master!

@cpcloud
Copy link
Member

cpcloud commented Aug 13, 2013

@jtratner nice!

@cpcloud
Copy link
Member

cpcloud commented Aug 14, 2013

let me see if i follow that git....

  1. cherry pick commits from arithmetic-refactor back to but not including series2_base on to the series2 branch
  2. make ur branch again
  3. replace your tag with the new merge base

so essentially u r creating a new tag with ur changes every time @jreback rebases

is that correct?

@jtratner
Copy link
Contributor

@cpcloud well, you have to create the tag before @jreback rebases (so it's the base for your current work), but yes, each time I basically move the base commit tag to be the last commit of series2. i.e.

git checkout some-branch-that-might-rebase
git tag base_commit some-branch-that-might-rebase
git checkout -b  my-branch

and then after some-branch-that-might-rebase rebases, you just do:

git fetch # or merge or whatever
git checkout some-branch-that-might-rebase
git cherry-pick base_commit..my-branch
# now, update the tag to be the new base commit
git tag -d base_commit
git tag base_commit some-branch-that-might-rebase

and then you can do whatever else you want

@jtratner
Copy link
Contributor

but yes, that's basically it. I like it because you don't have to remember any hashes.

@cpcloud
Copy link
Member

cpcloud commented Aug 14, 2013

just to play devils advocate how is this different from just git rebase jreback/td? or in your case

git fetch jreback
git checkout arithmethic-refactor
git rebase jreback/td

@jtratner
Copy link
Contributor

@cpcloud what if upstream combines multiple previous commits? I only want to apply my commits on top of wherever it ended up.

@cpcloud
Copy link
Member

cpcloud commented Aug 14, 2013

ah i see, so yours takes into account possible rewritten history and you don't really care if that's happened....that is pretty useful....care to add to git tips when u get a chance?

@jreback
Copy link
Contributor Author

jreback commented Aug 14, 2013

what if the commit that series2_base is deleted, eg rebased away upstream?

@jtratner
Copy link
Contributor

@jreback you'll always have it locally, because that's in the history of the local branch you're working on. the cherry-pick just runs off of the refs you say. and git cherry-pick a..b does not include a, but includes b. (you can use git cherry-pick a^..b to include a as well)

@jtratner
Copy link
Contributor

@cpcloud I added it to the git tips with some additional explanation and a hopefully strongly worded note so newbies to git don't read it and think they need to do that off master. (which would work, but add extra complexity for them)

@TomAugspurger
Copy link
Contributor

I'm not sure if this is the right place to ask, but is there a reason for the Panel constructor not taking a dict of Series?

In [26]: a = pd.Series(np.random.randn(4))

In [27]: b = pd.Series(np.random.randn(4), index=np.arange(1, 5))

In [28]: pd.Panel({'a': a, 'b': b})

will results in

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-28-bb125e2a1174> in <module>()
----> 1 pd.Panel({'a': a, 'b': b})

/Users/tom/python2.7/lib/python2.7/site-packages/pandas-0.12.0_169_gb2ad044-py2.7-macosx-10.8-x86_64.egg/pandas/core/panel.pyc in __init__(self, data, items, major_axis, minor_axis, copy, dtype)
    234         self._init_data(
    235             data=data, items=items, major_axis=major_axis, minor_axis=minor_axis,
--> 236             copy=copy, dtype=dtype)
    237 
    238     def _init_data(self, data, copy, dtype, **kwargs):

/Users/tom/python2.7/lib/python2.7/site-packages/pandas-0.12.0_169_gb2ad044-py2.7-macosx-10.8-x86_64.egg/pandas/core/panel.pyc in _init_data(self, data, copy, dtype, **kwargs)
    252             mgr = data
    253         elif isinstance(data, dict):
--> 254             mgr = self._init_dict(data, passed_axes, dtype=dtype)
    255             copy = False
    256             dtype = None

/Users/tom/python2.7/lib/python2.7/site-packages/pandas-0.12.0_169_gb2ad044-py2.7-macosx-10.8-x86_64.egg/pandas/core/panel.pyc in _init_dict(self, data, axes, dtype)
    293         # extract axis for remaining axes & create the slicemap
    294         raxes = [self._extract_axis(self, data, axis=i)
--> 295                  if a is None else a for i, a in enumerate(axes)]
    296         raxes_sm = self._extract_axes_for_slice(self, raxes)
    297 

/Users/tom/python2.7/lib/python2.7/site-packages/pandas-0.12.0_169_gb2ad044-py2.7-macosx-10.8-x86_64.egg/pandas/core/panel.pyc in _extract_axis(self, data, axis, intersect)
   1560             elif v is not None:
   1561                 have_raw_arrays = True
-> 1562                 raw_lengths.append(v.shape[axis])
   1563 
   1564         if have_frames:

IndexError: tuple index out of range

Obviously I can just recast a and b as DataFrames before passing them off to the Panel, so it's not a big deal; I was just curious.

@jreback
Copy link
Contributor Author

jreback commented Aug 15, 2013

@TomAugspurger its not obvious what the axes should be here (you only specified 2, but you need 3). Maybe need a better error message on that.

@jreback
Copy link
Contributor Author

jreback commented Aug 15, 2013

@TomAugspurger already had one, see #4185, would welcome a PR on this!

@TomAugspurger
Copy link
Contributor

Thanks. I see what you're saying about the axes.

@jreback
Copy link
Contributor Author

jreback commented Sep 29, 2013

moving unchecked to #5044

@jreback jreback closed this as completed Sep 29, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Admin Administrative tasks related to the pandas project API Design Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

No branches or pull requests

4 participants