Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: concat of Series w/o names #10698 #10723

Merged
merged 1 commit into from
Sep 2, 2015
Merged

BUG: concat of Series w/o names #10698 #10723

merged 1 commit into from
Sep 2, 2015

Conversation

IamGianluca
Copy link

closes #10698

Let the result of 'concat' to inherit the parent Series' names. The Series' name (if present) will be used as the resulting DataFrame column name. When only one of the Series has a valid name, the resulting DataFrame will inherit the name only, and use a column name for the other columns the column index value.

columns = []
for i in range(len(data)):
columns.append(Series(data[i]).name if Series(data[i]).name is not None else i)
tmpdf.columns = columns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use a counter here - first missing should be 0

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi jreback. I see your point. I've erroneously believed we wanted to use the column index as the column name in case the parent Series didn't have a valid name. I'm going to update the pull request to include the change you suggested.

Apologies in advance if my code is not always the neatest. I'm new to software development, although I use pandas everyday for data analysis. I thought actively participating to the project could be a good way for me to learn how to code properly and become a better programmer.

@jreback jreback changed the title Issue 10698 fix BUG: concat of Series w/o names #10698 Aug 2, 2015
@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Aug 2, 2015
@jreback jreback added this to the 0.17.0 milestone Aug 2, 2015
@jreback
Copy link
Contributor

jreback commented Aug 2, 2015

here are some helpful hints FYI: http://pandas.pydata.org/pandas-docs/stable/contributing.html

@IamGianluca
Copy link
Author

Apologies I pushed a few "fixes" for some unit tests which were failing. Looking at the reason why those tests were failing I've noticed it was because of the use of the argument 'keys' in the function 'concat'. Most people are using it to pass the column names, which is something I personally don't do and therefore I didn't consider when writing the code. I'm going to reset my local branch to commit 'b9cba86ec1c11c200c44c62d0a300b7f326a12e4' and make the necessary changes to allow this case. Before sending the next pull request I'm going to make sure all nosetests pass correctly. This should also fix the build issue. Apologies again, this is due to lack of experience. I'll make sure to learn from my mistake.

@@ -1797,6 +1797,15 @@ def test_concat_dataframe_keys_bug(self):
self.assertEqual(list(result.columns), [('t1', 'value'),
('t2', 'value')])

def test_concat_series_partial_columns_names(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the reference to the GitHub issue

@IamGianluca
Copy link
Author

@jreback Thanks for spending the time to review my pull request. I'm going to start working on another bug now.

@jreback
Copy link
Contributor

jreback commented Aug 10, 2015

pls add a whatsnew note. This is just a bug fix, but I think should have a mini-example. pls have squash as well.

@IamGianluca
Copy link
Author

I think I made a mistake. I rebased my local version because was missing some of the recent updates in the upstream version and (after solving some merging conflicts) I pushed. I see there is an alert message saying "This branch has conflicts that must be resolved" on my pull request in GitHub. Apologies for that. Is there a way I can solve this?

@jreback
Copy link
Contributor

jreback commented Aug 11, 2015

you need to rebase / squash. If you have a conflict it will show up when you do this. Conflicts are normal and happen because others have changed code that you are changing. See the contributing docs here

@IamGianluca
Copy link
Author

Okay! So it should be fine right?

@max-sixty
Copy link
Contributor

Conflicts are fine, as in they're not a sign you've done anything wrong. But they do need to be resolved before merge - the docs describe how to squash & merge. Cheers

@IamGianluca
Copy link
Author

Okay guys, I think I did it. Please let me know if I did something wrong. Apologies again for my lack of experience.

@@ -152,6 +153,30 @@ Other enhancements
s.drop_duplicates(keep=False)


- ``concat`` will now inherit the existing series names (even when some are missing), if new ones are not provided through the ``keys`` argument (:issue:`10698`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say something like: will use existing Series names if provided.

@jreback
Copy link
Contributor

jreback commented Aug 15, 2015

might need a doc-example http://pandas.pydata.org/pandas-docs/stable/merging.html#more-concatenating-with-group-keys to explain how keys overrides this (and add a test as well)

@jreback
Copy link
Contributor

jreback commented Aug 18, 2015

can you update and add a doc example?

@IamGianluca
Copy link
Author

Sure! I think I messed up with my branch because lots of test are failing after I rebased last day. I'll fix this ASAP.

@IamGianluca
Copy link
Author

I see the branch has conflicts. I'm assuming in this sort of situations I need to fix those myself right?

In the effort of solving them using git mergetool I've noticed that sometimes there are pieces of code in my LOCAL branch which are not present in the REMOTE, but I haven't create them myself. What I usually do is to update my master branch:

# go to the master branch
git checkout master
# pull changes from github
git fetch upstream
# update the master branch
git rebase upstream/master
# push it to your Github repo
git push

Then I update the local branch:

# go to the feature branch
git checkout my-new-feature
# make a backup in case you mess up
git branch tmp my-new-feature
# rebase on master
git rebase master

The rebase will fail and suggest me to use mergetool to solve the conflicts manually. Now, how should I behave in situations like the one in the attached picture?

screenshot from 2015-08-19 08 42 41

In the picture above the document on the left is my local, the one in the middle is the result of the merge, and the one on the right is the remote. In this occasions I tend to pick from the remote everything I don't have in my local branch, because I assume these are coming from other submitted pull requests or code merged while I was working on my local branch. What about the code which is in my local, hasn't been written by me, and is not in the remote? Should I discard it?


.. ipython:: python

foo = pd.Series([1,2], name='foo')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just create all of these in a separate python block above previous behavior. Then no need to show them twice. (obviously show the results in the previous behavior section as a code-block then in an ipython block in new behavior).

@jreback
Copy link
Contributor

jreback commented Aug 20, 2015

need to rebase / squash. generally you won't have conflicts. other people add code and should be straightforward to accept.


.. ipython:: python

s3 = Series([0, 1, 2, 3], name='foo')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use pd. before all Series and concat calls?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are converting the docs (already partly, but maybe this one not)

@jorisvandenbossche
Copy link
Member

Seems you have a bit too much changes in the whatsnew file?

@IamGianluca
Copy link
Author

Joris, my bad, I messed up when merging. I think I've solved all the issues now.


import pandas.core.common as com

import pandas.lib as lib
import pandas.algos as algos
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra import?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback You're right. I've fixed!

@jreback
Copy link
Contributor

jreback commented Sep 1, 2015

can you update according to comments

jreback added a commit that referenced this pull request Sep 2, 2015
@jreback jreback merged commit 207efc2 into pandas-dev:master Sep 2, 2015
@jreback
Copy link
Contributor

jreback commented Sep 2, 2015

@IamGianluca awesome job! thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Loss of series name attributes in pd.concat
4 participants