New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: added a reference to DataFrame assign in concatenate section of merging #18665

Merged
merged 1 commit into from Feb 20, 2018

Conversation

Projects
None yet
4 participants
@obilodeau
Contributor

obilodeau commented Dec 6, 2017

Because of various stack overflow answers we are directed to the merging
page where a link to DataFrame.assign would be better. Adding the
reference here will make people's code better. At least it would have
for me.

  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

For tests, we'll see with CI.

@codecov

This comment has been minimized.

codecov bot commented Dec 7, 2017

Codecov Report

Merging #18665 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18665      +/-   ##
==========================================
- Coverage   91.61%    91.6%   -0.01%     
==========================================
  Files         150      150              
  Lines       48887    48864      -23     
==========================================
- Hits        44786    44763      -23     
  Misses       4101     4101
Flag Coverage Δ
#multiple 89.98% <ø> (-0.01%) ⬇️
#single 41.77% <ø> (-0.03%) ⬇️
Impacted Files Coverage Δ
pandas/core/ops.py 96.62% <0%> (-0.13%) ⬇️
pandas/core/panel.py 97.3% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 740ad9a...22c1fcf. Read the comment docs.

@@ -310,7 +310,8 @@ Concatenating with mixed ndims
You can concatenate a mix of Series and DataFrames. The
Series will be transformed to DataFrames with the column name as
the name of the Series.
the name of the Series. To concatenate only one Series it is preferable

This comment has been minimized.

@jreback

jreback Dec 7, 2017

Contributor

Can you show a mini-example inline here.

This comment has been minimized.

@obilodeau

obilodeau Dec 8, 2017

Contributor

Can you show a mini-example inline here.

You mean a full example like this one (not fit for our case of course)?

.. ipython:: python

   s1 = pd.Series(['X0', 'X1', 'X2', 'X3'], name='X')
   result = pd.concat([df1, s1], axis=1)

.. ipython:: python
   :suppress:

   @savefig merging_concat_mixed_ndim.png
   p.plot([df1, s1], result,
          labels=['df1', 's1'], vertical=False);
   plt.close('all');

If so, looking at the broader context of that area of the document: the use of pd.concat() I feel like this would confuse the user more. We are in a pd.concat() context for several examples in a row and this would introduce a single df.assign() call example in between them. An in-paragraph example would be ok for me. Otherwise, I think we should have a title and cover the use case completely separately (right now it is under the Concatenating with mixed ndims title).

In any case, I have no strong opinion, just let me know how you want it done.

This comment has been minimized.

@jreback

jreback Dec 9, 2017

Contributor

yes. you can make a small sub-section for this.

@@ -262,7 +262,7 @@ Performance Improvements
Documentation Changes
~~~~~~~~~~~~~~~~~~~~~
-
- Added a reference to `DataFrame.assign` in the concatenate section of the merging documentation

This comment has been minimized.

@jreback

jreback Dec 7, 2017

Contributor

use :func:`DataFrame.assign` ; add this issue number at the end

This comment has been minimized.

@jreback

jreback Dec 7, 2017

Contributor

you can move this to 0.21.1

This comment has been minimized.

@obilodeau

obilodeau Dec 8, 2017

Contributor

done

@@ -262,7 +262,7 @@ Performance Improvements
Documentation Changes
~~~~~~~~~~~~~~~~~~~~~
-
- Added a reference to `DataFrame.assign` in the concatenate section of the merging documentation

This comment has been minimized.

@jreback

jreback Dec 7, 2017

Contributor

you can move this to 0.21.1

@@ -310,7 +310,8 @@ Concatenating with mixed ndims
You can concatenate a mix of Series and DataFrames. The
Series will be transformed to DataFrames with the column name as
the name of the Series.
the name of the Series. To concatenate only one Series it is preferable

This comment has been minimized.

@jreback

jreback Dec 9, 2017

Contributor

yes. you can make a small sub-section for this.

@obilodeau

This comment has been minimized.

Contributor

obilodeau commented Dec 9, 2017

I made the requested changes.

Concatenating using ``assign``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can concatenate one Series to a DataFrame using the DataFrame method

This comment has been minimized.

@jreback

jreback Dec 9, 2017

Contributor

Can you add when you would do this.

This comment has been minimized.

@obilodeau

obilodeau Dec 10, 2017

Contributor

I'm not sure I understand what you want, so let me tell you why I'm sending this PR.

I learned about assign while trying to make a SettingWithCopyWarning warning go away. The proposed solution:

df.loc[:, ('derived_column')] = value * df.loc[:,('X0')]

was still generating a warning in my case. So I've read the doc about merging and stumbled upon a pd.concat solution:

df = pd.concat([df, value * df['X0']], axis=1)
df.columns.values[1] = "derived_column"

Comparing notes with a colleague she told me about assign and I found that solution more elegant:

df = df.assign(derived_column = value * df['X0'])

So that's why I want to add it to merging. Now that I think about it, should I add a reference to assign in the indexing-view-versus-copy doc also? I didn't do it because I don't understand why my original solution using .loc still triggered the warning.

Back to your comment, I guess the when is when I want to add a column to a DataFrame but I already said that, no?

This comment has been minimized.

@obilodeau

obilodeau Dec 10, 2017

Contributor

After submitting my comment, I understood why the .loc() solution is causing a warning. I was already working on a view due to prior processing. To make sure I tested with:

df = df.copy() # making sure it is a full df and not a view
df.loc[:, ('derived_column')] = value * df.loc[:,('X0')]

and it didn't produce a warning.

@jreback jreback added this to the 0.21.1 milestone Dec 9, 2017

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.21.1, 0.22.0 Dec 10, 2017

@jreback

This comment has been minimized.

Contributor

jreback commented Feb 11, 2018

can you rebase

@obilodeau

This comment has been minimized.

Contributor

obilodeau commented Feb 15, 2018

Rebase done. Updated changelog in version 0.23.0 as per milestone attributed.

Concatenating using ``assign``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can concatenate one Series to a DataFrame using the DataFrame method

This comment has been minimized.

@TomAugspurger

TomAugspurger Feb 16, 2018

Contributor

I'm a bit worried about this phrasing, and the section title.

Concatenation already has a well-defined meaning in pandas. This is similar, since df[col] = series, which is what assign is doing, will do alignment like concat, but it's different since you're only adding a single column.

I'd almost prefer a simple like this at the start or end of the concat documentation that says something like "concat combins many pandas objects into a single new one. If you want to to include a Series in a DataFrame as a new column, assign directly like ..."

This comment has been minimized.

@obilodeau

obilodeau Feb 16, 2018

Contributor

Please read my initial explanation as to why I believed this can send newcomers into the right direction quicker. My contribution was for that goal.

Now, it's been a while and I'm no longer new to this problem so I can't put myself back into that mindset anymore. All I remember is that the warning wouldn't go away and that the first solution I stumbled upon seemed too complex and assign should have been proposed, while reading that merging documentation.

If you want to to include a Series in a DataFrame as a new column, assign directly like ...

Is exactly what the section right after my contribution proposes yet it is still in the merging doc. I've put a reference to assign in front of it to redirect people to a cleaner solution when they have that specific problem.

Moving towards patch acceptance: if I change Concatenating using to Add a Series to a DataFrame using in the title and in the text body, would that satisfy your concern?

Thanks for your feedback!

This comment has been minimized.

@jreback

jreback Feb 18, 2018

Contributor

rather than making a separate section, I would be ok with making this a ::note.. I think. Its just a terminology issue, might be helpful to understand what concatenate is different from assign.

Concatenating using ``assign``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can concatenate one Series to a DataFrame using the DataFrame method

This comment has been minimized.

@jreback

jreback Feb 18, 2018

Contributor

rather than making a separate section, I would be ok with making this a ::note.. I think. Its just a terminology issue, might be helpful to understand what concatenate is different from assign.

@jreback jreback removed this from the 0.23.0 milestone Feb 18, 2018

@obilodeau

This comment has been minimized.

Contributor

obilodeau commented Feb 19, 2018

Rewritten the contribution to be a simple note pointer to the DataFrame.assign() function. I noticed that in the meantime the assign() documentation was bonified and includes a column assignment using values derived from the dataframe. This is exactly what I hoped I'd found when I was looking for it in the merging documentation.

@@ -323,6 +323,11 @@ the name of the ``Series``.
labels=['df1', 's1'], vertical=False);
plt.close('all');
.. note::
You can perform the same task using the DataFrame method

This comment has been minimized.

@jreback

jreback Feb 20, 2018

Contributor

this is not true. I would say if you want to add a single series to an existing dataframe, then you want to use assign, otherwise you want to use concat.

This comment has been minimized.

@TomAugspurger

TomAugspurger Feb 20, 2018

Contributor

It's true only in this exact case of concatenating a single series to a single dataframe. @obilodeau it'd be good to point out difference. concat is for concatenating an arbitrary number of pandas objects, while .assign is only defined on DataFrame, and only accepts 1-d argumnents. Something like

..  note::

   Since we're concatenating a Series to a DataFrame, we could have achieved the same
   output with :meth:`DataFrame.assign`. To concatenate an arbitrary number of pandas
   objects (DataFrame or Series), use ``concat``.
DOC: added a reference to DataFrame assign in concatenate section of …
…merging

Because of various stack overflow answers we are directed to the merging
page where a link to DataFrame.assign would be better. Adding the
reference here will make people's code better. At least it would have
for me.
@obilodeau

This comment has been minimized.

Contributor

obilodeau commented Feb 20, 2018

Updated text to reflect @TomAugspurger's suggestion based on @jreback's comments.

@jreback jreback added this to the 0.23.0 milestone Feb 20, 2018

@jreback jreback merged commit 3da7b1f into pandas-dev:master Feb 20, 2018

2 of 3 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jreback

This comment has been minimized.

Contributor

jreback commented Feb 20, 2018

thanks @obilodeau

harisbal pushed a commit to harisbal/pandas that referenced this pull request Feb 28, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment