@@ -95,7 +95,7 @@ constructed from the sorted keys of the dict, if possible.
9595
9696 NaN (not a number) is the standard missing data marker used in pandas.
9797
98- **From scalar value **
98+ **From scalar value **
9999
100100If ``data `` is a scalar value, an index must be
101101provided. The value will be repeated to match the length of **index **.
@@ -154,7 +154,7 @@ See also the :ref:`section on attribute access<indexing.attribute_access>`.
154154Vectorized operations and label alignment with Series
155155~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
156156
157- When working with raw NumPy arrays, looping through value-by-value is usually
157+ When working with raw NumPy arrays, looping through value-by-value is usually
158158not necessary. The same is true when working with Series in pandas.
159159Series can also be passed into most NumPy methods expecting an ndarray.
160160
@@ -324,7 +324,7 @@ From a list of dicts
324324From a dict of tuples
325325~~~~~~~~~~~~~~~~~~~~~
326326
327- You can automatically create a multi-indexed frame by passing a tuples
327+ You can automatically create a multi-indexed frame by passing a tuples
328328dictionary.
329329
330330.. ipython :: python
@@ -347,7 +347,7 @@ column name provided).
347347**Missing Data **
348348
349349Much more will be said on this topic in the :ref: `Missing data <missing_data >`
350- section. To construct a DataFrame with missing data, we use ``np.nan `` to
350+ section. To construct a DataFrame with missing data, we use ``np.nan `` to
351351represent missing values. Alternatively, you may pass a ``numpy.MaskedArray ``
352352as the data argument to the DataFrame constructor, and its masked entries will
353353be considered missing.
@@ -370,7 +370,7 @@ set to ``'index'`` in order to use the dict keys as row labels.
370370
371371``DataFrame.from_records `` takes a list of tuples or an ndarray with structured
372372dtype. It works analogously to the normal ``DataFrame `` constructor, except that
373- the resulting DataFrame index may be a specific field of the structured
373+ the resulting DataFrame index may be a specific field of the structured
374374dtype. For example:
375375
376376.. ipython :: python
@@ -506,25 +506,70 @@ to be inserted (for example, a ``Series`` or NumPy array), or a function
506506of one argument to be called on the ``DataFrame ``. A *copy * of the original
507507DataFrame is returned, with the new values inserted.
508508
509+ .. versionmodified :: 0.23.0
510+
511+ Starting with Python 3.6 the order of ``**kwargs `` is preserved. This allows
512+ for *dependent * assignment, where an expression later in ``**kwargs `` can refer
513+ to a column created earlier in the same :meth: `~DataFrame.assign `.
514+
515+ .. ipython :: python
516+
517+ dfa = pd.DataFrame({" A" : [1 , 2 , 3 ],
518+ " B" : [4 , 5 , 6 ]})
519+ dfa.assign(C = lambda x : x[' A' ] + x[' B' ],
520+ D = lambda x : x[' A' ] + x[' C' ])
521+
522+ In the second expression, ``x['C'] `` will refer to the newly created column,
523+ that's equal to ``dfa['A'] + dfa['B'] ``.
524+
525+ To write code compatible with all versions of Python, split the assignment in two.
526+
527+ .. ipython :: python
528+
529+ dependent = pd.DataFrame({" A" : [1 , 1 , 1 ]})
530+ (dependent.assign(A = lambda x : x[' A' ] + 1 )
531+ .assign(B = lambda x : x[' A' ] + 2 ))
532+
509533 .. warning ::
510534
511- Since the function signature of ``assign `` is ``**kwargs ``, a dictionary,
512- the order of the new columns in the resulting DataFrame cannot be guaranteed
513- to match the order you pass in. To make things predictable, items are inserted
514- alphabetically (by key) at the end of the DataFrame.
535+ Dependent assignment maybe subtly change the behavior of your code between
536+ Python 3.6 and older versions of Python.
537+
538+ If you wish write code that supports versions of python before and after 3.6,
539+ you'll need to take care when passing ``assign `` expressions that
540+
541+ * Updating an existing column
542+ * Refering to the newly updated column in the same ``assign ``
543+
544+ For example, we'll update column "A" and then refer to it when creating "B".
545+
546+ .. code-block :: python
547+
548+ >> > dependent = pd.DataFrame({" A" : [1 , 1 , 1 ]})
549+ >> > dependent.assign(A = lambda x : x[" A" ] + 1 ,
550+ B = lambda x : x[" A" ] + 2 )
551+
552+ For Python 3.5 and earlier the expression creating ``B `` refers to the
553+ "old" value of ``A ``, ``[1, 1, 1] ``. The output is then
554+
555+ .. code-block :: python
556+
557+ A B
558+ 0 2 3
559+ 1 2 3
560+ 2 2 3
561+
562+ For Python 3.6 and later, the expression creating ``A `` refers to the
563+ "new" value of ``A ``, ``[2, 2, 2] ``, which results in
564+
565+ .. code-block :: python
515566
516- All expressions are computed first, and then assigned. So you can't refer
517- to another column being assigned in the same call to ``assign ``. For example:
567+ A B
568+ 0 2 4
569+ 1 2 4
570+ 2 2 4
518571
519- .. ipython ::
520- :verbatim:
521572
522- In [1]: # Don't do this, bad reference to `C `
523- df.assign(C = lambda x: x['A'] + x['B'],
524- D = lambda x: x['A'] + x['C'])
525- In [2]: # Instead, break it into two assigns
526- (df.assign(C = lambda x: x['A'] + x['B'])
527- .assign(D = lambda x: x['A'] + x['C']))
528573
529574 Indexing / Selection
530575~~~~~~~~~~~~~~~~~~~~
@@ -914,7 +959,7 @@ For example, using the earlier example data, we could do:
914959 Squeezing
915960~~~~~~~~~
916961
917- Another way to change the dimensionality of an object is to ``squeeze `` a 1-len
962+ Another way to change the dimensionality of an object is to ``squeeze `` a 1-len
918963object, similar to ``wp['Item1'] ``.
919964
920965.. ipython :: python
0 commit comments