<img src="../../../images/banners/pandas-cropped.jpeg" width="600"/>

<a class="anchor" id="essential_basic_functionality"></a>
# <img src="../../../images/logos/pandas.png" width="23"/>  Function Application

## <img src="../../../images/logos/toc.png" width="20"/> Table of Contents 

* [Tablewise function application](#tablewise_function_application)
* [Row or column-wise function application](#row_or_column-wise_function_application)
* [Aggregation API](#aggregation_api)
    * [Aggregating with multiple functions](#aggregating_with_multiple_functions)
    * [Aggregating with a dict](#aggregating_with_a_dict)
    * [Mixed dtypes](#mixed_dtypes)
    * [Custom describe](#custom_describe)
* [Transform API](#transform_api)
    * [Transform with multiple functions](#transform_with_multiple_functions)
    * [Transforming with a dict](#transforming_with_a_dict)
* [Applying elementwise functions](#applying_elementwise_functions)
---

In [1]:
import pandas as pd
import numpy as np

To apply your own or another library’s functions to pandas objects,
you should be aware of the three methods below. The appropriate
method to use depends on whether your function expects to operate
on an entire `DataFrame` or `Series`, row- or column-wise, or elementwise.

- Tablewise Function Application: `pipe()`
- Row or Column-wise Function Application: `apply()`
- Aggregation API: `agg()` and `transform()`
- Applying Elementwise Functions: `applymap()`

<a class="anchor" id="tablewise_function_application"></a>
## Tablewise function application

`DataFrames` and `Series` can be passed into functions.
However, if the function needs to be called in a chain, consider using the [`pipe()`](../reference/api/pandas.DataFrame.pipe.html#pandas.DataFrame.pipe "pandas.DataFrame.pipe") method.

First some setup:

In [2]:
def extract_city_name(df):
    """
 Chicago, IL -> Chicago for city_name column
 """
    df["city_name"] = df["city_and_code"].str.split(",").str.get(0)
    return df

In [3]:
def add_country_name(df, country_name=None):
    """
 Chicago -> Chicago-US for city_name column
 """
    col = "city_name"
    df["city_and_country"] = df[col] + ' - ' + country_name
    return df

In [4]:
df_p = pd.DataFrame({"city_and_code": ["Chicago, IL"]})

In [5]:
df_p

Unnamed: 0,city_and_code
0,"Chicago, IL"


`extract_city_name` and `add_country_name` are functions taking and returning `DataFrames`.

Now compare the following:

In [6]:
add_country_name(extract_city_name(df_p), country_name="US")

Unnamed: 0,city_and_code,city_name,city_and_country
0,"Chicago, IL",Chicago,Chicago - US


Is equivalent to:

In [7]:
df_p.pipe(extract_city_name).pipe(add_country_name, country_name="US")

Unnamed: 0,city_and_code,city_name,city_and_country
0,"Chicago, IL",Chicago,Chicago - US


pandas encourages the second style, which is known as method chaining.
`pipe` makes it easy to use your own or another library’s functions
in method chains, alongside pandas’ methods.

In the example above, the functions `extract_city_name` and `add_country_name` each expected a `DataFrame` as the first positional argument.
What if the function you wish to apply takes its data as, say, the second argument?
In this case, provide `pipe` with a tuple of `(callable, data_keyword)`.
`.pipe` will route the `DataFrame` to the argument specified in the tuple.

The pipe method is inspired by unix pipes and more recently [dplyr](https://github.com/tidyverse/dplyr) and [magrittr](https://github.com/tidyverse/magrittr), which
have introduced the popular `(%>%)` (read pipe) operator for [R](https://www.r-project.org).
The implementation of `pipe` here is quite clean and feels right at home in Python.
We encourage you to view the source code of [`pipe()`](../reference/api/pandas.DataFrame.pipe.html#pandas.DataFrame.pipe "pandas.DataFrame.pipe").

<a class="anchor" id="row_or_column-wise_function_application"></a>
## Row or column-wise function application

Arbitrary functions can be applied along the axes of a DataFrame
using the [`apply()`](../reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply") method, which, like the descriptive
statistics methods, takes an optional `axis` argument:

In [8]:
df = pd.DataFrame(
    {
        "one": pd.Series(np.random.randn(3), index=["a", "b", "c"]),
        "two": pd.Series(np.random.randn(4), index=["a", "b", "c", "d"]),
        "three": pd.Series(np.random.randn(3), index=["b", "c", "d"]),
    }
)

In [9]:
df

Unnamed: 0,one,two,three
a,-0.493386,0.478888,
b,-0.096872,0.025342,0.921374
c,0.731032,-0.775681,0.399647
d,,-0.003504,0.55923


In [10]:
df.apply(np.mean)

one      0.046925
two     -0.068739
three    0.626751
dtype: float64

In [11]:
df.apply(np.mean, axis=1)

a   -0.007249
b    0.283281
c    0.118333
d    0.277863
dtype: float64

In [12]:
df.apply(lambda x: x.max() - x.min())

one      1.224418
two      1.254569
three    0.521727
dtype: float64

In [13]:
df.apply(np.cumsum)

Unnamed: 0,one,two,three
a,-0.493386,0.478888,
b,-0.590258,0.50423,0.921374
c,0.140774,-0.271452,1.321022
d,,-0.274956,1.880252


In [14]:
df.apply(np.exp)

Unnamed: 0,one,two,three
a,0.610555,1.614278,
b,0.907672,1.025666,2.512741
c,2.077224,0.46039,1.491299
d,,0.996502,1.749325


The [`apply()`](../reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply") method will also dispatch on a string method name.

In [15]:
df.apply("mean")

one      0.046925
two     -0.068739
three    0.626751
dtype: float64

In [16]:
df.apply("mean", axis=1)

a   -0.007249
b    0.283281
c    0.118333
d    0.277863
dtype: float64

The return type of the function passed to [`apply()`](../reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply") affects the
type of the final output from `DataFrame.apply` for the default behaviour:

- If the applied function returns a Series, the final output is a DataFrame. The columns match the index of the Series returned by the applied function.
- If the applied function returns any other type, the final output is a Series.

This default behaviour can be overridden using the `result_type`, which
accepts three options: `reduce`, `broadcast`, and `expand`.
These will determine how list-likes return values expand (or not) to a `DataFrame`.

[`apply()`](../reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply") combined with some cleverness can be used to answer many questions
about a data set. For example, suppose we wanted to extract the date where the
maximum value for each column occurred:

In [17]:
tsdf = pd.DataFrame(
    np.random.randn(1000, 3),
    columns=["A", "B", "C"],
    index=pd.date_range("1/1/2000", periods=1000),
)

In [18]:
tsdf

Unnamed: 0,A,B,C
2000-01-01,1.171402,0.061935,-1.178171
2000-01-02,-0.160439,0.235054,0.774846
2000-01-03,-0.651207,1.354219,-1.026847
2000-01-04,2.611678,-0.535306,-1.134581
2000-01-05,-0.087390,1.231878,-0.568446
...,...,...,...
2002-09-22,-0.238245,-0.433048,-0.643904
2002-09-23,0.556219,-0.949737,0.906274
2002-09-24,0.142258,1.255590,0.831502
2002-09-25,0.714660,1.318060,-0.019473


In [19]:
tsdf.apply(lambda x: x.idxmax())

A   2000-11-13
B   2001-01-19
C   2002-07-23
dtype: datetime64[ns]

You may also pass additional arguments and keyword arguments to the [`apply()`](../reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply")
method. For instance, consider the following function you would like to apply:

In [20]:
def subtract_and_divide(x, sub, divide=1):
    return (x - sub) / divide

You may then apply this function as follows:

In [21]:
df.apply(subtract_and_divide, args=(5, 3))

Unnamed: 0,one,two,three
a,-1.831129,-1.507037,
b,-1.698957,-1.658219,-1.359542
c,-1.422989,-1.925227,-1.533451
d,,-1.667835,-1.480257


In [22]:
df.apply(subtract_and_divide, args=(5,), divide=3)

Unnamed: 0,one,two,three
a,-1.831129,-1.507037,
b,-1.698957,-1.658219,-1.359542
c,-1.422989,-1.925227,-1.533451
d,,-1.667835,-1.480257


In [23]:
df.apply(subtract_and_divide, sub=5, divide=3)

Unnamed: 0,one,two,three
a,-1.831129,-1.507037,
b,-1.698957,-1.658219,-1.359542
c,-1.422989,-1.925227,-1.533451
d,,-1.667835,-1.480257


Another useful feature is the ability to pass Series methods to carry out some
Series operation on each column or row:

In [24]:
s = pd.Series([0, 2, np.nan, 8])
s.interpolate(method='polynomial', order=2)

0    0.000000
1    2.000000
2    4.666667
3    8.000000
dtype: float64

Let's apply interpolation on tsdf:

In [25]:
# First let's add some null values
tsdf.iloc[3] = np.nan

In [26]:
tsdf.head()

Unnamed: 0,A,B,C
2000-01-01,1.171402,0.061935,-1.178171
2000-01-02,-0.160439,0.235054,0.774846
2000-01-03,-0.651207,1.354219,-1.026847
2000-01-04,,,
2000-01-05,-0.08739,1.231878,-0.568446


In [27]:
tsdf.apply(pd.Series.interpolate, method='linear').head()

Unnamed: 0,A,B,C
2000-01-01,1.171402,0.061935,-1.178171
2000-01-02,-0.160439,0.235054,0.774846
2000-01-03,-0.651207,1.354219,-1.026847
2000-01-04,-0.369299,1.293049,-0.797647
2000-01-05,-0.08739,1.231878,-0.568446


Finally, `apply()` takes an argument `raw` which is False by default, which
converts each row or column into a Series before applying the function. When
set to True, the passed function will instead receive an ndarray object, which
has positive performance implications if you do not need the indexing
functionality.

<a class="anchor" id="aggregation_api"></a>
## Aggregation API

The aggregation API allows one to express possibly multiple aggregation operations in a single concise way.
This API is similar across pandas objects, see [groupby API](https://pandas.pydata.org/docs/user_guide/groupby.html#groupby-aggregate), the
[window API](https://pandas.pydata.org/docs/user_guide/window.html#window-overview), and the [resample API](https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-aggregate).
The entry point for aggregation is [`DataFrame.aggregate()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.aggregate.html#pandas.DataFrame.aggregate), or the alias
[`DataFrame.agg()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.agg.html#pandas.DataFrame.agg).

We will use a similar starting frame from above:

In [28]:
tsdf = pd.DataFrame(
    np.random.randn(10, 3),
    columns=["A", "B", "C"],
    index=pd.date_range("1/1/2000", periods=10),
)

In [29]:
tsdf.iloc[3:7] = np.nan

In [30]:
tsdf

Unnamed: 0,A,B,C
2000-01-01,0.977136,0.220022,3.066667
2000-01-02,0.583268,2.382862,-0.659028
2000-01-03,1.446675,-1.873114,1.203079
2000-01-04,,,
2000-01-05,,,
2000-01-06,,,
2000-01-07,,,
2000-01-08,1.428441,-1.159168,0.552958
2000-01-09,-0.708804,0.678452,-1.551369
2000-01-10,0.324199,1.029885,0.159081


Using a single function is equivalent to [`apply()`](../reference/api/pandas.DataFrame.apply.html#pandas.DataFrame.apply "pandas.DataFrame.apply"). You can also
pass named methods as strings. These will return a `Series` of the aggregated
output:

In [31]:
tsdf.agg(np.sum)

A    4.050914
B    1.278939
C    2.771387
dtype: float64

In [32]:
tsdf.agg("sum")

A    4.050914
B    1.278939
C    2.771387
dtype: float64

In [33]:
tsdf.sum()

A    4.050914
B    1.278939
C    2.771387
dtype: float64

Single aggregations on a `Series` this will return a scalar value:

In [34]:
tsdf["A"].agg("sum")

4.050913785394702

<a class="anchor" id="aggregating_with_multiple_functions"></a>
### Aggregating with multiple functions

You can pass multiple aggregation arguments as a list.
The results of each of the passed functions will be a row in the resulting `DataFrame`.
These are naturally named from the aggregation function.

In [35]:
tsdf.agg(["sum"])

Unnamed: 0,A,B,C
sum,4.050914,1.278939,2.771387


Multiple functions yield multiple rows:

In [36]:
tsdf.agg(["sum", "mean"])

Unnamed: 0,A,B,C
sum,4.050914,1.278939,2.771387
mean,0.675152,0.213156,0.461898


On a `Series`, multiple functions return a `Series`, indexed by the function names:

In [37]:
tsdf["A"].agg(["sum", "mean"])

sum     4.050914
mean    0.675152
Name: A, dtype: float64

Passing a `lambda` function will yield a `<lambda>` named row:

In [38]:
tsdf["A"].agg(["sum", lambda x: x.mean()])

sum         4.050914
<lambda>    0.675152
Name: A, dtype: float64

Passing a named function will yield that name for the row:

In [39]:
def mymean(x):
    return x.mean()

tsdf["A"].agg(["sum", mymean])

sum       4.050914
mymean    0.675152
Name: A, dtype: float64

<a class="anchor" id="aggregating_with_a_dict"></a>
### Aggregating with a dict

Passing a dictionary of column names to a scalar or a list of scalars, to `DataFrame.agg`
allows you to customize which functions are applied to which columns. Note that the results
are not in any particular order, you can use an `OrderedDict` instead to guarantee ordering.

In [40]:
tsdf.agg({"A": "mean", "B": "sum"})

A    0.675152
B    1.278939
dtype: float64

Passing a list-like will generate a `DataFrame` output. You will get a matrix-like output
of all of the aggregators. The output will consist of all unique functions. Those that are
not noted for a particular column will be `NaN`:

In [41]:
tsdf.agg({"A": ["mean", "min"], "B": "sum"})

Unnamed: 0,A,B
mean,0.675152,
min,-0.708804,
sum,,1.278939


<a class="anchor" id="mixed_dtypes"></a>
### Mixed dtypes

Deprecated since version 1.4.0: Attempting to determine which columns cannot be aggregated and silently dropping them from the results is deprecated and will be removed in a future version. If any porition of the columns or operations provided fail, the call to `.agg` will raise.

When presented with mixed dtypes that cannot aggregate, `.agg` will only take the valid
aggregations. This is similar to how `.groupby.agg` works.

In [42]:
mdf = pd.DataFrame(
    {
        "A": [1, 2, 3],
        "B": [1.0, 2.0, 3.0],
        "C": ["foo", "bar", "baz"],
        "D": pd.date_range("20130101", periods=3),
    }
)

In [43]:
mdf.dtypes

A             int64
B           float64
C            object
D    datetime64[ns]
dtype: object

In [44]:
mdf.agg(["min", "sum"])

  mdf.agg(["min", "sum"])


Unnamed: 0,A,B,C,D
min,1,1.0,bar,2013-01-01
sum,6,6.0,foobarbaz,NaT


In [45]:
mdf.drop('D', axis='columns').agg(["min", "sum"])

Unnamed: 0,A,B,C
min,1,1.0,bar
sum,6,6.0,foobarbaz


<a class="anchor" id="custom_describe"></a>
### Custom describe

With `.agg()` it is possible to easily create a custom describe function, similar
to the built in [describe function](#basics-describe).

In [46]:
from functools import partial
q_25 = partial(pd.Series.quantile, q=0.25)
q_25.__name__ = "25%"
q_75 = partial(pd.Series.quantile, q=0.75)
q_75.__name__ = "75%"

In [47]:
tsdf.agg(["count", "mean", "std", "min", q_25, "median", q_75, "max"])

Unnamed: 0,A,B,C
count,6.0,6.0,6.0
mean,0.675152,0.213156,0.461898
std,0.812506,1.537987,1.596489
min,-0.708804,-1.873114,-1.551369
25%,0.388966,-0.814371,-0.454501
median,0.780202,0.449237,0.35602
75%,1.315614,0.942027,1.040548
max,1.446675,2.382862,3.066667


<a class="anchor" id="transform_api"></a>
## Transform API

The [`transform()`](../reference/api/pandas.DataFrame.transform.html#pandas.DataFrame.transform "pandas.DataFrame.transform") method returns an object that is indexed the same (same size)
as the original. This API allows you to provide *multiple* operations at the same
time rather than one-by-one. Its API is quite similar to the `.agg` API.

We create a frame similar to the one used in the above sections.

In [48]:
tsdf = pd.DataFrame(
    np.random.randn(10, 3),
    columns=["A", "B", "C"],
    index=pd.date_range("1/1/2000", periods=10),
)

tsdf.iloc[3:7] = np.nan
tsdf

Unnamed: 0,A,B,C
2000-01-01,-0.135252,0.174021,-1.235235
2000-01-02,0.780093,-1.399771,-0.66745
2000-01-03,0.088788,0.842491,0.156672
2000-01-04,,,
2000-01-05,,,
2000-01-06,,,
2000-01-07,,,
2000-01-08,0.601409,-1.105925,-0.211198
2000-01-09,-0.781389,-0.520562,-0.396363
2000-01-10,-1.046941,0.857704,-0.510821


Transform the entire frame. `.transform()` allows input functions as: a NumPy function, a string
function name or a user defined function.

In [49]:
tsdf.transform(np.abs)

Unnamed: 0,A,B,C
2000-01-01,0.135252,0.174021,1.235235
2000-01-02,0.780093,1.399771,0.66745
2000-01-03,0.088788,0.842491,0.156672
2000-01-04,,,
2000-01-05,,,
2000-01-06,,,
2000-01-07,,,
2000-01-08,0.601409,1.105925,0.211198
2000-01-09,0.781389,0.520562,0.396363
2000-01-10,1.046941,0.857704,0.510821


In [50]:
tsdf.transform("abs")

Unnamed: 0,A,B,C
2000-01-01,0.135252,0.174021,1.235235
2000-01-02,0.780093,1.399771,0.66745
2000-01-03,0.088788,0.842491,0.156672
2000-01-04,,,
2000-01-05,,,
2000-01-06,,,
2000-01-07,,,
2000-01-08,0.601409,1.105925,0.211198
2000-01-09,0.781389,0.520562,0.396363
2000-01-10,1.046941,0.857704,0.510821


In [51]:
tsdf.transform(lambda x: x.abs())

Unnamed: 0,A,B,C
2000-01-01,0.135252,0.174021,1.235235
2000-01-02,0.780093,1.399771,0.66745
2000-01-03,0.088788,0.842491,0.156672
2000-01-04,,,
2000-01-05,,,
2000-01-06,,,
2000-01-07,,,
2000-01-08,0.601409,1.105925,0.211198
2000-01-09,0.781389,0.520562,0.396363
2000-01-10,1.046941,0.857704,0.510821


Here [`transform()`](../reference/api/pandas.DataFrame.transform.html#pandas.DataFrame.transform "pandas.DataFrame.transform") received a single function; this is equivalent to a [ufunc](https://numpy.org/doc/stable/reference/ufuncs.html) application.

In [52]:
np.abs(tsdf)

Unnamed: 0,A,B,C
2000-01-01,0.135252,0.174021,1.235235
2000-01-02,0.780093,1.399771,0.66745
2000-01-03,0.088788,0.842491,0.156672
2000-01-04,,,
2000-01-05,,,
2000-01-06,,,
2000-01-07,,,
2000-01-08,0.601409,1.105925,0.211198
2000-01-09,0.781389,0.520562,0.396363
2000-01-10,1.046941,0.857704,0.510821


Passing a single function to `.transform()` with a `Series` will yield a single `Series` in return.

In [53]:
tsdf["A"].transform(np.abs)

2000-01-01    0.135252
2000-01-02    0.780093
2000-01-03    0.088788
2000-01-04         NaN
2000-01-05         NaN
2000-01-06         NaN
2000-01-07         NaN
2000-01-08    0.601409
2000-01-09    0.781389
2000-01-10    1.046941
Freq: D, Name: A, dtype: float64

<a class="anchor" id="transform_with_multiple_functions"></a>
### Transform with multiple functions

Passing multiple functions will yield a column MultiIndexed DataFrame.
The first level will be the original frame column names; the second level
will be the names of the transforming functions.

In [54]:
tsdf.transform([np.abs, lambda x: x + 1])

Unnamed: 0_level_0,A,A,B,B,C,C
Unnamed: 0_level_1,absolute,<lambda>,absolute,<lambda>,absolute,<lambda>
2000-01-01,0.135252,0.864748,0.174021,1.174021,1.235235,-0.235235
2000-01-02,0.780093,1.780093,1.399771,-0.399771,0.66745,0.33255
2000-01-03,0.088788,1.088788,0.842491,1.842491,0.156672,1.156672
2000-01-04,,,,,,
2000-01-05,,,,,,
2000-01-06,,,,,,
2000-01-07,,,,,,
2000-01-08,0.601409,1.601409,1.105925,-0.105925,0.211198,0.788802
2000-01-09,0.781389,0.218611,0.520562,0.479438,0.396363,0.603637
2000-01-10,1.046941,-0.046941,0.857704,1.857704,0.510821,0.489179


Passing multiple functions to a Series will yield a DataFrame. The
resulting column names will be the transforming functions.

In [55]:
tsdf["A"].transform([np.abs, lambda x: x + 1])

Unnamed: 0,absolute,<lambda>
2000-01-01,0.135252,0.864748
2000-01-02,0.780093,1.780093
2000-01-03,0.088788,1.088788
2000-01-04,,
2000-01-05,,
2000-01-06,,
2000-01-07,,
2000-01-08,0.601409,1.601409
2000-01-09,0.781389,0.218611
2000-01-10,1.046941,-0.046941


<a class="anchor" id="transforming_with_a_dict"></a>
### Transforming with a dict

Passing a dict of functions will allow selective transforming per column.

In [56]:
tsdf.transform({"A": np.abs, "B": lambda x: x + 1})

Unnamed: 0,A,B
2000-01-01,0.135252,1.174021
2000-01-02,0.780093,-0.399771
2000-01-03,0.088788,1.842491
2000-01-04,,
2000-01-05,,
2000-01-06,,
2000-01-07,,
2000-01-08,0.601409,-0.105925
2000-01-09,0.781389,0.479438
2000-01-10,1.046941,1.857704


Passing a dict of lists will generate a MultiIndexed DataFrame with these
selective transforms.

In [57]:
tsdf.transform({"A": np.abs, "B": [lambda x: x + 1, "sqrt"]})

  result = getattr(ufunc, method)(*inputs, **kwargs)


Unnamed: 0_level_0,A,B,B
Unnamed: 0_level_1,absolute,<lambda>,sqrt
2000-01-01,0.135252,1.174021,0.417158
2000-01-02,0.780093,-0.399771,
2000-01-03,0.088788,1.842491,0.917873
2000-01-04,,,
2000-01-05,,,
2000-01-06,,,
2000-01-07,,,
2000-01-08,0.601409,-0.105925,
2000-01-09,0.781389,0.479438,
2000-01-10,1.046941,1.857704,0.926123


**Note**:Two major differences between apply and transform

There are two major differences between the `transform` and `apply` `groupby` methods.

- **Input:**
    - apply implicitly passes all the columns for each group as a DataFrame to the custom function.
    - while transform passes each column for each group individually as a Series to the custom function.
- **Output:**
    - The custom function passed to apply can return a scalar, or a Series or DataFrame (or numpy array or even list).
    - The custom function passed to transform must return a sequence (a one dimensional Series, array or list) the same length as the group.

So, transform works on just one Series at a time and apply works on the entire DataFrame at once.

In [58]:
# df.transform(np.sum) --> raises ValueError: Function did not transform
df.apply(np.sum)

one      0.140774
two     -0.274956
three    1.880252
dtype: float64

In [59]:
def add_two_columns(df):
    return df['one'] + df['two']

In [60]:
# df.transform(add_two_columns, axis='columns') --> raises ValueError: Function did not transform
df.apply(add_two_columns, axis='columns')

a   -0.014499
b   -0.071530
c   -0.044649
d         NaN
dtype: float64

In [61]:
def add_1(s):
    return s + 1

In [62]:
df.transform(add_1)

Unnamed: 0,one,two,three
a,0.506614,1.478888,
b,0.903128,1.025342,1.921374
c,1.731032,0.224319,1.399647
d,,0.996496,1.55923


In [63]:
df.apply(add_1)

Unnamed: 0,one,two,three
a,0.506614,1.478888,
b,0.903128,1.025342,1.921374
c,1.731032,0.224319,1.399647
d,,0.996496,1.55923


In [64]:
def mysum(s):
    return sum(s)

In [65]:
# df.transform(mysum) --> raises ValueError: Function did not transform
df.apply(mysum)

one           NaN
two     -0.274956
three         NaN
dtype: float64

<a class="anchor" id="applying_elementwise_functions"></a>
## Applying elementwise functions

Since not all functions can be vectorized (accept NumPy arrays and return
another array or value), the methods [`applymap()`](../reference/api/pandas.DataFrame.applymap.html#pandas.DataFrame.applymap "pandas.DataFrame.applymap") on DataFrame
and analogously [`map()`](../reference/api/pandas.Series.map.html#pandas.Series.map "pandas.Series.map") on Series accept any Python function taking
a single value and returning a single value. For example:

In [66]:
df

Unnamed: 0,one,two,three
a,-0.493386,0.478888,
b,-0.096872,0.025342,0.921374
c,0.731032,-0.775681,0.399647
d,,-0.003504,0.55923


In [67]:
def f(x):
    return len(str(x))

In [68]:
df["one"].map(f)

a    19
b    19
c    18
d     3
Name: one, dtype: int64

In [69]:
df.applymap(f)

Unnamed: 0,one,two,three
a,19,19,3
b,19,19,18
c,18,19,18
d,3,22,18


[`Series.map()`](../reference/api/pandas.Series.map.html#pandas.Series.map "pandas.Series.map") has an additional feature; it can be used to easily
“link” or “map” values defined by a secondary series:

In [70]:
s = pd.Series(
    ["six", "seven", "six", "seven", "six"], index=["a", "b", "c", "d", "e"]
)

In [71]:
t = pd.Series({"six": 6.0, "seven": 7.0})

In [72]:
s

a      six
b    seven
c      six
d    seven
e      six
dtype: object

In [73]:
s.map(t)

a    6.0
b    7.0
c    6.0
d    7.0
e    6.0
dtype: float64