# Apply

Esta función es bastante poderosa, pues nos permite aplicarle funciones predefinidas en Python (ya sea de lobrerias externas o creadas por nosotros) a un `DataFrame` para modificar, manupilar y consultar (hay un largo etc de todo lo que podemos crear) la data.

In [1]:
import pandas as pd

In [2]:
df_books = pd.read_csv('./data_sources/bestsellers-with-categories_e591527f-ae45-4fa5-b0d1-d50142128fa6.csv', sep=',', header=0)
df_books.head(3)

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction


Creamos una función de Python que duplique el un valor:

In [3]:
def two_times(vale):
    return vale * 2

Accedemos a la columna que deseamos modificar, le aplicamos la función `apply` y como parametro mandamos la función que creamos arriva `two_times` pero sin ejecutarla:

In [4]:
df_books['User Rating'].apply(two_times)

0      9.4
1      9.2
2      9.4
3      9.4
4      9.6
      ... 
545    9.8
546    9.4
547    9.4
548    9.4
549    9.4
Name: User Rating, Length: 550, dtype: float64

El ejemplo anterio realizó el calculo, pero no afectó el `DataFrame` original, para modificalo podemos reasigna el valor a la columna `User Rating` o bien crear una columna nueva:

In [5]:
df_books['new_colum'] = df_books['User Rating'].apply(two_times)
df_books.head()

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre,new_colum
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction,9.4
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction,9.2
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction,9.4
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction,9.4
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction,9.6


`apply` también puede recibir funciones de tipo `lamba`:

In [6]:
df_books['new_colum'] = df_books['User Rating'].apply(lambda x : x * 3) # x es el valor que tiene cada fila de la columna User Rating
df_books.head()

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre,new_colum
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction,14.1
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction,13.8
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction,14.1
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction,14.1
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction,14.4


Podemos ser aun más especificos agregar lógica un poco más compleja a las funciones.

Multiplicaremos por 3 el valor de User Rating cuando el genero sea de ficción, de lo contrario mantendremos el valor de User Rating

In [7]:
df_books['new_colum'] = df_books.apply(lambda x : x['User Rating'] * 3 if x['Genre'] == 'Fiction' else ['User Rating'])  #
df_books.head()

KeyError: 'Genre'

El error que arrojó la instrucción anterior es porque por default Pandas aplica las funciones en el `axis=0` (filas), en el ejemplo anterior requerimos que se emplee en las columnas (`axis=1`).

In [11]:
df_books['new_colum'] = df_books.apply(lambda x : x['User Rating'] * 3 if x['Genre'] == 'Fiction' else x['User Rating'], axis=1)  #
df_books.head()

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre,new_colum
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction,4.7
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction,13.8
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction,4.7
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction,14.1
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction,4.8


También podemos dejar el ejemplo anterio un poco más limpio aislando nuestra `lambda function` en una nueva variable:

In [13]:
three_times_fiction = lambda x : x['User Rating'] * 3 if x['Genre'] == 'Fiction' else x['User Rating']
df_books['new_colum'] = df_books.apply(three_times_fiction, axis=1)
df_books.head()

Unnamed: 0,Name,Author,User Rating,Reviews,Price,Year,Genre,new_colum
0,10-Day Green Smoothie Cleanse,JJ Smith,4.7,17350,8,2016,Non Fiction,4.7
1,11/22/63: A Novel,Stephen King,4.6,2052,22,2011,Fiction,13.8
2,12 Rules for Life: An Antidote to Chaos,Jordan B. Peterson,4.7,18979,15,2018,Non Fiction,4.7
3,1984 (Signet Classics),George Orwell,4.7,21424,6,2017,Fiction,14.1
4,"5,000 Awesome Facts (About Everything!) (Natio...",National Geographic Kids,4.8,7665,12,2019,Non Fiction,4.8


[Documentación de apply](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html)