<center>
<img src='https://drive.google.com/uc?id=1AcU8-pfRLIe_v1J430lxrEVqp2cqxA2J' height='180px' />
</center>

`apply()` , `map()` & `applymap()` **Summary** functions
--

> **apply()** is used to apply a function along an axis of the DataFrame or on values of Series.

> **applymap()** is used to apply a function to a DataFrame elementwise.

> **map()** is used to substitute each value in a Series with another value.

<img src='https://drive.google.com/uc?id=14LQs_AcDq9PoOFdVLcY3Hilm0_Hf5KJt' height='180px' />

Lets understand all the above functions with simple dummy data.

**How to use apply()?**

The Pandas apply() is used to apply a function along an axis of the DataFrame or on values of Series.

Let’s begin with a simple example, to sum each row and save the result to a new column “D”

In [None]:
## Creating a dummy dataframe
import pandas as pd
df = pd.DataFrame({ 'A': [1,2,3,4], 
                    'B': [10,20,30,40],
                    'C': [20,40,60,80]
                  }, 
                  index=['Row 1', 'Row 2', 'Row 3', 'Row 4'])

df

Unnamed: 0,A,B,C
Row 1,1,10,20
Row 2,2,20,40
Row 3,3,30,60
Row 4,4,40,80


In [None]:
# Let's call our sum function as "custom_sum" as "sum" is a built-in function
def custom_sum(row):
    return row.sum()

df['D'] = df.apply(custom_sum, axis=1)

df

## Remember :  axis=1 => row wise,  axis=0 is default => column wise

Unnamed: 0,A,B,C,D
Row 1,1,10,20,31
Row 2,2,20,40,62
Row 3,3,30,60,93
Row 4,4,40,80,124


In [None]:
## With the understanding of the sum of each row, the sum of each column is just to use axis = 0 instead

df.loc['Row 5'] = df.apply(custom_sum, axis=0)

df

Unnamed: 0,A,B,C,D
Row 1,1,10,20,31
Row 2,2,20,40,62
Row 3,3,30,60,93
Row 4,4,40,80,124
Row 5,10,100,200,310


**Use lambda with apply**

You can also use **lambda expression** with Pandas apply() function.

> The lambda equivalent for the sum of each row of a DataFrame:
<pre>
df['D'] = df.apply(lambda x:x.sum(), axis=1)
</pre>

> The lambda equivalent for the sum of each column of a DataFrame:
<pre>
df['Row 5'] = df.apply(lambda x:x.sum(), axis=0)
</pre>

> And the lambda equivalent for multiply by 2 on a single column value or simply said **Series**:
<pre>
df['D'] = df['C'].apply(lambda x:x*2)
</pre>

In [None]:
## Uncomment to run this example :
df['D'] = df['C'].apply(lambda x:x*2)
df

Unnamed: 0,A,B,C,D
Row 1,1,10,20,40
Row 2,2,20,40,80
Row 3,3,30,60,120
Row 4,4,40,80,160
Row 5,10,100,200,400


In [None]:
## Self Test time - 2 mins - One of the Student would explain : What is this code doing ??
## many ways to apply if-conditions-in-pandas-dataframe
## https://datatofish.com/if-condition-in-pandas-dataframe/

## First scan this simple example :
import pandas as pd

numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10]}
df = pd.DataFrame(numbers,columns=['set_of_numbers'])

print (df)
print ("----------------------")

df['equal_or_lower_than_4?'] = df['set_of_numbers'].apply(lambda x: 'True' if x <= 4 else 'False')

print (df)

   set_of_numbers
0               1
1               2
2               3
3               4
4               5
5               6
6               7
7               8
8               9
9              10
----------------------
   set_of_numbers equal_or_lower_than_4?
0               1                   True
1               2                   True
2               3                   True
3               4                   True
4               5                  False
5               6                  False
6               7                  False
7               8                  False
8               9                  False
9              10                  False


Load your **WINE REVIEW DATASET** :

In [None]:
## load the wine_reviews dataset
from google.colab import files
files.upload()

In [None]:
import pandas as pd
wine_reviews = pd.read_csv("wine reviews_small.csv", index_col=0)
wine_reviews.head(2)

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos


<font color='red'> Create a rating column in the **wine_reviews** dataframe, based on the following logic : (solve it 4 mins) </font>

> \>70 points and <81 **=> 3 stars** 

> \>80 and <91 **=> 4 stars** 

> \>90 **=> 5 stars**

In [None]:
## lets assign ratings , with the logic as below :
# >70 points and <81 => 3 stars , >80 and <91 => 4 stars, >90 => 5 stars, In all other cases give it 0 rating

wine_reviews['rating'] = wine_reviews['points'].apply(lambda x: '5' if x >90 else 
                                                      ('4' if (x >80 & x<91) else 
                                                      ('3' if (x >70 & x<81) else '0')))


## How to verify that rating is applied properly ?? 
## lets print the value_counts of rating column
print(wine_reviews['rating'].value_counts())

print("------------------------------")
print("statistical details of points feature")
print(wine_reviews['points'].describe())

4    22126
5     7469
3       70
Name: rating, dtype: int64
------------------------------
statistical details of points feature
count    29665.000000
mean        88.385539
std          3.010396
min         80.000000
25%         86.000000
50%         88.000000
75%         91.000000
max        100.000000
Name: points, dtype: float64


**With result_type parameter**

result_type is a parameter in apply() set to **`'expand'`**, **`'reduce'`**, or **`'broadcast'`** to get the desired type of result.

In the above scenario if **`result_type`** is set to 'broadcast' then the output will be a DataFrame substituted by the **`custom_sum`** value.

In [None]:
import pandas as pd
df = pd.DataFrame({ 'A': [1,2,3,4], 
                    'B': [10,20,30,40],
                    'C': [20,40,60,80]
                  }, 
                  index=['Row 1', 'Row 2', 'Row 3', 'Row 4'])

dfCopy1 = df.apply(custom_sum, axis=1, result_type='broadcast')
dfCopy1

Unnamed: 0,A,B,C
Row 1,31,31,31
Row 2,62,62,62
Row 3,93,93,93
Row 4,124,124,124


<font color='darkgreen'><b>The result is broadcasted to the original shape of the frame, the original index and columns are retained.</b></font>

To understand result_type as **`'expand'`** and **`'reduce'`**, we will first create a function that returns a list.

In [None]:
def cal_multi_col(row):
    return [row['A'] * 2, row['B'] * 3]


## Now apply this function across the DataFrame column with result_type as 'expand'
dfExpand = df.apply(cal_multi_col, axis=1, result_type='expand')
dfExpand

Unnamed: 0,0,1
Row 1,2,30
Row 2,4,60
Row 3,6,90
Row 4,8,120


In [None]:
## In order to append this to the existing DataFrame, 
## the result has to be kept in a variable so the column names 
## can be accessed by res.columns.

dfCopy2 = df = pd.DataFrame({ 'A': [1,2,3,4], 
                    'B': [10,20,30,40],
                    'C': [20,40,60,80]
                  }, 
                  index=['Row 1', 'Row 2', 'Row 3', 'Row 4'])

dfExpand = dfCopy2.apply(cal_multi_col, axis=1, result_type='expand')
dfCopy2[dfExpand.columns] = dfExpand

dfCopy2

Unnamed: 0,A,B,C,0,1
Row 1,1,10,20,2,30
Row 2,2,20,40,4,60
Row 3,3,30,60,6,90
Row 4,4,40,80,8,120


In [None]:
## Next, apply the function across the DataFrame column with result_type as 'reduce' . 
## result_type='reduce' is just opposite of 'expand' and returns a Series 
## if possible rather than expanding list-like results.

dfCopy3 = df = pd.DataFrame({ 'A': [1,2,3,4], 
                    'B': [10,20,30,40],
                    'C': [20,40,60,80]
                  }, 
                  index=['Row 1', 'Row 2', 'Row 3', 'Row 4'])

dfCopy3['New'] = dfCopy3.apply(cal_multi_col, axis=1, result_type='reduce')

dfCopy3

Unnamed: 0,A,B,C,New
Row 1,1,10,20,"[2, 30]"
Row 2,2,20,40,"[4, 60]"
Row 3,3,30,60,"[6, 90]"
Row 4,4,40,80,"[8, 120]"


**How to use applymap()?**

applymap() is only available in DataFrame and used for element-wise operation across the whole DataFrame. It has been optimized and some cases work much faster than apply().

For example: to output a DataFrame **`with number squared`**

In [None]:
import numpy as np

dfCopy4 = df = pd.DataFrame({ 'A': [1,2,3,4], 
                    'B': [10,20,30,40],
                    'C': [20,40,60,80]
                  }, 
                  index=['Row 1', 'Row 2', 'Row 3', 'Row 4'])

dfCopy4 = dfCopy4.applymap(np.square)

dfCopy4

Unnamed: 0,A,B,C
Row 1,1,100,400
Row 2,4,400,1600
Row 3,9,900,3600
Row 4,16,1600,6400


**How to use map()?**

**map()** is only available in Series and used for substituting each value in a Series with another value. To understand how the **map() works**, we first create a Series.

In [None]:
my_laptops = ['lenovo', 'thinkpad', 'ultrabook', 'asus']

## suppose , i want to make all strings into uppercase
uppered_cases = list(map(str.upper, my_laptops))

print(uppered_cases)

['LENOVO', 'THINKPAD', 'ULTRABOOK', 'ASUS']


In [None]:
## what would this code do ??

circle_areas = [3.16773, 5.57668, 4.00914, 56.24241, 9.01344, 32.00013]

result = list(map(round, circle_areas))

print(result)

## If you would like to set the number of significant digits you could do like this :
## new_list = list(map(lambda x: round(x,precision),old_list))

[3, 6, 4, 56, 9, 32]


<h3> <font color='red'> <b> Question Time </b> </font>(Solve 4 Qns in 20 mins) </h3>
<hr>

**1. Create variable centered_price containing a version of the price column with the mean price subtracted.**

(Note: *this 'centering' transformation is a common preprocessing step before applying various machine learning algorithms.*)

In [None]:
## find the mean price 
review_price_mean = wine_reviews.price.mean()

## We are normalising the price, that is substracting the mean from each price value
## thus reducing the range of price. 
## to test the range , just for curiosity , u can find the min and max values of price column
minPrice = wine_reviews.price.min()
maxPrice = wine_reviews.price.max()

print("Minimum price before normalising ", minPrice)
print("Maximum price before normalising ", maxPrice)
print("--------------------------------------")

## now finding centered price
## your code here
centered_price = wine_reviews.price.map(lambda x:x - review_price_mean)

#wine_reviews.price[0]
print(centered_price)
print("--------------------------------------")

## Just figuring out the min & max range after normalising. 
print("Minimum price after normalising ", centered_price.min())
print("Maximum price after normalising ", centered_price.max())
## Range don't seem to have reduced !!

Minimum price before normalising  4.0
Maximum price before normalising  2500.0
--------------------------------------
0              NaN
1       -19.990123
2       -20.990123
3       -21.990123
4        30.009877
           ...    
29996   -15.990123
29997    19.009877
29998   -11.990123
29999    40.009877
30000     5.009877
Name: price, Length: 29665, dtype: float64
--------------------------------------
Minimum price after normalising  -30.990123456790123
Maximum price after normalising  2465.00987654321


<hr>

**2. I'm an `economical` wine buyer. Which wine is the "best bargain"? Create a variable `bargain_wine` with the title of the wine with the highest `points-to-price` ratio in the dataset.**

In [None]:
## Learn a new function for solving this problem 
## max() returns the max value 
## idxmax() returns the index of the max value.
## https://www.geeksforgeeks.org/python-pandas-dataframe-idxmax/

## your code here
bargain_idx = (wine_reviews['points'] / wine_reviews['price']).idxmax()
bargain_idx

## once we get the Index of the row-entry which give us the max ratio of points to price
## then we can get 'title' of the wine. Obviously it is the best wine.  

## your code here
bargain_wine = wine_reviews.loc[bargain_idx,'title']
bargain_wine

'Felix Solis 2013 Flirty Bird Syrah (Vino de la Tierra de Castilla)'

<hr>

**3. There are only so many words you can use when describing a bottle of wine. Is a wine more likely to be `"tropical"` or `"fruity"`? Create a Series `descriptor_counts` counting how many times each of these two words appears in the description column in the dataset.**

In [None]:
## we are applying map() over the column "description" .  
## sum() totals the no. of entries
count_tropical = wine_reviews.description.map(lambda d: "tropical" in d).sum()

## repeating above step for finding count_of_fruity
count_fruity = wine_reviews.description.map(lambda d: "fruity" in d).sum()

descriptor_counts = pd.Series([count_tropical, count_fruity], index=['tropical', 'fruity'])

descriptor_counts




tropical     837
fruity      2137
dtype: int64

<hr>

4. We'd like to host these wine reviews on our website, but a rating system ranging from 80 to 100 points is too hard to understand - we'd like to translate them into simple star ratings. **A score of `95` or higher counts as `3 stars`, a score of at least `85` but less than `95` is `2 stars`. Any other score is `1 star`.**

Also, the **Canadian Vintners** Association bought a lot of ads on the site, so any wines from Canada should `automatically` get 3 stars, regardless of points.

Create a series **`star_ratings`** with the number of stars corresponding to each review in the dataset. 

In [None]:
## lets define a function
## each time apply fn sends a row-entry to the caln_stars function. 
## By using if-elif-else we check and return some points
def caln_stars(row):
    if row.country == 'Canada':
        return 3
    elif row.points >= 95:
        return 3
    elif row.points >= 85:
        return 2
    else:
        return 1
star_ratings = wine_reviews.apply(caln_stars, axis=1)
## Remember :  axis=1 => row wise,  axis=0 is default => column wise

## to check the various pointers, we would use use value_counts() function
star_ratings.value_counts()

<hr>

<small>The study content is prepared by Rocky Jagtiani ( https://linkedin.com/in/rocky-jagtiani-3b390649/) - <b>rocky@suvenconsultants.com</b> </small>

<img src ="https://drive.google.com/uc?id=1-y7gMwSV7--Bu6Y-piHkdhIrjEkeZFQW"  width = '150px' />



<small><b>Copying this material is prohibited and needs prior permission from the Author & the Management of https://www.suvenconsultants.com/  </b></small>

<hr>

Thank you for going through the Notebook. I am sure it was a fruitful learning exprience. Even you can earn your **`"Masters in Data Science"`** certification followed with Internships and Placement calls. Do look at https://datascience.suvenconsultants.com for Online live classroom training programmes from <u>Rocky Sir & his team of data scientist </u>.

![CertificationPic_In_the_NB](https://drive.google.com/uc?id=1lLSFd1O5hrIRBIWztjBiO6RcyPwyDmyF)

