<div class="alert alert-block alert-success">
    <h1 align="center">Pandas Trick 29</h1>
    <h3 align="center">Map, Apply , ApplyMap, ZIP in Pandas</h3>
    <h4 align="center"><a href="http://www.iran-machinelearning.ir">Soheil Tehranipour</a></h5>
</div>

## Map, Apply Map, Apply and ZIP Functions

* In this Tutorial, we will try to understand the meaning and usage of ```map(), applymap(), and apply() functions```.
* We will try to Understand the difference between these functions and when to use these functions effectively.
* We will try to understand the usage of these functions one by one.

&nbsp;

* **Map Function**

    * It applies a function on each of the elements present in the series.
    * The ```MAP Function``` is used only for series, It cannot be used with Dataframes.

&nbsp;

* **Apply Function**
     
     * It also works similar to the Map Function.
     * But the ```Apply Function``` can be used with both Series and Dataframes.

&nbsp;
     
* **Applymap Function**

    * This Function is actually a combination of ```apply and map```.
    * When we need to apply a function to whole dataset, instead of some columns of the dataset, we use applymap.
    

### Importing the libraries

In [2]:
import numpy as np
import pandas as pd

In [4]:
pd.__version__

'1.5.1'

## Import Dataset & Make Dataframe

In [5]:
# lets import the dataset required
data = pd.read_csv('employee.csv')

# lets also check the shape of the dataset
data.shape

(1470, 35)

In [6]:
data[(data['Department'] == 'Sales') & (data['DailyRate'] > 1000)]

Unnamed: 0,Age,Attrition,BusinessTravel,DailyRate,Department,DistanceFromHome,Education,EducationField,EmployeeCount,EmployeeNumber,...,RelationshipSatisfaction,StandardHours,StockOptionLevel,TotalWorkingYears,TrainingTimesLastYear,WorkLifeBalance,YearsAtCompany,YearsInCurrentRole,YearsSinceLastPromotion,YearsWithCurrManager
0,41,Yes,Travel_Rarely,1102,Sales,1,2,Life Sciences,1,1,...,1,80,0,8,0,1,6,4,0,5
18,53,No,Travel_Rarely,1219,Sales,2,4,Life Sciences,1,23,...,3,80,0,31,3,3,25,8,3,7
21,36,Yes,Travel_Rarely,1218,Sales,9,4,Life Sciences,1,27,...,2,80,0,10,4,3,5,3,0,3
39,33,No,Travel_Frequently,1141,Sales,1,3,Life Sciences,1,52,...,1,80,2,10,3,3,5,3,1,3
46,34,No,Non-Travel,1065,Sales,23,4,Marketing,1,60,...,3,80,0,10,2,3,9,5,8,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1417,31,No,Travel_Rarely,1154,Sales,2,2,Life Sciences,1,1996,...,3,80,1,3,1,3,2,2,1,2
1433,25,No,Travel_Rarely,1382,Sales,8,2,Other,1,2018,...,2,80,1,6,3,2,5,3,0,4
1453,36,No,Travel_Rarely,1120,Sales,11,4,Marketing,1,2045,...,1,80,1,8,2,2,6,3,0,0
1464,26,No,Travel_Rarely,1167,Sales,5,3,Other,1,2060,...,4,80,0,5,2,3,4,2,0,0


In [8]:
data['DailyRate']

0       1102
1        279
2       1373
3       1392
4        591
        ... 
1465     884
1466     613
1467     155
1468    1023
1469     628
Name: DailyRate, Length: 1470, dtype: int64

In [16]:
# lets apply a function on one of the columns of the datset to check the usage of the map function

# lets define function to divide the Daily Rate by 100
def function(x):
    return x/10

data['DailyRate'] = data['DailyRate'].map(function)
data['DailyRate']

KeyError: 'DailyRate'

In [10]:
# lets try to map this function to more Daily Rate and Monthly Rate

data[['DailyRate', 'MonthlyRate']].map(function)


AttributeError: 'DataFrame' object has no attribute 'map'

* The Above code generates error, as we discussed earlier also that map can only be used with series and not dataframes
* If we want to apply these functions to DataFrame Objects, then we need to use the apply function

## Apply Method

In [11]:
data[['DailyRate','MonthlyRate']].apply(function)

Unnamed: 0,DailyRate,MonthlyRate
0,11.02,1947.9
1,2.79,2490.7
2,13.73,239.6
3,13.92,2315.9
4,5.91,1663.2
...,...,...
1465,8.84,1229.0
1466,6.13,2145.7
1467,1.55,517.4
1468,10.23,1324.3


## ApplyMap

In [17]:
# As we discussed earlier also, that the Applymap Function is the combination of apply and map functions

# This Function is used where we need to apply the function to the whole dataset
import pandas as pd
# lets define a dataset like that

dataframe  = pd.DataFrame([[1, 5, 5, 6],[7, 8, 9, 4], [8, 5, 4, 1]])

dataframe.applymap(function)

Unnamed: 0,0,1,2,3
0,0.1,0.5,0.5,0.6
1,0.7,0.8,0.9,0.4
2,0.8,0.5,0.4,0.1


## ZIP In Pandas

One of the way to create Pandas DataFrame is by using zip() function. You can use the lists to create lists of tuples and create a dictionary from it. Then, this dictionary can be used to construct a dataframe. zip() function creates the objects and that can be used to produce single item at a time. This function can create pandas DataFrames by merging two lists. Suppose there are two lists of your family, first list holds the name of family and second list holds the age of family. 

In [19]:
Name = ['Soheil', 'Farhad', 'Mahtab', 'Baran']
Age = [33, 38, 3, 1]

In [20]:
zipped = zip(Name, Age)
zipped

<zip at 0x204e33f9940>

In [21]:
type(zipped)

zip

In [22]:
# get the list of tuples from two lists and merge them by using zip().
my_list = list(zipped)
my_list

[('Soheil', 33), ('Farhad', 38), ('Mahtab', 3), ('Baran', 1)]

* If you’re working with sequences like lists, tuples, or strings, then your iterables are guaranteed to be evaluated from left to right. This means that the resulting list of tuples will take the form [(numbers[0], letters[0]), (numbers[1], letters[1]),..., (numbers[n], letters[n])].

In [23]:
# Converting lists of tuples into pandas Dataframe.

df = pd.DataFrame(my_list, columns = ['Name', 'Age'])
df


Unnamed: 0,Name,Age
0,Soheil,33
1,Farhad,38
2,Mahtab,3
3,Baran,1


# Apply In Action

In [3]:
data = pd.read_csv("NYC_Jobs.csv")
data = data[['Job ID','Civil Service Title','Agency','Posting Type','Job Category','Salary Range From','Salary Range To']]
data.head()

Unnamed: 0,Job ID,Civil Service Title,Agency,Posting Type,Job Category,Salary Range From,Salary Range To
0,424339,PUBLIC HEALTH NURSE,DEPT OF HEALTH/MENTAL HYGIENE,External,Health,84252.0,84252.0
1,379094,CERT IT DEVELOPER (APP),NYC EMPLOYEES RETIREMENT SYS,External,"Technology, Data & Innovation",82884.0,116391.0
2,520417,EXECUTIVE AGENCY COUNSEL,NYC HOUSING AUTHORITY,External,Legal Affairs,105000.0,125000.0
3,233549,CERTIFIED IT ADMINISTRATOR (LA,NYC EMPLOYEES RETIREMENT SYS,External,Information Technology & Telecommunications,87203.0,131623.0
4,510256,ASSOCIATE HUMAN RIGHTS SPECIAL,HUMAN RIGHTS COMMISSION,External,Constituent Services & Community Programs,58449.0,67216.0


In [14]:
data.shape

(3773, 8)

In [4]:
# you can apply a pre-defined function to it
data['Salary Range From'].apply(np.sqrt)

0       290.261951
1       287.895814
2       324.037035
3       295.301541
4       241.762280
           ...    
3768    292.183162
3769    273.221522
3770    316.227766
3771    205.640463
3772    249.793915
Name: Salary Range From, Length: 3773, dtype: float64

In [11]:
# you can create a custom function and apply it to the dataframe
def capitalize_position(title):
    title_lower = title.lower()
    title_final = title_lower.capitalize()
    return title_final

In [12]:
# again you would need to assign it to the dataframe
data['Civil Service Title'] = data['Civil Service Title'].apply(capitalize_position)

In [13]:
data['Civil Service Title']

0                  Public health nurse
1              Cert it developer (app)
2             Executive agency counsel
3       Certified it administrator (la
4       Associate human rights special
                     ...              
3768    Certified it administrator (la
3769     Associate housing development
3770               Senior it architect
3771                         Economist
3772           Agency attorney interne
Name: Civil Service Title, Length: 3773, dtype: object

<img src="https://webna.ir/wp-content/uploads/2018/08/%D9%85%DA%A9%D8%AA%D8%A8-%D8%AE%D9%88%D9%86%D9%87.png" width=50% />