# D. Methods on Pandas DataFrame

We've looked at the basic data structures of pandas and learned how to deal with axis-wise data processing on DataFrame objects.
Pandas provides so many easy-to-use and powerful data manipulation operations for DataFrame objects. Therefore, we'll be discussing what they are and how they can be used to make the data processing phase less tedious and less burdensome.

### _Objective_
1. **Pandas operations on DataFrame**: Understanding the basic axis-wise operations you can use on DataFrame objects. 
2. **Non-pandas operations on DataFrame with `apply()`&`map()`**: Understanding the concepts of `.apply()` and `.map()` for implementing non-pandas and user-defined operations to DataFrame.

In [1]:
import pandas as pd
import numpy as np

# \[1. Basic Pandas Operations on DataFrame\]

Pandas supports axis-wise operations for DataFrame objects.

#### Example Data) Report cards

|Class   | Last Name | First Name| History | English | Math | Social Studies | Science |
|----| --- | ----   | --- |---| --- | --- | --- |
|1 | Smith | John |80 |92 |70 | 65 | 92 |
|1 | Schafer | Elise |91 |75 |90 | 68 | 85 | 
|2 | Zimmermann | Kate |86 |76 |42 | 72 | 88 |
|2 | Mendoza | James |77 |92 |52 | 60 | 80 |
|3 | Park | Jay |75 |85 |85 | 92 | 95 |
|3 | Delcourt | Emma |96 |90 |95 | 81 | 72 |
|4 | Thompson | Sarah |91 |81 |92 | 81 | 73 |

In [2]:
columns = ["class","l_name", "f_name", "history", "english", "math", "social_studies", "science"]
scores = [["1", "Smith", "John", 80, 92, 70, 65, 92],
          ["1", "Schafer", "Elise", 91, 75, 90, 68, 85],
          ["2", "Zimmermann", "Kate", 86, 76, 42, 72, 88],
          ["2", "Mendoza", "James", 77, 92, 52, 60, 80],
          ["3", "Park", "Jay", 75, 85, 85, 92, 95],
          ["3", "Delcourt", "Emma", 96, 90, 95, 81, 72],
          ["4", "Thompson", "Sarah", 91, 81, 92, 81, 73]]
df = pd.DataFrame(scores,columns=columns)
df

Unnamed: 0,class,l_name,f_name,history,english,math,social_studies,science
0,1,Smith,John,80,92,70,65,92
1,1,Schafer,Elise,91,75,90,68,85
2,2,Zimmermann,Kate,86,76,42,72,88
3,2,Mendoza,James,77,92,52,60,80
4,3,Park,Jay,75,85,85,92,95
5,3,Delcourt,Emma,96,90,95,81,72
6,4,Thompson,Sarah,91,81,92,81,73


## 1. Row-wise Operations (axis = 0)

A row-wise operation on a DataFrame runs across the rows of selected columns.

### (1) Descriptive statistics over rows

pandas provides `.sum()`, `.mean()`, `.std()`, `.max()`, `min()` for axis-wise descriptive statistics. By default, the `axis` keyword is set to `0` (axis = 0) for row-wise operations, and the operations run downwards across rows of selected columns.

In [3]:
# Total scores achieved in history
df.history.sum() # returning the total after adding up all values of the `history` column.

596

In [4]:
# Average score in history
df.history.mean() # the average score in `history`  

85.14285714285714

In [5]:
# Highest score in History
df.history.max()

96

In [6]:
# lowest score in History
df.history.min()

75

In [7]:
# Standard deviation of scores in history
df.history.std()

7.988086367179802

### (2) Pandas operations between columns

Here, pandas supports and allows operations between columns. <br>
Let's calculate the mean values between two selected columns. In this example, we'll look at each student's average score from math and English combined. The result will be stored in `mean_2`.

In [8]:
# Each student's average score in English and math combined 
df["mean_2"] = (df.english + df.math)/2
df

Unnamed: 0,class,l_name,f_name,history,english,math,social_studies,science,mean_2
0,1,Smith,John,80,92,70,65,92,81.0
1,1,Schafer,Elise,91,75,90,68,85,82.5
2,2,Zimmermann,Kate,86,76,42,72,88,59.0
3,2,Mendoza,James,77,92,52,60,80,72.0
4,3,Park,Jay,75,85,85,92,95,85.0
5,3,Delcourt,Emma,96,90,95,81,72,92.5
6,4,Thompson,Sarah,91,81,92,81,73,86.5


### (3) Adding new columns to DataFrame
There are many ways to create new columns to existing DataFrame. First, we can use DataFrame indexing to create a new column in DataFrame and set it to default values with the following syntax.
> df[col_name] = value

It creates a new column `col_name` in DataFrame `df` and sets the default value for the entire column to `value`.



Let's create a column 'pass' with a default value `True`.

In [9]:
# The value of every element in `pass` is `True`
df["pass"] = True
df

Unnamed: 0,class,l_name,f_name,history,english,math,social_studies,science,mean_2,pass
0,1,Smith,John,80,92,70,65,92,81.0,True
1,1,Schafer,Elise,91,75,90,68,85,82.5,True
2,2,Zimmermann,Kate,86,76,42,72,88,59.0,True
3,2,Mendoza,James,77,92,52,60,80,72.0,True
4,3,Park,Jay,75,85,85,92,95,85.0,True
5,3,Delcourt,Emma,96,90,95,81,72,92.5,True
6,4,Thompson,Sarah,91,81,92,81,73,86.5,True


Or, you can assign different values to each element.

In [10]:
df["pass_fail"] = [True, True, False, False, True, True, True] # assigning different values to each element of the new column `pass_fail`
df


Unnamed: 0,class,l_name,f_name,history,english,math,social_studies,science,mean_2,pass,pass_fail
0,1,Smith,John,80,92,70,65,92,81.0,True,True
1,1,Schafer,Elise,91,75,90,68,85,82.5,True,True
2,2,Zimmermann,Kate,86,76,42,72,88,59.0,True,False
3,2,Mendoza,James,77,92,52,60,80,72.0,True,False
4,3,Park,Jay,75,85,85,92,95,85.0,True,True
5,3,Delcourt,Emma,96,90,95,81,72,92.5,True,True
6,4,Thompson,Sarah,91,81,92,81,73,86.5,True,True


### (4) Deleting columns
There are two methods for removing columns from DataFrame. The first method is  using `del` keyword for a delete of a single column, and the second method is `.drop()` for removing multiple columns.

Let's delete the column `mean_2` using **`del`**.

In [11]:
del df['mean_2']
df

Unnamed: 0,class,l_name,f_name,history,english,math,social_studies,science,pass,pass_fail
0,1,Smith,John,80,92,70,65,92,True,True
1,1,Schafer,Elise,91,75,90,68,85,True,True
2,2,Zimmermann,Kate,86,76,42,72,88,True,False
3,2,Mendoza,James,77,92,52,60,80,True,False
4,3,Park,Jay,75,85,85,92,95,True,True
5,3,Delcourt,Emma,96,90,95,81,72,True,True
6,4,Thompson,Sarah,91,81,92,81,73,True,True


Let's remove two columns `pass` and `pass_fail` at one go, using **`.drop()`**.

In [12]:
df = df.drop(columns=["pass","pass_fail"])
df

Unnamed: 0,class,l_name,f_name,history,english,math,social_studies,science
0,1,Smith,John,80,92,70,65,92
1,1,Schafer,Elise,91,75,90,68,85
2,2,Zimmermann,Kate,86,76,42,72,88
3,2,Mendoza,James,77,92,52,60,80
4,3,Park,Jay,75,85,85,92,95
5,3,Delcourt,Emma,96,90,95,81,72
6,4,Thompson,Sarah,91,81,92,81,73


## 1. Column-wise Operations (axis = 1)


### (1) Descriptive statistics over columns

The descriptive statistics such as `.sum()`, `.mean()`, `.std()`, `.max()`, `.min()` can be also applied over columns with the `axis` keyword set to 1 as `axis = 1`. It will then return statistics of each row.



In [13]:
# Total score of each student
df.iloc[:,2:].sum(axis=1)

0    399
1    409
2    364
3    361
4    432
5    434
6    418
dtype: int64

In [14]:
# Average score of each student
df.iloc[:,2:].mean(axis=1)

0    79.8
1    81.8
2    72.8
3    72.2
4    86.4
5    86.8
6    83.6
dtype: float64

In [15]:
# Highest score from each student
df.iloc[:,2:].max(axis=1)

0    92
1    91
2    88
3    92
4    95
5    96
6    92
dtype: int64

In [16]:
# Lowest score from each student
df.iloc[:,2:].min(axis=1)

0    65
1    68
2    42
3    52
4    75
5    72
6    73
dtype: int64

In [17]:
# Standard deviation of each student's scores
df.iloc[:,2:].std(axis=1)

0    12.377399
1     9.984989
2    18.471600
3    16.068603
4     7.733046
5    10.183320
6     7.924645
dtype: float64

### (2) Pandas operations between rows
Just as column-to-column operations, you can perform row-to-row operations on DataFrames.<br>
Let's select two columns and apply standard arithmetic operations to those selected rows.

In [18]:
df

Unnamed: 0,class,l_name,f_name,history,english,math,social_studies,science
0,1,Smith,John,80,92,70,65,92
1,1,Schafer,Elise,91,75,90,68,85
2,2,Zimmermann,Kate,86,76,42,72,88
3,2,Mendoza,James,77,92,52,60,80
4,3,Park,Jay,75,85,85,92,95
5,3,Delcourt,Emma,96,90,95,81,72
6,4,Thompson,Sarah,91,81,92,81,73


In [19]:
df.loc[1,"history":"science"] - df.loc[3,"history":"science"] # Difference between two students' scores in history and science

history            14
english           -17
math               38
social_studies      8
science             5
dtype: object

### (3) Adding new rows to DataFrame

There are two ways to add a new row to DataFrame.
1. Assigning values to the DataFrame with the new row name in square brackets `[]` as `.loc[row_name]`
2. Passing a dictionary to `.append()`

We've just found out Mina's exam scores had been left out of the student report cards and decided to add her scores to the existing DataFrame.

By the first method, you should first create a list on her information and scores as `["4", "Myeong", "Mina", 95, 83, 85, 87, 80]` and assign it to the new row `7` as follows.

In [20]:
# A new row, row 7
df.loc["7"] = ["4", "Myeong", "Mina", 95, 83, 85, 87, 0]
df

Unnamed: 0,class,l_name,f_name,history,english,math,social_studies,science
0,1,Smith,John,80,92,70,65,92
1,1,Schafer,Elise,91,75,90,68,85
2,2,Zimmermann,Kate,86,76,42,72,88
3,2,Mendoza,James,77,92,52,60,80
4,3,Park,Jay,75,85,85,92,95
5,3,Delcourt,Emma,96,90,95,81,72
6,4,Thompson,Sarah,91,81,92,81,73
7,4,Myeong,Mina,95,83,85,87,0


Or, you can use  `.append()`. In this case, we have to create a dictionary on details about her.

In [21]:
df.append({
    "class": "4",
    "l_name": "Valenti",
    "f_name": "Michael",
    "history": 95,
    "english": 70,
    "math": 82,
    "social_studies": 91,
    "science": 75
},ignore_index=True)

Unnamed: 0,class,l_name,f_name,history,english,math,social_studies,science
0,1,Smith,John,80,92,70,65,92
1,1,Schafer,Elise,91,75,90,68,85
2,2,Zimmermann,Kate,86,76,42,72,88
3,2,Mendoza,James,77,92,52,60,80
4,3,Park,Jay,75,85,85,92,95
5,3,Delcourt,Emma,96,90,95,81,72
6,4,Thompson,Sarah,91,81,92,81,73
7,4,Myeong,Mina,95,83,85,87,0
8,4,Valenti,Michael,95,70,82,91,75


### (4) Deleting rows

Use `.drop()` to delete rows from DataFrame.

#### Deleting a single row

In [22]:
df_del = df.drop(index=0)
df_del

Unnamed: 0,class,l_name,f_name,history,english,math,social_studies,science
1,1,Schafer,Elise,91,75,90,68,85
2,2,Zimmermann,Kate,86,76,42,72,88
3,2,Mendoza,James,77,92,52,60,80
4,3,Park,Jay,75,85,85,92,95
5,3,Delcourt,Emma,96,90,95,81,72
6,4,Thompson,Sarah,91,81,92,81,73
7,4,Myeong,Mina,95,83,85,87,0


**Deleting multiple rows** 

In [23]:
# Deleting row 1 and 4
df_del = df.drop(index=[1,4])
df_del

Unnamed: 0,class,l_name,f_name,history,english,math,social_studies,science
0,1,Smith,John,80,92,70,65,92
2,2,Zimmermann,Kate,86,76,42,72,88
3,2,Mendoza,James,77,92,52,60,80
5,3,Delcourt,Emma,96,90,95,81,72
6,4,Thompson,Sarah,91,81,92,81,73
7,4,Myeong,Mina,95,83,85,87,0


# \[2. Dataframe Manipulation\]

+ In addition to pandas functions and methods, there are other types of functions such as user-defined functions and non-pandas functions you can use on DataFrame. These functions can be performed both axis-wise and element-wise operations with `.apply()` or `.applymap()`.

+ User-defined functions can be applied **axis-wise** on DataFrame with `.apply()`.

+ Or, if you want to apply user-defined function to every element of a DataFrame, use `.applymap()`.

+ In Series, you should use `map()` and `.apply()` for applying user-defined functions element-wise.

+ Pandas provides `.transpose()` for a row-column swap.

## 1. Apply & Map 

In [24]:
lecture_df = df[["history", "english", "math", "social_studies", "science"]]
lecture_df

Unnamed: 0,history,english,math,social_studies,science
0,80,92,70,65,92
1,91,75,90,68,85
2,86,76,42,72,88
3,77,92,52,60,80
4,75,85,85,92,95
5,96,90,95,81,72
6,91,81,92,81,73
7,95,83,85,87,0


### (1) Applying a user-defined function across the rows (axis = 0)


![](https://cdn.shortpixel.ai/spai/w_750+q_lossy+ret_img+to_webp/https://www.sharpsightlabs.com/wp-content/uploads/2018/10/np-sum-axis0-example.png)

In [25]:
# Rating the difficulty level of each exam
# Subjects with an average score of 80 or higher will be rated as 'easy', otherwise 'hard'
lecture_df.apply(lambda x : "easy" if x.mean()>80 else "hard", axis=0)

history           easy
english           easy
math              hard
social_studies    hard
science           hard
dtype: object

### (2) Applying a user-defined function across the columns (axis = 1)

![](https://cdn.shortpixel.ai/spai/w_563+q_lossy+ret_img+to_webp/https://www.sharpsightlabs.com/wp-content/uploads/2018/10/np-sum-axis1-example_v2.png)

In [26]:
# Students with an average over 80 will be graded `pass`, otherwise `fail`
lecture_df.apply(lambda x : "pass" if x.mean()>80 else "fail", axis=1)

0    fail
1    pass
2    fail
3    fail
4    pass
5    pass
6    pass
7    fail
dtype: object

### (3) Element-wise operation of a user-defined function on DataFrame

In [27]:
# Courses will be graded `pass` for scores over 80, otherwise `fail` 
lecture_df.applymap(lambda x : "Pass" if x>80 else "Fail")

Unnamed: 0,history,english,math,social_studies,science
0,Fail,Pass,Fail,Fail,Pass
1,Pass,Fail,Pass,Fail,Pass
2,Pass,Fail,Fail,Fail,Pass
3,Fail,Pass,Fail,Fail,Fail
4,Fail,Pass,Pass,Pass,Pass
5,Pass,Pass,Pass,Pass,Fail
6,Pass,Pass,Pass,Pass,Fail
7,Pass,Pass,Pass,Pass,Fail


### (4) Element-wise operation of a user-defined function on Series

In [28]:
math_score = lecture_df.math
math_score

0    70
1    90
2    42
3    52
4    85
5    95
6    92
7    85
Name: math, dtype: int64

In [29]:
# Math scores over 80 will be graded `pass`, otherwise `fail`
math_score.apply(lambda x : "pass" if x > 80 else "fail")

0    fail
1    pass
2    fail
3    fail
4    pass
5    pass
6    pass
7    pass
Name: math, dtype: object

In [30]:
# A math score over 80 will be graded `pass`, otherwise `fail`
x = math_score.map(lambda x : "pass" if x>80 else "fail")
x

0    fail
1    pass
2    fail
3    fail
4    pass
5    pass
6    pass
7    pass
Name: math, dtype: object

You can see both `.map()` and `.apply()` can be used on pandas Series for element-wise operations.<br>The only difference is, `.map()` can take a dictionary as an argument while `.apply()` can't.

In [31]:
x.map({"Fail":"See you again next semester",
       "Pass":"You may go on another adventure"})

0    NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN
6    NaN
7    NaN
Name: math, dtype: object

## 2. Transposing DataFrame

Pandas provides `.transpose()` to change the axis order.
Since a DataFrame object is two-dimensional with only rows and columns, `.transpose()` simply transposes rows and columns, in which case `.transpose()` can be used as an alias `.T`.

In [3]:
transposed_df = df.T
transposed_df

Unnamed: 0,0,1,2,3,4,5,6
class,1,1,2,2,3,3,4
l_name,Smith,Schafer,Zimmermann,Mendoza,Park,Delcourt,Thompson
f_name,John,Elise,Kate,James,Jay,Emma,Sarah
history,80,91,86,77,75,96,91
english,92,75,76,92,85,90,81
math,70,90,42,52,85,95,92
social_studies,65,68,72,60,92,81,81
science,92,85,88,80,95,72,73


You can see that it only transposed index and columns without any change in the content of the DataFrame.