___

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" 
alt="CLRSWY"></p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:100%; text-align:center; border-radius:10px 10px;">WAY TO REINVENT YOURSELF</p>

<img src=https://i.ibb.co/6gCsHd6/1200px-Pandas-logo-svg.png width="700" height="200">

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#060108; font-size:200%; text-align:center; border-radius:10px 10px;">Data Analysis with Python</p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#060108; font-size:150%; text-align:center; border-radius:10px 10px;">Session - 04</p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#4d77cf; font-size:200%; text-align:center; border-radius:10px 10px;">Pandas DataFrames</p>

<a id="toc"></a>

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Content</p>

* [IMPORTING LIBRARIES NEEDED IN THIS NOTEBOOK](#0)
* [DATA FRAMES](#1)
* [CREATING A DATA FRAME](#2)
    * [Creating a DataFrame Using the Lists of Data & Columns](#2.1)
    * [Creating a DataFrame Using a Numpy Arrays](#2.2)
    * [Creating a DataFrame Using a Dictionary](#2.3)
    * [The Examination of Some Attributes on Data](#2.4)
* [INDEXING, SLICING & SELECTION](#3)    
* [CREATING A NEW COLUMN](#4)    
* [REMOVING COLUMNS](#5)
* [REMOVING ROWS](#6)
* [SELECTING ROWS & COLUMNS USING .loc[ ] & .iloc[ ] ](#7)
* [CONDITIONAL SELECTION](#8)
    * [One Conditional Statement](#8.1)
    * [Two or More Conditional Statements](#8.2)
    * [Conditional Selection Using .loc[ ]](#8.3)
* [reset_index() & set_index()](#9)
* [Multi-Index & Index Hierarchy](#10)
* [Some Other Useful Methods with Iris Dataset](#11)
* [THE END OF THE SESSION-04](#12)

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Importing Libraries Needed in This Notebook</p>

<a id="0"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Once you've installed NumPy & Pandas you can import them as a library:

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from scipy import stats

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Data Frames</p>

<a id="1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

A DataFrame is a two-dimensional data container, similar to a Matrix, but which can contain heterogeneous data, and for which symbolic names may be associated with the rows and columns. ``DataFrames`` are the workhorse of pandas and are directly inspired by the R programming language. We can think of a DataFrame as a bunch of Series objects put together to share the same index. 

### Why use Pandas?

Data scientists make use of Pandas in Python for its **following advantages**:

- Easily handles missing data
- It uses Series for one-dimensional data structure and DataFrame for multi-dimensional data structure
- It provides an efficient way to slice the data
- It provides a flexible way to merge, concatenate or reshape the data
- It includes a powerful time series tool to work with

In a nutshell, Pandas is a useful library in data analysis. It can be used to perform data manipulation and analysis. Pandas provide powerful and easy-to-use data structures, as well as the means to quickly perform operations on these structures.

[SOURCE01](https://pandas.pydata.org/pandas-docs/stable/user_guide/dsintro.html), 
[SOURCE02](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html), 
[SOURCE03](https://morioh.com/p/2528ac775b1b), 
[SOURCE04](https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python), 
[SOURCE05](https://www.guru99.com/python-pandas-tutorial.html), 
[SOURCE06](https://www.tutorialspoint.com/python_pandas/python_pandas_dataframe.htm), 
[SOURCE07](https://realpython.com/pandas-dataframe/) &
[SOURCE08](https://towardsdatascience.com/a-simple-guide-to-pandas-dataframes-b125f64e1453)<br>
[VIDEO SOURCE01](https://www.youtube.com/watch?v=zmdjNSmRXF4), 
[VIDEO SOURCE02](https://www.youtube.com/watch?v=F6kmIpWWEdU) &
[VIDEO SOURCE03](https://towardsdatascience.com/pandas-dataframe-basics-3c16eb35c4f3)<br>

**Now let's use pandas to explore this topic!**

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Creating a DataFrame</p>

<a id="2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

A **``DataFrame``** is a **two-dimension collection of data**. It is a data structure where data is stored in **tabular form**. Datasets are arranged in rows and columns; we can store multiple datasets in the data frame. We can perform various arithmetic operations, such as adding column/row selection and columns/rows in the data frame.

We can import the DataFrames from the external storage; these storages can be referred to as the SQL Database, CSV file, and an Excel file. We can also use the lists, dictionary, and from a list of dictionary, etc.

In this session, we will learn to create the DataFrame in multiple ways. Let's understand these different ways.

**``pandas.DataFrame(data=None, index=None, columns=None, dtype=None, copy=None)``**

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Creating a DataFrame Using the Lists of Data & Columns</p>

<a id="2.1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [4]:
data1 = [1, 3, 5, 7, 9]
data1

[1, 3, 5, 7, 9]

In [5]:
pd.DataFrame(data1) # we can have single column data frame

Unnamed: 0,0
0,1
1,3
2,5
3,7
4,9


In [6]:
pd.Series(data1)

0    1
1    3
2    5
3    7
4    9
dtype: int64

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Creating a DataFrame Using a Numpy Arrays</p>

<a id="2.2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [9]:
data2 = np.arange(1, 24, 2).reshape(3, 4)
data2

array([[ 1,  3,  5,  7],
       [ 9, 11, 13, 15],
       [17, 19, 21, 23]])

In [13]:
df = pd.DataFrame(data=data2, columns=['var1', 'var2', 'var3', 'var4'])
df

Unnamed: 0,var1,var2,var3,var4
0,1,3,5,7
1,9,11,13,15
2,17,19,21,23


### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Creating a DataFrame Using a Dictionary</p>

<a id="2.3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [14]:
s1 = np.random.randint(2, 10, size = 4)
s2 = np.random.randint(3, 10, size = 4)
s3 = np.random.randint(4, 15, size = 4)

In [15]:
s1, s2, s3

(array([5, 4, 3, 4]), array([6, 6, 6, 6]), array([8, 8, 8, 4]))

In [16]:
my_dict = {'var1':s1, 'var2':s2, 'var3':s3}
my_dict

{'var1': array([5, 4, 3, 4]),
 'var2': array([6, 6, 6, 6]),
 'var3': array([8, 8, 8, 4])}

In [18]:
df2 = pd.DataFrame(my_dict)
df2

Unnamed: 0,var1,var2,var3
0,5,6,8
1,4,6,8
2,3,6,8
3,4,6,4


### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">The Examination of Some Attributes on Data</p>

<a id="2.4"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [19]:
df2

Unnamed: 0,var1,var2,var3
0,5,6,8
1,4,6,8
2,3,6,8
3,4,6,4


In [20]:
df.head(2)

Unnamed: 0,var1,var2,var3,var4
0,1,3,5,7
1,9,11,13,15


In [21]:
df.tail(2)

Unnamed: 0,var1,var2,var3,var4
1,9,11,13,15
2,17,19,21,23


In [22]:
df2.sample(2)

Unnamed: 0,var1,var2,var3
2,3,6,8
1,4,6,8


In [23]:
df2.columns

Index(['var1', 'var2', 'var3'], dtype='object')

In [24]:
for col in df2.columns:
    print(col)

var1
var2
var3


In [25]:
for col in df2.columns:
    print(df2[col].mean())

4.0
6.0
7.0


In [26]:
df2.mean()

var1    4.0
var2    6.0
var3    7.0
dtype: float64

In [28]:
df2.index

RangeIndex(start=0, stop=4, step=1)

In [27]:
df2.columns = ['new1', 'new2', 'new3']
df2

Unnamed: 0,new1,new2,new3
0,5,6,8
1,4,6,8
2,3,6,8
3,4,6,4


In [29]:
df2.index = ['a', 'b', 'c', 'd']
df2

Unnamed: 0,new1,new2,new3
a,5,6,8
b,4,6,8
c,3,6,8
d,4,6,4


In [32]:
df2.rename(columns={'new1':'aaa', 'new2':'bbb'}) # temp assignment

Unnamed: 0,aaa,bbb,new3
a,5,6,8
b,4,6,8
c,3,6,8
d,4,6,4


In [33]:
df2.rename(index={'a':1, 'b':2}) # temp assignment

Unnamed: 0,new1,new2,new3
1,5,6,8
2,4,6,8
c,3,6,8
d,4,6,4


In [34]:
df2

Unnamed: 0,new1,new2,new3
a,5,6,8
b,4,6,8
c,3,6,8
d,4,6,4


In [35]:
df2.shape

(4, 3)

In [36]:
df2.shape[1]

3

In [37]:
len(df2)

4

In [38]:
df2.size

12

In [39]:
df2.ndim

2

In [40]:
df2.values

array([[5, 6, 8],
       [4, 6, 8],
       [3, 6, 8],
       [4, 6, 4]])

In [41]:
type(df2)

pandas.core.frame.DataFrame

In [42]:
type(df2.values)

numpy.ndarray

In [43]:
type(df2["new1"])

pandas.core.series.Series

In [44]:
'new2' in df2

True

In [45]:
'new5' in df2

False

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Indexing, Slicing & Selection</p>

<a id="3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Let's learn a variety of methods to grab data from a DataFrame

In [46]:
from numpy.random import randn

In [47]:
np.random.seed(101)
df3 = pd.DataFrame(randn(5, 4), index = 'A B C D E'.split(), columns = 'W X Y Z'.split())
df3

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [48]:
df3['Y']

A    0.907969
B   -0.848077
C    0.528813
D   -0.933237
E    2.605967
Name: Y, dtype: float64

In [49]:
df3.Y

A    0.907969
B   -0.848077
C    0.528813
D   -0.933237
E    2.605967
Name: Y, dtype: float64

In [52]:
df3[["Y"]] # returns a single column dataframe, we use double brackets [[]]

Unnamed: 0,Y
A,0.907969
B,-0.848077
C,0.528813
D,-0.933237
E,2.605967


In [54]:
df3[["X", "Y"]] # we need to pass a list to return more than one column

Unnamed: 0,X,Y
A,0.628133,0.907969
B,-0.319318,-0.848077
C,0.740122,0.528813
D,-0.758872,-0.933237
E,1.978757,2.605967


In [55]:
df3["W":"Y"] # searches rows but returns no cell

Unnamed: 0,W,X,Y,Z


In [57]:
df3["A":"C"] # searches rows and returns

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001


In [59]:
df3["A":"C"][["Y", "W"]]

Unnamed: 0,Y,W
A,0.907969,2.70685
B,-0.848077,0.651118
C,0.528813,-2.018168


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Creating a New Column</p>

<a id="4"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [74]:
df3["new1"] = df3["W"] * df3["X"]
df3

Unnamed: 0,W,X,Y,Z,new1
A,2.70685,0.628133,0.907969,0.503826,1.700261
B,0.651118,-0.319318,-0.848077,0.605965,-0.207914
C,-2.018168,0.740122,0.528813,-0.589001,-1.493691
D,0.188695,-0.758872,-0.933237,0.955057,-0.143196
E,0.190794,1.978757,2.605967,0.683509,0.377536


In [75]:
df3["new2"] = np.arange(5)
df3

Unnamed: 0,W,X,Y,Z,new1,new2
A,2.70685,0.628133,0.907969,0.503826,1.700261,0
B,0.651118,-0.319318,-0.848077,0.605965,-0.207914,1
C,-2.018168,0.740122,0.528813,-0.589001,-1.493691,2
D,0.188695,-0.758872,-0.933237,0.955057,-0.143196,3
E,0.190794,1.978757,2.605967,0.683509,0.377536,4


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Removing Columns</p>

<a id="5"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [76]:
df3.drop("new2", axis=1)

Unnamed: 0,W,X,Y,Z,new1
A,2.70685,0.628133,0.907969,0.503826,1.700261
B,0.651118,-0.319318,-0.848077,0.605965,-0.207914
C,-2.018168,0.740122,0.528813,-0.589001,-1.493691
D,0.188695,-0.758872,-0.933237,0.955057,-0.143196
E,0.190794,1.978757,2.605967,0.683509,0.377536


In [68]:
df3

Unnamed: 0,W,X,Y,Z,new1,new2
A,2.70685,0.628133,0.907969,0.503826,1.700261,0
B,0.651118,-0.319318,-0.848077,0.605965,-0.207914,1
C,-2.018168,0.740122,0.528813,-0.589001,-1.493691,2
D,0.188695,-0.758872,-0.933237,0.955057,-0.143196,3
E,0.190794,1.978757,2.605967,0.683509,0.377536,4


In [69]:
df3.drop(["new1", "new2"], axis=1)

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [70]:
df3.drop(columns=["new1", "new2"])

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [77]:
df3.drop(columns=["new1", "new2"], inplace=True) # permanently changes
df3

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Removing Rows</p>

<a id="6"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [78]:
df3.drop("C", axis=0) # temp change

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [79]:
df3.drop(index="B")

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [80]:
df3.drop("E")

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057


In [81]:
df3

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


In [82]:
df3.drop(["B", "D"])

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
C,-2.018168,0.740122,0.528813,-0.589001
E,0.190794,1.978757,2.605967,0.683509


In [84]:
df3.drop(df3.index[3])  # df3.drop("D")

Unnamed: 0,W,X,Y,Z
A,2.70685,0.628133,0.907969,0.503826
B,0.651118,-0.319318,-0.848077,0.605965
C,-2.018168,0.740122,0.528813,-0.589001
E,0.190794,1.978757,2.605967,0.683509


In [86]:
df3.drop(df3.index[0:2])

Unnamed: 0,W,X,Y,Z
C,-2.018168,0.740122,0.528813,-0.589001
D,0.188695,-0.758872,-0.933237,0.955057
E,0.190794,1.978757,2.605967,0.683509


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Selecting Rows and Columns using .loc[ ] and iloc[ ]</p>

<a id="7"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

#### `.loc[]` → allows us to select data using **labels** (names) of rows (index) & columns

#### `.iloc[]` → allows us to select data using **index numbers** of rows (index) & columns. it's like classical indexing logic

In [85]:
data4 = np.random.randint(1, 40, size=(8, 4))
df4 = pd.DataFrame(data4, columns = ["var1", "var2", "var3", 'var4'])
df4

Unnamed: 0,var1,var2,var3,var4
0,8,11,39,10
1,19,8,16,1
2,13,18,12,16
3,34,30,25,37
4,20,36,31,11
5,21,28,9,23
6,27,24,38,23
7,10,3,19,29


In [87]:
df4.loc[4]

var1    20
var2    36
var3    31
var4    11
Name: 4, dtype: int32

In [88]:
df4.loc[[4]]

Unnamed: 0,var1,var2,var3,var4
4,20,36,31,11


In [89]:
df4.loc[2:5] # returns stop inclusive range

Unnamed: 0,var1,var2,var3,var4
2,13,18,12,16
3,34,30,25,37
4,20,36,31,11
5,21,28,9,23


In [90]:
df4.iloc[2:5] # returns stop exclusive range

Unnamed: 0,var1,var2,var3,var4
2,13,18,12,16
3,34,30,25,37
4,20,36,31,11


In [91]:
df4.index = 'a b c d e f g h'.split()
df4

Unnamed: 0,var1,var2,var3,var4
a,8,11,39,10
b,19,8,16,1
c,13,18,12,16
d,34,30,25,37
e,20,36,31,11
f,21,28,9,23
g,27,24,38,23
h,10,3,19,29


In [92]:
df4.iloc[1:4] # uses background index numbers (0,1,2, ...), doesn't care labels

Unnamed: 0,var1,var2,var3,var4
b,19,8,16,1
c,13,18,12,16
d,34,30,25,37


In [93]:
df4.loc['a':'d']

Unnamed: 0,var1,var2,var3,var4
a,8,11,39,10
b,19,8,16,1
c,13,18,12,16
d,34,30,25,37


In [95]:
df4.loc['d', 'var3'] # row, column via labels

25

In [96]:
df4.iloc[3, 2]

25

In [97]:
df4.loc['d':'g',"var2"]

d    30
e    36
f    28
g    24
Name: var2, dtype: int32

In [99]:
df4.loc['d':'g']["var2"]

d    30
e    36
f    28
g    24
Name: var2, dtype: int32

In [100]:
df4.loc['d':'g'][["var2"]]

Unnamed: 0,var2
d,30
e,36
f,28
g,24


In [101]:
df4.loc['d':'g',["var2"]]

Unnamed: 0,var2
d,30
e,36
f,28
g,24


In [102]:
df4.loc['d':'g',["var2", "var3"]]

Unnamed: 0,var2,var3
d,30,25
e,36,31
f,28,9
g,24,38


In [103]:
df4

Unnamed: 0,var1,var2,var3,var4
a,8,11,39,10
b,19,8,16,1
c,13,18,12,16
d,34,30,25,37
e,20,36,31,11
f,21,28,9,23
g,27,24,38,23
h,10,3,19,29


In [104]:
df4.iloc[2:5, 2]

c    12
d    25
e    31
Name: var3, dtype: int32

In [105]:
df4.iloc[2:5, [2]]

Unnamed: 0,var3
c,12
d,25
e,31


In [108]:
df4.iloc[2:5][["var2"]]

Unnamed: 0,var2
c,18
d,30
e,36


In [109]:
df4.loc["a", "var1"]

8

In [112]:
df4.loc[["a"], ["var1"]] # returns as dataframe

Unnamed: 0,var1
a,8


In [113]:
df4.loc[["a", "c"], ["var1", "var3"]] 

Unnamed: 0,var1,var3
a,8,39
c,13,12


In [114]:
df4

Unnamed: 0,var1,var2,var3,var4
a,8,11,39,10
b,19,8,16,1
c,13,18,12,16
d,34,30,25,37
e,20,36,31,11
f,21,28,9,23
g,27,24,38,23
h,10,3,19,29


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Conditional Selection</p>

<a id="8"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

An important feature of pandas is conditional selection using bracket notation, very similar to numpy:

In [115]:
df4

Unnamed: 0,var1,var2,var3,var4
a,8,11,39,10
b,19,8,16,1
c,13,18,12,16
d,34,30,25,37
e,20,36,31,11
f,21,28,9,23
g,27,24,38,23
h,10,3,19,29


### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">One Conditional Statement</p>

<a id="8.1"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [116]:
df4 > 10

Unnamed: 0,var1,var2,var3,var4
a,False,True,True,False
b,True,False,True,False
c,True,True,True,True
d,True,True,True,True
e,True,True,True,True
f,True,True,False,True
g,True,True,True,True
h,False,False,True,True


In [117]:
df4[df4>10]

Unnamed: 0,var1,var2,var3,var4
a,,11.0,39.0,
b,19.0,,16.0,
c,13.0,18.0,12.0,16.0
d,34.0,30.0,25.0,37.0
e,20.0,36.0,31.0,11.0
f,21.0,28.0,,23.0
g,27.0,24.0,38.0,23.0
h,,,19.0,29.0


In [118]:
df4[df4["var1"]>10]

Unnamed: 0,var1,var2,var3,var4
b,19,8,16,1
c,13,18,12,16
d,34,30,25,37
e,20,36,31,11
f,21,28,9,23
g,27,24,38,23


In [120]:
df4[df4["var1"]>10][["var2"]]

Unnamed: 0,var2
b,8
c,18
d,30
e,36
f,28
g,24


In [122]:
df4[df4["var1"]>10][["var2", "var3"]]

Unnamed: 0,var2,var3
b,8,16
c,18,12
d,30,25
e,36,31
f,28,9
g,24,38


In [121]:
df4[df4["var1"]>10][["var2"]].mean()

var2    24.0
dtype: float64

### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Two or More Conditional Statements</p>

<a id="8.2"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

**For two or more conditions, you can use | → or, & → and with parenthesis:**

In [123]:
df4

Unnamed: 0,var1,var2,var3,var4
a,8,11,39,10
b,19,8,16,1
c,13,18,12,16
d,34,30,25,37
e,20,36,31,11
f,21,28,9,23
g,27,24,38,23
h,10,3,19,29


In [124]:
df4[(df4["var1"] > 10) & (df4["var2"] < 20)]

Unnamed: 0,var1,var2,var3,var4
b,19,8,16,1
c,13,18,12,16


### <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:150%; text-align:LEFT; border-radius:10px 10px;">Conditional Selection Using .loc[ ] and .iloc[ ]</p>

<a id="8.3"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

In [125]:
df4.loc[df4["var1"]>10, ["var2", "var3"]]

Unnamed: 0,var2,var3
b,8,16
c,18,12
d,30,25
e,36,31
f,28,9
g,24,38


In [127]:
df4.loc[((df4["var1"]<10) | (df4["var1"] > 30)),  ["var2", "var3"]]

Unnamed: 0,var2,var3
a,11,39
d,30,25


## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">reset_index() & set_index()</p>

<a id="9"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Let's discuss some more features of indexing, including resetting the index or setting it something else. We'll also talk about index hierarchy!

In [135]:
df4.reset_index()

Unnamed: 0,var4,var1,var2,var3
0,10,8,11,39
1,1,19,8,16
2,16,13,18,12
3,37,34,30,25
4,11,20,36,31
5,23,21,28,9
6,23,27,24,38
7,29,10,3,19


In [136]:
df4.reset_index(drop=True)

Unnamed: 0,var1,var2,var3
0,8,11,39
1,19,8,16
2,13,18,12
3,34,30,25
4,20,36,31
5,21,28,9
6,27,24,38
7,10,3,19


In [137]:
df4.reset_index(drop=True, inplace=True)
df4

Unnamed: 0,var1,var2,var3
0,8,11,39
1,19,8,16
2,13,18,12
3,34,30,25
4,20,36,31
5,21,28,9
6,27,24,38
7,10,3,19


In [138]:
df4.set_index("var4")

KeyError: "None of ['var4'] are in the columns"

In [139]:
df4.set_index("var4", inplace=True)
df4

KeyError: "None of ['var4'] are in the columns"

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Multi-Index & Index Hierarchy</p>

<a id="10"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

Let us go over how to work with Multi-Index, first we'll create a quick example of what a Multi-Indexed DataFrame would look like:

In [140]:
outside = ['M1', 'M1', 'M1', 'M2', 'M2', 'M2','M3', 'M3', 'M3']
inside = [1, 2, 3, 1, 2, 3, 5, 6, 7]
multi_index = list(zip(outside, inside))
multi_index

[('M1', 1),
 ('M1', 2),
 ('M1', 3),
 ('M2', 1),
 ('M2', 2),
 ('M2', 3),
 ('M3', 5),
 ('M3', 6),
 ('M3', 7)]

In [141]:
hier_index = pd.MultiIndex.from_tuples(multi_index)
hier_index

MultiIndex([('M1', 1),
            ('M1', 2),
            ('M1', 3),
            ('M2', 1),
            ('M2', 2),
            ('M2', 3),
            ('M3', 5),
            ('M3', 6),
            ('M3', 7)],
           )

In [142]:
np.random.seed(101)
df5 = pd.DataFrame(np.random.randn(9, 4), index = hier_index, columns=['A', 'B', 'C', 'D'])
df5

Unnamed: 0,Unnamed: 1,A,B,C,D
M1,1,2.70685,0.628133,0.907969,0.503826
M1,2,0.651118,-0.319318,-0.848077,0.605965
M1,3,-2.018168,0.740122,0.528813,-0.589001
M2,1,0.188695,-0.758872,-0.933237,0.955057
M2,2,0.190794,1.978757,2.605967,0.683509
M2,3,0.302665,1.693723,-1.706086,-1.159119
M3,5,-0.134841,0.390528,0.166905,0.184502
M3,6,0.807706,0.07296,0.638787,0.329646
M3,7,-0.497104,-0.75407,-0.943406,0.484752


**``Note``** that all of the MultiIndex constructors accept a names argument which stores string names for the levels themselves. If no names are provided, None will be assigned:

For more information Indexing and Selecting Data, visit [**Pandas Official Documentation**](https://pandas.pydata.org/pandas-docs/version/0.13.0/indexing.html)

In [143]:
df5.index.names

FrozenList([None, None])

In [144]:
df5.index.names = ["Group", "Num"]
df5.index.names

FrozenList(['Group', 'Num'])

In [145]:
df5

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B,C,D
Group,Num,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M1,1,2.70685,0.628133,0.907969,0.503826
M1,2,0.651118,-0.319318,-0.848077,0.605965
M1,3,-2.018168,0.740122,0.528813,-0.589001
M2,1,0.188695,-0.758872,-0.933237,0.955057
M2,2,0.190794,1.978757,2.605967,0.683509
M2,3,0.302665,1.693723,-1.706086,-1.159119
M3,5,-0.134841,0.390528,0.166905,0.184502
M3,6,0.807706,0.07296,0.638787,0.329646
M3,7,-0.497104,-0.75407,-0.943406,0.484752


In [147]:
df5.index

MultiIndex([('M1', 1),
            ('M1', 2),
            ('M1', 3),
            ('M2', 1),
            ('M2', 2),
            ('M2', 3),
            ('M3', 5),
            ('M3', 6),
            ('M3', 7)],
           names=['Group', 'Num'])

In [148]:
df5.index.levels

FrozenList([['M1', 'M2', 'M3'], [1, 2, 3, 5, 6, 7]])

In [149]:
df5.index.levels[0]

Index(['M1', 'M2', 'M3'], dtype='object', name='Group')

In [150]:
df5.index.levels[1]

Int64Index([1, 2, 3, 5, 6, 7], dtype='int64', name='Num')

In [154]:
df5.index.get_level_values(0)

Index(['M1', 'M1', 'M1', 'M2', 'M2', 'M2', 'M3', 'M3', 'M3'], dtype='object', name='Group')

In [155]:
df5.index.get_level_values(1)

Int64Index([1, 2, 3, 1, 2, 3, 5, 6, 7], dtype='int64', name='Num')

In [156]:
df5.index.get_level_values("Group")

Index(['M1', 'M1', 'M1', 'M2', 'M2', 'M2', 'M3', 'M3', 'M3'], dtype='object', name='Group')

Now let's show how to index this! For index hierarchy we use ``df.loc[]``, if this was on the columns axis, you would just use normal bracket notation ``df[]``. Calling one level of the index returns the sub-dataframe:

In [157]:
df5

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B,C,D
Group,Num,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M1,1,2.70685,0.628133,0.907969,0.503826
M1,2,0.651118,-0.319318,-0.848077,0.605965
M1,3,-2.018168,0.740122,0.528813,-0.589001
M2,1,0.188695,-0.758872,-0.933237,0.955057
M2,2,0.190794,1.978757,2.605967,0.683509
M2,3,0.302665,1.693723,-1.706086,-1.159119
M3,5,-0.134841,0.390528,0.166905,0.184502
M3,6,0.807706,0.07296,0.638787,0.329646
M3,7,-0.497104,-0.75407,-0.943406,0.484752


In [158]:
df5["A"]

Group  Num
M1     1      2.706850
       2      0.651118
       3     -2.018168
M2     1      0.188695
       2      0.190794
       3      0.302665
M3     5     -0.134841
       6      0.807706
       7     -0.497104
Name: A, dtype: float64

In [159]:
df5[["A"]]

Unnamed: 0_level_0,Unnamed: 1_level_0,A
Group,Num,Unnamed: 2_level_1
M1,1,2.70685
M1,2,0.651118
M1,3,-2.018168
M2,1,0.188695
M2,2,0.190794
M2,3,0.302665
M3,5,-0.134841
M3,6,0.807706
M3,7,-0.497104


In [160]:
df5[["A", "B"]]

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B
Group,Num,Unnamed: 2_level_1,Unnamed: 3_level_1
M1,1,2.70685,0.628133
M1,2,0.651118,-0.319318
M1,3,-2.018168,0.740122
M2,1,0.188695,-0.758872
M2,2,0.190794,1.978757
M2,3,0.302665,1.693723
M3,5,-0.134841,0.390528
M3,6,0.807706,0.07296
M3,7,-0.497104,-0.75407


In [161]:
df5.loc["M1"]

Unnamed: 0_level_0,A,B,C,D
Num,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1,2.70685,0.628133,0.907969,0.503826
2,0.651118,-0.319318,-0.848077,0.605965
3,-2.018168,0.740122,0.528813,-0.589001


In [162]:
df5.loc[("M1", 2)]

A    0.651118
B   -0.319318
C   -0.848077
D    0.605965
Name: (M1, 2), dtype: float64

In [163]:
df5.loc[[("M1", 2)]]

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B,C,D
Group,Num,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M1,2,0.651118,-0.319318,-0.848077,0.605965


In [164]:
df5.loc["M1", "A":"C"]

Unnamed: 0_level_0,A,B,C
Num,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,2.70685,0.628133,0.907969
2,0.651118,-0.319318,-0.848077
3,-2.018168,0.740122,0.528813


In [165]:
df5.loc[[("M1", 2)], "A":"C"]

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B,C
Group,Num,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
M1,2,0.651118,-0.319318,-0.848077


In [166]:
df5.loc["M1":"M2"]

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B,C,D
Group,Num,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M1,1,2.70685,0.628133,0.907969,0.503826
M1,2,0.651118,-0.319318,-0.848077,0.605965
M1,3,-2.018168,0.740122,0.528813,-0.589001
M2,1,0.188695,-0.758872,-0.933237,0.955057
M2,2,0.190794,1.978757,2.605967,0.683509
M2,3,0.302665,1.693723,-1.706086,-1.159119


In [167]:
df5.loc[("M1", 2):("M2", 1)]

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B,C,D
Group,Num,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M1,2,0.651118,-0.319318,-0.848077,0.605965
M1,3,-2.018168,0.740122,0.528813,-0.589001
M2,1,0.188695,-0.758872,-0.933237,0.955057


In [168]:
df5.loc["M1":"M2":2]

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B,C,D
Group,Num,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M1,1,2.70685,0.628133,0.907969,0.503826
M1,3,-2.018168,0.740122,0.528813,-0.589001
M2,2,0.190794,1.978757,2.605967,0.683509


In [170]:
df5.loc[[("M2", 3), ("M3", 5)]]

Unnamed: 0_level_0,Unnamed: 1_level_0,A,B,C,D
Group,Num,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
M2,3,0.302665,1.693723,-1.706086,-1.159119
M3,5,-0.134841,0.390528,0.166905,0.184502


More information for Multiindex and Advanced Indexing, visit [**Pandas Official Documentation**](https://pandas.pydata.org/docs/user_guide/advanced.html)

## <p style="background-color:#9d4f8c; font-family:newtimeroman; color:#FFF9ED; font-size:175%; text-align:center; border-radius:10px 10px;">Some Other Useful Methods with Iris Dataset</p>

<a id="11"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

### Let's apply functions/attributes/methods we have learnt for "iris dataset" 

In [172]:
sns.get_dataset_names()

['anagrams',
 'anscombe',
 'attention',
 'brain_networks',
 'car_crashes',
 'diamonds',
 'dots',
 'exercise',
 'flights',
 'fmri',
 'gammas',
 'geyser',
 'iris',
 'mpg',
 'penguins',
 'planets',
 'taxis',
 'tips',
 'titanic']

In [173]:
df6 = sns.load_dataset("iris")

In [174]:
df6.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:150%; text-align:center; border-radius:10px 10px;">The End of The Session - 04</p>

<a id="12"></a>
<a href="#toc" class="btn btn-primary btn-sm" role="button" aria-pressed="true" 
style="color:blue; background-color:#dfa8e4" data-toggle="popover">Content</a>

<p style="text-align: center;"><img src="https://docs.google.com/uc?id=1lY0Uj5R04yMY3-ZppPWxqCr5pvBLYPnV" class="img-fluid" 
alt="CLRSWY"></p>

## <p style="background-color:#FDFEFE; font-family:newtimeroman; color:#9d4f8c; font-size:100%; text-align:center; border-radius:10px 10px;">WAY TO REINVENT YOURSELF</p>

________