# Pandas DataFrame Analysis

Pandas DataFrame objects come with a variety of built-in functions like head(), tail() and info() that allow us to view and analyze DataFrames.

In [1]:
import numpy as np
import pandas as pd

In [2]:
data = {
    'Name':['John', 'Alice', 'Bob', 'Emma', 'Mike', 'Sarah', 'David', 'Linda', 'Tom', 'Emily'],
    'Age':[25, 30, 35, 28, 32, 27, 40, 33, 29, 31],
    'City':['New York', 'Paris', 'London', 'Sydney', 'Tokyo', 'Berlin', 'Rome', 'Madrid', 'Toronto', 'Moscow'],
}

In [3]:
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,John,25,New York
1,Alice,30,Paris
2,Bob,35,London
3,Emma,28,Sydney
4,Mike,32,Tokyo
5,Sarah,27,Berlin
6,David,40,Rome
7,Linda,33,Madrid
8,Tom,29,Toronto
9,Emily,31,Moscow


In [4]:
# Gives First 3 rows
df.head(3)

Unnamed: 0,Name,Age,City
0,John,25,New York
1,Alice,30,Paris
2,Bob,35,London


In [5]:
# Gives First 5 rows
df.head()

Unnamed: 0,Name,Age,City
0,John,25,New York
1,Alice,30,Paris
2,Bob,35,London
3,Emma,28,Sydney
4,Mike,32,Tokyo


In [6]:
# Gives last 2 rows
df.tail(2)

Unnamed: 0,Name,Age,City
8,Tom,29,Toronto
9,Emily,31,Moscow


In [7]:
# Gives last 5 rows
df.tail()

Unnamed: 0,Name,Age,City
5,Sarah,27,Berlin
6,David,40,Rome
7,Linda,33,Madrid
8,Tom,29,Toronto
9,Emily,31,Moscow


Get DataFrame Information <br>

The info() method gives us the overall information about the DataFrame such as its class, data type, size etc. For example,

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    10 non-null     object
 1   Age     10 non-null     int64 
 2   City    10 non-null     object
dtypes: int64(1), object(2)
memory usage: 368.0+ bytes


In [10]:
df.describe() # Gives stats of all numerical columns

Unnamed: 0,Age
count,10.0
mean,31.0
std,4.320494
min,25.0
25%,28.25
50%,30.5
75%,32.75
max,40.0


# Pandas DataFrame Manipulation

DataFrame manipulation in Pandas involves editing and modifying existing DataFrames. Some common DataFrame manipulation operations are: <br>

* Adding rows/columns
* Removing rows/columns
* Renaming rows/columns


Add a New Column to a Pandas DataFrame <br>

We can add a new column to an existing Pandas DataFrame by simply declaring a new list as a column.

In [11]:
data = {
    'Name':['John', 'Emma', 'Michael', 'Sophia'],
    'Height':[5.5, 6.0, 5.8, 5.3],
    'Qualification':['BSc', 'BBA', 'MBA', 'BSc'],
}

In [12]:
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Height,Qualification
0,John,5.5,BSc
1,Emma,6.0,BBA
2,Michael,5.8,MBA
3,Sophia,5.3,BSc


In [13]:
# declare a new list
address = ['New York', 'London', 'Sydney', 'Toronto']

# assign the list as a column
df['Address'] = address
df

Unnamed: 0,Name,Height,Qualification,Address
0,John,5.5,BSc,New York
1,Emma,6.0,BBA,London
2,Michael,5.8,MBA,Sydney
3,Sophia,5.3,BSc,Toronto


Add a New Row to a Pandas DataFrame <br>

Adding rows to a DataFrame is not quite as straightforward as adding columns in Pandas. We use the .loc property to add a new row to a Pandas DataFrame.