# Python Pandas Introduction
Pandas is defined as an open-source library that provides high-performance data manipulation in Python. 

Data analysis requires lots of processing, such as restructuring, cleaning or merging, etc. There are different tools are available for fast data processing, such as Numpy, Scipy, Cython, and Panda. But we prefer Pandas because working with Pandas is fast, simple and more expressive than other tools.. It can perform five significant steps required for processing and analysis of data irrespective of the origin of the data, i.e., load, manipulate, prepare, model, and analyze.


Pandas is built on top of the Numpy package, means Numpy is required for operating the Pandas.


# Python Pandas Data Structure
The Pandas provides two data structures for processing the data, i.e., Series and DataFrame, which are discussed below:

# 1 Python Pandas Series
It is defined as a one-dimensional array that is capable of storing various data types. The row labels of series are called the index. We can easily convert the list, tuple, and dictionary into series using "series' method. A Series cannot contain multiple columns. It has one parameter:

In [None]:
#pip install pandas

In [4]:
import pandas as pd

pd.Series(    dtype="object" ) # series is a single column in a dataframe

Series([], dtype: object)

# Create a Series from a  list object

In [1]:
import pandas as pd  
import numpy as np  
info = np.array(['P','a','n','d','a','s'])  



a = pd.Series(info)  
print(a)  

0    P
1    a
2    n
3    d
4    a
5    s
dtype: object


In [3]:
# creating series from a list with the following index  ["one", "two", "three", "four"]

list1 =  ['P','a','n', 'h']


series=   pd.Series( list1 , index = ["one", "two", "three", "four"])
print(series)

one      P
two      a
three    n
four     h
dtype: object


In [None]:
#Create a Series from dict: when you create series from dictionary, the key become the index of the series  
import pandas as pd  
import numpy as np  

info = {'x' : 0, 'y' : 1, 'z' : 2}  


a = pd.Series( info, dtype="float")  
print (a)  

In [24]:
x = pd.Series(     [1,2,3],         index = ['a','b','c'], dtype="int32")  
#retrieve the first element  
print (x)  


a    1
b    2
c    3
dtype: int32


In [6]:
# slice through the series above from second row to the last row
x[  1:  ]

b    2
c    3
dtype: int64

In [7]:
# to get access to the index of series
x.index

Index(['a', 'b', 'c'], dtype='object')

In [8]:
# changing the 
x.index = ["one", "two", "three"]

In [9]:
print(x)

one      1
two      2
three    3
dtype: int64


In [10]:
# to get the values of the series
x.values

array([1, 2, 3], dtype=int64)

In [8]:
# to get dimension of a series
x.shape

(3,)

# Pandas Series.map()
The main task of map() is used to map the values from two series that have a common column.

In [13]:
grade = pd.Series(["A", "A", "C",  "E", "E", "B", "D", "D", "C", "F"]) 
print(grade)

0    A
1    A
2    C
3    E
4    E
5    B
6    D
7    D
8    C
9    F
dtype: object


In [15]:
# convert the series of grade above the point the relatiionship below
grade_to_point = {"A":5, "B":4, "C":3, "D":2,  "E": 1, "F":0 }

In [16]:
grade.map(grade_to_point)

0    5
1    5
2    3
3    1
4    1
5    4
6    2
7    2
8    3
9    0
dtype: int64

In [26]:
x=pd.Series(["male", "male", "female", "female", "male", "female", "female", "transgender"])
print(x)

# change the values of series above using map , all female should be 0, all male should 1 
# and transgender should be 2

0           male
1           male
2         female
3         female
4           male
5         female
6         female
7    transgender
dtype: object


In [22]:
gender_to_number = {"male": 1, "female":0, "transgender":2}
f =   x.map(gender_to_number)

In [23]:
print(f)

0    1
1    1
2    0
3    0
4    1
5    0
6    0
7    2
dtype: int64


# Pandas Series.value_counts()
to count of number of records in a series

In [28]:
x.value_counts()

female         4
male           3
transgender    1
dtype: int64

# Converting Series to Dataframe

In [31]:
# to_frame
x.to_frame(name = "gender")

Unnamed: 0,gender
0,male
1,male
2,female
3,female
4,male
5,female
6,female
7,transgender


# 2 Python Pandas DataFrame
is a widely used data structure of pandas and works with a two-dimensional array with labeled axes (rows and columns). DataFrame is defined as a standard way to store data and has two different indexes, i.e., row index and column index. It consists of the following properties:
The columns can be heterogeneous types like int, bool, and so on.

It can be seen as a dictionary of Series structure where both the rows and columns are indexed. It is denoted as "columns" in case of columns and "index" in case of rows.


In [32]:
import pandas as pd  
# a list of strings  
x = ['Python', 'Pandas']  
  
# Calling DataFrame constructor on list  
df = pd.DataFrame(x)  
display(df)  


Unnamed: 0,0
0,Python
1,Pandas


In [35]:
# creation of Dataframe from list  below, column name should be "information", and index  should
# as follow  "A", "B", "C", "D", "E", "F"

list1 = [1,2,3,"Four", "Five", True, ]


df  = pd.DataFrame(list1, columns=["information"], index = list("ABCDEF"))
display(df)

Unnamed: 0,information
A,1
B,2
C,3
D,Four
E,Five
F,True


In [36]:
# create dataframe from the list below
l1 = ["island", "ikeja", "berger"] # location
l2 = [1000, 150, 50]  # T fare

dic = {"location":l1, "T fare": l2}

df1 = pd.DataFrame(dic)
display(df1)

Unnamed: 0,location,T fare
0,island,1000
1,ikeja,150
2,berger,50


In [None]:
import numpy as np

In [39]:
dict_ = {
    
 "Date": ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-01', '2023-01-07', 
             '2023-01-08', '2023-01-09', '2023-01-10', '2023-01-11', '2023-01-12', '2023-01-13', '2023-01-14', 
             '2023-01-15', '2023-01-16', '2023-01-01'],
    
 "Name":["Jamiu", "Olaitan", np.nan, "Pelumi", "Kemi", "Jamiu", "Sanhco", "Richard", "James","Sanhco",  "Lukmon", np.nan, 
         "Idris","Rashford", np.nan, "Olaitan", "Jamiu"],
    
 "Age": [25, 12, 50, 16, 19, 25, 20,30,np.nan, 34,56,78,34, np.nan,12, 34, 25],
    
 "State":["Lagos", "Kano", "Niger", np.nan, "Oyo","Lagos", "Port harcourt", "Delta", np.nan,"Kwara", "Adamawa", "Benue", 
          "Edo", "Jos", "Kano", "Delta", "Lagos"],
    
 "Job":  ["Engineer", "Banker", "Lawyer", np.nan,"Doctor", "Engineer",  "Driver", "CEO", "Farmer", "Nurse", "Artist",
          "Footballer", "Nurse", np.nan, np.nan, "Banker", "Engineer"],
    
  "Salary":[39485.43,20000.3, 7363.00, 67362.00, 7362.73, 39485.43,np.nan, 83663.28, 98382.92, np.nan, np.nan,9448.34, 
            np.nan,8499.00, 20000.3, 3849, 39485.43],
    
        "Review": np.repeat("good", 17)
      }


In [69]:
# read the dictionary above into dataframe using the following as index "abcdefghijklmnopq"
df2 = pd.DataFrame(dict_, index= list("abcdefghijklmnopq"))
df2

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
a,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
b,2023-01-02,Olaitan,12.0,Kano,Banker,20000.3,good
c,2023-01-03,,50.0,Niger,Lawyer,7363.0,good
d,2023-01-04,Pelumi,16.0,,,67362.0,good
e,2023-01-05,Kemi,19.0,Oyo,Doctor,7362.73,good
f,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
g,2023-01-07,Sanhco,20.0,Port harcourt,Driver,,good
h,2023-01-08,Richard,30.0,Delta,CEO,83663.28,good
i,2023-01-09,James,,,Farmer,98382.92,good
j,2023-01-10,Sanhco,34.0,Kwara,Nurse,,good


In [66]:
# export the dataframe above in excel format
df2.to_excel( "afternoon.xlsx" , index =False )

In [67]:
# export the dataframe above in csv format
df2.to_csv("class.csv", index=False)

In [68]:
# read the the excel and csv file you export into dataframe 

path = "C:/Users/pc/Documents/powerbi/"

df_display = pd.read_excel( path + "afternoon.xlsx")
df_display

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
0,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
1,2023-01-02,Olaitan,12.0,Kano,Banker,20000.3,good
2,2023-01-03,,50.0,Niger,Lawyer,7363.0,good
3,2023-01-04,Pelumi,16.0,,,67362.0,good
4,2023-01-05,Kemi,19.0,Oyo,Doctor,7362.73,good
5,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
6,2023-01-07,Sanhco,20.0,Port harcourt,Driver,,good
7,2023-01-08,Richard,30.0,Delta,CEO,83663.28,good
8,2023-01-09,James,,,Farmer,98382.92,good
9,2023-01-10,Sanhco,34.0,Kwara,Nurse,,good


In [53]:
df_display = pd.read_csv( path + "class.csv")
df_display

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
0,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
1,2023-01-02,Olaitan,12.0,Kano,Banker,20000.3,good
2,2023-01-03,,50.0,Niger,Lawyer,7363.0,good
3,2023-01-04,Pelumi,16.0,,,67362.0,good
4,2023-01-05,Kemi,19.0,Oyo,Doctor,7362.73,good
5,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
6,2023-01-07,Sanhco,20.0,Port harcourt,Driver,,good
7,2023-01-08,Richard,30.0,Delta,CEO,83663.28,good
8,2023-01-09,James,,,Farmer,98382.92,good
9,2023-01-10,Sanhco,34.0,Kwara,Nurse,,good


# Viewing/Inspecting Data

In [54]:
# print the first seven row of the dataframe
df_display.head(7)

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
0,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
1,2023-01-02,Olaitan,12.0,Kano,Banker,20000.3,good
2,2023-01-03,,50.0,Niger,Lawyer,7363.0,good
3,2023-01-04,Pelumi,16.0,,,67362.0,good
4,2023-01-05,Kemi,19.0,Oyo,Doctor,7362.73,good
5,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
6,2023-01-07,Sanhco,20.0,Port harcourt,Driver,,good


In [55]:
# print the last seven row of the dataframe
df_display.tail(7)

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
10,2023-01-11,Lukmon,56.0,Adamawa,Artist,,good
11,2023-01-12,,78.0,Benue,Footballer,9448.34,good
12,2023-01-13,Idris,34.0,Edo,Nurse,,good
13,2023-01-14,Rashford,,Jos,,8499.0,good
14,2023-01-15,,12.0,Kano,,20000.3,good
15,2023-01-16,Olaitan,34.0,Delta,Banker,3849.0,good
16,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good


In [57]:
# randomly print seven in the dataframe
df_display.sample(7)

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
2,2023-01-03,,50.0,Niger,Lawyer,7363.0,good
6,2023-01-07,Sanhco,20.0,Port harcourt,Driver,,good
3,2023-01-04,Pelumi,16.0,,,67362.0,good
12,2023-01-13,Idris,34.0,Edo,Nurse,,good
13,2023-01-14,Rashford,,Jos,,8499.0,good
4,2023-01-05,Kemi,19.0,Oyo,Doctor,7362.73,good
16,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good


In [58]:
# print the summary of the dataframe
df_display.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17 entries, 0 to 16
Data columns (total 7 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Date    17 non-null     object 
 1   Name    14 non-null     object 
 2   Age     15 non-null     float64
 3   State   15 non-null     object 
 4   Job     14 non-null     object 
 5   Salary  13 non-null     float64
 6   Review  17 non-null     object 
dtypes: float64(2), object(5)
memory usage: 1.1+ KB


In [64]:
df_display

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
0,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
1,2023-01-02,Olaitan,12.0,Kano,Banker,20000.3,good
2,2023-01-03,,50.0,Niger,Lawyer,7363.0,good
3,2023-01-04,Pelumi,16.0,,,67362.0,good
4,2023-01-05,Kemi,19.0,Oyo,Doctor,7362.73,good
5,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
6,2023-01-07,Sanhco,20.0,Port harcourt,Driver,,good
7,2023-01-08,Richard,30.0,Delta,CEO,83663.28,good
8,2023-01-09,James,,,Farmer,98382.92,good
9,2023-01-10,Sanhco,34.0,Kwara,Nurse,,good


In [60]:
 # print the dimension of the dataframe
df_display.shape

(17, 7)

In [61]:
# print all the column of the dataframe
df_display.columns

Index(['Date', 'Name', 'Age', 'State', 'Job', 'Salary', 'Review'], dtype='object')

In [63]:
# gives the statistical summary of the dataframe
df_display.describe()

Unnamed: 0,Age,Salary
count,15.0,13.0
mean,31.333333,34183.627692
std,18.010579,31341.77354
min,12.0,3849.0
25%,19.5,8499.0
50%,25.0,20000.3
75%,34.0,39485.43
max,78.0,98382.92


# Row and Column Selection in Dataframe
df2.loc[ start_row: end_row , start_column: end_column        ]

In [71]:
# Using tne label name, select the second column of the dataframe and all the rows
df2.loc[ :,  "Name"]

a       Jamiu
b     Olaitan
c         NaN
d      Pelumi
e        Kemi
f       Jamiu
g      Sanhco
h     Richard
i       James
j      Sanhco
k      Lukmon
l         NaN
m       Idris
n    Rashford
o         NaN
p     Olaitan
q       Jamiu
Name: Name, dtype: object

In [73]:
# Using tne label name, select the second column of the dataframe and row 5 to 10

df2.loc["e":"j", "Name"]

e       Kemi
f      Jamiu
g     Sanhco
h    Richard
i      James
j     Sanhco
Name: Name, dtype: object

In [72]:
df2

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
a,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
b,2023-01-02,Olaitan,12.0,Kano,Banker,20000.3,good
c,2023-01-03,,50.0,Niger,Lawyer,7363.0,good
d,2023-01-04,Pelumi,16.0,,,67362.0,good
e,2023-01-05,Kemi,19.0,Oyo,Doctor,7362.73,good
f,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
g,2023-01-07,Sanhco,20.0,Port harcourt,Driver,,good
h,2023-01-08,Richard,30.0,Delta,CEO,83663.28,good
i,2023-01-09,James,,,Farmer,98382.92,good
j,2023-01-10,Sanhco,34.0,Kwara,Nurse,,good


In [74]:
# Using tne label name, select the second, fourth and the sixth columns of the dataframe and all
# the rows
df2.loc[:,    ["Name", "State", "Salary"]      ]

Unnamed: 0,Name,State,Salary
a,Jamiu,Lagos,39485.43
b,Olaitan,Kano,20000.3
c,,Niger,7363.0
d,Pelumi,,67362.0
e,Kemi,Oyo,7362.73
f,Jamiu,Lagos,39485.43
g,Sanhco,Port harcourt,
h,Richard,Delta,83663.28
i,James,,98382.92
j,Sanhco,Kwara,


In [76]:
# Using tne label name, select the second, fourth and the sixth columns of the dataframe and rows 
# 7, 10   and 14

df2.loc[ ["g", "j", "n"],  ["Name", "State", "Salary"]   ]

Unnamed: 0,Name,State,Salary
g,Sanhco,Port harcourt,
j,Sanhco,Kwara,
n,Rashford,Jos,8499.0


In [75]:
df2

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
a,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
b,2023-01-02,Olaitan,12.0,Kano,Banker,20000.3,good
c,2023-01-03,,50.0,Niger,Lawyer,7363.0,good
d,2023-01-04,Pelumi,16.0,,,67362.0,good
e,2023-01-05,Kemi,19.0,Oyo,Doctor,7362.73,good
f,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
g,2023-01-07,Sanhco,20.0,Port harcourt,Driver,,good
h,2023-01-08,Richard,30.0,Delta,CEO,83663.28,good
i,2023-01-09,James,,,Farmer,98382.92,good
j,2023-01-10,Sanhco,34.0,Kwara,Nurse,,good


In [77]:
# Using tne index number, select the second column of the dataframe and all the rows
df2.iloc[:, 1]

a       Jamiu
b     Olaitan
c         NaN
d      Pelumi
e        Kemi
f       Jamiu
g      Sanhco
h     Richard
i       James
j      Sanhco
k      Lukmon
l         NaN
m       Idris
n    Rashford
o         NaN
p     Olaitan
q       Jamiu
Name: Name, dtype: object

In [78]:
# Using tne index number, select the second column of the dataframe and rows 5 to 10

df2.iloc[4:11, 1]

e       Kemi
f      Jamiu
g     Sanhco
h    Richard
i      James
j     Sanhco
k     Lukmon
Name: Name, dtype: object

In [81]:
# Using tne index number, select the second, fourth and the sixth columns of the dataframe and all
# the rows

df2.iloc[:, [1, 3, 5]]

Unnamed: 0,Name,State,Salary
a,Jamiu,Lagos,39485.43
b,Olaitan,Kano,20000.3
c,,Niger,7363.0
d,Pelumi,,67362.0
e,Kemi,Oyo,7362.73
f,Jamiu,Lagos,39485.43
g,Sanhco,Port harcourt,
h,Richard,Delta,83663.28
i,James,,98382.92
j,Sanhco,Kwara,


In [None]:
# Using tne index number, select the second, fourth and the sixth columns of the dataframe and rows from
# 7 to 14

# Modifying Data on the dataframe 

In [None]:
# Mrs Pelumi asked to update her State and job as follow (Abuja and Data science ). perform the task

In [84]:
df2.loc["d", ["State", "Job"]] 

State    NaN
Job      NaN
Name: d, dtype: object

In [85]:
df2.loc["d", ["State", "Job"]] = ["Abuja", "Data Science"]

In [86]:
df2

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
a,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
b,2023-01-02,Olaitan,12.0,Kano,Banker,20000.3,good
c,2023-01-03,,50.0,Niger,Lawyer,7363.0,good
d,2023-01-04,Pelumi,16.0,Abuja,Data Science,67362.0,good
e,2023-01-05,Kemi,19.0,Oyo,Doctor,7362.73,good
f,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
g,2023-01-07,Sanhco,20.0,Port harcourt,Driver,,good
h,2023-01-08,Richard,30.0,Delta,CEO,83663.28,good
i,2023-01-09,James,,,Farmer,98382.92,good
j,2023-01-10,Sanhco,34.0,Kwara,Nurse,,good


In [88]:
player_dicts={"Play":['Kai Harvert', 'Gabriel Jesus',  'Gabriel Jesus', "Declan Rice", "Thomas Patrey", 
         "Bukayo Saka", "Bukayo Saka"], 
              "Agekkkkkk":[24,26, 26,24,30,21, 21],
                }

arsenal = pd.DataFrame(player_dicts)
arsenal

Unnamed: 0,Play,Agekkkkkk
0,Kai Harvert,24
1,Gabriel Jesus,26
2,Gabriel Jesus,26
3,Declan Rice,24
4,Thomas Patrey,30
5,Bukayo Saka,21
6,Bukayo Saka,21


In [None]:
# the list below is the salary of the arsenal player in the above datarame 
# [280000, 265000, 265000, 240000, 20000,  195000, 195000], add it to the dataframe as a column

In [89]:
arsenal["Salary"]=  [280000, 265000, 265000, 240000, 20000,  195000, 195000]

In [90]:
arsenal

Unnamed: 0,Play,Agekkkkkk,Salary
0,Kai Harvert,24,280000
1,Gabriel Jesus,26,265000
2,Gabriel Jesus,26,265000
3,Declan Rice,24,240000
4,Thomas Patrey,30,20000
5,Bukayo Saka,21,195000
6,Bukayo Saka,21,195000


In [93]:
# the columns of the dataframe above is not descriptive. change them appropriately
arsenal.rename(columns={"Agekkkkkk": "Age"}, inplace=True)

In [100]:
arsenal

Unnamed: 0,Play,Salary
0,Kai Harvert,280000
one,Gabriel Jesus,265000
2,Gabriel Jesus,265000
three,Declan Rice,240000
4,Thomas Patrey,20000
5,Bukayo Saka,195000
6,Bukayo Saka,195000


In [96]:
# remove row second and fourth rows in the dataframe above
arsenal.rename(index = {1:"one", 3:"three"}, inplace=True)

In [104]:
# remove the age column from the dataframe above
arsenal.drop(columns= ["Age"], inplace=True)

In [107]:
# remove the second and  the fourth rows from the dataframe above
arsenal.drop(index=["one", "three"], inplace=True)

In [109]:
# check for duplicates and remove them in the dataframe above

df_display.duplicated().sum()

2

In [111]:
df_display[df_display.duplicated()]

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
5,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
16,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good


In [112]:
df_display.drop_duplicates(inplace=True)

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
0,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
1,2023-01-02,Olaitan,12.0,Kano,Banker,20000.3,good
2,2023-01-03,,50.0,Niger,Lawyer,7363.0,good
3,2023-01-04,Pelumi,16.0,,,67362.0,good
4,2023-01-05,Kemi,19.0,Oyo,Doctor,7362.73,good
6,2023-01-07,Sanhco,20.0,Port harcourt,Driver,,good
7,2023-01-08,Richard,30.0,Delta,CEO,83663.28,good
8,2023-01-09,James,,,Farmer,98382.92,good
9,2023-01-10,Sanhco,34.0,Kwara,Nurse,,good
10,2023-01-11,Lukmon,56.0,Adamawa,Artist,,good


In [113]:
# how to check and treat missing values
df_display.isna().sum()

Date      0
Name      3
Age       2
State     2
Job       3
Salary    4
Review    0
dtype: int64

In [None]:
# drop the column or row
# forwardfill
# backwardfill
# fill with constants (mean, mediam, mode)

In [117]:
# droping the rows with missin values
df_display.dropna(axis=0)

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
0,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
1,2023-01-02,Olaitan,12.0,Kano,Banker,20000.3,good
4,2023-01-05,Kemi,19.0,Oyo,Doctor,7362.73,good
5,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
7,2023-01-08,Richard,30.0,Delta,CEO,83663.28,good
15,2023-01-16,Olaitan,34.0,Delta,Banker,3849.0,good
16,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good


In [118]:
# droping the columns with missin values
df_display.dropna(axis=1)

Unnamed: 0,Date,Review
0,2023-01-01,good
1,2023-01-02,good
2,2023-01-03,good
3,2023-01-04,good
4,2023-01-05,good
5,2023-01-01,good
6,2023-01-07,good
7,2023-01-08,good
8,2023-01-09,good
9,2023-01-10,good


In [121]:
# forward fill
df_display.ffill()

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
0,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
1,2023-01-02,Olaitan,12.0,Kano,Banker,20000.3,good
2,2023-01-03,Olaitan,50.0,Niger,Lawyer,7363.0,good
3,2023-01-04,Pelumi,16.0,Niger,Lawyer,67362.0,good
4,2023-01-05,Kemi,19.0,Oyo,Doctor,7362.73,good
5,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
6,2023-01-07,Sanhco,20.0,Port harcourt,Driver,39485.43,good
7,2023-01-08,Richard,30.0,Delta,CEO,83663.28,good
8,2023-01-09,James,30.0,Delta,Farmer,98382.92,good
9,2023-01-10,Sanhco,34.0,Kwara,Nurse,98382.92,good


In [122]:
df_display.backfill()

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
0,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
1,2023-01-02,Olaitan,12.0,Kano,Banker,20000.3,good
2,2023-01-03,Pelumi,50.0,Niger,Lawyer,7363.0,good
3,2023-01-04,Pelumi,16.0,Oyo,Doctor,67362.0,good
4,2023-01-05,Kemi,19.0,Oyo,Doctor,7362.73,good
5,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
6,2023-01-07,Sanhco,20.0,Port harcourt,Driver,83663.28,good
7,2023-01-08,Richard,30.0,Delta,CEO,83663.28,good
8,2023-01-09,James,34.0,Kwara,Farmer,98382.92,good
9,2023-01-10,Sanhco,34.0,Kwara,Nurse,9448.34,good


In [132]:
most_frequent= df_display["Name"].mode()[0]
df_display["Name"].fillna(most_frequent)

0        Jamiu
1      Olaitan
2        Jamiu
3       Pelumi
4         Kemi
5        Jamiu
6       Sanhco
7      Richard
8        James
9       Sanhco
10      Lukmon
11       Jamiu
12       Idris
13    Rashford
14       Jamiu
15     Olaitan
16       Jamiu
Name: Name, dtype: object

In [134]:
 round(df_display["Age"].mean(), 0)


31.0

In [137]:
med= df_display["Age"].median()
med

25.0

In [None]:
df_display["Age"].fillna(med)