# Python Pandas Introduction
Pandas is defined as an open-source library that provides high-performance data manipulation in Python. 

Data analysis requires lots of processing, such as restructuring, cleaning or merging, etc. There are different tools are available for fast data processing, such as Numpy, Scipy, Cython, and Panda. But we prefer Pandas because working with Pandas is fast, simple and more expressive than other tools.. It can perform five significant steps required for processing and analysis of data irrespective of the origin of the data, i.e., load, manipulate, prepare, model, and analyze.


Pandas is built on top of the Numpy package, means Numpy is required for operating the Pandas.


# Python Pandas Data Structure
The Pandas provides two data structures for processing the data, i.e., Series and DataFrame, which are discussed below:

# 1 Python Pandas Series
It is defined as a one-dimensional array that is capable of storing various data types. The row labels of series are called the index. We can easily convert the list, tuple, and dictionary into series using "series' method. A Series cannot contain multiple columns. It has one parameter:

In [None]:
#pip install pandas

In [4]:
import pandas as pd

pd.Series(    dtype="object" ) # series is a single column in a dataframe

Series([], dtype: object)

# Create a Series from a  list object

In [1]:
import pandas as pd  
import numpy as np  
info = np.array(['P','a','n','d','a','s'])  



a = pd.Series(info)  
print(a)  

0    P
1    a
2    n
3    d
4    a
5    s
dtype: object


In [3]:
# creating series from a list with the following index  ["one", "two", "three", "four"]

list1 =  ['P','a','n', 'h']


series=   pd.Series( list1 , index = ["one", "two", "three", "four"])
print(series)

one      P
two      a
three    n
four     h
dtype: object


In [4]:
#Create a Series from dict: when you create series from dictionary, the key become the index of the series  
import pandas as pd  
import numpy as np  

info = {'x' : 0, 'y' : 1, 'z' : 2}  


a = pd.Series( info, dtype="float")  
print (a)  

x    0.0
y    1.0
z    2.0
dtype: float64


In [24]:
x = pd.Series(     [1,2,3],         index = ['a','b','c'], dtype="int32")  
#retrieve the first element  
print (x)  


a    1
b    2
c    3
dtype: int32


In [6]:
# slice through the series above from second row to the last row
x[  1:  ]

b    2
c    3
dtype: int64

In [7]:
# to get access to the index of series
x.index

Index(['a', 'b', 'c'], dtype='object')

In [8]:
# changing the 
x.index = ["one", "two", "three"]

In [9]:
print(x)

one      1
two      2
three    3
dtype: int64


In [10]:
# to get the values of the series
x.values

array([1, 2, 3], dtype=int64)

In [8]:
# to get dimension of a series
x.shape

(3,)

# Pandas Series.map()
The main task of map() is used to map the values from two series that have a common column.

In [13]:
grade = pd.Series(["A", "A", "C",  "E", "E", "B", "D", "D", "C", "F"]) 
print(grade)

0    A
1    A
2    C
3    E
4    E
5    B
6    D
7    D
8    C
9    F
dtype: object


In [15]:
# convert the series of grade above the point the relatiionship below
grade_to_point = {"A":5, "B":4, "C":3, "D":2,  "E": 1, "F":0 }

In [16]:
grade.map(grade_to_point)

0    5
1    5
2    3
3    1
4    1
5    4
6    2
7    2
8    3
9    0
dtype: int64

In [26]:
x=pd.Series(["male", "male", "female", "female", "male", "female", "female", "transgender"])
print(x)

# change the values of series above using map , all female should be 0, all male should 1 
# and transgender should be 2

0           male
1           male
2         female
3         female
4           male
5         female
6         female
7    transgender
dtype: object


In [22]:
gender_to_number = {"male": 1, "female":0, "transgender":2}
f =   x.map(gender_to_number)

In [23]:
print(f)

0    1
1    1
2    0
3    0
4    1
5    0
6    0
7    2
dtype: int64


# Pandas Series.value_counts()
to count of number of records in a series

In [28]:
x.value_counts()

female         4
male           3
transgender    1
dtype: int64

# Converting Series to Dataframe

In [31]:
# to_frame
x.to_frame(name = "gender")

Unnamed: 0,gender
0,male
1,male
2,female
3,female
4,male
5,female
6,female
7,transgender


# 2 Python Pandas DataFrame
is a widely used data structure of pandas and works with a two-dimensional array with labeled axes (rows and columns). DataFrame is defined as a standard way to store data and has two different indexes, i.e., row index and column index. It consists of the following properties:
The columns can be heterogeneous types like int, bool, and so on.

It can be seen as a dictionary of Series structure where both the rows and columns are indexed. It is denoted as "columns" in case of columns and "index" in case of rows.


In [32]:
import pandas as pd  
# a list of strings  
x = ['Python', 'Pandas']  
  
# Calling DataFrame constructor on list  
df = pd.DataFrame(x)  
display(df)  


Unnamed: 0,0
0,Python
1,Pandas


In [35]:
# creation of Dataframe from list  below, column name should be "information", and index  should
# as follow  "A", "B", "C", "D", "E", "F"

list1 = [1,2,3,"Four", "Five", True, ]


df  = pd.DataFrame(list1, columns=["information"], index = list("ABCDEF"))
display(df)

Unnamed: 0,information
A,1
B,2
C,3
D,Four
E,Five
F,True


In [36]:
# create dataframe from the list below
l1 = ["island", "ikeja", "berger"] # location
l2 = [1000, 150, 50]  # T fare

dic = {"location":l1, "T fare": l2}

df1 = pd.DataFrame(dic)
display(df1)

Unnamed: 0,location,T fare
0,island,1000
1,ikeja,150
2,berger,50


In [None]:
import numpy as np

In [39]:
dict_ = {
    
 "Date": ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05', '2023-01-01', '2023-01-07', 
             '2023-01-08', '2023-01-09', '2023-01-10', '2023-01-11', '2023-01-12', '2023-01-13', '2023-01-14', 
             '2023-01-15', '2023-01-16', '2023-01-01'],
    
 "Name":["Jamiu", "Olaitan", np.nan, "Pelumi", "Kemi", "Jamiu", "Sanhco", "Richard", "James","Sanhco",  "Lukmon", np.nan, 
         "Idris","Rashford", np.nan, "Olaitan", "Jamiu"],
    
 "Age": [25, 12, 50, 16, 19, 25, 20,30,np.nan, 34,56,78,34, np.nan,12, 34, 25],
    
 "State":["Lagos", "Kano", "Niger", np.nan, "Oyo","Lagos", "Port harcourt", "Delta", np.nan,"Kwara", "Adamawa", "Benue", 
          "Edo", "Jos", "Kano", "Delta", "Lagos"],
    
 "Job":  ["Engineer", "Banker", "Lawyer", np.nan,"Doctor", "Engineer",  "Driver", "CEO", "Farmer", "Nurse", "Artist",
          "Footballer", "Nurse", np.nan, np.nan, "Banker", "Engineer"],
    
  "Salary":[39485.43,20000.3, 7363.00, 67362.00, 7362.73, 39485.43,np.nan, 83663.28, 98382.92, np.nan, np.nan,9448.34, 
            np.nan,8499.00, 20000.3, 3849, 39485.43],
    
        "Review": np.repeat("good", 17)
      }


In [41]:
# read the dictionary above into dataframe using the following as index "abcdefghijklmnopq"
df2 = pd.DataFrame(dict_, index= list("abcdefghijklmnopq"))
df2

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
a,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
b,2023-01-02,Olaitan,12.0,Kano,Banker,20000.3,good
c,2023-01-03,,50.0,Niger,Lawyer,7363.0,good
d,2023-01-04,Pelumi,16.0,,,67362.0,good
e,2023-01-05,Kemi,19.0,Oyo,Doctor,7362.73,good
f,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
g,2023-01-07,Sanhco,20.0,Port harcourt,Driver,,good
h,2023-01-08,Richard,30.0,Delta,CEO,83663.28,good
i,2023-01-09,James,,,Farmer,98382.92,good
j,2023-01-10,Sanhco,34.0,Kwara,Nurse,,good


In [43]:
# export the dataframe above in excel format
df2.to_excel( "afternoon.xlsx" , index =False )

In [44]:
# export the dataframe above in csv format
df2.to_csv("class.csv", index=False)

In [46]:
# read the the excel and csv file you export into dataframe 


df_display = pd.read_excel("afternoon.xlsx")
df_display

Unnamed: 0,Date,Name,Age,State,Job,Salary,Review
0,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
1,2023-01-02,Olaitan,12.0,Kano,Banker,20000.3,good
2,2023-01-03,,50.0,Niger,Lawyer,7363.0,good
3,2023-01-04,Pelumi,16.0,,,67362.0,good
4,2023-01-05,Kemi,19.0,Oyo,Doctor,7362.73,good
5,2023-01-01,Jamiu,25.0,Lagos,Engineer,39485.43,good
6,2023-01-07,Sanhco,20.0,Port harcourt,Driver,,good
7,2023-01-08,Richard,30.0,Delta,CEO,83663.28,good
8,2023-01-09,James,,,Farmer,98382.92,good
9,2023-01-10,Sanhco,34.0,Kwara,Nurse,,good


# Viewing/Inspecting Data

In [None]:
# print the first seven row of the dataframe


In [None]:
# print the last seven row of the dataframe


In [None]:
# randomly print seven in the dataframe


In [None]:
# print the summary of the dataframe


In [None]:
 # print the dimension of the dataframe


In [None]:
# print all the column of the dataframe


In [None]:
# gives the statistical summary of the dataframe


# Row and Column Selection in Dataframe

In [None]:
# Using tne label name, select the second column of the dataframe and all the rows

In [None]:
# Using tne label name, select the second column of the dataframe and row 5 to 10

In [None]:
# Using tne label name, select the second, fourth and the sixth columns of the dataframe and all
# the rows

In [37]:
# Using tne label name, select the second, fourth and the sixth columns of the dataframe and rows from
# 7 to 14

In [None]:
# Using tne index, select the second column of the dataframe and all the rows

In [None]:
# Using tne index number, select the second column of the dataframe and row 5 to 10

In [None]:
# Using tne index number, select the second, fourth and the sixth columns of the dataframe and all
# the rows

In [None]:
# Using tne index number, select the second, fourth and the sixth columns of the dataframe and rows from
# 7 to 14

# Modifying Data on the dataframe 

In [None]:
# Mrs Pelumi asked to update her State and job as follow (Abuja and Data science ). perform the task

In [45]:
player_dicts={"Play":['Kai Harvert', 'Gabriel Jesus',  'Gabriel Jesus', "Declan Rice", "Thomas Patrey", 
         "Bukayo Saka", "Bukayo Saka"], 
              "Agekkkkkk":[24,26, 26,24,30,21, 21],
                }

arsenal = pd.DataFrame(player_dicts)

In [None]:
# the list below is the salary of the arsenal player in the above datarame 
# [280000, 265000, 265000, 240000, 20000,  195000, 195000], add it to the dataframe as a column

In [None]:
# the columns of the dataframe above is not descriptive. change them appropriately

In [None]:
# remove row second and fourth rows in the dataframe above

In [None]:
# remove the age column from the dataframe above

In [None]:
# check for duplicates and remove them in the dataframe above