# Introduction to Pandas
## Basics of Series and DataFrames

Completed tasks include:

* Creating Pandas Series from various data structures
* Performing operations on Series and DataFrames
* Filtering and accessing data
* Creating DataFrames from different sources
* Data manipulation and analysis

Demonstrates proficiency in using Pandas for data handling and analysis.

In [9]:
import pandas as pd
import numpy as np

1. Create a Pandas Series from a Python list, numpy array, and a dictionary.

In [10]:
#Creating the different data types
lis = [0,1,2,3,4,5,6,7,8,9]
arr = np.random.randint(0,10,10)
dic = {0:0, 1:1, 2:2, 3:3, 4:4, 5:5, 6:6, 7:7, 8:8, 9:9}
#Creating the Pandas Series
pdlis = pd.Series(lis)
pdarr = pd.Series(arr)
pddic = pd.Series(dic)

print(f"Series from Python List: \n {pdlis}")
print(f"Series from Numpy Array: \n {pdarr}")
print(f"Series from Dictionary: \n {pddic}")

Series from Python List: 
 0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64
Series from Numpy Array: 
 0    7
1    2
2    1
3    1
4    8
5    0
6    3
7    3
8    1
9    1
dtype: int32
Series from Dictionary: 
 0    0
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
dtype: int64


2. Assign a custom index to the Series.

In [11]:
#Creating the list
lis = [x for x in range(3)]
#Converting it to the pandas series with custom indexing
pdlis = pd.Series(lis, index = ["Sr. No.", "Name", "Age"])
print(f"Custom Indexinng:\n {pdlis}")

Custom Indexinng:
 Sr. No.    0
Name       1
Age        2
dtype: int64


3. Perform basic arithmetic operations on Series.

In [12]:
#Creating the list
lis = [x for x in range(3)]
#Converting it to the pandas series with custom indexing
pdlis = pd.Series(lis, index = ["Sr. No.", "Name", "Age"])
print(f"Custom Indexinng:\n {pdlis}")

#Arithmetic Operations
print(f"Addition: \n {pdlis+2}")
print(f"Multiplication: \n {pdlis*2}")
print(f"Subtraction: \n {pdlis-2}")
print(f"Division: \n {pdlis/2}")

Custom Indexinng:
 Sr. No.    0
Name       1
Age        2
dtype: int64
Addition: 
 Sr. No.    2
Name       3
Age        4
dtype: int64
Multiplication: 
 Sr. No.    0
Name       2
Age        4
dtype: int64
Subtraction: 
 Sr. No.   -2
Name      -1
Age        0
dtype: int64
Division: 
 Sr. No.    0.0
Name       0.5
Age        1.0
dtype: float64


4. Access elements using index labels and positions.

In [13]:
import warnings
warnings.filterwarnings('ignore')
#Creating the list
lis = [3,5,6]
#Converting it to the pandas series with custom indexing
pdlis = pd.Series(lis, index = ["Sr. No.", "Name", "Age"])
print(f"Using Index Labels:\n {pdlis['Name']}")
print(f"Using Positions:\n {pdlis[1]}")

Using Index Labels:
 5
Using Positions:
 5


5. Filter the Series to include only values greater than a specific threshold.

In [14]:
#Creating the list
lis = [12,23,34,5,4354,5686,653,4,45,4,72,4,467,65,5,4]
#Converting it to the pandas series
pdlis = pd.Series(lis)
print(f"Series:\n {pdlis}")
#Values greater than 30
newlis = pdlis[pdlis>=30]
print(f"New List:\n {newlis}")

Series:
 0       12
1       23
2       34
3        5
4     4354
5     5686
6      653
7        4
8       45
9        4
10      72
11       4
12     467
13      65
14       5
15       4
dtype: int64
New List:
 2       34
4     4354
5     5686
6      653
8       45
10      72
12     467
13      65
dtype: int64


6. Create a DataFrame from a dictionary of lists.

In [15]:
#Creating the dictionary
dic = dict_of_lists = {
    "fruits": ["apple", "banana", "cherry"],
    "vegetables": ["carrot", "broccoli", "spinach"],
    "colors": ["red", "green", "blue"]
}
#converting into dataframe
df = pd.DataFrame(dic, index = ["day1", "day2", "day3"])
print("Data Frame:")
df

Data Frame:


Unnamed: 0,fruits,vegetables,colors
day1,apple,carrot,red
day2,banana,broccoli,green
day3,cherry,spinach,blue


7. Create a DataFrame from a numpy array, specifying column and index names.

In [16]:
#Creating the numpy array
arr = [200,2002,2003,2005]
#converting into dataframe
df = pd.DataFrame(arr, columns = ["years"], index = ["day1", "day2", "day3", "day4"])
print("Data Frame:")
df

Data Frame:


Unnamed: 0,years
day1,200
day2,2002
day3,2003
day4,2005


8. Load a DataFrame from a CSV file.

In [17]:
#Loading the CSV File from local Cmputer
df = pd.read_csv('diabetes (1).csv')
print("DataFrame loaded through CSV File:")
df

DataFrame loaded through CSV File:


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1
...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63,0
764,2,122,70,27,0,36.8,0.340,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1


9. Display the first and last five rows of the DataFrame.

In [18]:
#Loading the CSV File from local Cmputer
df = pd.read_csv('diabetes (1).csv')
print("Printing first 5 rows:")
df.head()

Printing first 5 rows:


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
0,6,148,72,35,0,33.6,0.627,50,1
1,1,85,66,29,0,26.6,0.351,31,0
2,8,183,64,0,0,23.3,0.672,32,1
3,1,89,66,23,94,28.1,0.167,21,0
4,0,137,40,35,168,43.1,2.288,33,1


In [19]:
#Loading the CSV File from local Cmputer
df = pd.read_csv('diabetes (1).csv')
print("Printing last 5 rows:")
df.tail()

Printing last 5 rows:


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
763,10,101,76,48,180,32.9,0.171,63,0
764,2,122,70,27,0,36.8,0.34,27,0
765,5,121,72,23,112,26.2,0.245,30,0
766,1,126,60,0,0,30.1,0.349,47,1
767,1,93,70,31,0,30.4,0.315,23,0


10. Get a summary of the DataFrame including the mean, median, and standard deviation of numeric columns.

In [20]:
#Data set already in df
print("Summary of the DataFrame")
df.describe() #describe only take the numeric columns

Summary of the DataFrame


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,3.845052,120.894531,69.105469,20.536458,79.799479,31.992578,0.471876,33.240885,0.348958
std,3.369578,31.972618,19.355807,15.952218,115.244002,7.88416,0.331329,11.760232,0.476951
min,0.0,0.0,0.0,0.0,0.0,0.0,0.078,21.0,0.0
25%,1.0,99.0,62.0,0.0,0.0,27.3,0.24375,24.0,0.0
50%,3.0,117.0,72.0,23.0,30.5,32.0,0.3725,29.0,0.0
75%,6.0,140.25,80.0,32.0,127.25,36.6,0.62625,41.0,1.0
max,17.0,199.0,122.0,99.0,846.0,67.1,2.42,81.0,1.0


11. Extract a specific column as a Series.

In [21]:
#Dataset is already defined
print("Extracting the specific column as series")
#extracting the specific column
sc = df['BMI']
print(f"The Type of the Column is: {type(sc)} and the column is: \n {sc}")

Extracting the specific column as series
The Type of the Column is: <class 'pandas.core.series.Series'> and the column is: 
 0      33.6
1      26.6
2      23.3
3      28.1
4      43.1
       ... 
763    32.9
764    36.8
765    26.2
766    30.1
767    30.4
Name: BMI, Length: 768, dtype: float64


12. Filter rows based on column values.

In [22]:
#df is already defined
#filtering rows based on columnns
print("Row Based on Columns")
df.loc[df['BMI'] > 50]

Row Based on Columns


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
120,0,162,76,56,100,53.2,0.759,25,1
125,1,88,30,42,99,55.0,0.496,26,1
177,0,129,110,46,130,67.1,0.319,26,1
193,11,135,0,0,0,52.3,0.578,40,1
247,0,165,90,33,680,52.3,0.427,23,0
303,5,115,98,0,0,52.9,0.209,28,1
445,0,180,78,63,14,59.4,2.42,25,1
673,3,123,100,35,240,57.3,0.88,22,0


13. Select rows based on multiple conditions.

In [23]:
#df is already defined
#filtering rows based on multiple conditions
print("Row Based on Multiple Conditions")
df.loc[(df['BMI'] > 30) & (df['BloodPressure'] > 88) & (df['Insulin'] > 100)]

Row Based on Multiple Conditions


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome
24,11,143,94,33,146,36.6,0.254,51,1
43,9,171,110,24,240,45.4,0.721,54,1
53,8,176,90,34,300,33.7,0.467,58,1
99,1,122,90,51,220,49.7,0.325,31,1
177,0,129,110,46,130,67.1,0.319,26,1
247,0,165,90,33,680,52.3,0.427,23,0
369,1,133,102,28,140,32.8,0.234,45,1
428,0,135,94,46,145,40.6,0.284,26,0
539,3,129,92,49,155,36.4,0.968,32,1
545,8,186,90,35,225,34.5,0.423,37,1


14. Add a new column to the DataFrame.

In [24]:
#Create a new column
lis = [x for x in range (100)]
#converting into series
pdlis = pd.Series(lis)
#adding the new columnn
df['New'] = pdlis
print("New df with 'New' new column")
df

New df with 'New' new column


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,BMI,DiabetesPedigreeFunction,Age,Outcome,New
0,6,148,72,35,0,33.6,0.627,50,1,0.0
1,1,85,66,29,0,26.6,0.351,31,0,1.0
2,8,183,64,0,0,23.3,0.672,32,1,2.0
3,1,89,66,23,94,28.1,0.167,21,0,3.0
4,0,137,40,35,168,43.1,2.288,33,1,4.0
...,...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,32.9,0.171,63,0,
764,2,122,70,27,0,36.8,0.340,27,0,
765,5,121,72,23,112,26.2,0.245,30,0,
766,1,126,60,0,0,30.1,0.349,47,1,


15. Delete a column from the DataFrame.

In [25]:
#df is alreay defined
#delete the BMI Column
print("The 'BMI' Columns is deleted: ")
df.drop('BMI', axis = 1, inplace = True)
df

The 'BMI' Columns is deleted: 


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,DiabetesPedigreeFunction,Age,Outcome,New
0,6,148,72,35,0,0.627,50,1,0.0
1,1,85,66,29,0,0.351,31,0,1.0
2,8,183,64,0,0,0.672,32,1,2.0
3,1,89,66,23,94,0.167,21,0,3.0
4,0,137,40,35,168,2.288,33,1,4.0
...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,0.171,63,0,
764,2,122,70,27,0,0.340,27,0,
765,5,121,72,23,112,0.245,30,0,
766,1,126,60,0,0,0.349,47,1,


16. Rename columns in the DataFrame.

In [26]:
#df is already defined
# Renaming the Columnn in df
print("'Age' Column name is renamed to 'Years'")
df.rename(columns = {'Age': 'Years'})

'Age' Column name is renamed to 'Years'


Unnamed: 0,Pregnancies,Glucose,BloodPressure,SkinThickness,Insulin,DiabetesPedigreeFunction,Years,Outcome,New
0,6,148,72,35,0,0.627,50,1,0.0
1,1,85,66,29,0,0.351,31,0,1.0
2,8,183,64,0,0,0.672,32,1,2.0
3,1,89,66,23,94,0.167,21,0,3.0
4,0,137,40,35,168,2.288,33,1,4.0
...,...,...,...,...,...,...,...,...,...
763,10,101,76,48,180,0.171,63,0,
764,2,122,70,27,0,0.340,27,0,
765,5,121,72,23,112,0.245,30,0,
766,1,126,60,0,0,0.349,47,1,


# Student Database Application
## Introduction

This application is a student database management system that utilizes the Pandas library to perform various tasks. The application allows users to:

* Load student data from a dictionary
* Calculate the average grade of each student
* Filter students with an average grade greater than 80
* Add new students to the database
* Delete students from the database
* Update student information
* Generate a report of student names and average grades

This application demonstrates the use of Pandas for data manipulation and analysis in a real-world scenario.

In [27]:
# Create a dictionary to store the student data
student_data = {
    'Name': ['John', 'Mary', 'David', 'Emily', 'Sarah'],
    'Grade1': [85, 90, 80, 95, 88],
    'Grade2': [90, 85, 95, 90, 92],
    'Grade3': [92, 88, 90, 92, 89]
}

# Create the DataFrame
df = pd.DataFrame(student_data)
df

Unnamed: 0,Name,Grade1,Grade2,Grade3
0,John,85,90,92
1,Mary,90,85,88
2,David,80,95,90
3,Emily,95,90,92
4,Sarah,88,92,89


In [28]:
# Create the DataFrame
df = pd.DataFrame(student_data)

# Calculate average grade of each student
df['Average Grade'] = (df['Grade1'] + df['Grade2'] + df['Grade3']) / 3

# Filter students with average grade greater than 80
filtered_students = df[df['Average Grade'] > 80]

# Add new student to database
new_student = pd.DataFrame({'Name': ['Michael'], 'Grade1': [90], 'Grade2': [92], 'Grade3': [88]})
df = pd.concat([df, new_student], ignore_index=True)

# Delete student from database
df = df.drop(df[df['Name'] == 'David'].index)

# Update student information
df.loc[df['Name'] == 'Mary', 'Grade1'] = 95

# Generate report
report = df[['Name', 'Average Grade']]
report

Unnamed: 0,Name,Average Grade
0,John,89.0
1,Mary,87.666667
3,Emily,92.333333
4,Sarah,89.666667
5,Michael,
