# Project 1 


The Following details basic summary statistics of a dataset containing student exam scores. We will be examining the mean median and mode of student math scores 

## Read in Dataframe

In [1]:
import pandas as pd

df = pd.read_csv("StudentsPerformance.csv", encoding="utf-8")


df.head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75


In [2]:
df.columns

Index(['gender', 'race/ethnicity', 'parental level of education', 'lunch',
       'test preparation course', 'math score', 'reading score',
       'writing score'],
      dtype='object')

These are the names of the different variables in our dataset. We are going to be using 'math score' 

In [3]:
round(df['math score'].describe(), 2)

count    1000.00
mean       66.09
std        15.16
min         0.00
25%        57.00
50%        66.00
75%        77.00
max       100.00
Name: math score, dtype: float64

This is the 5 number summary of student math scores. We actually already know our mean and median just from this output 

## Mean, Median, and Mode - Pandas 

In [4]:
mode = df[["math score"]].mode().round(3)
mean = df[["math score"]].mean().round(3)
median = df[["math score"]].median().round(3)



In [5]:
# Print Mean
print(mean)

math score    66.089
dtype: float64


In [6]:
#Print Median 
print(median)

math score    66.0
dtype: float64


In [7]:
# Print Mode
print(mode)

   math score
0          65


The data appears to be very symmetrical as our measures of center are all fairly similar. This would be a good time to create a histogram to confirm, but we can do that later. 

## Mean, Median, and Mode - "The Hard Way"  

In [8]:
import csv

with open("StudentsPerformance.csv", "r") as file:
    reader = csv.DictReader(file)
    math_scores = [int(row["math score"]) for row in reader if row["math score"].strip() != ""] 


In [9]:
# Mean
total = 0
count = 0
for score in math_scores:
    total = score + total 
    count = count + 1 
mean = round(total / count, 3)


In [10]:
# Median
scores_sort = sorted(math_scores)
n = len(scores_sort)
if n % 2 == 0:
    median = (scores_sort[int(n / 2) - 1] + scores_sort[int(n / 2)]) / 2
else:
    median = scores_sort[int(n / 2)]


In [11]:
# Mode
frequency = {}
for score in math_scores:
    if score in frequency:
        frequency[score] = frequency[score] + 1
    else:
        frequency[score] = 1

max_count = max(frequency.values())
mode = []
for key, val in frequency.items():  
    if val == max_count:            
        mode.append(key)       

In [12]:
#Output
print("Mean math score: ", str(mean))
print("Median math score: ",   str(median))
print("Mode math score: ",  str(mode))


Mean math score:  66.089
Median math score:  66.0
Mode math score:  [65]


## Visualization of mean math score

In [13]:

bins = [0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]
labels = ['0–9', '10-19', '20-29', '30-39', '40-49', '50–59', '60–69', '70–79', '80–89', '90–100']

df['Range'] = pd.cut(df['math score'], bins=bins, labels=labels, right=False)

math_score_distribution = df['Range'].value_counts().sort_index()

print("\nMath Score Distribution\n")

# max_count = distribution.max()
for score_range, count in math_score_distribution.items():
    bar_length = int(count / 4) 
    bar = "#" * bar_length
    print(str(score_range) + ": " + str(bar))





Math Score Distribution

0–9: 
10-19: 
20-29: ##
30-39: ######
40-49: #######################
50–59: ###############################################
60–69: ###################################################################
70–79: ######################################################
80–89: #################################
90–100: ############


As suspected the distribution is fairly symmetrical, with our mean median and mode all falling in the bin 60-69. 