# Descriptive Statistics - Numerical Techniques Exercise 

Exploring Students' Performance Dataset - Numerical Techniques

Objective: The objective of this exercise is to practice using numerical techniques to analyze the Students' Performance dataset and gain insights into the students' academic performance.

Dataset Description: The Students' Performance dataset contains information about students' demographic attributes, such as gender, race/ethnicity, parental education, lunch type, and test scores in three subjects: Math, Reading, and Writing.

Exercise Steps:

Load the Dataset: Import the necessary libraries and load the Students' Performance dataset into a pandas DataFrame.

Explore the Dataset: Use basic pandas functions to get an overview of the dataset, including the number of rows and columns, and number of unique values for each column. For those columns that have less than 10 distinct values, show those unique values. Hint: look at the previous lesson. There you can find the functions or methods you need to use.*

Analyze Descriptive Statistics: Calculate and interpret descriptive statistics, including measures of central tendency (mean, median, mode) and dispersion (standard deviation, range) for the numerical variables, and frequency counts for categorical variables.

In [1]:
# Import pandas library
import pandas as pd

# Store the link of csv file
url= "https://raw.githubusercontent.com/data-bootcamp-v4/prework_data/main/students_performance.csv"

# Read the csv file from url and store it into dataframe "df"
df= pd.read_csv(url)

# Display top 5 rows of the dataset
df.head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course,math score,reading score,writing score
0,female,group B,bachelor's degree,standard,none,72,72,74
1,female,group C,some college,standard,completed,69,90,88
2,female,group B,master's degree,standard,none,90,95,93
3,male,group A,associate's degree,free/reduced,none,47,57,44
4,male,group C,some college,standard,none,76,78,75


In [2]:
# Understand the shape of the dataset
df.shape

(1000, 8)

Using the `df.shape` method, we can understand the structure of the dataframe. From the output, it is clear that the dataset contains **1000 observations** and **8 variables**.


### Numerical Variable Analysis

In [4]:
# Separate the numerical columns
num_cols=["math score", "reading score", "writing score"]

# Display numerical columns
df[num_cols].head()

Unnamed: 0,math score,reading score,writing score
0,72,72,74
1,69,90,88
2,90,95,93
3,47,57,44
4,76,78,75


#### Central Tendency: Mean

In [6]:
df[num_cols].mean()

math score       66.089
reading score    69.169
writing score    68.054
dtype: float64

#### Central Tendency: Median

In [7]:
df[num_cols].median()

math score       66.0
reading score    70.0
writing score    69.0
dtype: float64

#### Central Tendency: Mode

In [8]:
df[num_cols].mode()

Unnamed: 0,math score,reading score,writing score
0,65,72,74


#### Central Tendency:Range

In [9]:
df[num_cols].max()-df[num_cols].min()

math score       100
reading score     83
writing score     90
dtype: int64

#### Central Tendency: Variance

In [10]:
df[num_cols].var()

math score       229.918998
reading score    213.165605
writing score    230.907992
dtype: float64

#### Central Tendency: Standard Division

In [11]:
df[num_cols].std()

math score       15.163080
reading score    14.600192
writing score    15.195657
dtype: float64

In [12]:
df[num_cols].describe()

Unnamed: 0,math score,reading score,writing score
count,1000.0,1000.0,1000.0
mean,66.089,69.169,68.054
std,15.16308,14.600192,15.195657
min,0.0,17.0,10.0
25%,57.0,59.0,57.75
50%,66.0,70.0,69.0
75%,77.0,79.0,79.0
max,100.0,100.0,100.0


### Categorical Variable Analysis

In [14]:
# Separate categorical columns
cat_cols = [
    "gender",
    "race/ethnicity",
    "parental level of education",
    "lunch",
    "test preparation course"
]

df[cat_cols].head()

Unnamed: 0,gender,race/ethnicity,parental level of education,lunch,test preparation course
0,female,group B,bachelor's degree,standard,none
1,female,group C,some college,standard,completed
2,female,group B,master's degree,standard,none
3,male,group A,associate's degree,free/reduced,none
4,male,group C,some college,standard,none


#### Frequency Counts

In [21]:
df[cat_cols].value_counts()

gender  race/ethnicity  parental level of education  lunch         test preparation course
female  group C         associate's degree           standard      none                       21
                        some college                 standard      none                       19
male    group C         high school                  standard      none                       17
female  group C         associate's degree           standard      completed                  15
                        high school                  standard      none                       15
                                                                                              ..
male    group D         some college                 free/reduced  completed                   1
                        master's degree              free/reduced  completed                   1
        group E         high school                  free/reduced  none                        1
                        some college

In [26]:
# Each column value count
for col in cat_cols:
    print(df[col].value_counts(),"\n")

gender
female    518
male      482
Name: count, dtype: int64 

race/ethnicity
group C    319
group D    262
group B    190
group E    140
group A     89
Name: count, dtype: int64 

parental level of education
some college          226
associate's degree    222
high school           196
some high school      179
bachelor's degree     118
master's degree        59
Name: count, dtype: int64 

lunch
standard        645
free/reduced    355
Name: count, dtype: int64 

test preparation course
none         642
completed    358
Name: count, dtype: int64 

