In [1]:
import pandas as pd
from pathlib import Path

In [2]:
data_path = Path('../../datasets/student_mental_health.csv')
df = pd.read_csv(data_path)
df.columns

Index(['Timestamp', 'Choose your gender', 'Age', 'What is your course?',
       'Your current year of Study', 'What is your CGPA?', 'Marital status',
       'Do you have Depression?', 'Do you have Anxiety?',
       'Do you have Panic attack?',
       'Did you seek any specialist for a treatment?'],
      dtype='object')

In [3]:
df.columns = ['Timestamp', 'Gender', 'Age', 'Course',
       'Year of study', 'CGPA', 'Marital status',
       'Depression', 'Anxiety',
       'Panic attack',
       'Under Treatment']

# Exploring Data with Frequency Tables in Python

## Learning objectives:
- Understand what a frequency table is and why it is useful.
- Build frequency table for: discrete, categorial, and continous variables.
- Compute relative and cumulative frequencies.

## What is a Frequency Table ?
A frequency table is a **summary** that shows how often each value (or range of values) occurs.
It is useful to show us how the data are distributed across the sample.

**Types:**
- absolute.
- relative.
- cumulative.

### Understanding the Three Types of Frequencies

#### Absolute Frequency (f)
The number of occurrences of a value (or class) in your dataset. Used **to count how often something appears**.

**Example how to use it in data science:**

In [4]:
df['Course'].value_counts().sort_values(ascending=False)

Course
BCS                        18
Engineering                17
BIT                        10
Biomedical science          4
KOE                         4
psychology                  2
Engine                      2
Laws                        2
BENL                        2
ENM                         1
Mathemathics                1
Pendidikan islam            1
Human Resources             1
Irkhs                       1
Psychology                  1
KENMS                       1
Accounting                  1
Law                         1
Marine science              1
Banking Studies             1
Business Administration     1
Benl                        1
KIRKHS                      1
Usuluddin                   1
TAASL                       1
ALA                         1
Islamic education           1
Kirkhs                      1
DIPLOMA TESL                1
koe                         1
CTS                         1
Islamic Education           1
Biotechnology               1
eng

- We can see that  Bachelor of Computer Science (BCS) is the most frequent course in our dataset

#### Relative Frequency (fr)
The **proportion** of occurrences based on the total (n). Used to compare percentage between groups.

**Example how to use it in data science:**

In [5]:
df['Age'].value_counts(normalize=True)

Age
18.0    0.32
24.0    0.23
19.0    0.21
23.0    0.13
20.0    0.06
21.0    0.03
22.0    0.02
Name: proportion, dtype: float64

* We can see that **21%** of individuals are 19 years old.
* Useful to calculate **PROBABILITIES**.

#### Cumulative Frequency (F)
Adds up frequencies in a running total, showing how many data points fall **at** or **bellow** a particular value or class

**Example how to use it in data science:**

In [15]:
abs_freq = df['Age'].value_counts().sort_values()
print('----------------\nAbsolute Frequency')
print(abs_freq)
cum_freq = abs_freq.cumsum()
print('\n----------------\nCumulative Frequency')
print(cum_freq)

----------------
Absolute Frequency
Age
22.0     2
21.0     3
20.0     6
23.0    13
19.0    21
24.0    23
18.0    32
Name: count, dtype: int64

----------------
Cumulative Frequency
Age
22.0      2
21.0      5
20.0     11
23.0     24
19.0     45
24.0     68
18.0    100
Name: count, dtype: int64
