# Student grades

At the end of the year we can ask for an excel-sheet containing all grades for all students of ITF (and ACS, and ELO-ICT and the "WES", which is "working and studying" for people who come to get a degree while still working four days a week). This sheet contains all students, their courses and their grades.

Unfortunately, we can't pass this information on. Even more than not following the GDPR (and thus illegal) it would show an appalling lack in tact.

Therefore we changed some things
- All student ID's (r-numbers) have been changed into an equally long random number
- All names have been changed into the names of actors
- All grades have been changed into a letter using the function in the next code-block

Student ID's and names have always been changed into the same actor. So if student X became Matt Damon in 20-21, the same student will be Matt Damon in 21-22.


In [None]:
# the function used to change the grades:

def replace_grade(grade):
    if grade in ['NA', '#']: # NA = student did not take exam, but could have and # = no exam
        return grade
    if grade == 'G': # Pass/fail-course, passed
        return "PASS"
    if grade == 'NG': # Pass/fail-course, not passed
        return "FAIL"
    grade = float(grade) # Grade is a number, so student took exam and got some grades
    if grade == 0:
        return 'ZERO'
    if grade >= 17:
        return 'A'
    if grade >= 14:
        return 'B'
    if grade >= 10:
        return 'C'
    if grade >= 8: 
        return 'D'
    return 'F'

The files are all csv-files. The biggest downside: they're in Dutch. This is mainly annoying in the program names and column names. The course names are already in English. If you don't understand a column name, use [deepl](http://www.deepl.com) to translate it.

You can open them as follows:

In [None]:
import pandas as pd
import numpy as np

df_20_21 = pd.read_csv('files/grades 20-21_anonymous.csv', sep=';')
df_21_22 = pd.read_csv('files/grades 21-22_anonymous.csv', sep=';')
df_22_23 = pd.read_csv('files/grades 22-23_anonymous.csv', sep=';')
df_23_24 = pd.read_csv('files/grades 23-24_anonymous.csv', sep=';')

df_20_21.head(2)

In [None]:
df_21_22.head(2)

In [None]:
df_23_24.head(2)

As you see there are some differences. They all have the same columns, but in 20-21 "Score 1e kans" and "Score 2e kans" weren't used. You can fill them in though, because the "Score na juni" is the "Score 1e kans" and "Score september" is "Score 2e kans". The column "Score" is always the final score, which is...

* The highest of "Score 1e kans" and "Score 2e kans".
* Or it used to be possible that a student kept the highest score, even if he got that the year before.

Copy the values of the scores in june or september to the 1e kans and 2e kans-columns. You can do this quick and dirty (overwrite always) are make sure you don't overwrite any values already in the new columns.

In [None]:
#DELETE

# Quick and dirty:
# df_20_21["Score 1e kans"] = df_20_21["Score na juni"]
# df_20_21["Score 2e kans"] = df_20_21["Score September"]

# Long and clean:
df_20_21.loc[df_20_21["Score 1e kans"] == "#","Score 1e kans"] = df_20_21.loc[df_20_21["Score 1e kans"] == "#","Score na juni"]
df_20_21.loc[df_20_21["Score 2e kans"] == "#","Score 2e kans"] = df_20_21.loc[df_20_21["Score 2e kans"] == "#","Score September"]

df_20_21.head()

Next up is to join the three dataframes. Make sure to add a column called "Year" so you know which year a grade is from! Check by looking up a student and confirming he has courses over multiple years.

In [None]:
#DELETE

df_20_21["Year"] = "2020-2021"
df_21_22["Year"] = "2021-2022"
df_22_23["Year"] = "2022-2023"
df_23_24["Year"] = "2023-2024"

df = pd.concat([df_20_21, df_21_22, df_22_23, df_23_24], ignore_index=True, sort=False)

# df.head()

df.loc[df["Student"] == 544120].head(20)

A quick word about tolerances. The possible values are:
* "#" none needed.
* "AT" student chose to apply tolerance.
* "TT" school applied tolerance automatically so the student could graduate. Only happens in the final year.

In [None]:
df["Tolerantie"].unique()

Some more cleanup:

* You don't need "Score Januari", "Score Juni", "Score na juni" and "Score September" anymore.
* "#" is actually an empty value. Note that "#" is sometimes " #".
* "Score 1e kans", "Score 2e kans" and "Score" are ordered categoricals.
* Tolerances are also categoricals, but not ordered.


In [None]:
df["Score"].unique()

In [None]:
# DELETE

from pandas.api.types import CategoricalDtype

df.drop(["Score Januari", "Score Juni", "Score na juni", "Score September"], axis=1, inplace=True, errors='ignore')
df.replace({' #': None,'#': None}, inplace=True)

grade = CategoricalDtype(categories=['A', 'B', 'PASS', 'C', 'D', 'F', 'FAIL', 'ZERO'], ordered=True)

for f in ["Score 1e kans","Score 2e kans","Score"]:
    df[f] = df[f].astype(grade)
    
tol = CategoricalDtype(categories=['AT', 'TT'], ordered=False)
df["Tolerantie"] = df["Tolerantie"].astype(tol)

df.head()

So how hard is the Programming Essentials course? It's the z-code "Z25499". Show a pretty graph which counts the final grades.

![](files/2023-09-20-12-12-43.png)

In [None]:
# DELETE

import seaborn as sns

sns.countplot(x="Score", data=df.loc[df["ID Opleidingsond."] == "Z25499"])

Split up by year?

In [None]:
# DELETE

import seaborn as sns

sns.countplot(x="Score", data=df.loc[df["ID Opleidingsond."] == "Z25499"], hue="Year")

Maybe compare to networking essentials (Z25068)?

In [None]:
# DELETE

import seaborn as sns

sns.countplot(x="Score", data=df.loc[df["ID Opleidingsond."].isin(["Z25499", "Z25068"])], hue="ID Opleidingsond.")

And the rest is up to you. Compare courses? Maybe look for the top student ever? And eventually try to predict where the students that started in 2023-2024 will end up based on the data of the other students?