In [1]:
%%capture
from functions import *

@register_cell_magic
def markdown(line, cell):
    return md(cell.format(**globals()))

# Assessments

The assessments dataframe contains information about the unique assessments in each code module and presentation.

assessments.head()

---

## Assessments Contents

* **code_module**: The code module represents the code name of the course the assessment was held for.
* **code_presentation**: The presentation represents the presentation which the test was held for.
* **id_assessment**: The assessment ID is the unique identifier for each assessment.
* **assessment_type**: The assessment type represents the kind of assessment it was.
    - There are three assessment types:
        * TMA: Tutor Marked Assessment
        * CMA: Computer Marked Assessment
        * Exam: The Final Exam
* **date**: The date is how many days from the start of the course the assessment took place
* **weight**: The weight is the weighted value of the assessment. Exams should have a weight of 100 which the rest of the assessments should add to 100 in total.

**Size**

In [2]:
md(f'''* Number of Rows: {len(assessments)}
* Number of Columns: {len(assessments.columns)}''')

* Number of Rows: 206
* Number of Columns: 6

**Data Types**

In [3]:
assessments.dtypes

code_module           object
code_presentation     object
id_assessment          int64
assessment_type       object
date                 float64
weight               float64
dtype: object

* id_student and id_assessments are both categorical values and so should be converted to objects

In [4]:
# converting the data types
assessments = assessments.astype({'id_assessment': int})
assessments = assessments.astype({'id_assessment': object})

**Null Values**

In [5]:
# prints the sum of a columns null value
assessments.isnull().sum()

code_module           0
code_presentation     0
id_assessment         0
assessment_type       0
date                 11
weight                0
dtype: int64

* We have 2,873 null data points for assessment date. The documentation of this dataset states that if the exam date is missing then it is as the end of the last presentation week. We can find this information in the courses dataframe.

In [6]:
# adding the dates for the null test dates
for index, row in assessments[assessments['date'].isna()].iterrows():
    assessments.at[index, 'date'] = courses.loc[(courses['code_module'] == row['code_module']) & (courses['code_presentation'] == row['code_presentation']), 'module_presentation_length']

# reprinting to ensure it worked
assessments.isnull().sum()

code_module          0
code_presentation    0
id_assessment        0
assessment_type      0
date                 0
weight               0
dtype: int64

* There are 173 null values for score. These records are, unfortunately not of much interest to us, since score is what we are trying to find the relationship for, and so we will discard them. This leaves us with no null data in assessments.

**Unique Counts**

In [7]:
assessments.nunique()

code_module            7
code_presentation      4
id_assessment        206
assessment_type        3
date                  78
weight                24
dtype: int64

**Unique Categorical Values**

In [8]:
unique_vals(assessments)

index,Values
code_module,['AAA' 'BBB' 'CCC' 'DDD' 'EEE' 'FFF' 'GGG']
code_presentation,['2013J' '2014J' '2013B' '2014B']
assessment_type,['TMA' 'Exam' 'CMA']


**Duplicate Values:**

In [9]:
duplicate_vals(assessments)

NameError: name 'duplicate_vals' is not defined

**Statistics**

In [34]:
assessments.describe()

Unnamed: 0,date,weight
count,206.0,206.0
mean,150.966019,20.873786
std,78.161395,30.384224
min,12.0,0.0
25%,81.25,0.0
50%,159.0,12.5
75%,227.0,24.25
max,269.0,100.0
