In [1]:
from functions import *

# Courses

---

The courses dataframe has information for all modules and their presentations.

In [2]:
# show head of courses dataframe
courses.head()

Unnamed: 0,code_module,code_presentation,module_presentation_length
0,AAA,2013J,268
1,AAA,2014J,269
2,BBB,2013J,268
3,BBB,2014J,262
4,BBB,2013B,240


---

## Courses Contents

* **code_module**: The code module represents the code name of the course. Modules are identified with three capital letters which run sequentially between AAA and GGG
* **code_presentation**: The presentations are codified by their year and offering semester. B is for February and J is for October. 2013B for example is February of 2013. 
* **mode_presentation_length**: The module presentation length is the length of the course in days.

---

## Courses Information

**Size**

In [3]:
# get row & column count for courses dataframe
get_size(courses)

Unnamed: 0,Count
Columns,3
Rows,22


In [4]:
md(f'''
Courses has {len(courses.columns)} columns and {len(courses)} rows
''')


Courses has 3 columns and 22 rows


**Data Types**

In [5]:
# show data types for courses dataframe
get_dtypes(courses)

index,Type
code_module,object
code_presentation,object
module_presentation_length,int64


The `object` datatype in pandas can present unexpected behavior and so we will change the `object` datatypes to `string`.

In [6]:
# convert objects into datatypes that are better supported
courses = courses.convert_dtypes(convert_integer=False)

**Null Values**

In [7]:
# show null values for columns in courses
null_vals(courses)

index,Null Values
code_module,0
code_presentation,0
module_presentation_length,0


**Duplicate Values**

In [8]:
# show duplicate values in courses if any
get_dupes(courses)

There are no Duplicate Values

**Unique Counts**

In [28]:
# get counts for the unque values in courses columns
count_unique(courses)

index,Count
code_module,7
code_presentation,4
module_presentation_length,7


In [29]:
# store the number of unique modules
mod_count = courses['code_module'].nunique()
# store the number of unique presentations
presentation_count = courses['code_presentation'].nunique()
# store the minimum module length in days
min_mod_count = courses['module_presentation_length'].min()
# store the maximum module length in dats
max_mod_count = courses['module_presentation_length'].max()
# store the average module length in days
avg_mod_count = round(courses['module_presentation_length'].mean(), 1)

md(f'''There are {mod_count} unique modules delivered over {presentation_count} presentations''')

There are 7 unique modules delivered over 4 presentations

**Unique Categorical Values**

In [10]:
# get the unique categorical values in courses
unique_vals(courses)

index,Values
code_module,"['AAA', 'BBB', 'CCC', 'DDD', 'EEE', 'FFF', 'GGG']"
code_presentation,"['2013J', '2014J', '2013B', '2014B']"


Here we can see the modules AAA through GGG and the four presentations they were delivered over

In [13]:
# making a crosstab to map each code module to its presentation
modules_dates = pd.crosstab(index=courses['code_presentation'], columns=courses['code_module'])
modules_dates = modules_dates.replace(1, pd.Series(modules_dates.columns, modules_dates.columns))
modules_dates.style.hide_columns()


code_presentation,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7
2013B,0,BBB,0,DDD,0,FFF,0
2013J,AAA,BBB,0,DDD,EEE,FFF,GGG
2014B,0,BBB,CCC,DDD,EEE,FFF,GGG
2014J,AAA,BBB,CCC,DDD,EEE,FFF,GGG


Here is a breakdown of each presentation and what modules were offered during that time

**Numerical Values**

In [11]:
# show statistical breakdown of numerical values in courses
courses.describe().round(1)

Unnamed: 0,module_presentation_length
count,22.0
mean,255.5
std,13.7
min,234.0
25%,241.0
50%,261.5
75%,268.0
max,269.0


In [14]:
md(f"""
* Modules range from {min_mod_count} to {max_mod_count} days in length.
* The average module is {avg_mod_count} days.
""")


* Modules range from 234 to 269 days in length.
* The average module is 255.5 days.
