# Lab 8 - Combining Attendance and Practice Quiz Attempts

## Background

In all of my class, I use an attendace quiz to track student attendance.  Note that students take multiple attempts at the same quiz, one per class; so that number of attempts a student takes on this quiz represents the number of class session that student has attended.

In some, but not all, of my courses I also provide practice quizzes that students can use to prepare for actual quizzes and tests.  These quizzes pull questions randomly from a bank of questions, allow students unlimited attempts, and are not used as part of the students grade.

In this lab, you will collect simulated data from mock classes into one table and in the next lab you will summarize these data.

## Tasks 

The files found in `attendance_example.zip` contains (made-up and random) examples of the D2L files that I use to summarize my attendance quizzes and practice quizzes
Make sure you download `attendance_example.zip` to the `data` folder inside the course repository, then unzip the file.

1. Use `glob` to find the path to all csv files.
2. Use write functions that use regular expressions to extract the class name, quiz type (`Attendance` and `Practice`), and the module number (if the file is a practice quiz.
3. Write a function that takes a path as an argument and returns a dataframe that contains:
    * All of the original columns
    * A Class column that holds the class identifier
    * A Category column that contains the quiz type
    * A Module column that (a) contains the module number for a practice quiz, or (b) is otherwise empty.
4. Use a loop, `union`, and the accumulator pattern to load all of the data into a single table.
5. Write the resulting table to a csv file.

In [1]:
from glob import glob
files = glob('./data/attendance_example/*/*.csv')
files

['./data/attendance_example/dsci494s7/Attendance Quiz - User Attempts.csv',
 './data/attendance_example/dsci494s7/Practice Quiz - Module 1 - User Attempts.csv',
 './data/attendance_example/dsci494s7/Practice Quiz - Module 2 - User Attempts.csv',
 './data/attendance_example/dsci494s7/Practice Quiz - Module 3 - User Attempts.csv',
 './data/attendance_example/dsci494s7/Practice Quiz - Module 4 - User Attempts.csv',
 './data/attendance_example/stat180s18/Attendance Quiz - User Attempts.csv',
 './data/attendance_example/stat491s1/Attendance Quiz - User Attempts.csv',
 './data/attendance_example/stat491s1/Practice Quiz - Module 1 - User Attempts.csv',
 './data/attendance_example/stat491s1/Practice Quiz - Module 2 - User Attempts.csv',
 './data/attendance_example/stat491s1/Practice Quiz - Module 3 - User Attempts.csv',
 './data/attendance_example/stat491s1/Practice Quiz - Module 4 - User Attempts.csv']

In [2]:
import re
CLASS_NAME_RE = re.compile(r'^\./data/attendance_example/([a-z]*[0-9]*).*\.csv$')
class_name = lambda p: CLASS_NAME_RE.match(p).group(1) 
class_names = lambda files: [class_name(p) for p in files]
class_names(files)

['dsci494',
 'dsci494',
 'dsci494',
 'dsci494',
 'dsci494',
 'stat180',
 'stat491',
 'stat491',
 'stat491',
 'stat491',
 'stat491']

In [3]:
QUIZ_NAME_RE = re.compile(r'^\./data/attendance_example/.*/([A-Z][a-z]*).*\.csv$')
quiz_name = lambda p: QUIZ_NAME_RE.match(p).group(1) 
quiz_names = lambda files: [quiz_name(p) for p in files]
quiz_names(files)

['Attendance',
 'Practice',
 'Practice',
 'Practice',
 'Practice',
 'Attendance',
 'Attendance',
 'Practice',
 'Practice',
 'Practice',
 'Practice']

In [72]:
MODULE_NAME_RE = re.compile(r'^\./data/attendance_example/.*/Practice.*([0-5]).*\.csv$')
module_name = lambda r: MODULE_NAME_RE.match(r).group(1) if MODULE_NAME_RE.match(r) is not None else " "
module_names = lambda files: [MODULE_NAME_RE.match(p).group(1) for p in files if MODULE_NAME_RE.match(p)]
module_names(files)

#module_check = lambda path: ifelse(module_name(path) is None, "", module_name(path))
#match = lambda p:MODULE_NAME_RE.match(p)
#match('./data/attendance_example/dsci494s7/Attendance Quiz - User Attempts.csv')
#module_name('./data/attendance_example/dsci494s7/Practice Quiz - Module 3 - User Attempts.csv')

['1', '2', '3', '4', '1', '2', '3', '4']

In [35]:
import pandas as pd
from dfply import *
df = {path:pd.read_csv(path) for path in files}
df['./data/attendance_example/dsci494s7/Attendance Quiz - User Attempts.csv'].head()

Unnamed: 0,Org Defined ID,UserName,FirstName,LastName,Attempt #,Score,Out Of,Attempt_Start,Attempt_End,Percent
0,14460432,au9747cp,Jericho,Greer,1,1,1,2019-01-14 14:00:00,2019-01-14 14:06:00,100 %
1,14460432,au9747cp,Jericho,Greer,2,1,1,2019-01-16 14:00:00,2019-01-16 14:08:00,100 %
2,14460432,au9747cp,Jericho,Greer,3,1,1,2019-01-18 14:00:00,2019-01-18 14:05:00,100 %
3,14460432,au9747cp,Jericho,Greer,4,1,1,2019-01-23 14:00:00,2019-01-23 14:06:00,100 %
4,14460432,au9747cp,Jericho,Greer,5,1,1,2019-01-25 14:00:00,2019-01-25 14:10:00,100 %


In [73]:
from more_dfply import ifelse
def createdf(path):
    return df[path] >> mutate(Class = class_name(path), 
                              Category = quiz_name(path)) >> mutate(Module = ifelse(X.Category == 'Practice', 
                                                                                    module_name(path), '')) 

In [74]:
createdf('./data/attendance_example/dsci494s7/Attendance Quiz - User Attempts.csv').head()

Unnamed: 0,Org Defined ID,UserName,FirstName,LastName,Attempt #,Score,Out Of,Attempt_Start,Attempt_End,Percent,Class,Category,Module
0,14460432,au9747cp,Jericho,Greer,1,1,1,2019-01-14 14:00:00,2019-01-14 14:06:00,100 %,dsci494,Attendance,
1,14460432,au9747cp,Jericho,Greer,2,1,1,2019-01-16 14:00:00,2019-01-16 14:08:00,100 %,dsci494,Attendance,
2,14460432,au9747cp,Jericho,Greer,3,1,1,2019-01-18 14:00:00,2019-01-18 14:05:00,100 %,dsci494,Attendance,
3,14460432,au9747cp,Jericho,Greer,4,1,1,2019-01-23 14:00:00,2019-01-23 14:06:00,100 %,dsci494,Attendance,
4,14460432,au9747cp,Jericho,Greer,5,1,1,2019-01-25 14:00:00,2019-01-25 14:10:00,100 %,dsci494,Attendance,


In [75]:
createdf('./data/attendance_example/dsci494s7/Practice Quiz - Module 1 - User Attempts.csv').head()

Unnamed: 0,Org Defined ID,UserName,FirstName,LastName,Attempt #,Score,Out Of,Attempt_Start,Attempt_End,Percent,Class,Category,Module
0,14460432,au9747cp,Jericho,Greer,1,10,20,2019-01-28 15:26:00,2019-01-28 15:30:00,50 %,dsci494,Practice,1
1,14460432,au9747cp,Jericho,Greer,1,19,20,2019-01-27 15:25:00,2019-01-27 15:34:00,95 %,dsci494,Practice,1
2,14460432,au9747cp,Jericho,Greer,2,11,20,2019-01-27 15:29:00,2019-01-27 15:33:00,55 %,dsci494,Practice,1
3,14460432,au9747cp,Jericho,Greer,3,9,20,2019-01-27 15:37:00,2019-01-27 15:38:00,45 %,dsci494,Practice,1
4,14460432,au9747cp,Jericho,Greer,4,3,20,2019-01-27 15:43:00,2019-01-27 15:49:00,15 %,dsci494,Practice,1


In [80]:
alldata = {f:createdf(f) for f in files}

In [84]:
alldata['./data/attendance_example/dsci494s7/Practice Quiz - Module 1 - User Attempts.csv'] >> union(alldata['./data/attendance_example/dsci494s7/Attendance Quiz - User Attempts.csv']) >> head 

Unnamed: 0,Org Defined ID,UserName,FirstName,LastName,Attempt #,Score,Out Of,Attempt_Start,Attempt_End,Percent,Class,Category,Module
0,14460432,au9747cp,Jericho,Greer,1,10,20,2019-01-28 15:26:00,2019-01-28 15:30:00,50 %,dsci494,Practice,1
1,14460432,au9747cp,Jericho,Greer,1,19,20,2019-01-27 15:25:00,2019-01-27 15:34:00,95 %,dsci494,Practice,1
2,14460432,au9747cp,Jericho,Greer,2,11,20,2019-01-27 15:29:00,2019-01-27 15:33:00,55 %,dsci494,Practice,1
3,14460432,au9747cp,Jericho,Greer,3,9,20,2019-01-27 15:37:00,2019-01-27 15:38:00,45 %,dsci494,Practice,1
4,14460432,au9747cp,Jericho,Greer,4,3,20,2019-01-27 15:43:00,2019-01-27 15:49:00,15 %,dsci494,Practice,1


In [94]:
col_names = ['Org Defined ID', 'UserName', 'FirstName', 'LastName', 'Attempt #', 'Score', 'Out Of', 'Attempt_Start', 'Attempt_End', 'Percent', 'Class', 'Category', 'Module']

In [95]:
df = pd.DataFrame(columns=col_names)
for d in alldata.values():
    df = df >> union_all(d)

In [98]:
df.head()

Unnamed: 0,Org Defined ID,UserName,FirstName,LastName,Attempt #,Score,Out Of,Attempt_Start,Attempt_End,Percent,Class,Category,Module
0,14460432,au9747cp,Jericho,Greer,1,1,1,2019-01-14 14:00:00,2019-01-14 14:06:00,100 %,dsci494,Attendance,
1,14460432,au9747cp,Jericho,Greer,2,1,1,2019-01-16 14:00:00,2019-01-16 14:08:00,100 %,dsci494,Attendance,
2,14460432,au9747cp,Jericho,Greer,3,1,1,2019-01-18 14:00:00,2019-01-18 14:05:00,100 %,dsci494,Attendance,
3,14460432,au9747cp,Jericho,Greer,4,1,1,2019-01-23 14:00:00,2019-01-23 14:06:00,100 %,dsci494,Attendance,
4,14460432,au9747cp,Jericho,Greer,5,1,1,2019-01-25 14:00:00,2019-01-25 14:10:00,100 %,dsci494,Attendance,


In [100]:
df.shape

(3359, 13)

In [101]:
df.to_csv("attendance_practice_attempts.csv", index = False)