# Import All Necessary Modules And Setup Project

If you get any errors when importing these, ensure you run the commands:
```bash
$ python -m pip install -r requirements.txt
```
to install all necessary modules for this project. This command must be run from inside of this project directory.

It is recommended to use virtual environments for this project to ensure there is no conflicting package versions on your system.

Activate the virtual environment (if needed), run the pip install command, and then launch Jupyter Lab inside this project to get this project running.

In [None]:
# Uncomment the following line to execute the pip install
# %pip install -r requirements.txt

In [None]:
import pandas as pd
import numpy as np

# Visualization
from matplotlib import pyplot as plt
import seaborn as sns

from measure_incremental_development.compute import calculate_mid, classify_snapshots


## Import Auxiliary Modules and Functions

In [None]:
from projectConstants import *

`projectConstants` defines various constants (namely, column names) that are used throughout the project.

In [None]:
from getSubmissionDataframes import *

`getSubmissionDataframes` contains the following functions:

*   `getFileInStudentSubmission`
*   `getStudentSubmission`
*   `filterDownToRunAndEdits`
*   `filterDownToRunAndEditsAndPastes`
*   `getStudentSubmissionRunsAndEdits`
*   `getFileInStudentSubmissionRunsAndEdits`

In [None]:
from reconstructSubmissions import *

`reconstructSubmissions` has the functions:

*   `reconstructSingleFileDebugger`
*   `reconstructFinalFile`
*   `reconstructFileAtRunEvents`
*   `reconstructProjectAtRunEvents`

In [None]:
from viewReconstructions import *

`viewReconstructions` has the following functions:

*   `viewFinalReconstructedProject`
*   `viewReconstructedProjectStates`

In [None]:
from getStudentProjectInfo import *

`getStudentProjectInfo` has the following function:

*   `getStudentProjectList`

In [None]:
from filterOutBadReconstructions import *

`filterOutBadReconstructions` has the following functions:

*   `getFileReconstructionDF`
    *   Get the raw file reconstructions dataframe
*   `getProjectReconstructionDF`
    *   Same as above, but if any file in a project fails at reconstructing, the whole project is marked as a failed reconstruction
*   `getOnlyBadFileReconstructionsDF`
    *   Get a DF like `getFileReconstructionDF` returns, containing *ONLY* the bad file reconstructions
*   `getOnlyBadProjectReconstructions`
    *   Get a DF like `getProjectReconstructionDF` returns, containing *ONLY* the bad project reconstructions
*   `mergeKeystrokesWithFileReconstructions`
    *   Merge a keystroke dataframe with the *file* reconstruction df on `SubjectID, AssignmentID, CodeStateSection`
*   `mergeKeystrokesWithProjectReconstructions`
    *   Merge a keystroke dataframe with the *project* reconstruction df on `SubjectID, AssignmentID`
*   `getKeystrokesDFWithoutBadFileReconstructions`
    *   Filter down the keystroke dataframe to remove information related to *files* that reconstruct incorrectly
*   `getKeystrokesDFWithoutBadProjectReconstructions`
    *   Filter down the keystroke dataframe to remove information related to *projects* that reconstruct incorrectly

Unless granularity of the keystroke data is desired, the `getKeystrokesDFWithoutBadProjectReconstructions` will probably be the only needed function.

NOTE: The reconstruction data used for these functions was generated by the `checkSubmissions.sh` script and `determineReconstructionFailures.ipynb` notebook.

In [None]:
from midScoreFunctions import *

`midScoreFunctions` has the following functions:

*   `remove_empty_at_start`
*   `get_scores`
*   `get_mid_score_row`
*   `get_mid_score_all`

In [None]:
from timeBetweenRuns import *

`timeBetweenRuns` has the following functions:

*   `getTimestampRow`
*   `getFilteredRunEvents`
*   `getTimeBetweenRuns`
*   `getTimeBetweenRunsDf`

In [None]:
from codingSessionFunctions import *

`codingSessionFunctions` has the following functions:

*   `markEventsByCodingSessions`
*   `getIndividualSessionInfo`
*   `getCodingSessionsDf`
*   `sessionInfoToAssignmentInfo`

## Load Datasets

In [None]:
keystroke_df_unedited = pd.read_csv("data/keystrokes.csv")
student_df_unedited = pd.read_csv("data/students.csv")

#### Copy Datasets For Modification

This preserves the initial datasets, in case we ever need to bring an unedited column/row back into anything

In [None]:
keystroke_df = keystroke_df_unedited.copy()
student_df = student_df_unedited.copy()

#### Filter Keystroke Data To Only Projects That Have Reconstructed Correctly

In [None]:
keystroke_df = getKeystrokesDFWithoutBadProjectReconstructions(keystroke_df)

#### Get Information And Keystroke Dataframes For Each `Student,Project` pair

**NOTE:** This may take a few minutes to compute. 

In [None]:
projects_df, run_events_df, final_data = getStudentProjectList(student_df, keystroke_df)

print(len(projects_df), len(run_events_df))

In [None]:
# List all students with a submission for the assignment

for student, assign, df in final_data:
    if len(df) > 0:
        print(student, assign)

In [None]:
for student, assign, df in final_data:
    if len(df) > 0:
        # print(len(df))
        print(50*'=')
        print(student, assign)
        viewFinalReconstructedProject(df)
        print(50*'=')


## Add MID Library

- 0-2 Likely Incremental
- 2-2.5 Somewhat Incremental
- 2.5-3 Somewhat Non-Incremental
- 3+ Likely Non-Incremental

#### Calculate MID statistc for student and assignmemt

In [None]:
print(get_scores('Student10', 'Assign10', student_df))

In [None]:
mid_df = get_mid_score_all(final_data, student_df)

In [None]:
mid_df.to_csv('./data/mid_scores.csv', index=False)

## Code to get the time between runs

In [None]:
runEvents = getFilteredRunEvents(keystroke_df)

In [None]:
timeBetweenRunsDf = getTimeBetweenRunsDf(keystroke_df, final_data)

In [None]:
display(timeBetweenRunsDf)

In [None]:
timeBetweenRunsDf.to_csv('./data/timeBetweenRuns.csv', index=False)

## Get Coding Sessions

#### Defined as keypresses within 5 minutes of eachother

In [None]:
codingSessions = getCodingSessionsDf(keystroke_df, final_data)

In [None]:
codingSessions.to_csv('./data/codingSessions.csv', index=False)

In [None]:
assignmentKeystrokeInfo = sessionInfoToAssignmentInfo(codingSessions)

In [None]:
assignmentKeystrokeInfo.to_csv('./data/assignmentKeystrokeInfo.csv', index=False)