# Question Data Tutorial
The goal of this tutorial is to introduce some useful functions and show how to do typical tasks when working with quantitative educational data. This tutorial assumes you already have basic knowledge of Python and Pandas and have already completed the Exam data tutorial and/or mastered the skills in that lesson.

In this lesson, you will learn the following:
* How recode variables to new values
* Change the names of data frame columns
* Concatenate and merge data frames

***
Created by Dr. Nicholas Young

Last modified: April 3, 2025

Python version: 3.11.9

As will likely be the case for most your files, we start by importing numpy, pandas, and pyplot

In [13]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Now let's read in our question files. This time we have two files, one from a morning section of a course and one from an afternoon section of a course. To minimize the amount of typing, I'll use 'm' for morning and 'a' for afternoon in the variables I store my data.

Some background on this data:
* This instructor's exam had 4 multiple choice questions and 2 free response questions. We only have the multiple choice questions.
* 1 is correct while 0 is incorrect
* Q1 and Q4 are the same on both exams but the instructor swapped the order of Q2 and Q3 on the two exams. This means that Q2 on the morning exam is Q3 on the afternoon exam and Q2 on the morning exam is Q3 on the afternoon exam.
* The instructor also collected some data about the students taking their exam. This is stored in the `class_demographics.csv` file.

In [15]:
m_data = pd.read_csv('morning_exam.csv')
a_data = pd.read_csv('afternoon_exam.csv')

Let's start by inspecting our data and see what we are working with

In [26]:
m_data.head()

Unnamed: 0,id,section,Q1,Q2,Q3,Q4
0,100,morning,0,1,0,0
1,101,morning,0,1,1,0
2,102,morning,0,0,0,0
3,103,morning,0,1,1,0
4,104,morning,1,0,1,1


In [22]:
a_data.head()

Unnamed: 0,id,section,Q1,Q2,Q3,Q4
0,200,afternoon,0,0,B,0
1,201,afternoon,1,0,C,0
2,202,afternoon,1,1,B,0
3,203,afternoon,0,0,B,0
4,204,afternoon,0,1,D,0


You'll likely notice a problem in that Q3 is the students' responses rather than whether they are correct or not. Let's address that first.

## Changing the values of variables based on a condition.
As is often the case, our data isn't exactly the way we want it. Here, we have a column with student answers rather than whether it is correct not. We want to convert that to correct or not (1/0).

Let us assume that "B" is the correct answer to Q3 on the afternoon exam.

There are two ways to do this. Which one we want to use depends on how many different values there will be after the conversion. For a binary outcome like correct or not, we can use a logic statement. Here, we ask if Q3 is equal to B. If so, assign a 1 and otherwise, assign a zero.

In [47]:
a_data['Q3'] =(a_data['Q3'] == "B").astype(int)

Here the `as.type(int)` is important because when we test whether each response in Q3 is B, we get a list of True and False back. True is equivalent to 1 and False is equivalent to 0 so we can convert the logic variable to an integer to get ones and zeros.

If we had multiple values were assigning to (say you were trying to categorize the alternative conception that each response targets), you could use `replace`. In `replace`, you provide a dictionary with the current values followed by the new values. This provides much more flexibility than the previous way.

In [51]:
a_data['Q3'] = a_data['Q3'].replace({"A": 0, "B": 1, "C": 0, "D": 0})

If we look at the afternoon data now, we will see that all of the questions are now in correct/incorrect format

In [54]:
a_data.head()

Unnamed: 0,id,section,Q1,Q2,Q3,Q4
0,200,afternoon,0,0,1,0
1,201,afternoon,1,0,0,0
2,202,afternoon,1,1,1,0
3,203,afternoon,0,0,1,0
4,204,afternoon,0,1,0,0
