### One test descriptive statistics
___

Using exercises spreadsheets from Designing and Analyzing Language Tests by Oxford.

The purpose of this notebook is to compute the total score for each student and his or her percentage correct score, and then to calculate the various descriptive statistics.

<br>

#### General Setup
___

In [1]:
# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy.stats as ss

<br>

#### Load the data
___

In [2]:
test = pd.read_excel('Data/one_test.xlsx')
test.head()

Unnamed: 0,Student,Q01,Q02,Q03,Q04,Q05,Q06,Q07,Q08,Q09,...,Q11,Q12,Q13,Q14,Q15,Q16,Q17,Q18,Q19,Q20
0,Student01,0,0,0,0,0,1,1,1,0,...,1,1,1,0,1,0,0,0,0,0
1,Student02,1,1,1,1,1,0,1,1,1,...,1,0,1,1,1,0,1,0,1,1
2,Student03,1,0,0,0,0,0,0,1,0,...,0,1,0,0,0,0,1,1,1,1
3,Student04,0,0,0,0,0,1,0,0,0,...,1,1,1,0,0,0,0,1,0,1
4,Student05,1,1,1,1,1,0,0,0,1,...,0,1,1,0,1,1,1,0,1,0


In [3]:
# check the dataset info
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 21 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Student  30 non-null     object
 1   Q01      30 non-null     int64 
 2   Q02      30 non-null     int64 
 3   Q03      30 non-null     int64 
 4   Q04      30 non-null     int64 
 5   Q05      30 non-null     int64 
 6   Q06      30 non-null     int64 
 7   Q07      30 non-null     int64 
 8   Q08      30 non-null     int64 
 9   Q09      30 non-null     int64 
 10  Q10      30 non-null     int64 
 11  Q11      30 non-null     int64 
 12  Q12      30 non-null     int64 
 13  Q13      30 non-null     int64 
 14  Q14      30 non-null     int64 
 15  Q15      30 non-null     int64 
 16  Q16      30 non-null     int64 
 17  Q17      30 non-null     int64 
 18  Q18      30 non-null     int64 
 19  Q19      30 non-null     int64 
 20  Q20      30 non-null     int64 
dtypes: int64(20), object(1)
memory usage: 5.0

The dataset contains dichotomous test results for 20 students. 

In [4]:
# calculate total correct answers and add it to the dataframe
test['Total'] = test.loc[:, test.columns != 'Student'].sum(axis=1)
test.head()

Unnamed: 0,Student,Q01,Q02,Q03,Q04,Q05,Q06,Q07,Q08,Q09,...,Q12,Q13,Q14,Q15,Q16,Q17,Q18,Q19,Q20,Total
0,Student01,0,0,0,0,0,1,1,1,0,...,1,1,0,1,0,0,0,0,0,7
1,Student02,1,1,1,1,1,0,1,1,1,...,0,1,1,1,0,1,0,1,1,16
2,Student03,1,0,0,0,0,0,0,1,0,...,1,0,0,0,0,1,1,1,1,8
3,Student04,0,0,0,0,0,1,0,0,0,...,1,1,0,0,0,0,1,0,1,7
4,Student05,1,1,1,1,1,0,0,0,1,...,1,1,0,1,1,1,0,1,0,13


In [5]:
# calculate percetage of correct answers
test['% Correct'] = test['Total'] / len(test.columns[1:21]) * 100
test.head()

Unnamed: 0,Student,Q01,Q02,Q03,Q04,Q05,Q06,Q07,Q08,Q09,...,Q13,Q14,Q15,Q16,Q17,Q18,Q19,Q20,Total,% Correct
0,Student01,0,0,0,0,0,1,1,1,0,...,1,0,1,0,0,0,0,0,7,35.0
1,Student02,1,1,1,1,1,0,1,1,1,...,1,1,1,0,1,0,1,1,16,80.0
2,Student03,1,0,0,0,0,0,0,1,0,...,0,0,0,0,1,1,1,1,8,40.0
3,Student04,0,0,0,0,0,1,0,0,0,...,1,0,0,0,0,1,0,1,7,35.0
4,Student05,1,1,1,1,1,0,0,0,1,...,1,0,1,1,1,0,1,0,13,65.0


<br>

#### Descriptive stats
___

In [6]:
# calculate pandas stats and converting it to a dataframe
stats = pd.DataFrame(np.round(test['Total'].describe(),2))[1:]
stats

Unnamed: 0,Total
mean,10.83
std,3.64
min,3.0
25%,8.0
50%,11.0
75%,13.0
max,19.0


In [7]:
# renaming the std to std(sample)
stats.loc['std(sample)'] = stats.loc['std']
stats.drop('std', axis=0, inplace=True)

# adding mode and variance
stats.loc['mode'] = ss.mode(test['Total'])[0][0]
stats.loc['var(pop)'] = test['Total'].var()
stats

Unnamed: 0,Total
mean,10.83
min,3.0
25%,8.0
50%,11.0
75%,13.0
max,19.0
std(sample),3.64
mode,7.0
var(pop),13.247126


In [11]:
ss.variation(test['Total'])

0.3303217008243675