# Lab 3 - Online vs In-person class

### In 2020, the pandemic led to remote work and online classes. There's a debate about whether these changes will stick. Online learning has clear advantages, it's cost-effective, leveraging digital resources and content from global sources. However, there's a lingering question about its impact on students' academic performance: is it positive or negative?"


The details of the codes are available: https://github.com/jiwoongim/Causal-Inference-Tutorial/

For any who wants to ru along you can add this to your code cell in the beginning

!git clone https://github.com/jiwoongim/Causal-Inference-Tutorial.git

And then before from util….

import sys
sys.path.append('/content/Causal-Inference-Tutorial/ci/src/ci')

In [1]:
import sys

sys.path.append('/teamspace/studios/this_studio/2024-causal-inference-machine-learning/Causal-Inference-Tutorial/ci/src/ci')
sys.path.append('/teamspace/studios/this_studio/2024-causal-inference-machine-learning/Causal-Inference-Tutorial/ci/src')

In [2]:
from util import get_dataset

dataset_name = "./online_classroom"
dataset = get_dataset(dataset_name)
print(dataset.data_df)

     gender  asian  black  hawaiian  hispanic  unknown  white  format_ol  \
0         0    0.0    0.0       0.0       0.0      0.0    1.0          0   
1         1    0.0    0.0       0.0       0.0      0.0    1.0          0   
2         1    0.0    0.0       0.0       0.0      0.0    1.0          0   
3         1    0.0    0.0       0.0       0.0      0.0    1.0          0   
4         1    0.0    0.0       0.0       0.0      0.0    1.0          1   
..      ...    ...    ...       ...       ...      ...    ...        ...   
318       0    0.0    0.0       0.0       0.0      0.0    1.0          0   
319       1    NaN    NaN       NaN       NaN      NaN    NaN          1   
320       0    NaN    NaN       NaN       NaN      NaN    NaN          1   
321       1    NaN    NaN       NaN       NaN      NaN    NaN          1   
322       1    0.0    0.0       0.0       0.0      0.0    1.0          0   

     format_blended  falsexam  
0               0.0  63.29997  
1               0.0  79

### Randomized classes so that some students were assigned to have face-to-face lectures, others to have only online lessons, and a third group to have a blended format of both online and face-to-face classes.

### Action assignment
#### Group 1: online class
#### Group 2: inclasss
#### Group 3: both online & in-class


In [3]:
print("...Computing Treatments Stats")
dataset.get_treatment_stats()


...Computing Treatments Stats
Number of online class: 94
Number of in-person class: 120
Number of blended: 109


### Outcome - standard exam at the end of the semester.

\begin{align*}
\large
\mathbb{E}[Y[a]] &= \sum_x p(y|A=a,x)p(x)\\
                &\simeq \frac{1}{n} \sum_i p(y_i|A=a,x_i)
\end{align*}

#### Just group by action and average the outcomes.

### Potential Outcomes

\begin{align*}
\large
\mathbb{E}[Y[\text{'in-person class'}]] = 78.54\\
\large
\mathbb{E}[Y[\text{'online class'}]] = 73.63\\
\large  
\mathbb{E}[Y[\text{'blended class'}]] = 77.09\\
\end{align*}


In [4]:
print("...Computing Potential Outcome")
potential_outcomes = dataset.get_potential_outcomes_by_treatment()
print(potential_outcomes)

...Computing Potential Outcome
                gender     asian     black  hawaiian  hispanic   unknown  \
class_format                                                               
blended       0.550459  0.217949  0.102564  0.025641  0.012821  0.012821   
inclass       0.633333  0.202020  0.070707  0.000000  0.010101  0.000000   
online        0.542553  0.228571  0.028571  0.014286  0.028571  0.000000   

                 white  format_ol  format_blended   falsexam  
class_format                                                  
blended       0.628205        0.0             1.0  77.093731  
inclass       0.717172        0.0             0.0  78.547485  
online        0.700000        1.0             0.0  73.635263  


### What is the effect of having online classes?

\begin{align*}
    ATE &= \mathbb{E}[Y[\text{'online class'}]] - \mathbb{E}[Y[\text{'in-person class'}]] = -4.912 \\
    ATE &= \mathbb{E}[Y[\text{'online class'}]] - \mathbb{E}[Y[\text{'blended class'}]] = -3.457
\end{align*}


In [5]:
print("...Computing ATE")
po_online = potential_outcomes.loc["online"]["falsexam"]
po_inclass = potential_outcomes.loc["inclass"]["falsexam"]
ate = po_online - po_inclass
print(f"... ATE: {ate}")

...Computing ATE
... ATE: -4.912221498226955


### Overall, online classes have a negative effect!


# Variance Matters Evenif Actions are Randomized

## Discussion
### Let's continue to assume that students have chosen the courses randomly between "in-person classes" vs. "online classes".
### However, suppose that the number of students in "in-person class" is far less than the number of students in "online class".

### Q: Could the performance of standard test scores change depending on the number of students?

#### Possibilities:
1. The teacher can pay more attention to each student when the number of students are less.
2. When there are a little number of students, the variance of the scores can highly vary. So, the average score can also vary by chance.

### Ironically, if the number of students is high, then the variance reduces but the average performance can decrease.

#### Maybe the right setup is to have "online class" as control, and "(in-person, # students)" as treatments. Find the diminishing return point.


## Back to real online classroom example

\begin{align*}
\large
\text{standard error}: \hat{se} = \sqrt{\frac{1}{n-1}\sum^n_{i=1}(x_i-\bar{x})^2}\\
\large
\text{confidence interval}: (\hat{\mu}-2\hat{se}, \hat{mu}+2\hat{se})
\end{align*}

In [6]:
    online_se, inclass_se = dataset.get_se_of_outcomes_by_treatment()
    online_ci, inclass_ci = dataset.get_ci_of_outcomes_by_treatment()
    print("95% CI for Online:", online_ci)
    print("95% for for Inclass:", inclass_ci)

95% CI for Online: (70.62243066639022, 76.64809550382253)
95% for for Inclass: (76.83767633789478, 80.25729282877188)


### Note that variance get larger when you compute ATE!

\begin{align*}
\large
\mathcal{N}(\mu_1, \sigma_1^2) \pm \mathcal{N}(\mu_2, \sigma_2^2) = \mathcal{N}(\mu_1\pm\mu_2, \sigma_1^2+\sigma_2^2)
\end{align*}
ATE confidence interval gorws as well.

In [7]:
    ate_ci, _ = dataset.get_ci_of_average_treatment_effect()
    print("95% for for ATE:", ate_ci)

95% for for ATE: (-8.376410208363385, -1.4480327880905253)


### The upper confidence interval is still negative!