
About: This notebook analyses interactions between students. It looks at: 
1) student availability for interaction, 
2) the presence of others in the space for interaction, 
3) collaboration between Dream Toy student pairs. It allows the user to analyse these interactions for different periods of the week - close to submission deadlines and not - as well as in different areas of the makerspace - the collaboration and technical areas, respectively.
It is intended that this notebook will be run with multiprocessing.

How features are computed: Collaboration is interpreted as a combination of proximity (within 0.5m) & the bodies of two collaborators being turned towards each other, based on the angles of their shoulders towards each other.
1.   Dfs containing only 2 individuals' data are first compared for proximity
2.   These dfs are then compared for the individuals being turned towards each other
3. No. of timestamps for which both conditions are true are summed
4. Sum is multiplied by 0.67 to get total in secs (technically this step is unnecessary for correlations, but useful for sharing absolute values)

For features checking availability of students in the space (both how often is an individual in the space and how often are others with an individual in the space), this is computed by checking how many different periods for which is an individual in the space after having left for more than 2 hrs, and how many times of those occasions is another student in the space




Each function accepts a dataframe as an argument. Each function is expected to be run with multiprocessing, therefore run with individual dfs in a list that are parallely computed. e.g. if you wanted to compute a function for all days of the week, you would multiprocess with a list of dfs for each day of the week, if you wanted to compute for days before a deadline, you would multiprocess with a list of dfs for those days before the deadline

# Installations

In [1]:
!pip install multiprocess

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting multiprocess
  Downloading multiprocess-0.70.14-py310-none-any.whl (134 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m134.3/134.3 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting dill>=0.3.6 (from multiprocess)
  Downloading dill-0.3.6-py3-none-any.whl (110 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m110.5/110.5 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: dill, multiprocess
Successfully installed dill-0.3.6 multiprocess-0.70.14


In [2]:
!pip install icalendar

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting icalendar
  Downloading icalendar-5.0.5-py3-none-any.whl (99 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m99.1/99.1 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: icalendar
Successfully installed icalendar-5.0.5


# Imports

In [3]:
import math
import os
import re
import ast
import PIL
import math
import cv2
import sys
import subprocess
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import multiprocess as mp
import collections
from datetime import datetime
from matplotlib import pyplot as plt
from PIL import Image, ImageDraw, ImageFont

# pandas tricks for better display
pd.set_option('display.width', 1500)
pd.set_option('display.max_columns', 100)


# Paths

In [4]:
# base folder path
base_path = ''

# if we are on google colab, we mount the drive
if 'google.colab' in str(get_ipython()):
  from google.colab import drive
  drive.mount('/content/drive')
  base_path = './drive/Shareddrives/2020-Makerspace-tracking'

# if we are running it locally, we use the standard gdrive path
# (you will have to update this path)
else: base_path = '/Users/schneibe/Library/CloudStorage/GoogleDrive-bertrand_schneider@g.harvard.edu/Shared drives/2020-Makerspace-tracking/'

Mounted at /content/drive


In [5]:
# folders we'll be working with
agg_path = os.path.join(base_path, 'Data', '2022-Spr-T519', 'aggregated')
data_path = os.path.join(base_path, 'Data', '2022-Spr-T519', 'poseconnect')
analysis_path = os.path.join(base_path, 'Analysis', '2022-Spr-Week7')

# Data

Summary Data

In [6]:
sid_summary_path = os.path.join(agg_path,'sid_summary','sid_summary.csv')
sid_summary_df=pd.read_csv(sid_summary_path)
sid_summary_df

Unnamed: 0.2,Unnamed: 0.1,student_id,hour,collaboration,laser,nothing,office,printer,sewing,soldering,tool,Unnamed: 0,email,mid_gain_se,mid_gain_com,enjoyment,stress_level,mid_gain_se_norm,mid_gain_com_norm,enjoyment_norm,stress_level_norm,score,stu_instructor_time,stu_student_time,stu_marc_time,stu_daniel_time,stu_iulian_time,stu_bertrand_time,stu_pair_time,stu_pair_tech,stu_pair_collab,stumarc_tech,stumarc_collab,stu_availability,others_present,stu_availability_dl,stu_availability_ndl,marctime_dl,marctime_ndl,stupairtechtime_dl,stupairtechtime_ndl,stupaircollabtime_dl,stupaircollabtime_ndl
0,0,aashna,39,51655.0,19352.0,3474.0,64.0,431.0,0.0,1.0,1900.0,1,aashnasaraf@gse.harvard.edu,-5,0.375,2,2,0.25,0.409091,0.0,0.0,0.164773,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1,chali,288,629155.0,106095.0,20278.0,578.0,3649.0,173.0,2139.0,14861.0,11,chalisakaewla@gse.harvard.edu,-1,0.5,2,4,0.583333,0.454545,0.0,1.0,0.50947,12699.18,45096.36,343.04,12354.13,0.0,2.01,42226.08,17.42,13803.34,5.36,131.32,8.0,2.0,5.0,3.0,136.68,0.0,74.37,0.67,12617.44,1185.9
2,2,conner,343,717562.0,1317.0,27330.0,1576.0,2566.0,13381.0,0.0,5176.0,16,ceastman@gse.harvard.edu,-7,2.0,3,3,0.083333,1.0,0.333333,0.5,0.479167,1573.83,25559.16,1088.75,485.08,0.0,0.0,15952.7,265.32,1966.45,0.0,123.28,7.0,3.0,2.0,5.0,0.0,123.28,19.43,435.5,13.4,1953.05
3,3,denise,132,157604.0,67578.0,12990.0,38.0,1244.0,332.0,848.0,12941.0,3,denisefabella@gse.harvard.edu,-4,-0.0625,4,3,0.333333,0.25,0.666667,0.5,0.4375,46.9,24885.81,0.0,46.9,0.0,0.0,22047.02,3482.66,655.93,0.0,0.0,4.0,2.0,3.0,1.0,0.0,0.0,5562.34,18.09,47.57,608.36
4,4,hoa,204,300396.0,5209.0,109129.0,443.0,7613.0,1895.0,272.0,2747.0,9,hoapham@gse.harvard.edu,-5,-0.6875,5,2,0.25,0.022727,1.0,0.0,0.318182,606.35,8152.56,27.47,578.88,0.0,0.0,7349.9,354.43,1448.54,0.0,20.77,3.0,2.0,1.0,2.0,0.0,20.77,7.37,933.31,672.68,775.86
5,5,ji su,328,372448.0,60198.0,34858.0,1386.0,4082.0,724.0,1184.0,40076.0,17,jlee@gse.harvard.edu,-1,0.125,4,4,0.583333,0.318182,0.666667,1.0,0.642045,1310.52,42739.97,406.69,903.83,0.0,0.0,42.21,0.0,0.0,239.19,0.0,11.0,2.0,9.0,2.0,239.19,0.0,0.0,0.0,0.0,0.0
6,6,juan,302,406361.0,95339.0,263948.0,370.0,4630.0,7.0,778.0,14544.0,6,juanpablo_garcesramirez@gse.harvard.edu,0,0.25,3,4,0.666667,0.363636,0.333333,1.0,0.590909,2734.94,9336.45,166.83,2557.39,0.0,10.72,0.0,0.0,0.0,0.67,3.35,10.0,1.0,7.0,3.0,0.0,4.02,0.0,0.0,0.0,0.0
7,7,melissa,164,133094.0,80012.0,63433.0,993.0,1897.0,630.0,6125.0,42036.0,2,mkain@gse.harvard.edu,0,1.5,3,2,0.666667,0.818182,0.333333,0.0,0.454545,979.54,23312.65,979.54,0.0,0.0,0.0,22047.02,3482.66,655.93,108.54,103.85,3.0,1.0,1.0,2.0,0.0,212.39,5562.34,18.09,47.57,608.36
8,8,miaoya,124,135991.0,35153.0,9003.0,0.0,480.0,482.0,109.0,4817.0,5,miaoyazhong@gse.harvard.edu,3,0.875,4,4,0.916667,0.590909,0.666667,1.0,0.793561,261.3,9492.56,0.0,261.3,0.0,0.0,8957.9,182.91,3979.13,0.0,0.0,3.0,1.0,3.0,0.0,0.0,0.0,779.21,0.0,3979.13,0.0
9,9,natalie,310,517890.0,192475.0,25856.0,678.0,3644.0,21.0,1931.0,3644.0,15,nvarkey@gse.harvard.edu,-1,-0.0625,5,3,0.583333,0.25,1.0,0.5,0.583333,24288.84,103232.26,117.92,24170.92,0.0,0.0,42226.08,17.42,13803.34,0.0,56.95,11.0,2.0,8.0,3.0,35.51,21.44,74.37,0.67,12617.44,1185.9


Survey

In [7]:
scores_path = os.path.join(agg_path, 'participants_scores.csv')
scores_df = pd.read_csv(scores_path)
scores_df.head()

Unnamed: 0.1,Unnamed: 0,student_id,email,mid_gain_se,mid_gain_com,enjoyment,stress_level,mid_gain_se_norm,mid_gain_com_norm,enjoyment_norm,stress_level_norm,score
0,0,rui,ruizhou@gse.harvard.edu,0,0.0,4,3,0.666667,0.272727,0.666667,0.5,0.526515
1,1,aashna,aashnasaraf@gse.harvard.edu,-5,0.375,2,2,0.25,0.409091,0.0,0.0,0.164773
2,2,melissa,mkain@gse.harvard.edu,0,1.5,3,2,0.666667,0.818182,0.333333,0.0,0.454545
3,3,denise,denisefabella@gse.harvard.edu,-4,-0.0625,4,3,0.333333,0.25,0.666667,0.5,0.4375
4,4,rhea,rsharma@gse.harvard.edu,0,0.0,3,3,0.666667,0.272727,0.333333,0.5,0.443182


### Sensor data

In [8]:
# load the script
script = os.path.join(base_path, 'Analysis', 'scripts', 'augment_df.py')
%run "$script"

In [9]:
# go through the poseconnect data
folder = os.path.join(data_path, 'poseconnect_cleaned')
for dir in os.listdir(folder):
    if '2022' in dir:
        subfolder = os.path.join(folder, dir)
        for subfile in os.listdir(subfolder):
            
            # we only care about the 3d reconstructed data
            if subfile.endswith('.csv') and '3d' in subfile:
                path = os.path.join(subfolder, subfile)
                csv = path.replace('3d_', 'summary_')
                
                # if the summary file already exists, we skip it
                if os.path.isfile(csv): continue
                    
                # we read the data and add AOI columns
                data = pd.read_csv(path)
                add_aoi_to_df(data) 
                
                # summarize the data by student, hour, aoi and save it
                summary = data.groupby(['student_id','hour', 'aoi']).size().unstack()
                summary.to_csv(csv)

### Combine the two together

In [10]:
import glob
import pathlib

csv_files = list(pathlib.Path(folder).rglob('*.csv'))
summary_files = [x for x in csv_files if 'summary_' in str(x)]
summary_files

[PosixPath('drive/Shareddrives/2020-Makerspace-tracking/Data/2022-Spr-T519/poseconnect/poseconnect_cleaned/2022-03-03/summary_2022-03-03.csv'),
 PosixPath('drive/Shareddrives/2020-Makerspace-tracking/Data/2022-Spr-T519/poseconnect/poseconnect_cleaned/2022-03-04/summary_2022-03-04.csv'),
 PosixPath('drive/Shareddrives/2020-Makerspace-tracking/Data/2022-Spr-T519/poseconnect/poseconnect_cleaned/2022-03-05/summary_2022-03-05.csv'),
 PosixPath('drive/Shareddrives/2020-Makerspace-tracking/Data/2022-Spr-T519/poseconnect/poseconnect_cleaned/2022-03-06/summary_2022-03-06.csv'),
 PosixPath('drive/Shareddrives/2020-Makerspace-tracking/Data/2022-Spr-T519/poseconnect/poseconnect_cleaned/2022-03-07/summary_2022-03-07.csv'),
 PosixPath('drive/Shareddrives/2020-Makerspace-tracking/Data/2022-Spr-T519/poseconnect/poseconnect_cleaned/2022-03-08/summary_2022-03-08.csv'),
 PosixPath('drive/Shareddrives/2020-Makerspace-tracking/Data/2022-Spr-T519/poseconnect/poseconnect_cleaned/2022-03-09/summary_2022-03-09

In [11]:
data_files=[x for x in csv_files if '3d_' in str(x)]
data_files

[PosixPath('drive/Shareddrives/2020-Makerspace-tracking/Data/2022-Spr-T519/poseconnect/poseconnect_cleaned/2022-03-03/3d_2022-03-03.csv'),
 PosixPath('drive/Shareddrives/2020-Makerspace-tracking/Data/2022-Spr-T519/poseconnect/poseconnect_cleaned/2022-03-04/3d_2022-03-04.csv'),
 PosixPath('drive/Shareddrives/2020-Makerspace-tracking/Data/2022-Spr-T519/poseconnect/poseconnect_cleaned/2022-03-05/3d_2022-03-05.csv'),
 PosixPath('drive/Shareddrives/2020-Makerspace-tracking/Data/2022-Spr-T519/poseconnect/poseconnect_cleaned/2022-03-06/3d_2022-03-06.csv'),
 PosixPath('drive/Shareddrives/2020-Makerspace-tracking/Data/2022-Spr-T519/poseconnect/poseconnect_cleaned/2022-03-07/3d_2022-03-07.csv'),
 PosixPath('drive/Shareddrives/2020-Makerspace-tracking/Data/2022-Spr-T519/poseconnect/poseconnect_cleaned/2022-03-08/3d_2022-03-08.csv'),
 PosixPath('drive/Shareddrives/2020-Makerspace-tracking/Data/2022-Spr-T519/poseconnect/poseconnect_cleaned/2022-03-09/3d_2022-03-09.csv')]

In [12]:
# combine all the dfs together 
main_df = None

for csv in summary_files:
    df = pd.read_csv(csv)
    df.insert(0,'file', csv)
    if type(main_df) == type(None): main_df = df
    else: main_df = main_df.append(df, ignore_index=True)

main_df = main_df.groupby(['student_id']).sum()
main_df

  else: main_df = main_df.append(df, ignore_index=True)
  else: main_df = main_df.append(df, ignore_index=True)
  else: main_df = main_df.append(df, ignore_index=True)
  else: main_df = main_df.append(df, ignore_index=True)
  else: main_df = main_df.append(df, ignore_index=True)
  else: main_df = main_df.append(df, ignore_index=True)
  main_df = main_df.groupby(['student_id']).sum()


Unnamed: 0_level_0,hour,collaboration,laser,nothing,office,printer,sewing,soldering,tool
student_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
aashna,39,51655.0,19352.0,3474.0,64.0,431.0,0.0,1.0,1900.0
alaa,81,173798.0,7287.0,13550.0,350.0,1341.0,2267.0,7063.0,2330.0
bertrand,43,3779.0,324.0,2801.0,300.0,581.0,0.0,0.0,1707.0
chali,288,629155.0,106095.0,20278.0,578.0,3649.0,173.0,2139.0,14861.0
conner,343,717562.0,1317.0,27330.0,1576.0,2566.0,13381.0,0.0,5176.0
daniel,285,390026.0,23178.0,108692.0,1892.0,55900.0,2059.0,2025.0,13204.0
denise,132,157604.0,67578.0,12990.0,38.0,1244.0,332.0,848.0,12941.0
hoa,204,300396.0,5209.0,109129.0,443.0,7613.0,1895.0,272.0,2747.0
iulian,50,53712.0,1.0,3352.0,108.0,331.0,963.0,10582.0,120.0
ji su,328,372448.0,60198.0,34858.0,1386.0,4082.0,724.0,1184.0,40076.0


In [13]:
master_df = main_df.merge(scores_df, on='student_id')
master_df

Unnamed: 0.1,student_id,hour,collaboration,laser,nothing,office,printer,sewing,soldering,tool,Unnamed: 0,email,mid_gain_se,mid_gain_com,enjoyment,stress_level,mid_gain_se_norm,mid_gain_com_norm,enjoyment_norm,stress_level_norm,score
0,aashna,39,51655.0,19352.0,3474.0,64.0,431.0,0.0,1.0,1900.0,1,aashnasaraf@gse.harvard.edu,-5,0.375,2,2,0.25,0.409091,0.0,0.0,0.164773
1,chali,288,629155.0,106095.0,20278.0,578.0,3649.0,173.0,2139.0,14861.0,11,chalisakaewla@gse.harvard.edu,-1,0.5,2,4,0.583333,0.454545,0.0,1.0,0.50947
2,conner,343,717562.0,1317.0,27330.0,1576.0,2566.0,13381.0,0.0,5176.0,16,ceastman@gse.harvard.edu,-7,2.0,3,3,0.083333,1.0,0.333333,0.5,0.479167
3,denise,132,157604.0,67578.0,12990.0,38.0,1244.0,332.0,848.0,12941.0,3,denisefabella@gse.harvard.edu,-4,-0.0625,4,3,0.333333,0.25,0.666667,0.5,0.4375
4,hoa,204,300396.0,5209.0,109129.0,443.0,7613.0,1895.0,272.0,2747.0,9,hoapham@gse.harvard.edu,-5,-0.6875,5,2,0.25,0.022727,1.0,0.0,0.318182
5,ji su,328,372448.0,60198.0,34858.0,1386.0,4082.0,724.0,1184.0,40076.0,17,jlee@gse.harvard.edu,-1,0.125,4,4,0.583333,0.318182,0.666667,1.0,0.642045
6,juan,302,406361.0,95339.0,263948.0,370.0,4630.0,7.0,778.0,14544.0,6,juanpablo_garcesramirez@gse.harvard.edu,0,0.25,3,4,0.666667,0.363636,0.333333,1.0,0.590909
7,melissa,164,133094.0,80012.0,63433.0,993.0,1897.0,630.0,6125.0,42036.0,2,mkain@gse.harvard.edu,0,1.5,3,2,0.666667,0.818182,0.333333,0.0,0.454545
8,miaoya,124,135991.0,35153.0,9003.0,0.0,480.0,482.0,109.0,4817.0,5,miaoyazhong@gse.harvard.edu,3,0.875,4,4,0.916667,0.590909,0.666667,1.0,0.793561
9,natalie,310,517890.0,192475.0,25856.0,678.0,3644.0,21.0,1931.0,3644.0,15,nvarkey@gse.harvard.edu,-1,-0.0625,5,3,0.583333,0.25,1.0,0.5,0.583333


In [14]:
#import data
#since functions will use multiprocessing and some features are computed according to days of the week, each day's CSV is imported as a separate df
df_list=[]
for csv in data_files:
  if(os.stat(csv).st_size != 0):
    df=pd.read_csv(csv)
    df_list.append(df)

In [15]:
#complete df of one week's dates
df_concat = None

for df in df_list:
    if type(df_concat) == type(None): df_concat = df
    else: df_concat = df_concat.append(df, ignore_index=True)

  else: df_concat = df_concat.append(df, ignore_index=True)


In [16]:
df_concat

Unnamed: 0,student_id,timestamp,period_info,pose_2d_ids,pose_track_3d_id,0_x,0_y,0_z,1_x,1_y,1_z,2_x,2_y,2_z,3_x,3_y,3_z,4_x,4_y,4_z,5_x,5_y,5_z,6_x,6_y,6_z,7_x,7_y,7_z,8_x,8_y,8_z,9_x,9_y,9_z,10_x,10_y,10_z,11_x,11_y,11_z,12_x,12_y,12_z,13_x,13_y,13_z,14_x,14_y,14_z,15_x,15_y,15_z,16_x,16_y,16_z
0,alaa,2022-03-03 11:59:58.333000-05:00,[],"['0_2_2022-03-03-12-00-00_32_252', '0_0_2022-0...",702ea188f8ed4baeb082d1de2de36e14,4.587749,3.815725,1.291853,4.608601,3.771621,1.320903,4.605619,3.831106,1.314416,4.638632,3.711567,1.310427,4.776135,3.816300,1.300858,4.626925,3.614881,1.170360,4.808679,3.899865,1.127346,4.546496,3.578651,0.899637,4.725025,3.972797,0.904364,4.427299,3.768063,0.965655,4.504177,3.901629,0.966268,4.673406,3.576978,0.692934,4.772597,3.805176,0.665899,4.562533,3.691466,0.495581,4.641915,3.916310,0.505219,4.723884,3.454788,0.454414,4.729767,3.854114,0.449075
1,conner,2022-03-03 11:59:58.333000-05:00,[],"['0_0_2022-03-03-12-00-00_0_0', '1_2_2022-03-0...",ce3d78a51a3844f28228d6ae44af0095,3.794381,2.385934,1.150455,3.776397,2.391676,1.184838,3.828644,2.405571,1.182714,3.752801,2.271805,1.228534,3.920583,2.290996,1.213750,3.669967,2.135557,1.118720,3.992474,2.149991,1.114669,3.540116,2.194768,0.876354,3.922844,2.289566,0.889181,3.625917,2.502845,0.915856,3.824736,2.450527,0.970192,3.724309,2.037788,0.645411,3.986952,2.045394,0.648115,3.712892,2.216469,0.661014,3.923642,2.314203,0.556552,3.882857,1.968686,0.485327,3.928037,2.083287,0.402023
2,iulian,2022-03-03 11:59:58.333000-05:00,[],"['1_0_2022-03-03-12-00-00_0_1', '2_4_2022-03-0...",3e63845cf48b435e97a1f535790ca3db,2.593531,5.066759,1.120396,2.601009,5.137371,1.154971,2.597127,5.057150,1.158102,2.524500,5.196687,1.157995,2.432163,5.035139,1.171502,2.425943,5.294851,1.038472,2.384366,4.945238,1.056190,2.491426,5.272451,0.768300,2.439769,4.919987,0.786296,2.698739,5.096933,0.752734,2.647414,5.002916,0.773538,1.976159,5.232603,0.664640,1.990143,4.980256,0.680192,2.354763,5.247836,0.545953,2.398035,4.807415,0.668888,2.245312,5.142913,0.540910,2.259032,5.050715,0.496892
3,rhea,2022-03-03 11:59:58.333000-05:00,[],"['0_0_2022-03-03-12-00-00_0_3', '2_4_2022-03-0...",a29925621f4c4607811e94159660f793,4.102947,2.371081,1.255347,4.131122,2.332003,1.286275,4.256654,2.364661,1.284287,4.195360,2.256518,1.291817,4.280587,2.397798,1.274423,4.220375,2.155668,1.144090,4.364625,2.397337,1.110431,4.228372,2.033439,0.942239,4.346208,2.424165,0.886875,4.218369,2.086450,0.804723,4.311809,2.241146,0.799338,4.296697,2.074624,0.724345,4.395891,2.233321,0.710402,4.392623,1.837919,0.670981,4.333614,2.144272,0.561175,4.814688,1.984768,0.523746,4.744262,2.153429,0.503067
4,yani,2022-03-03 11:59:58.333000-05:00,[],"['0_2_2022-03-03-12-00-00_32_256', '0_0_2022-0...",d07126e26ad74a369e8fd7f807ce06e5,3.216531,2.433644,0.906522,3.245860,2.458172,0.918309,3.234221,2.437440,0.925389,3.130995,2.379197,1.041321,3.287013,2.328904,0.985869,2.983588,2.247918,1.025574,3.287602,2.152619,0.868869,2.777681,2.173783,0.876492,3.226568,2.248746,0.634860,3.022058,2.427485,0.691159,3.044499,2.412328,0.691471,2.841066,2.050365,0.571140,3.049495,2.033765,0.522657,2.926181,2.437595,0.403017,3.138803,2.306264,0.370921,2.546202,2.055151,0.293082,3.144051,2.364242,0.118294
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7468867,juan,2022-03-09 20:30:01.667000-05:00,[],"['1_2_2022-03-09-20-30-00_50_138', '1_4_2022-0...",580a2d21dcb04eb1bb0ebbc5e7f8daf1,2.265652,4.484807,1.096176,2.242466,4.516355,1.128116,2.266384,4.473225,1.128860,2.152933,4.579241,1.123077,2.160081,4.318690,1.110754,2.062484,4.700229,0.941409,2.090761,4.243715,0.948824,2.204604,4.787150,0.708433,2.264486,4.363734,0.760783,2.356460,4.631990,0.822942,2.416295,4.388492,0.792708,2.245667,4.673546,0.502725,2.271854,4.395597,0.493795,2.679663,4.562706,0.645804,2.690479,4.598695,0.566205,2.827186,4.864247,0.570728,2.810723,4.921866,0.443011
7468868,juan,2022-03-09 20:30:01.733000-05:00,[],"['1_2_2022-03-09-20-30-00_51_141', '1_4_2022-0...",580a2d21dcb04eb1bb0ebbc5e7f8daf1,2.264086,4.494708,1.095767,2.243711,4.516507,1.128552,2.267013,4.473953,1.126789,2.148699,4.575527,1.123372,2.156222,4.310183,1.105833,2.064542,4.700233,0.943953,2.086644,4.238103,0.945923,2.205473,4.786751,0.708017,2.268632,4.351058,0.753923,2.357335,4.632916,0.829270,2.401071,4.341386,0.803428,2.240975,4.673380,0.501967,2.251942,4.371934,0.492137,2.623251,4.825080,0.640444,2.614206,4.521411,0.624757,2.759041,5.045650,0.579002,2.738804,5.105461,0.590802
7468869,juan,2022-03-09 20:30:01.800000-05:00,[],"['1_2_2022-03-09-20-30-00_52_144', '1_4_2022-0...",580a2d21dcb04eb1bb0ebbc5e7f8daf1,2.260138,4.493280,1.095237,2.237225,4.526172,1.131120,2.262771,4.472199,1.122683,2.154398,4.589249,1.126153,2.158839,4.305725,1.105087,2.066805,4.694863,0.937194,2.092812,4.250522,0.950752,2.201629,4.793428,0.704896,2.263875,4.349868,0.747253,2.349485,4.626553,0.822467,2.426940,4.382216,0.783788,2.238942,4.667286,0.500513,2.260959,4.389292,0.495092,2.631767,4.842475,0.591954,2.673069,4.661146,0.569139,2.765091,5.058871,0.579067,2.664984,4.759627,0.375639
7468870,juan,2022-03-09 20:30:01.867000-05:00,[],"['1_4_2022-03-09-20-30-00_53_164', '1_2_2022-0...",580a2d21dcb04eb1bb0ebbc5e7f8daf1,2.260135,4.488797,1.091614,2.240586,4.530331,1.129090,2.267172,4.472190,1.119941,2.147719,4.584923,1.124945,2.163803,4.311053,1.108457,2.066514,4.696124,0.939219,2.092131,4.250476,0.946573,2.191076,4.773764,0.698911,2.268614,4.328985,0.745228,2.353176,4.627560,0.820725,2.399474,4.327324,0.791138,2.247016,4.653396,0.497994,2.284588,4.402677,0.496688,2.642973,4.685784,0.618432,2.693485,4.600052,0.560875,2.825441,4.883465,0.549795,2.735551,5.063736,0.588888


In [17]:
#create list of dfs that analyse data around deadline date (Tuesday & Wednesday) and another for data not around deadline date (days other than Tuesdays & Wednesdays)
#df_deadline
df_dl=[df_list[4], df_list[5]]
#df_non_deadline
df_ndl=[df_list[0], df_list[1],df_list[2], df_list[3]]

In [18]:
#create list of all students' and instructors' names, for use in functions
student_list=df_concat['student_id'].unique()
student_list
instructor_list=['daniel', 'iulian','bertrand','marc','alaa']
student_list=[student for student in student_list if student not in instructor_list]
student_list.append('helen')

In [19]:
import warnings
from pandas.errors import SettingWithCopyWarning
warnings.simplefilter(action="ignore", category=SettingWithCopyWarning)

# **Helper Functions**



**Multiprocess Function**

In [20]:
def multiprocess_functions(function,df_list):
    function = function
    args = df_list
    times=[]

    # figure out how many cores we need
    num_cores = min(mp.cpu_count(), len(args))

    # multiprocess the arguments using the function defined above
    with mp.Pool(num_cores) as pool:
        #get list of time dicts
        times=pool.map(function, args)
    return times

**Converting Time List from MP to Dictionary**

In [21]:
def time_list_to_dict(time_list,student_list):
    times_final = {'conner':0,
 'rhea':0,
 'yani':0,
 'aashna':0,
 'sara':0,
 'chali':0,
 'natalie':0,
 'rachel':0,
 'xiaoyi':0,
 'hoa':0,
 'melissa':0,
 'denise':0,
 'rui':0,
 'ji su':0,
 'juan':0,
 'rebecca':0,
 'miaoya':0,
 'helen':0}
    for student in student_list:
        for dict in time_list:
            times_final[student]=times_final[student]+dict[student]

    return times_final

**Add dict to df**

In [22]:
def dict_to_col(df,time_dict, col_name):
    df[col_name]=pd.Series([])
    df[col_name]=df[col_name].fillna(df['student_id'].map(time_dict))

# **Student Availability in the Space** (no. of times that students show up in the space)

In [23]:
def student_availability(df):
    student_list=['conner','rhea','yani','aashna','sara','chali','natalie','rachel','xiaoyi','hoa','melissa','denise',
 'rui','ji su','juan','rebecca','miaoya','helen']
    student_appearances={}
    for student in student_list:
        arrival_counter=0
        curr_time=0
        last_time=0
        #creating df for each student
        df_stu = df.loc[(df['student_id']==student)]
        df_stu.drop_duplicates(subset=['timestamp'], inplace=True)
        df_stu = df_stu.reset_index(drop=True)
        df_stu['timestamp']=df_stu['timestamp'].map(lambda x:x[11:19])
        
        for row in df_stu.to_dict('records'):
            curr_time=row['timestamp']
            if arrival_counter==0:
                arrival_counter=arrival_counter+1
            if last_time!=0:
                if ((datetime.strptime(curr_time,'%H:%M:%S')-datetime.strptime(last_time,'%H:%M:%S'))/3600).seconds>=2:
                    arrival_counter=arrival_counter+1
            last_time=curr_time
        student_appearances[student]=arrival_counter
    
    return student_appearances

In [None]:
#multiprocess function for given list of dfs
availability_time_list=multiprocess_functions(student_availability,df_dl)

In [None]:
#convert list from mp to dict
availability_time_dict=time_list_to_dict(availability_time_list,student_list)
availability_time_dict

In [None]:
#add dict to a df
dict_to_col(sample_df,availability_time_dict,'stu_availability_time')

# **Others Available** (presence of others in the space)

In [None]:
def others_present(df):
    student_list=['conner','rhea','yani','aashna','sara','chali','natalie','rachel','xiaoyi','hoa','melissa','denise',
 'rui','ji su','juan','rebecca','miaoya','helen']
    student_others={}
    for student in student_list:
        arrival_counter=0
        others_around=0
        curr_time=0
        last_time=0
        #creating df for each student
        df_stu = df.loc[(df['student_id']==student)]
        df_stu.drop_duplicates(subset=['timestamp'], inplace=True)
        df_stu = df_stu.reset_index(drop=True)
        df_stu['timestamp']=df_stu['timestamp'].map(lambda x:x[11:19])
        df_temp=df
        df_temp['timestamp']=df_temp['timestamp'].map(lambda x:x[11:19])
        
        for row in df_stu.to_dict('records'):
            curr_time=row['timestamp']
            if arrival_counter==0:
                arrival_counter=arrival_counter+1
                if len(df_temp.loc[df_temp['timestamp']==row['timestamp']].index)>1:
                    others_around=others_around+1
            if last_time!=0:
                if ((datetime.strptime(curr_time,'%H:%M:%S')-datetime.strptime(last_time,'%H:%M:%S'))/3600).seconds>=2:
                    arrival_counter=arrival_counter+1
                    if len(df_temp.loc[df_temp['timestamp']==row['timestamp']].index)>1:
                        others_around=others_around+1
            last_time=curr_time
        student_others[student]=others_around
    
    return student_others

In [None]:
#multiprocess function for given list of dfs
others_time_list=multiprocess_functions(others_present,sample_df_list)

In [None]:
#convert list from mp to dict
others_time_dict=time_list_to_dict(others_time_list,student_list)
others_time_dict

In [None]:
#add dict to a df
dict_to_col(sample_df,others_time_dict,'others_present_time')

# **Student Pair Collaboration**

In [None]:
def stu_pair_time(df):
    stu_pair_times={}
    for student in list(student_pairs.keys()):
        #creating df for each student
        df_stu = df.loc[(df['student_id']==student)]
        df_stu.drop_duplicates(subset=['timestamp'], inplace=True)
        df_stu = df_stu.reset_index(drop=True)

        #list to store each student's collab time w/ other students
        stu_pair_time=0
        other_stu=student_pairs[student]

        #get timestamps for student in focus
        stu_times=df_stu['timestamp'].tolist()

        #create temp df for other student
        df_other_temp=df.loc[(df['student_id']==other_stu)& (df['timestamp'].isin(stu_times))]
        df_other_temp.drop_duplicates(subset=['timestamp'],inplace=True)
        df_other_temp = df_other_temp.reset_index(drop=True)
        other_times_temp=df_other_temp['timestamp'].tolist()

        #create temp df for student in focus
        df_stu_temp = pd.DataFrame(df_stu.loc[df_stu['timestamp'].isin(other_times_temp)])
        df_stu_temp = df_stu_temp.reset_index(drop=True)

        #compare the x-y coords of noses of both students to establish proximity, range of +-1 within nose of other counts
        df_stu_temp['x_compare']=np.where((df_stu_temp['0_x']+1 > df_other_temp['0_x']) & (df_other_temp['0_x'] >= df_stu_temp['0_x']-1), True, False)
        df_stu_temp['y_compare']=np.where((df_stu_temp['0_y']+1 > df_other_temp['0_y']) & (df_other_temp['0_y'] >= df_stu_temp['0_y']-1), True, False)

        #get student & other student shoulders' slopes - (y2-y1) / (x2-x1)
        df_stu_temp['slope']=(df_stu_temp['6_y']-df_stu_temp['5_y'])/(df_stu_temp['6_x']-df_stu_temp['5_x'])
        df_other_temp['slope']=(df_other_temp['6_y']-df_other_temp['5_y'])/(df_other_temp['6_x']-df_other_temp['5_x'])

        #initialising 'turned-to-each-other' column of student in focus to False
        df_stu_temp['turned']=False

        #print('outer loop: '+ str(len(df_other_temp)))

        #comparing the slopes of students & instructors to see if turned towards each other
        for row in df_other_temp.itertuples():
            #print(str(row.Index))
            df_stu_temp['0_x'].iloc[row.Index]
            x_dist=abs((df_stu_temp.at[row.Index,'0_x'])-(df_other_temp.at[row.Index,'0_x']))
            y_dist=abs((df_stu_temp.at[row.Index,'0_y'])-(df_other_temp.at[row.Index,'0_y']))
            #checking whether people are adjacent along x or y axis 
            #(if dist. between x_coords is greater than dist. between y_coords - adjacent along x-axis & vice versa)
            #this check is needed to interpret how shoulder slopes would be if turned to each other
            if x_dist>y_dist:
                #checking who is on the left side of the other
                if df_stu_temp.at[row.Index,'0_x']<df_other_temp.at[row.Index,'0_x']:
                    #check if turned to each other
                    if (df_stu_temp.at[row.Index,'slope']<0) and (df_other_temp.at[row.Index,'slope']>0):
                        df_stu_temp.at[row.Index,'turned']=True
                    else:
                        df_stu_temp.at[row.Index,'turned']=False
                else:
                    if (df_stu_temp.at[row.Index,'slope']>0) and (df_other_temp.at[row.Index,'slope']<0):
                        df_stu_temp.at[row.Index,'turned']=True
                    else:
                        df_stu_temp.at[row.Index,'turned']=False
            else:
                #checking who is on the left side of the other
                if df_stu_temp.at[row.Index,'0_y']<df_other_temp.at[row.Index,'0_y']:
                    #check if turned to each other
                    if (df_stu_temp.at[row.Index,'slope']>0) and (df_other_temp.at[row.Index,'slope']<0):
                        df_stu_temp.at[row.Index,'turned']=True
                    else:
                        df_stu_temp.at[row.Index,'turned']=False
                else:
                    if (df_stu_temp.at[row.Index,'slope']<0) and (df_other_temp.at[row.Index,'slope']>0):
                        df_stu_temp.at[row.Index,'turned']=True
                    else:
                        df_stu_temp.at[row.Index,'turned']=False

        time_counts=len(df_stu_temp.loc[(df_stu_temp['x_compare']==True)& (df_stu_temp['y_compare']==True) & (df_stu_temp['turned']==True)])

        #print('time counts: '+ student + str(time_counts))
        stu_pair_time=(time_counts*0.67)
        #print('time counts: '+ student + str(time_counts))

        stu_pair_times[student]=stu_pair_time
        stu_pair_times[other_stu]=stu_pair_time
    return stu_pair_times

In [None]:
#multiprocess function for given list of dfs
pair_time_list=multiprocess_functions(stu_pair_time,sample_df_list)

In [None]:
#convert list from mp to dict
pair_time_dict=time_list_to_dict(pair_time_list,student_list)
pair_time_dict

In [None]:
#add dict to a df
dict_to_col(sample_df,pair_time_dict,'pair_time')

# **Student Pair Collaboration in Technical Area**

In [None]:
def stu_pair_time_tech(df):
    stu_pair_times={}
    for student in list(student_pairs.keys()):
        #creating df for each student
        df_stu = df.loc[(df['student_id']==student)]
        df_stu.drop_duplicates(subset=['timestamp'], inplace=True)
        df_stu = df_stu.loc[((df_stu['0_x']<1.2) | (df_stu['0_x']>6)) | ((df_stu['0_y']<1.5) | (df_stu['0_y']>11))]
        df_stu = df_stu.reset_index(drop=True)

        #list to store each student's collab time w/ other students
        stu_pair_time=0
        other_stu=student_pairs[student]

        #get timestamps for student in focus
        stu_times=df_stu['timestamp'].tolist()

        #create temp df for other student
        df_other_temp=df.loc[(df['student_id']==other_stu)& (df['timestamp'].isin(stu_times))]
        df_other_temp.drop_duplicates(subset=['timestamp'],inplace=True)
        df_other_temp = df_other_temp.reset_index(drop=True)
        other_times_temp=df_other_temp['timestamp'].tolist()

        #create temp df for student in focus
        df_stu_temp = pd.DataFrame(df_stu.loc[df_stu['timestamp'].isin(other_times_temp)])
        df_stu_temp = df_stu_temp.reset_index(drop=True)

        #compare the x-y coords of noses of both students to establish proximity, range of +-1 within nose of other counts
        df_stu_temp['x_compare']=np.where((df_stu_temp['0_x']+1 > df_other_temp['0_x']) & (df_other_temp['0_x'] >= df_stu_temp['0_x']-1), True, False)
        df_stu_temp['y_compare']=np.where((df_stu_temp['0_y']+1 > df_other_temp['0_y']) & (df_other_temp['0_y'] >= df_stu_temp['0_y']-1), True, False)

        #get student & other student shoulders' slopes - (y2-y1) / (x2-x1)
        df_stu_temp['slope']=(df_stu_temp['6_y']-df_stu_temp['5_y'])/(df_stu_temp['6_x']-df_stu_temp['5_x'])
        df_other_temp['slope']=(df_other_temp['6_y']-df_other_temp['5_y'])/(df_other_temp['6_x']-df_other_temp['5_x'])

        #initialising 'turned-to-each-other' column of student in focus to False
        df_stu_temp['turned']=False

        #print('outer loop: '+ str(len(df_other_temp)))

        #comparing the slopes of students & instructors to see if turned towards each other
        for row in df_other_temp.itertuples():
            #print(str(row.Index))
            df_stu_temp['0_x'].iloc[row.Index]
            x_dist=abs((df_stu_temp.at[row.Index,'0_x'])-(df_other_temp.at[row.Index,'0_x']))
            y_dist=abs((df_stu_temp.at[row.Index,'0_y'])-(df_other_temp.at[row.Index,'0_y']))
            #checking whether people are adjacent along x or y axis 
            #(if dist. between x_coords is greater than dist. between y_coords - adjacent along x-axis & vice versa)
            #this check is needed to interpret how shoulder slopes would be if turned to each other
            if x_dist>y_dist:
                #checking who is on the left side of the other
                if df_stu_temp.at[row.Index,'0_x']<df_other_temp.at[row.Index,'0_x']:
                    #check if turned to each other
                    if (df_stu_temp.at[row.Index,'slope']<0) and (df_other_temp.at[row.Index,'slope']>0):
                        df_stu_temp.at[row.Index,'turned']=True
                    else:
                        df_stu_temp.at[row.Index,'turned']=False
                else:
                    if (df_stu_temp.at[row.Index,'slope']>0) and (df_other_temp.at[row.Index,'slope']<0):
                        df_stu_temp.at[row.Index,'turned']=True
                    else:
                        df_stu_temp.at[row.Index,'turned']=False
            else:
                #checking who is on the left side of the other
                if df_stu_temp.at[row.Index,'0_y']<df_other_temp.at[row.Index,'0_y']:
                    #check if turned to each other
                    if (df_stu_temp.at[row.Index,'slope']>0) and (df_other_temp.at[row.Index,'slope']<0):
                        df_stu_temp.at[row.Index,'turned']=True
                    else:
                        df_stu_temp.at[row.Index,'turned']=False
                else:
                    if (df_stu_temp.at[row.Index,'slope']<0) and (df_other_temp.at[row.Index,'slope']>0):
                        df_stu_temp.at[row.Index,'turned']=True
                    else:
                        df_stu_temp.at[row.Index,'turned']=False

        time_counts=len(df_stu_temp.loc[(df_stu_temp['x_compare']==True)& (df_stu_temp['y_compare']==True) & (df_stu_temp['turned']==True)])

        #print('time counts: '+ student + str(time_counts))
        stu_pair_time=(time_counts*0.67)
        #print('time counts: '+ student + str(time_counts))

        stu_pair_times[student]=stu_pair_time
        stu_pair_times[other_stu]=stu_pair_time
    return stu_pair_times

In [None]:
#multiprocess function for given list of dfs
pair_techtime_list=multiprocess_functions(stu_pair_time_tech,sample_df_list)

In [None]:
#convert list from mp to dict
pair_techtime_dict=time_list_to_dict(pair_techtime_list,student_list)
pair_techtime_dict

In [None]:
#add dict to a df
dict_to_col(sample_df,pair_techtime_dict,'pair_tech_time')

# **Student Pair Time in Collab Area**

In [None]:
def stu_pair_time_collab(df):
    stu_pair_times={}
    for student in list(student_pairs.keys()):
        #creating df for each student
        df_stu = df.loc[(df['student_id']==student)]
        df_stu.drop_duplicates(subset=['timestamp'], inplace=True)
        df_stu = df_stu.loc[((df_stu['0_x']>1.2) | (df_stu['0_x']<6)) | ((df_stu['0_y']>1.5) | (df_stu['0_y']<11))]
        df_stu = df_stu.reset_index(drop=True)

        #list to store each student's collab time w/ other students
        stu_pair_time=0
        other_stu=student_pairs[student]

        #get timestamps for student in focus
        stu_times=df_stu['timestamp'].tolist()

        #create temp df for other student
        df_other_temp=df.loc[(df['student_id']==other_stu)& (df['timestamp'].isin(stu_times))]
        df_other_temp.drop_duplicates(subset=['timestamp'],inplace=True)
        df_other_temp = df_other_temp.reset_index(drop=True)
        other_times_temp=df_other_temp['timestamp'].tolist()

        #create temp df for student in focus
        df_stu_temp = pd.DataFrame(df_stu.loc[df_stu['timestamp'].isin(other_times_temp)])
        df_stu_temp = df_stu_temp.reset_index(drop=True)

        #compare the x-y coords of noses of both students to establish proximity, range of +-1 within nose of other counts
        df_stu_temp['x_compare']=np.where((df_stu_temp['0_x']+1 > df_other_temp['0_x']) & (df_other_temp['0_x'] >= df_stu_temp['0_x']-1), True, False)
        df_stu_temp['y_compare']=np.where((df_stu_temp['0_y']+1 > df_other_temp['0_y']) & (df_other_temp['0_y'] >= df_stu_temp['0_y']-1), True, False)

        #get student & other student shoulders' slopes - (y2-y1) / (x2-x1)
        df_stu_temp['slope']=(df_stu_temp['6_y']-df_stu_temp['5_y'])/(df_stu_temp['6_x']-df_stu_temp['5_x'])
        df_other_temp['slope']=(df_other_temp['6_y']-df_other_temp['5_y'])/(df_other_temp['6_x']-df_other_temp['5_x'])

        #initialising 'turned-to-each-other' column of student in focus to False
        df_stu_temp['turned']=False

        #print('outer loop: '+ str(len(df_other_temp)))

        #comparing the slopes of students & instructors to see if turned towards each other
        for row in df_other_temp.itertuples():
            #print(str(row.Index))
            df_stu_temp['0_x'].iloc[row.Index]
            x_dist=abs((df_stu_temp.at[row.Index,'0_x'])-(df_other_temp.at[row.Index,'0_x']))
            y_dist=abs((df_stu_temp.at[row.Index,'0_y'])-(df_other_temp.at[row.Index,'0_y']))
            #checking whether people are adjacent along x or y axis 
            #(if dist. between x_coords is greater than dist. between y_coords - adjacent along x-axis & vice versa)
            #this check is needed to interpret how shoulder slopes would be if turned to each other
            if x_dist>y_dist:
                #checking who is on the left side of the other
                if df_stu_temp.at[row.Index,'0_x']<df_other_temp.at[row.Index,'0_x']:
                    #check if turned to each other
                    if (df_stu_temp.at[row.Index,'slope']<0) and (df_other_temp.at[row.Index,'slope']>0):
                        df_stu_temp.at[row.Index,'turned']=True
                    else:
                        df_stu_temp.at[row.Index,'turned']=False
                else:
                    if (df_stu_temp.at[row.Index,'slope']>0) and (df_other_temp.at[row.Index,'slope']<0):
                        df_stu_temp.at[row.Index,'turned']=True
                    else:
                        df_stu_temp.at[row.Index,'turned']=False
            else:
                #checking who is on the left side of the other
                if df_stu_temp.at[row.Index,'0_y']<df_other_temp.at[row.Index,'0_y']:
                    #check if turned to each other
                    if (df_stu_temp.at[row.Index,'slope']>0) and (df_other_temp.at[row.Index,'slope']<0):
                        df_stu_temp.at[row.Index,'turned']=True
                    else:
                        df_stu_temp.at[row.Index,'turned']=False
                else:
                    if (df_stu_temp.at[row.Index,'slope']<0) and (df_other_temp.at[row.Index,'slope']>0):
                        df_stu_temp.at[row.Index,'turned']=True
                    else:
                        df_stu_temp.at[row.Index,'turned']=False

        time_counts=len(df_stu_temp.loc[(df_stu_temp['x_compare']==True)& (df_stu_temp['y_compare']==True) & (df_stu_temp['turned']==True)])

        #print('time counts: '+ student + str(time_counts))
        stu_pair_time=(time_counts*0.67)
        #print('time counts: '+ student + str(time_counts))

        stu_pair_times[student]=stu_pair_time
        stu_pair_times[other_stu]=stu_pair_time
    return stu_pair_times

In [None]:
#multiprocess function for given list of dfs
pair_collabtime_list=multiprocess_functions(stu_pair_time_collab,sample_df_list)

In [None]:
#convert list from mp to dict
pair_collabtime_dict=time_list_to_dict(pair_collabtime_list,student_list)
pair_collabtime_dict

In [None]:
#add dict to a df
dict_to_col(sample_df,pair_collabtime_dict,'pair_collabtime_time')

# Correlations

In [None]:
sid_summary_df.columns

In [None]:
# load the script for generating correlation heatmaps
script_heatmap = os.path.join(base_path, 'Analysis', 'scripts', 'heatmap.py')
%run "$script_heatmap"

In [None]:
script_heatmap

In [None]:
# define our predictors (rows) and outcomes (columns) - only including those predictors relevant for student-student collaboration
predictors = ['hour', 'collaboration', 'laser', 'nothing','office', 'printer', 'sewing', 'soldering', 'tool','stu_student_time','stu_pair_time', 'stu_pair_tech', 'stu_pair_collab','stu_availability', 'others_present', 'stu_availability_dl', 'stu_availability_ndl', 'stupairtechtime_dl', 'stupairtechtime_ndl', 'stupaircollabtime_dl', 'stupaircollabtime_ndl']
outcomes = ['mid_gain_se', 'mid_gain_com', 'enjoyment', 'stress_level','mid_gain_se_norm', 'mid_gain_com_norm', 'enjoyment_norm','stress_level_norm', 'score']

compute_correlation(sid_summary_df, predictors, outcomes)