# Final Project

## Framing

**Introduction**: describe your dataset, and why you're interested in it

In this dataset, Kinect sensors captured students' body postures, location, and gestures in a makerspace over the period of a 13-weeks semester, recording nearly half a million observations from 16 students enrolled in a class. 

I am interested in this dataset because I am currently involved in the next iteration of the Makerspace project and it would be good to gain more familiarity with the dataset by analyzing it in this final project. 

**Research question(s)**: describe the overall research question of your project

Can we generate “profiles” (or personas) for students, based on their behavior in the space?

**Hypotheses**:
    * Describe 2-3 hypotheses that you're planning to test with your dataset
    * Each hypoteses should be based on academic research (cite a paper) and/or background knowledge that you have about the dataset if you've collected it yourself (e.g., if you've conducted interviews)
    * Each hypotheses should be formulated as an affirmation (and not a question)
    * You can also describe alternative hypotheses, if you think that your results could go either way (but again, have a rationale as for why)

    
**Hypothesis 1**
Higher velocity movements are correlated with personality traits such as extraversion and openness. 

**Hypothesis 2**
Students who lean towards their partners when collaborating within the makerspace correlates more with the personality trait of agreebleness, while students who lean away correlates more with neuroticism. 


**Papers**
Wache, J. (2014) The secret language of our body - Affect and personality recognition using physiological signals. In Proceedings of the 16th International Conference on Multimodal Interaction (pp. 389-393). ICMI 

McCrae, R.R. & John, O.P. (1992) An introduction to the five- factor model and its applications. Journal of personality 60(2), 175–215.

Srivastava, R., Feng, J., Roy, S., Sim, T., and Yan, S. Don’t Ask Me What I’m Like, Just Watch and Listen. Proceedings of the 20th ACM international conference on Multimedia. ACM, 2012., (2012), 329–338.


**Results**:
    * how are you planning to test each hypothesis? What models are you thinking of using?
I am planning to use clustering methods to group students based on the characteristics of the body movement (eg low or high velocity, lean towards or lean away) and use deep learning models to predict their personality traits.

    * what are the best results you can hope for? Is that interesting / relevant for other researchers?
The best results would be a strong correlation between the students' body language and personality traits. This would be interesting for other researchers as personality profiling could lead to other areas of research such as personalized interventions and quality of social interactions.

    * what are implications of your potential findings for practioners? 
Personality traits could be inferred from body language data without the traditional use of surveys. This presents practioners a way of understanding their students to cater to their individual learning needs.

**Threads**
    * Describe issues that might arise during the analyses above
It might be possible that we may not derive clear clusters that separate the students or we may also be looking at the incorrect characteristics of the body movement to inform us of the personality traits.

    * Come up with backup plans in case you run into theses issues
Think of alternative characteristics of the students' body movement or allow the models to uncover the relevant characteristics without making any prior assumptions.

## Data Exploration

Describe your raw data below; provide definition / explanations for the measures you're using

The raw data consists of student information (eg person_id, name), skeletal joint data, facial expressions (eg isSmiling) and collection information (eg timestamp, kinect_id). For the purpose of this investigation, we will drop information related to facial expressions as 1) these variables are dichotomous 2) they contain a limited range of facial expressions.

## Data Cleaning

Clean you data in this section, and make sure it's ready to be analyzed for next week!

In [1]:
import os

files = []

for file in os.listdir('./dataset'):
    if file.endswith('.csv'):
        files.append(file)

print(files)

['6.csv', '7.csv', '5.csv', '4.csv', '3.csv', '2.csv', '10.csv', '11.csv', '13.csv', '12.csv', '9.csv']


In [2]:
os.mkdir('./data_cleaned')
os.listdir('./')

['.DS_Store',
 'dataset',
 'data_cleaned',
 'Wee9-Final-Project.ipynb',
 '.ipynb_checkpoints',
 '.git']

In [3]:
import pandas as pd

for file in os.listdir('./dataset'):
    if file.endswith('.csv'):
        file_input = os.path.join('./dataset',file)
        df = pd.read_csv(file_input,parse_dates=True,index_col='timestamp')
        cleaned = df.drop([df.columns[0],'name_aga_conf','confidence_value','name_aga_freq','frequency','freq_count',
                           'isTalking','isWearingGlasses','isSmiling','leftHandRaised','rightHandRaised',
                           'ip','et_timestamp','skeleton','face_detected'], axis=1)
        cleaned.dropna(inplace=True)
        file_output = os.path.join('./data_cleaned',file)
        cleaned.to_csv(file_output)
        
print(os.listdir('./data_cleaned'))

['6.csv', '7.csv', '5.csv', '4.csv', '3.csv', '2.csv', '10.csv', '11.csv', '13.csv', '12.csv', '9.csv']
