# Social Follow Up Analysis

The first study, called `social-follow-up`, first replicated the results of the first study on a different set of randomly chosen question. We then extend the questions asked in the first study to explore where the social influence on participants curiousity were reflected in their desire to the see the answers of questions.

In [1]:
%load_ext pycodestyle_magic

In [2]:
# Analytical Tools
import numpy as np
import pandas as pd
import scipy
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats

# Utilities
import math
import json
import pprint
import utilities.processing as processing

# Make printing much more convenient
log = pprint.pprint

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

### Loading Data

In [10]:
FILE_NAMES = [
    'raw-data/question-setup-social-follow-up.json',
    'raw-data/social-follow-up-entries.json'
]

with open(FILE_NAMES[0]) as file:
    literals = json.load(file)

q_text = literals['questions_and_answers']['questions']
j_text = literals['judgements']
QUESTIONS = {ques: 'q' + str(num) for num, ques in enumerate(q_text)}
JUDGEMENTS = {judge: 'j' + str(num) for num, judge in enumerate(j_text)}

In [11]:
with open(FILE_NAMES[1]) as file:
    master_responses = [json.loads(line) for line in file if line]
# Legacy, changing variable name
real_responses = master_responses
len(real_responses)

110

### Reading Responses into Data
Creates a `DataFrame` based on the survey data.

In [12]:
# Create dictionary to represent future DataFrame
num_questions = len(QUESTIONS)
num_judgements = len(JUDGEMENTS)
col_labels = processing.get_col_labels(num_questions,
                                       num_judgements,
                                       choice=True)
data = {label: [] for label in col_labels}

In [13]:
processing.fill_social_follow_up_data(data,
                                      real_responses,
                                      QUESTIONS,
                                      JUDGEMENTS)

In [14]:
data = pd.DataFrame(data)
log(data.size)
data.head()

7920


Unnamed: 0,condition,consent,q0choice,q0j0,q0j1,q0j2,q0j3,q0j4,q0score,q1choice,...,q8j3,q8j4,q8score,q9choice,q9j0,q9j1,q9j2,q9j3,q9j4,q9score
0,B,1,1,5,4,5,2,3,22,1,...,6,6,3375,0,4,4,2,4,4,3370
1,B,1,0,0,0,0,4,0,5,0,...,3,5,3375,1,0,0,0,3,5,3394
2,A,1,0,3,3,4,4,4,3365,1,...,3,3,38,1,3,2,5,4,5,36
3,B,1,1,3,0,0,4,0,36,0,...,5,5,3364,1,1,0,0,3,3,3365
4,B,1,0,0,0,3,5,1,40,0,...,2,2,3359,0,2,0,4,6,6,3360


### Analysizing Data

We first split the data into two groups, one corresponding to those which received high scores for group A and one for those who received higher scores for group B. The groups themselves are in the constant section for reference, where the first five entries corresponding to group A and the last five to group B.

In [15]:
# Remove participants without consent
data = data[data.consent == 1]
# Seperate into dataframes for each condition!
high_a = data[data.condition == 'A']
high_b = data[data.condition == 'B']
a_size, b_size = len(high_a), len(high_b)
a_size, b_size

(59, 51)