# Exploratory CSCW Gig
Having a look at some of the data from the independent condition in the CSCW paper.
This notebook loads the files generated from the log files, reports percentages, etc.

## States Considered
* **Query counts** are approximated by the first hit on a `QUERYSUGGESTIONS_GET` occurrence. 

## Data
* `full_session` considered the entire search session, over all groups (individuals), over each topic.

In [1]:
import pandas as pd

In [108]:
path_full_session = 'data/fullsession.csv'
path_five_session = 'data/first5minutes.csv'
path_second_five_session = 'data/second5minutes.csv'

In [109]:
full_df = pd.read_csv(path_full_session)
first5_df = pd.read_csv(path_five_session)
second5_df = pd.read_csv(path_second_five_session)


## Compute Raw Hover Percentages
Computing this value by summing over all observations, then dividing by the total sum.

In [110]:
def compute_raw_percentages(totals_df, df, name):
    """
    Adds the percentage columns to the given DataFrame for raw hover counts.
    """
    df['raw_sum'] = df['raw_query'] + df['raw_results'] + df['raw_recent'] + df['raw_saved']
    
    totals_df = totals_df.append({'name': name,
                                  'raw_sum': df['raw_sum'].sum(),
                                  'raw_query_sum': df['raw_query'].sum(),
                                  'raw_results_sum': df['raw_results'].sum(),
                                  'raw_recent_sum': df['raw_recent'].sum(),
                                  'raw_saved_sum': df['raw_saved'].sum(),
                                  
                                  'raw_query_percentage': df['raw_query'].sum() / df['raw_sum'].sum(),
                                  'raw_results_percentage': df['raw_results'].sum() / df['raw_sum'].sum(),
                                  'raw_recent_percentage': df['raw_recent'].sum() / df['raw_sum'].sum(),
                                  'raw_saved_percentage': df['raw_saved'].sum() / df['raw_sum'].sum(),
                                 },
                                 ignore_index=True)
    
    return totals_df
    
totals_df = pd.DataFrame(columns=['name', 'raw_sum', 'raw_query_sum', 'raw_results_sum', 'raw_recent_sum', 'raw_saved_sum', 'raw_query_percentage', 'raw_results_percentage', 'raw_recent_percentage', 'raw_saved_percentage'])

# For each dataset, work out the percentages by calling compute_raw_percentages()
totals_df = compute_raw_percentages(totals_df, full_df, 'full')
totals_df = compute_raw_percentages(totals_df, first5_df, 'first5')
totals_df = compute_raw_percentages(totals_df, second5_df, 'second5')


In [111]:
# Drop the raw values, show only the percentages.
percentages_df = totals_df.drop(columns=['raw_query_sum', 'raw_results_sum', 'raw_recent_sum', 'raw_saved_sum'])
percentages_df

Unnamed: 0,name,raw_sum,raw_query_percentage,raw_results_percentage,raw_recent_percentage,raw_saved_percentage
0,full,3333,0.247225,0.448845,0.125713,0.178218
1,first5,654,0.366972,0.460245,0.084098,0.088685
2,second5,1246,0.244783,0.453451,0.142857,0.158909


## Output Raw Hover Percentages for each Dataset

In [26]:
full = full_df[['raw_sum', 'raw_query_fraction', 'raw_results_fraction', 'raw_recent_fraction', 'raw_saved_fraction']]
