# Radar chart - Data generation

> The main goal of this notebook is to preprocess data so that it fits the required format when uploading to `scriptRadar.js` and `radarChart.js`.

**Libraries**

In [1]:
import pandas as pd
import json
import numpy as np

**Functions**

In [7]:
def attributes_by_gender(df, attributes):
    ''' Given a list of attribues, computes mean of each attribute values given during questionnaries, by gender.
    
        Parameters 
        -----------
            df : [DataFrame] : Initial df containing iid, wave, gender and attribute list
            attributes : [List] : List containing column names for attributes to be grouped
            
        Output
        -----------
            DataFrame by gender, containing mean values for each attribute in "attributes" list '''
    
    df_temp = df[['iid', 'wave', 'gender'] + attributes]
    df_res = df_temp.drop_duplicates().dropna() # Remove duplicates as each candidate has several entries in initial dataframe
    df_res = df_res[['gender'] + attributes].groupby('gender').mean()
    df_res = df_res.reset_index()
    
    return df_res

### 1. Load Data

In [3]:
# Path to data
path = '../data'

In [4]:
df = pd.read_csv(f'{path}/SpeedDating.csv', encoding='latin1')
df.head()

Unnamed: 0,iid,id,gender,idg,condtn,wave,round,position,positin1,order,...,attr3_3,sinc3_3,intel3_3,fun3_3,amb3_3,attr5_3,sinc5_3,intel5_3,fun5_3,amb5_3
0,1,1.0,0,1,1,1,10,7,,4,...,5.0,7.0,7.0,7.0,7.0,,,,,
1,1,1.0,0,1,1,1,10,7,,3,...,5.0,7.0,7.0,7.0,7.0,,,,,
2,1,1.0,0,1,1,1,10,7,,10,...,5.0,7.0,7.0,7.0,7.0,,,,,
3,1,1.0,0,1,1,1,10,7,,5,...,5.0,7.0,7.0,7.0,7.0,,,,,
4,1,1.0,0,1,1,1,10,7,,7,...,5.0,7.0,7.0,7.0,7.0,,,,,


### 2. Preprocessing

Data must respect a specific shape to be processed into **radar chart** in javascript.  
Below, we computes mean values for following attributes :  
* `attr` : Attractiveness  
* `sinc` : Sincerity  
* `intel` : Intelligence  
* `fun` : Fun  
* `amb` : Ambition  

These values are computed at 3 different steps in the Speed Dating process : **before**, **during** and **after** it.  

Four questions are answered by candidates (for each one of them, candidates chose corresponding values for attributes). Refer to [official documentation](https://perso.telecom-paristech.fr/eagan/class/igr204/data/SpeedDatingKey.pdf) for more information about these questions.  
In the table below, these questions are encoded from 1 to 4.  

**Note** : The `question` column in result table is encoded. Its values are composed by 2 elements :  
* first digit : number of question (from 1 to 4)  
* second digit : time at which question was asked (from 1 to 3)

In [30]:
key = ['iid', 'wave', 'gender']
name_attributes = ['attr', 'sinc', 'intel', 'fun', 'amb'] # Names of attributes to deal with
num_times = 3
num_questions = 4

# Initial radar DataFrame
df_radar = pd.DataFrame([])

for time in range(num_times):
    
    for question in range(num_questions):
        
        list_attributes = []

        for attr in name_attributes:
            list_attributes.append(f'{attr}{question+1}_{time+1}')

        # Preprocess data : computes mean of attributes by gender
        df_res = attributes_by_gender(df, list_attributes)

        # Preprocess data : computes mean of attributes in total
        df_res_tot = pd.DataFrame(df_res.mean(axis=0)).T

        # Add information about question and time in dataframes
        df_res['question'] = int(str(question+1) + str(time+1))
        df_res_tot['question'] = int(str(question+1) + str(time+1))
        df_res.columns = ['gender'] + name_attributes +  ['question']
        df_res_tot.columns = ['gender'] + name_attributes +  ['question']

        # Concat results in radar DataFrame
        df_radar = pd.concat([df_radar, df_res, df_res_tot], axis=0)

In [32]:
df_radar.head()

Unnamed: 0,gender,attr,sinc,intel,fun,amb,question
0,0.0,18.020372,18.22223,20.971004,17.299108,12.818476,11
1,1.0,27.008864,16.389707,19.41956,17.592051,8.823956,11
0,0.5,22.514618,17.305969,20.195282,17.44558,10.821216,11
0,0.0,35.600632,11.284535,12.478439,19.051636,9.114387,21
1,1.0,24.884526,15.108467,16.35427,18.044416,14.357482,21


### 3. Save data

In [33]:
preprocessed_path = '../webapp/preprocessed_data'

In [35]:
df_radar.to_csv(f'{preprocessed_path}/radar_all.csv')