# Preparar Dataset

Este notebook se encarga de extrar la información almacenada en los Excel de la carpeta `data/GAM/Result ResArtEmotion`. El resultado será un fichero json con los datos personales de los usuarios, y otro fichero json con las respuestas que ha hecho cada usuario a las obras que se muestran en la encuesta.

In [1]:
import pandas as pd

In [2]:
# Leemos los datos personales de los usuarios

path = '../../data/GAM/ResArtEmotion/Personal data.xlsx'

personal_data_it_df = pd.read_excel(path, sheet_name="IT")
personal_data_it_df.rename(columns={'How would you define your relationship with art?': 'How would you define your relationship with Art?'}, inplace=True)

personal_data_en_df = pd.read_excel(path, sheet_name="EN")
personal_data_es_df = pd.read_excel(path, sheet_name="ES")
personal_data_he_df = pd.read_excel(path, sheet_name="HE")
personal_data_fi_df = pd.read_excel(path, sheet_name="FI")

In [3]:
personal_data_df = pd.concat([personal_data_it_df, personal_data_en_df, personal_data_es_df, personal_data_he_df, personal_data_fi_df])
personal_data_df.head()

Unnamed: 0,Gender,Age,How would you define your relationship with Art?,Do you like going to museums or art exhibitions?
0,Male,53,I am passionate about the art,I go occasionally to museums or art exhibitions
1,Female,23,I am a little interested in art,I go occasionally to museums or art exhibitions
2,Male,55,I am a little interested in art,I go occasionally to museums or art exhibitions
3,Male,38,I am a little interested in art,I rarely visit museums or art exhibitions
4,Female,54,I am passionate about the art,I go occasionally to museums or art exhibitions


In [4]:
new_columns = {
    'How would you define your relationship with Art?': 'Art Relationship',
    'Do you like going to museums or art exhibitions?': 'Visit museums'
}

personal_data_df.rename(columns=new_columns, inplace=True)

In [5]:
# Cambiamos las respuestas por valores numéricos:

relationship_with_art = {
    'My job is related to the art world': 3,
    'I am passionate about the art': 2,
    'I am a little interested in art': 1,
    'I am not interested in art': 0
}

visit_museums = {
    'I like to visit museums frequently': 2,
    'I go occasionally to museums or art exhibitions': 1,
    'I rarely visit museums or art exhibitions': 0
}

In [6]:
personal_data_df.groupby(by='Art Relationship').count()

Unnamed: 0_level_0,Gender,Age,Visit museums
Art Relationship,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
I am a little interested in art,39,39,39
"I am a little interested in art, I am not interested in art",1,1,1
I am not interested in art,4,4,4
I am passionate about the art,41,39,40
"I am passionate about the art, I am a little interested in art",1,1,1
My job is related to the art world,24,23,24
"My job is related to the art world, I am a little interested in art",1,1,1
"My job is related to the art world, I am passionate about the art",14,14,14
"My job is related to the art world, I am passionate about the art, I am passionate about the art",1,1,1


In [7]:
def change_art_relationship(row):
    if isinstance(row['Art Relationship'], str):
        options = row['Art Relationship'].split(', ')
        return [relationship_with_art[o] for o in options]
    else:
        return []

def change_visit_museums(row):
    if isinstance(row['Visit museums'], str):
        return visit_museums[row['Visit museums']]
    else:
        return -1

In [8]:
personal_data_df['Art Relationship'] = personal_data_df.apply(lambda row: change_art_relationship(row), axis=1)
personal_data_df['Visit museums'] = personal_data_df.apply(lambda row: change_visit_museums(row), axis=1)

In [9]:
# Almacenamos estos datos en un fichero csv
personal_data_df.to_csv('../../data/GAM/users.csv', index=False)