**Description**: This notebook is designed to take a clustered dataset and create its melted version for visualization purposes. The primary goal is to transform the wide-format dataset into a long-format version, which is essential for various types of data visualizations.

The notebook begins by loading the clustered dataset and ensuring it is properly formatted. The melting process involves converting the dataset from a wide format, where each variable is in a separate column, to a long format, where variables are stored as key-value pairs. This transformation is particularly useful for creating plots and charts that require a tidy data structure, such as those used in libraries like Seaborn and Matplotlib.

By melting the dataset, each observation is reshaped into a more flexible format that facilitates the creation of comprehensive and insightful visualizations. This allows for a better understanding of the data distribution and patterns within each cluster. The resulting long-format dataset can be easily used to generate various visualizations, such as bar plots, line charts, and heatmaps, providing a clear and concise representation of the data.

Overall, this notebook serves as a tool for transforming clustered datasets into a format that enhances visualization capabilities, making it easier to interpret and present the data effectively.

In [1]:
import pandas as pd

In [2]:
original = pd.read_excel('datasets/FoodInsecurity_Hispanic_Demographics_Tone_Preferences_Dataset.xlsx')

In [3]:
original.head(2)

Unnamed: 0,age,gender,ethnicity,race,education,marital_status,income,employment,language,disability,states,sample_1,sample_2,sample_3,sample_4,sample_5,sample_6,sample_7,sample_8
0,45-54,female,non hispanic,native american,High School,na,"$25,000 - $49,999",Employed Part time,both,i do not have a disability,indiana,Persuasive,Simplier,Empathetic,Persuasive,Original,Original,Persuasive,Original
1,18-24,male,hispanic,white,High School,single,"Less than $25,000",Employed Part time,english,i do not have a disability,illinois,Original,Simplier,Empathetic,Simplier,Simplier,Original,Original,Persuasive


In [4]:
original.columns

Index(['age', 'gender', 'ethnicity', 'race', 'education', 'marital_status',
       'income', 'employment', 'language', 'disability', 'states', 'sample_1',
       'sample_2', 'sample_3', 'sample_4', 'sample_5', 'sample_6', 'sample_7',
       'sample_8'],
      dtype='object')

In [5]:
tones = original[['sample_1','sample_2', 'sample_3', 'sample_4', 'sample_5', 'sample_6', 'sample_7','sample_8']]

In [6]:
tones.head(2)

Unnamed: 0,sample_1,sample_2,sample_3,sample_4,sample_5,sample_6,sample_7,sample_8
0,Persuasive,Simplier,Empathetic,Persuasive,Original,Original,Persuasive,Original
1,Original,Simplier,Empathetic,Simplier,Simplier,Original,Original,Persuasive


## PCA

In [12]:
df = pd.read_excel('datasets/pca-dem-clusters.xlsx')

In [13]:
df[['sample_1','sample_2', 'sample_3', 'sample_4', 'sample_5', 'sample_6', 'sample_7','sample_8']] = tones

In [14]:
id_vars = df.columns.difference(['sample_1', 'sample_2', 'sample_3', 'sample_4', 'sample_5', 'sample_6', 'sample_7', 'sample_8'])

In [15]:
# Melt dataframe
pca_graphs = pd.melt(df, id_vars=id_vars, value_vars=['sample_1', 'sample_2', 'sample_3', 'sample_4', 'sample_5', 'sample_6', 'sample_7', 'sample_8'],value_name='tone')
pca_graphs.drop('variable',axis=1, inplace=True)

In [16]:
pca_graphs.to_excel('datasets/pca-dem-clusters.xlsx')

## TSNE

In [17]:
df_tsne = pd.read_excel('datasets/tsne-dem-clusters.xlsx')

In [18]:
df_tsne[['sample_1','sample_2', 'sample_3', 'sample_4', 'sample_5', 'sample_6', 'sample_7','sample_8']] = tones

In [19]:
id_vars = df_tsne.columns.difference(['sample_1', 'sample_2', 'sample_3', 'sample_4', 'sample_5', 'sample_6', 'sample_7', 'sample_8'])

In [20]:
# Melt dataframe
tsne_graphs = pd.melt(df_tsne, id_vars=id_vars, value_vars=['sample_1', 'sample_2', 'sample_3', 'sample_4', 'sample_5', 'sample_6', 'sample_7', 'sample_8'],value_name='tone')
tsne_graphs.drop('variable',axis=1, inplace=True)

In [21]:
tsne_graphs.to_excel('datasets/tsne-dem-clusters.xlsx', index=False)

## UMAP

In [22]:
df_umap = pd.read_excel('datasets/umap-dem-clusters.xlsx')

In [23]:
df_umap[['sample_1','sample_2', 'sample_3', 'sample_4', 'sample_5', 'sample_6', 'sample_7','sample_8']] = tones

In [24]:
id_vars = df_umap.columns.difference(['sample_1', 'sample_2', 'sample_3', 'sample_4', 'sample_5', 'sample_6', 'sample_7', 'sample_8'])

In [25]:
# Melt dataframe
umap_graphs = pd.melt(df_umap, id_vars=id_vars, value_vars=['sample_1', 'sample_2', 'sample_3', 'sample_4', 'sample_5', 'sample_6', 'sample_7', 'sample_8'],value_name='tone')
umap_graphs.drop('variable',axis=1, inplace=True)

In [26]:
umap_graphs.to_excel('datasets/umap-dem-clusters.xlsx', index=False)