The paper [Exploring Chart Choices for High Dimensional Projections](https://) discusses the need for alternative ways to think about visualizing Multi-Dimensional Projection(MDP). This project shows the alternative chart types for high-dimensional projections using three different use cases. There exist complex relationships between data instances, therefore there is need to explore different design-related items to make clear these complexities. The goal of the project is not to propose new visualization rather to explore the design space of alternatives chart types.

</br>

## Use case 1:  Class Separation 
#### Class separation through visualization
In this notebook, we start exploring class separation using visualization techniques, a crucial step in understanding how different categories or classes within a dataset relate to one another. Class separation is essential for assessing the effectiveness of classification models, as it provides insights into the distinguishability of classes based on their features.

##### Dataset Description

1. Title: Wisconsin Breast Cancer Database (January 8, 1991)

2. Sources:
   -- Dr. WIlliam H. Wolberg (physician)
      University of Wisconsin Hospitals
      Madison, Wisconsin
      USA
   -- Donor: Olvi Mangasarian (mangasarian@cs.wisc.edu)
      Received by David W. Aha (aha@cs.jhu.edu)
   -- Date: 15 July 1992



<a id='top'></a>
<div class="list-group" id="list-tab" role="tablist">
<h3 data-toggle="list"  role="tab" aria-controls="home"><p style="font-size : 30px"><font color="darkgrey">Content<font/></p></h3>

1. [<font color="darkgrey"> Dataset<font/>](#1)
    - 1.1 [<font color="darkgrey"> Overview<font/>](#1.1)
    - 1.2 [<font color="darkgrey"> Preprocessing<font/>](#1.2) 
    - 1.3 [<font color="darkgrey"> Dimensionality Reduction<font/>](#1.3) 

2. [<font color="darkgrey">Visual Perspectives<font/>](#2)   
    - 2.1. [<font color="darkgrey"> One Dimension (1D)<font/>](#2.1)
        - 2.1.1 [<font color="darkgrey"> 1D Strip plot <font/>](#2.1.1)
        - 2.1.2 [<font color="darkgrey"> Box Plot <font/>](#2.1.2)
        - 2.1.3 [<font color="darkgrey"> Histogram <font/>](#2.1.3)
        - 2.1.4 [<font color="darkgrey"> Violin Plot <font/>](#2.1.4)
    - 2.2. [<font color="darkgrey"> Two Dimensions (2D)<font/>](#2.2)
        - 2.2.1 [<font color="darkgrey">Scatterplot<font/>](#2.2.1)
        - 2.2.2 [<font color="darkgrey"> Contour<font/>](#2.2.2)
        - 2.2.3 [<font color="darkgrey"> Heatmap<font/>](#2.2.3)
        - 2.2.4 [<font color="darkgrey"> Dendrogram <font/>](#2.2.4)
        - 2.2.5 [<font color="darkgrey"> Parallel Coordinate Plot <font/>](#2.2.5)

    

<font size="+3" color="grey"><b>1. Dataset </b></font><br><a id="1"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>

In [1]:
# Import packages data overview
import pandas as pd

# set package env
import sys, os
# Move one step up from the current working directory
sys.path.append(os.path.abspath(os.path.join(os.getcwd(), '..')))
# Create a directory
output_dir = "images"
os.makedirs(output_dir, exist_ok=True)

# Replace with the path to your .data file
data_file_path = '../Data/Dataset_2/Breast_Cancer_Wisconsin_Original/breast-cancer-wisconsin.data'

# Define custom column headers (Replace these with your actual headers)
column_headers = [
    "Sample code number", "Clump Thickness", "Uniformity of Cell Size", 
    "Uniformity of Cell Shape", "Marginal Adhesion", "Bare Nuclei", 
    "Bland Chromatin", "Single Epithelial Cell Size", "Normal Nucleoli", 
    "Mitoses", "Class"
]

# Read the .data file and set the headers
df = pd.read_csv(data_file_path,  delimiter=',', header=None, names=column_headers)

<font size="+2" color="grey"><b>1.1. Overview </b></font><br>

In [2]:
df

Unnamed: 0,Sample code number,Clump Thickness,Uniformity of Cell Size,Uniformity of Cell Shape,Marginal Adhesion,Bare Nuclei,Bland Chromatin,Single Epithelial Cell Size,Normal Nucleoli,Mitoses,Class
0,1000025,5,1,1,1,2,1,3,1,1,2
1,1002945,5,4,4,5,7,10,3,2,1,2
2,1015425,3,1,1,1,2,2,3,1,1,2
3,1016277,6,8,8,1,3,4,3,7,1,2
4,1017023,4,1,1,3,2,1,3,1,1,2
...,...,...,...,...,...,...,...,...,...,...,...
694,776715,3,1,1,1,3,2,1,1,1,2
695,841769,2,1,1,1,2,1,1,1,1,2
696,888820,5,10,10,3,7,3,8,10,2,4
697,897471,4,8,6,4,3,4,10,6,1,4


<font size="+2" color="grey"><b>1.2. Preprocessing </b></font><br>

In [4]:
from Implementations.imputation import Preprocessor
# Initialize the Preprocessor class with the dataset
preprocessor = Preprocessor(df)
# We have decided to excempt the target variable from being preprocessed
exempt_columns = ['Class']
processed_data = preprocessor.preprocess(
    strategy='mean', 
    remove_missing=False, 
    exempt_columns=exempt_columns
)

<font size="+2" color="grey"><b>1.3. Dimensionality Reduction </b></font><br>

In [5]:
from Implementations.dimensionality_reduction import DimensionalityReduction
import pandas as pd
# Initialize DimensionalityReduction class
dr = DimensionalityReduction(data=df, target_column='Class')
# Apply different dimensionality reduction techniques
pca_df = dr.apply_pca()
tsne_df = dr.apply_tsne()
umap_df = dr.apply_umap()
complete_dataset = pd.concat([pca_df, tsne_df, umap_df], axis=1)

# combine reduced dimension into one
merged_datasets = complete_dataset[[
    # "PCA_Component_1", "PCA_Component_2", 
    "t-SNE_Component_1", "t-SNE_Component_2", 
    # "UMAP_Component_1", "UMAP_Component_2", 
    "Class"
]]
merged_datasets = merged_datasets.groupby(merged_datasets.columns, axis=1).first()
merged_datasets['Class2'] = merged_datasets['Class'].replace({2.0: 'benign', 4.0: 'malignant'})


  warn(f"n_jobs value {self.n_jobs} overridden to 1 by setting random_state. Use no seed for parallelism.")
  merged_datasets = merged_datasets.groupby(merged_datasets.columns, axis=1).first()


<font size="+9" color="grey"><b> 2.1 1D</b></font><font size="+1" color="grey"><b> ( Class Separation) </b></font><br>


<h2 style="font-size: 1.2em;">1D Types</h2>
<ul>
    <li>Strip Plot</li>
    <li>Histogram</li>
    <li>Box Plot</li>
    <li>Violin Plot</li>
</ul>


In [6]:
from Implementations.visualization import create_combined_chart
scatter_plot_single, gaussian_jitter, box_with_jitter, histogram_shade, combined_chart = create_combined_chart(
    merged_datasets, "t-SNE_Component_1", "t-SNE_Component_2",
    main_attr_color_range=["#1f78b4", "#b2df8a"],
    main_color_range=["#66c2a5", "#fc8d62"],
    attr_color_range=["#66c2a5", "#fc8d62", "#66c2a5", "#fc8d62"],
    attribute_nomeclature=["t-SNE 1", "t-SNE 2"],
    attr_color_domain=[
        "benign t-SNE 1",
        "malignant t-SNE 1",
        "benign t-SNE 2",
        "malignant t-SNE 2",
    ],
    width_single=800,
    height_single=400, 
    jitter_size=50
)
# combined_chart
combined_chart.configure_title(
    fontSize=18  # Title font size
).configure_axis(
    labelFontSize=15,
    titleFontSize=18
).configure_legend(
    labelFontSize=15,
    titleFontSize=18,
    orient="bottom", title=None, padding=5, labelLimit=200, 
    # columns=3
)
# .save('../../images/case1_combined.png', format='png', scale_factor=2)

In [7]:
combined_chart.configure_title(
    fontSize=18
).configure_axis(
    labelFontSize=15,
    titleFontSize=18
).configure_legend(
    labelFontSize=15,
    titleFontSize=18,
    orient="bottom", title=None, padding=5, labelLimit=200, 
    # columns=3
)

In [8]:
scatter_plot_single.configure_title(
    fontSize=18
).configure_axis(
    labelFontSize=15,
    titleFontSize=18
).configure_legend(
    labelFontSize=15,
    titleFontSize=18,
    orient="bottom", title=None, padding=5, labelLimit=200, 
    # columns=3
)
# .save('../../images/case1_scatter_plot.png', format='png', scale_factor=2)

<font size="+1" color="grey"><b> 2.1.1 Strip Plot </b></font>

In [9]:
gaussian_jitter.configure_title(
    fontSize=18
).configure_axis(
    labelFontSize=15,
    titleFontSize=18
).configure_legend(
    labelFontSize=15,
    titleFontSize=18,
    orient="bottom", title=None, padding=5, labelLimit=200, 
    # columns=3
)
# .save('../../images/case1_strip_plot.png', format='png', scale_factor=2)

<font size="+1" color="grey"><b> 2.1.2 Box Plot </b></font>

In [10]:
box_with_jitter.configure_title(
    fontSize=18
).configure_axis(
    labelFontSize=15,
    titleFontSize=18
).configure_legend(
    labelFontSize=15,
    titleFontSize=18,
    orient="bottom", title=None, padding=5, labelLimit=200, 
    # columns=3
)
# .save('../../images/case1_box_plot_with_strip_plot.png', format='png', scale_factor=2)

<font size="+1" color="grey"><b> 2.1.3 Histogram Plot</b></font>

In [11]:
histogram_shade.configure_title(
    fontSize=18
).configure_axis(
    labelFontSize=15,
    titleFontSize=18
).configure_legend(
    labelFontSize=15,
    titleFontSize=18,
    orient="bottom", title=None, padding=5, labelLimit=200, 
    # columns=3
)
# .save('../../images/case1_histogram_plot.png', format='png', scale_factor=2)

<font size="+1" color="grey"><b> 2.1.4 Violin Plot</b></font><br/>
***Violin Plot (1D): This is here because we could not do this in Altair***

In [12]:
from Implementations.visualization import create_plotly_violin_plots
violin_plot = create_plotly_violin_plots(merged_datasets, x="t-SNE_Component_1", y="t-SNE_Component_2")
violin_plot.update_layout(
    font=dict(size=18),
    title_font=dict(size=24),
    xaxis=dict(title_font=dict(size=20), tickfont=dict(size=16)),
    yaxis=dict(title_font=dict(size=20), tickfont=dict(size=16))

)
# .write_image("../../images/case1_violin_plot.png", width=800, height=400, scale=2)
display(violin_plot)

<font size="+9" color="grey"><b>2.2. 2D</b></font><font size="+1" color="grey"><b> ( Class Separation) </b></font><br>

<h2 style="font-size: 1.2em;">2D Types or More</h2>
<ul>
    <li>Scattered Plot</li>
    <li>Density Contour Plot</li>
    <li>Heatmap Plot</li>
    <li>Parallel Co-ordinate Plot</li>
    <li>Dendrogram Plot</li>
</ul>

In [13]:
from Implementations.visualization import create_2Dinteractive_plots
scatter_widget, contour_widget, density_widget, parallel_widget, dendro_widget, grid_layout = create_2Dinteractive_plots(merged_datasets, 't-SNE_Component_1', 't-SNE_Component_2', target_numeric="Class", target="Class2")


<font size="+1" color="grey"><b> 2.2.1 Scatter Plot</b></font><br/>

In [14]:
scatter_plot_2d = scatter_widget.update_layout(
    # width=800,
    # height=400,
    xaxis=dict(
        range=[-24, 34],
        gridcolor='LightGray',
        showgrid=True,
        zeroline=True,         # Show 0 line
        zerolinecolor="gray",  # Set color for 0 line
        zerolinewidth=1,
        title_font=dict(size=20), 
        tickfont=dict(size=16)
    ),
    yaxis=dict(
        range=[-15, 15],
        gridcolor='LightGray',
        showgrid=True,
        zeroline=True,         # Show 0 line
        zerolinecolor="gray",  # Set color for 0 line
        zerolinewidth=1,
        title_font=dict(size=20), 
        tickfont=dict(size=16)  
    ),
    plot_bgcolor='rgba(0,0,0,0)',
    paper_bgcolor='rgba(0,0,0,0)',
    font=dict(size=18),
    title_font=dict(size=24),
)

# scatter_plot_2d.write_image("../../images/case1_2d_scatter_plot.png", width=800, height=400, scale=2)
display(scatter_plot_2d)

FigureWidget({
    'data': [{'hovertemplate': 'Class2=benign<br>t-SNE_Component_1=%{x}<br>t-SNE_Component_2=%{y}<extra></extra>',
              'legendgroup': 'benign',
              'marker': {'color': '#66c2a5', 'symbol': 'circle'},
              'mode': 'markers',
              'name': 'benign',
              'orientation': 'v',
              'showlegend': True,
              'type': 'scatter',
              'uid': '4165abb4-2cf4-43dd-a3e1-b7605b03d751',
              'x': array([ -1.35362554,  12.17535686,   3.40371871, ...,  -9.22936821,
                           -9.61014843, -17.46323204]),
              'xaxis': 'x',
              'y': array([-1.69337952, -4.56390476,  4.65125751, ...,  6.85571718, 12.35418034,
                          10.53393364]),
              'yaxis': 'y'},
             {'hovertemplate': ('Class2=malignant<br>t-SNE_Comp' ... 'omponent_2=%{y}<extra></extra>'),
              'legendgroup': 'malignant',
              'marker': {'color': '#fc8d62', 'symbol': 

<font size="+1" color="grey"><b> 2.2.2 Contour Plot</b></font><br/>

In [15]:
contour_plot_2d = contour_widget.update_traces(line=dict(width=2)).update_layout(
    # width=800,
    # height=600,
    xaxis=dict(
        range=[-35, 40],
        gridcolor='grey',
        showgrid=True,
        zeroline=True,         # Show 0 line
        zerolinecolor="gray",  # Set color for 0 line
        zerolinewidth=1,        # Set width for 0 line
        title_font=dict(size=20), 
        tickfont=dict(size=16)
    ),
    yaxis=dict(
        range=[-20, 20],
        gridcolor='grey',
        showgrid=True,
        zeroline=True,         # Show 0 line
        zerolinecolor="gray",  # Set color for 0 line
        zerolinewidth=1,        # Set width for 0 line
        title_font=dict(size=20), 
        tickfont=dict(size=16)
    ),
    plot_bgcolor='rgba(0,0,0,0)',
    paper_bgcolor='rgba(0,0,0,0)',
    font=dict(size=18),
    title_font=dict(size=24)

)

# contour_plot_2d.write_image("../../images/case1_2d_contour_plot.png", width=800, height=400, scale=2)
display(contour_plot_2d)

FigureWidget({
    'data': [{'contours': {'coloring': 'none'},
              'hovertemplate': ('Class2=benign<br>t-SNE_Compone' ... '}<br>count=%{z}<extra></extra>'),
              'legendgroup': 'benign',
              'line': {'color': '#66c2a5', 'width': 2},
              'name': 'benign',
              'showlegend': True,
              'type': 'histogram2dcontour',
              'uid': 'b4f827d5-3f84-4a16-8e6e-cdf6a04ae7cc',
              'x': array([ -1.35362554,  12.17535686,   3.40371871, ...,  -9.22936821,
                           -9.61014843, -17.46323204]),
              'xaxis': 'x',
              'xbingroup': 'x',
              'y': array([-1.69337952, -4.56390476,  4.65125751, ...,  6.85571718, 12.35418034,
                          10.53393364]),
              'yaxis': 'y',
              'ybingroup': 'y'},
             {'contours': {'coloring': 'none'},
              'hovertemplate': ('Class2=malignant<br>t-SNE_Comp' ... '}<br>count=%{z}<extra></extra>'),
              

<font size="+1" color="grey"><b> 2.2.3 Heatmap Plot</b></font><br/>

In [16]:

heatmap_plot_2d = density_widget.update_layout(
    # width=800,
    # height=600,
    font=dict(size=18),
    title_font=dict(size=24),
    xaxis=dict(title_font=dict(size=20), tickfont=dict(size=16)),
    yaxis=dict(title_font=dict(size=20), tickfont=dict(size=16))
)

# heatmap_plot_2d.write_image("../../images/case1_2d_heatmap_plot.png", width=800, height=400, scale=2)
display(heatmap_plot_2d)

FigureWidget({
    'data': [{'coloraxis': 'coloraxis',
              'hovertemplate': 't-SNE_Component_1=%{x}<br>t-SNE_Component_2=%{y}<br>count=%{z}<extra></extra>',
              'name': '',
              'type': 'histogram2d',
              'uid': 'fd5d47c1-aa6e-469f-8abf-4ddd21af8ebc',
              'x': array([-1.35362554, 12.17535686,  3.40371871, ..., 23.55991936, 19.96989059,
                          19.96037483]),
              'xaxis': 'x',
              'xbingroup': 'x',
              'y': array([-1.69337952, -4.56390476,  4.65125751, ...,  2.67678857,  3.93712044,
                           4.05621529]),
              'yaxis': 'y',
              'ybingroup': 'y'}],
    'layout': {'coloraxis': {'colorbar': {'title': {'text': 'count'}}, 'colorscale': [[0, 'white'], [1, 'black']]},
               'font': {'size': 18},
               'legend': {'tracegroupgap': 0},
               'margin': {'t': 60},
               'showlegend': False,
               'template': '...',
       

<font size="+1" color="grey"><b> 2.2.4 Dendrogram Plot</b></font><br/>

In [17]:
treemap_plot_2d = (dendro_widget.update_layout(
    # width=800,
    # height=600,
    font=dict(size=18),
    title_font=dict(size=24),
    xaxis=dict(title_font=dict(size=20), tickfont=dict(size=16)),
    yaxis=dict(title_font=dict(size=20), tickfont=dict(size=16))
))

# treemap_plot_2d.write_image("../../images/case1_2d_treemap_plot.png", width=800, height=400, scale=2)
display(treemap_plot_2d)

FigureWidget({
    'data': [{'line': {'color': 'black', 'width': 1},
              'mode': 'lines',
              'type': 'scatter',
              'uid': 'c5f4e247-1356-4a63-8e24-cbd8ad884847',
              'x': [5.0, 5.0, 15.0, 15.0],
              'y': [0.0, 1.1752095444771093, 1.1752095444771093, 0.0]},
             {'line': {'color': 'black', 'width': 1},
              'mode': 'lines',
              'type': 'scatter',
              'uid': '28d5b9a4-53d6-44f9-8eb1-0a124da31ea3',
              'x': [35.0, 35.0, 45.0, 45.0],
              'y': [0.0, 0.10109423932970227, 0.10109423932970227, 0.0]},
             {'line': {'color': 'black', 'width': 1},
              'mode': 'lines',
              'type': 'scatter',
              'uid': 'c50def4e-8a15-4102-bb24-9e80f058918e',
              'x': [25.0, 25.0, 40.0, 40.0],
              'y': [0.0, 0.18228344254082002, 0.18228344254082002,
                    0.10109423932970227]},
             {'line': {'color': 'black', 'width': 1},
     

<font size="+1" color="grey"><b> 2.2.4 Parallel Coordinate Plot</b></font><br/>

In [18]:
parallel_plot_2d = (parallel_widget.update_layout(
    # width=800,
    # height=600,
    font=dict(size=18),
    title_font=dict(size=24),
    xaxis=dict(title_font=dict(size=20), tickfont=dict(size=16)),
    yaxis=dict(title_font=dict(size=20), tickfont=dict(size=16))
))


# parallel_plot_2d.write_image("../../images/case1_2d_parallel_plot.png", width=800, height=400, scale=2)
display(parallel_plot_2d)

FigureWidget({
    'data': [{'dimensions': [{'label': 't-SNE_Component_1',
                              'values': array([-1.35362554, 12.17535686,  3.40371871, ..., 23.55991936, 19.96989059,
                                               19.96037483])},
                             {'label': 't-SNE_Component_2',
                              'values': array([-1.69337952, -4.56390476,  4.65125751, ...,  2.67678857,  3.93712044,
                                                4.05621529])}],
              'domain': {'x': [0.0, 1.0], 'y': [0.0, 1.0]},
              'line': {'color': array([2., 2., 2., ..., 4., 4., 4.]),
                       'coloraxis': 'coloraxis'},
              'name': '',
              'type': 'parcoords',
              'uid': 'fc566a2b-502c-4ed1-b0fe-c589f3a0ccc8'}],
    'layout': {'coloraxis': {'colorbar': {'title': {'text': 'Class'}},
                             'colorscale': [[0.0, 'rgb(176, 242, 188)'],
                                            [0.166666666