The paper ([The Visual Language of Multidimensional Data Projection: A Visualization Taxonomy and Informed Insights](https://)) discusses the need for alternative ways to think about visualizing multidimensional projection(MDP). This project shows the alternative ways for MDPs using three different use cases. There exist complex relationships between data instances, therefore there is need to explore different visualization techniques (encoding and interaction) to make clear these complexities. The goal of the project is not to propose new visualization rather to explore the design space of alternatives visualization techniques.


## Use case 2: t-distributed stochastic neigbour embedding (t-SNE)

In this use case, we concentrate on the effect of parameters for MDP method so it has stable projection in the 2D space. This is important for 
- ensuring trust
- reproducing results
- removing uncertainty due to the MDP process
- preserving context and mental map, as neighbour points on one view are still close to one another when different parameters are used.

Here, we used the [Breast Cancer](https://www.semanticscholar.org/paper/Nuclear-feature-extraction-for-breast-tumor-Street-Wolberg/53f0fbb425bc14468eb3bf96b2e1d41ba8087f36) dataset from the [UCI machine learning](https://archive.ics.uci.edu/dataset/17/breast+cancer+wisconsin+diagnostic) repository.


<a id='top'></a>
<div class="list-group" id="list-tab" role="tablist">
<h3 data-toggle="list"  role="tab" aria-controls="home"><p style="font-size : 30px"><font color="darkgrey">Content<font/></p></h3>

1. [<font color="darkgrey">One Dimension<font/>](#1)   
    - 1.1 [<font color="darkgrey"> Dot plot<font/>](#1.1)
2. [<font color="darkgrey"> Two Dimensions<font/>](#2)
    - 2.1 [<font color="darkgrey">Scatterplot<font/>](#2.1)
    - 2.2 [<font color="darkgrey"> Histogram Heatmap<font/>](#2.2)
    - 2.3 [<font color="darkgrey"> 2D Density plot<font/>](#2.3)
3. [<font color="darkgrey"> N Dimensions<font/>](#3)
    - 3.1 [<font color="darkgrey"> Parallel Coordinates<font/>](#3.1)    
4. [<font color="darkgrey">Multiple views<font/>](#4)

In [1]:
from sklearn.preprocessing import StandardScaler
from sklearn.manifold import TSNE
import numpy as np
import pandas as pd
import altair as alt

import matplotlib.pyplot as plt
import seaborn as sns
from plotly.subplots import make_subplots
import plotly.graph_objects as go
import plotly.express as px


In [2]:
breast = pd.read_csv('https://raw.githubusercontent.com/pkmklong/Breast-Cancer-Wisconsin-Diagnostic-DataSet/master/data.csv')
breast.drop(['Unnamed: 32'], axis = 1, inplace = True)
breast.drop(['id'], axis = 1, inplace = True)
breast

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,M,17.99,10.38,122.80,1001.0,0.11840,0.27760,0.30010,0.14710,0.2419,...,25.380,17.33,184.60,2019.0,0.16220,0.66560,0.7119,0.2654,0.4601,0.11890
1,M,20.57,17.77,132.90,1326.0,0.08474,0.07864,0.08690,0.07017,0.1812,...,24.990,23.41,158.80,1956.0,0.12380,0.18660,0.2416,0.1860,0.2750,0.08902
2,M,19.69,21.25,130.00,1203.0,0.10960,0.15990,0.19740,0.12790,0.2069,...,23.570,25.53,152.50,1709.0,0.14440,0.42450,0.4504,0.2430,0.3613,0.08758
3,M,11.42,20.38,77.58,386.1,0.14250,0.28390,0.24140,0.10520,0.2597,...,14.910,26.50,98.87,567.7,0.20980,0.86630,0.6869,0.2575,0.6638,0.17300
4,M,20.29,14.34,135.10,1297.0,0.10030,0.13280,0.19800,0.10430,0.1809,...,22.540,16.67,152.20,1575.0,0.13740,0.20500,0.4000,0.1625,0.2364,0.07678
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
564,M,21.56,22.39,142.00,1479.0,0.11100,0.11590,0.24390,0.13890,0.1726,...,25.450,26.40,166.10,2027.0,0.14100,0.21130,0.4107,0.2216,0.2060,0.07115
565,M,20.13,28.25,131.20,1261.0,0.09780,0.10340,0.14400,0.09791,0.1752,...,23.690,38.25,155.00,1731.0,0.11660,0.19220,0.3215,0.1628,0.2572,0.06637
566,M,16.60,28.08,108.30,858.1,0.08455,0.10230,0.09251,0.05302,0.1590,...,18.980,34.12,126.70,1124.0,0.11390,0.30940,0.3403,0.1418,0.2218,0.07820
567,M,20.60,29.33,140.10,1265.0,0.11780,0.27700,0.35140,0.15200,0.2397,...,25.740,39.42,184.60,1821.0,0.16500,0.86810,0.9387,0.2650,0.4087,0.12400


In [3]:
#Check for null values
null_values = breast.isnull().values.any()
if null_values == True:
    print("There are some missing values in data")
else:
    print("There are no missing values in the dataset")

There are no missing values in the dataset


In [4]:
# Define the standard X (feature matrix) and target series y (not used in here)
X_breast =  breast.drop(columns='diagnosis')
all_features = X_breast.columns
y = breast['diagnosis']

### TSNE transformation

In [5]:
scaler = StandardScaler()
scaled_X = scaler.fit_transform(X_breast)

In [6]:
tsne_results = {}
df =pd.DataFrame()
perplexities = [5, 30, 50, 100]
for perplexity in perplexities:
    tsne = TSNE(
            n_components=2,
            perplexity=perplexity,
            init="random",
            random_state=0,
            n_iter=300
        )

    X = tsne.fit_transform(scaled_X)
    tsne_results[perplexity] = X



    df_temp = pd.DataFrame(X, columns=['tsne 1', 'tsne 2'])
    df_temp['Perplexity'] = perplexity
    df_temp['diagnosis'] = y
    #df_temp["diagnosis"]= df_temp["diagnosis"].map(lambda row: 'M' if row==1 else 'B')
    df = pd.concat([df, df_temp])

In [7]:
df_tsne = breast.copy()
for perplexity, tsne_result in tsne_results.items():
    # Add a column to the dataframe with the current perplexity value
    df_tsne['tsne1_'+str(perplexity)] = tsne_result[:, 0]
    df_tsne['tsne2_'+str(perplexity)] = tsne_result[:, 1]

In [8]:
df_tsne.head()

Unnamed: 0,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,...,symmetry_worst,fractal_dimension_worst,tsne1_5,tsne2_5,tsne1_30,tsne2_30,tsne1_50,tsne2_50,tsne1_100,tsne2_100
0,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,0.2419,...,0.4601,0.1189,14.607053,-9.053056,-12.83877,7.983828,-12.938224,4.403008,-10.35476,4.212507
1,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,0.1812,...,0.275,0.08902,5.453405,-20.839455,-4.786514,10.559366,-9.369542,-2.75216,-5.206512,6.041427
2,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,0.2069,...,0.3613,0.08758,12.331613,-10.087982,-10.63205,8.905945,-12.002336,1.495626,-8.293961,5.199122
3,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,0.2597,...,0.6638,0.173,19.300211,-2.625523,-15.023999,1.324092,-8.653552,7.927435,-9.326034,-0.063851
4,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,0.1809,...,0.2364,0.07678,-0.373876,-14.699286,-8.168822,7.791937,-10.871114,-1.533025,-5.932167,7.020203


In [10]:
tsne_para = df_tsne[['tsne1_5', 'tsne2_5', 'tsne1_30', 'tsne2_30', 'tsne1_50', 'tsne2_50', 'tsne1_100', 'tsne2_100', 'diagnosis']]


### Taxonomy

![FlowMDP](sankey_full.png)


<font size="+3" color="grey"><b>1. One Dimension </b></font><br><a id="1"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>

<font size="+2" color="grey"><b>1.1 Dot plot  </b></font><br><a id="1.1"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>

From the taxanomy, plotting Dot plot
- Dimension: D = 1
- Data abstraction: one numeric variable
- Encoding: position (x-axis), points, size
- Interaction: tooltip
- Layout: Juxtaposition (vertical and horizontal concatenation)

In [14]:
base = alt.Chart(tsne_para).encode()

pca_scatter_bin_5 = base.mark_point().encode(
    x=alt.X('tsne1_5:Q', bin=True, title="tsne 1_5"),
    y=alt.Y('count():Q', scale=alt.Scale(domain=[0, 200])),
    size='count()',
    tooltip=['diagnosis', 'count()'],
    #opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
)

pca_scatter_bin2_5 = base.mark_point().encode(
    x=alt.X('tsne2_5:Q', bin=True, title="tsne 2_5"),
    y=alt.Y('count():Q', scale=alt.Scale(domain=[0, 200])),
    size='count()',
    tooltip=['diagnosis', 'count()'],
    #opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
)

pca_scatter_bin_30 = base.mark_point().encode(
    x=alt.X('tsne1_30:Q', bin=True, title="tsne 1_30"),
    y=alt.Y('count():Q', scale=alt.Scale(domain=[0, 200])),
    size='count()',
    tooltip=['diagnosis', 'count()'],
    #opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
)

pca_scatter_bin2_30 = base.mark_point().encode(
    x=alt.X('tsne2_30:Q', bin=True, title="tsne 2_30"),
    y=alt.Y('count():Q', scale=alt.Scale(domain=[0, 200])),
    size=alt.Size('count()', scale=alt.Scale(range=[100, 600])),
    tooltip=['diagnosis', 'count()'],
    #opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
)

pca_scatter_bin_50 = base.mark_point().encode(
    x=alt.X('tsne1_50:Q', bin=True, title="tsne 1_50"),
    y=alt.Y('count():Q', scale=alt.Scale(domain=[0, 200])),
    size='count()',
    tooltip=['diagnosis', 'count()'],
    #opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
)

pca_scatter_bin2_50 = base.mark_point().encode(
    x=alt.X('tsne2_50:Q', bin=True, title="tsne 2_50"),
    y=alt.Y('count():Q', scale=alt.Scale(domain=[0, 200])),
    size=alt.Size('count()', scale=alt.Scale(range=[100, 600])),
    tooltip=['diagnosis', 'count()'],
    #opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
)

pca_scatter_bin_100 = base.mark_point().encode(
    x=alt.X('tsne1_100:Q', bin=True, title="tsne 1_100"),
    y=alt.Y('count():Q', scale=alt.Scale(domain=[0, 200])),
    size='count()',
    tooltip=['diagnosis', 'count()'],
    #opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
)

pca_scatter_bin2_100 = base.mark_point().encode(
    x=alt.X('tsne2_100:Q', bin=True, title="tsne 2_100"),
    y=alt.Y('count():Q', scale=alt.Scale(domain=[0, 200])),
    size=alt.Size('count()', scale=alt.Scale(range=[100, 600])),
    tooltip=['diagnosis', 'count()'],
    #opacity=alt.condition(selection, alt.value(1), alt.value(0.2))
)

all_viz= (pca_scatter_bin_5 | pca_scatter_bin2_5) & (pca_scatter_bin_30 | pca_scatter_bin2_30) & (pca_scatter_bin_50 | pca_scatter_bin2_50) & (pca_scatter_bin_100 | pca_scatter_bin2_100)


all_viz.properties(
    title= 'tSNE projection for different perplexities using dot plot',
).configure_title(
    anchor= 'middle',
    fontSize=16  
)

The tsne values for each parameter is plotted on the x-axis, and the count of records show the amount at each position. Tooltip provides more information about each dot.

### Further points

- Is the way of representation acceptable? It help to make the data less cluttered and reduces occlusion. However, we can observe that there is a compacting away of information resulting from the current representation and what is out of sight is out of mind.

- Interaction with tooltip can actually help provide more information about the compaction as hovering over the points shows more information about the data. 
- Will color encoding channel help here?

<font size="+3" color="grey"><b>2. Two Dimensions </b></font><br><a id="2"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>


<font size="+2" color="grey"><b>2.1  2D Scatterplot  </b></font><br><a id="2.1"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>

From the taxanomy, plotting scatterplot
- Dimension: D = 2
- Data abstraction: two numeric variables
- Encoding: position (x&y-axis),points, color
- Interaction: selection and filter
- Layout: Juxtaposition (vertical and horizontal concatenation)

In [16]:

#scatter plot for all the parameters
brush = alt.selection(type='interval', resolve='global')

base = alt.Chart(df_tsne).mark_circle().encode(
    color=alt.condition(brush, 'diagnosis:N', alt.ColorValue('gray')),
).add_selection(
    brush
)

chart1 = base.encode(
    x=alt.X('tsne1_5:Q', scale=alt.Scale(domain=[-25, 20])),
    y=alt.Y('tsne2_5:Q', scale=alt.Scale(domain=[-25, 25]))
)

chart2 = base.encode(
    x=alt.X('tsne1_30:Q', scale=alt.Scale(domain=[-25, 20])),
    y=alt.Y('tsne2_30:Q', scale=alt.Scale(domain=[-25, 25]))
)

chart3 = base.encode(
    x=alt.X('tsne1_50:Q', scale=alt.Scale(domain=[-25, 20])),
    y=alt.Y('tsne2_50:Q', scale=alt.Scale(domain=[-25, 25]))
)

chart4 = base.encode(
    x=alt.X('tsne1_100:Q', scale=alt.Scale(domain=[-25, 20])),
    y=alt.Y('tsne2_100:Q', scale=alt.Scale(domain=[-25, 25]))
)

all_viz_scatter= (chart1 | chart2) & (chart3 | chart4) 


all_viz_scatter.properties(
    title= 'tSNE projection for different perplexities using scatterplot',
).configure_title(
    anchor= 'middle',
    fontSize=16  
)

Aside scatter plot, other charts maybe considered for encoding 2D numeric data such as 2D density plot and 2D histogram plot. These plots are able to manage overplotting or situations when showing data sets would result in points overlapping with each and likely hiding patterns (occlusion). Information loss because of the way things are rendered.


<font size="+2" color="grey"><b>2.2 2D Histogram Heatmap  </b></font><br><a id="2.2"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>


From the taxanomy, plotting 2D Histogram heatmap
- Dimension: D = 2
- Data abstraction: two numeric variables
- Encoding: position (x&y-axis),shape, color
- Interaction: select and filter, tooltip
- Layout: Juxtaposition (vertical and horizontal concatenation)

In [18]:
#histogram heatmap for all the parameters
selection = alt.selection_multi(fields=['diagnosis'], bind='legend')

points = alt.Chart(tsne_para).mark_rect().encode(
    color = alt.condition(selection, 'diagnosis:O', alt.ColorValue('gray')),
    tooltip = 'diagnosis'
).add_selection(selection)

chart1 = points.encode(
    x=alt.X('tsne1_5:Q', bin=alt.Bin(maxbins=25), title="tsne 1_5"),
    y=alt.Y('tsne2_5:Q', bin=True, title="tsne 2_5")
)

chart2 = points.encode(
    x=alt.X('tsne1_30:Q', bin=alt.Bin(maxbins=25), title="tsne 1_30"),
    y=alt.Y('tsne2_30:Q', bin=True, title="tsne 2_30")
)

chart3 = points.encode(
    x=alt.X('tsne1_50:Q',bin=alt.Bin(maxbins=25), title="tsne 1_50"),
    y=alt.Y('tsne2_50:Q',bin=True, title="tsne 2_50")
)

chart4 = points.encode(
    x=alt.X('tsne1_100:Q', bin=alt.Bin(maxbins=25), title="tsne 1_100"),
    y=alt.Y('tsne2_100:Q', bin=True, title="tsne 2_100")
)

all_viz_hist= (chart1 | chart2) & (chart3 | chart4) 

all_viz_hist.properties(
    title= 'tSNE projection for different perplexities using 2D histogram heatmap',
).configure_title(
    anchor= 'middle',
    fontSize=16  
)

The above chart represents the tsne values for different paramaters using 2D histogram heatmap. tsne1 and tsne2 are plotted in x and y axes respectively. Selecting and filtering interaction is activated when a quality is selected from the legend and one can see the effect across all the charts.

### Further points

- Some overlapping still occurs, patterns are likely to be hidden when a dataset contains too many points that occlude one another like in this case. Nevertheless, interaction may come in handy here.
- Additionally, this kind of representation shows certain patterns in the dataset into easily identifying clusters (labels) and region where a particular cluster is dominant.



<font size="+2" color="grey"><b>2.3 2D density plot  </b></font><br><a id="2.3"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>


From the taxanomy, plotting 2D density plot
- Dimension: D = 2
- Data abstraction: two numeric variables
- Encoding: position (x&y-axis), size, color 
- Interaction: selection and filter, tooltip
- Layout: Juxtaposition (horizontal&vertical concatenation)

In [23]:

fig = make_subplots(
    rows=2, cols=2,
)

fig.add_trace(go.Histogram2dContour(
    x=tsne_para.iloc[:,0], 
    y=tsne_para.iloc[:,1], 
    colorscale='Viridis',
    zmin=0,
    zmax=20
), row=1, col=1)

fig.add_trace(go.Histogram2dContour( 
    x=tsne_para.iloc[:,2], 
    y=tsne_para.iloc[:,3],
    colorscale='Viridis',
    zmin=0,
    zmax=20
), row=1, col=2)

fig.add_trace(go.Histogram2dContour(
   x=tsne_para.iloc[:,4],  
   y=tsne_para.iloc[:,5],
   colorscale='Viridis',
   zmin=0,
   zmax=20
), row=2, col=1)

fig.add_trace(go.Histogram2dContour(
    x=tsne_para.iloc[:,6], 
    y=tsne_para.iloc[:,7],
    colorscale='Viridis',
    zmin=0,
    zmax=20
), row=2, col=2)

fig.update_xaxes(title_text='tsne1_5', row=1, col=1, range=[-25, 20])
fig.update_yaxes(title_text='tsne2_5', row=1, col=1, range=[-20, 20])
fig.update_xaxes(title_text='tsne1_30', row=1, col=2, range=[-25, 20])
fig.update_yaxes(title_text='tsne2_30', row=1, col=2, range=[-20, 20])
fig.update_xaxes(title_text='tsne1_50', row=2, col=1, range=[-25, 20])
fig.update_yaxes(title_text='tsne2_50', row=2, col=1, range=[-20, 20])
fig.update_xaxes(title_text='tsne1_100', row=2, col=2, range=[-25, 20])
fig.update_yaxes(title_text='tsne2_100', row=2, col=2, range=[-20, 20])

fig.update_layout(height=500, width=800, hovermode='closest', title={
        'text': 'tSNE Projection using 2D density plot',
        'y': 0.95,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top'
    }, 
    font=dict(family="Helvetica Neue, Arial", size=12), )

fig.show()


In [36]:
df

Unnamed: 0,tsne 1,tsne 2,Perplexity,scripts
0,58.547855,-1.767002,5,BUD
1,-0.042514,5.258186,5,BUD
2,8.316894,12.551376,5,BUD
3,34.925705,-24.832233,5,BUD
4,-14.084832,-19.311710,5,BUD
...,...,...,...,...
585,-18.267729,-18.668276,100,BOW
586,-36.430855,-0.544510,100,BOW
587,5.219565,-34.929501,100,BOW
588,-41.226734,2.116818,100,BOW


In [25]:
# Create a faceted density plot with Plotly
fig = px.density_contour(df, x='tsne 1', y='tsne 2', facet_col='Perplexity', color='diagnosis')

#fig.update_xaxes(range=[-70, 70])
#fig.update_yaxes(range=[-100, 100])


fig.update_layout(
    #height=500, width=1000, 
    hovermode='closest', 
    title={
        'text': 'tSNE Projection using Contour plot',
        'y': 0.98,
        'x': 0.5,
        'xanchor': 'center',
        'yanchor': 'top'
    }, 
    font=dict(family="Helvetica Neue, Arial", size=20)
)

fig.show()

### Further points

- Here, we represented the tsne values as 2D dentsity plot. The usage of this kind of plot easily shows whether clusters exist in a dataset or not, which in this case not.






<font size="+3" color="grey"><b>3. N Dimensions </b></font><br><a id="3"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>

<font size="+2" color="grey"><b>3.1 Parallel coordinates  </b></font><br><a id="3.1"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>

From the taxanomy, plotting parallel coordinate
- Dimension: D = >2
- Data abstraction: n numeric variables, one categorical
- Encoding: position (y-axis), lines, color 
- Interaction: axis reordering, highlighting, selecting
- Layout: None

In [29]:
tsne_PCP = tsne_para.copy()
tsne_PCP['diagnosis']= tsne_PCP["diagnosis"].map(lambda row: 1 if row=='M' else 0)
tsne_PCP

Unnamed: 0,tsne1_5,tsne2_5,tsne1_30,tsne2_30,tsne1_50,tsne2_50,tsne1_100,tsne2_100,diagnosis
0,14.607053,-9.053056,-12.838770,7.983828,-12.938224,4.403008,-10.354760,4.212507,1
1,5.453405,-20.839455,-4.786514,10.559366,-9.369542,-2.752160,-5.206512,6.041427,1
2,12.331613,-10.087982,-10.632050,8.905945,-12.002336,1.495626,-8.293961,5.199122,1
3,19.300211,-2.625523,-15.023999,1.324092,-8.653552,7.927435,-9.326034,-0.063851,1
4,-0.373876,-14.699286,-8.168822,7.791937,-10.871114,-1.533025,-5.932167,7.020203,1
...,...,...,...,...,...,...,...,...,...
564,13.914875,-17.890726,-10.511149,11.993457,-14.470928,0.007433,-8.637575,7.351347,1
565,3.411049,-15.240687,-7.428903,13.261080,-11.855868,-2.907466,-6.920950,7.357049,1
566,-0.456518,-10.870454,-1.151504,7.537772,-5.464025,-3.358994,-2.956374,-2.144295,1
567,14.755898,-11.000519,-14.158130,9.720449,-13.310723,3.482843,-10.022770,3.606786,1


In [31]:
# Create a parallel coordinates plot with Plotly
fig = px.parallel_coordinates(tsne_PCP, color='diagnosis', color_continuous_scale=px.colors.diverging.Tealrose,)


fig.update_layout(height=500, hovermode='closest',   title={
        'text': 't-SNE Projection using Parallel Coordinates',
        'y': 0.95,
        'x': 0.55,
        'xanchor': 'center',
        'yanchor': 'bottom'
    },
        font=dict(family="Helvetica Neue, Arial", size=10), )
fig.show()


### Further points

- Here, we have the projection values of the different parameters displayed as a parallel coordinate plot. With that we can compare the individual clusters (i.e., diagnosis) across the different parameters.

- With axis reordering, the number of crossing between the parameters can be minimized and the clutter in the plot can be decreased.

- Using highlighting and axis ordering, was identifying and comparing similarities made easier? Does it make sense to compare different parameters of a projected data this way?


<font size="+3" color="grey"><b>4. Multiple Views </b></font><br><a id="4"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>


- Dimension: D = >2
- Data abstraction: n numeric, one categorical
- Encoding: position (x & y-axis), area,  color, points.
- Interaction: tooltip, brushing and linking
- Layout: Juxtaposition (horizontal&vertical concatenation)

In [33]:
#for when perplexity is 5
brush = alt.selection_interval()
color = alt.Color('count()', scale=alt.Scale(scheme='greenblue'))

base_heatmap = alt.Chart(tsne_para).mark_rect().encode(
    color=color,
    tooltip= ['diagnosis:N', 'count():Q']
).add_selection(
    brush
)

chart_heatmap = base_heatmap.encode(
    x = alt.X('tsne1_5:Q',bin=alt.Bin(maxbins=20), title="tsne1_5"),
    y = alt.Y('tsne2_5:Q',bin=alt.Bin(maxbins=20), title="tsne2_5"),
)


base_scatter = alt.Chart(tsne_para).mark_circle().encode(
    color= alt.Color('diagnosis:N', legend=None),
    tooltip= ['diagnosis:N']
).transform_filter(
    brush
)

chart2 = base_scatter.encode(
    x=alt.X('tsne1_30:Q',),
    y=alt.Y('tsne2_30:Q',),
    color= alt.Color('diagnosis:N',),
)

chart3 = base_scatter.encode(
    x=alt.X('tsne1_50:Q',),
    y=alt.Y('tsne2_50:Q',)
)

chart4 = base_scatter.encode(
    x=alt.X('tsne1_100:Q',),
    y=alt.Y('tsne2_100:Q',)
)


all_viz_5= (chart_heatmap | chart2) & (chart3 | chart4) 
#tsne_scatter= alt.hconcat(chart2, chart3, chart4)

all_viz_5.properties(
    title= 'Decomposition of tSNE into multiple views for perplexity = 5'
).configure_title(
    anchor= 'middle',
    fontSize=16  
)



In [38]:
#for when perplexity is 30
brush = alt.selection_interval()
color = alt.Color('count()', scale=alt.Scale(scheme='greenblue'))

base_heatmap = alt.Chart(tsne_para).mark_rect().encode(
    color=color,
    #alt.Color('type:N', scale=alt.Scale(scheme='greenblue')),
    tooltip= ['diagnosis:N', 'count():Q']
).add_selection(
    brush
)

chart_heatmap = base_heatmap.encode(
    x = alt.X('tsne1_30:Q',bin=alt.Bin(maxbins=20), title="tsne1_30"),
    y = alt.Y('tsne2_30:Q',bin=alt.Bin(maxbins=20), title="tsne2_30"),
)

base_scatter = alt.Chart(tsne_para).mark_circle().encode(
    color= alt.Color('diagnosis:N', legend=None),
    tooltip= ['diagnosis:N']
).transform_filter(
    brush
)

chart1 = base_scatter.encode(
    x=alt.X('tsne1_5:Q', scale=alt.Scale(domain=[-25, 20])),
    y=alt.Y('tsne2_5:Q', scale=alt.Scale(domain=[-25, 25])),
    color= alt.Color('diagnosis:N',)
)

chart3 = base_scatter.encode(
    x=alt.X('tsne1_50:Q', scale=alt.Scale(domain=[-20, 20])),
    y=alt.Y('tsne2_50:Q', scale=alt.Scale(domain=[-20, 20]))
)

chart4 = base_scatter.encode(
    x=alt.X('tsne1_100:Q', scale=alt.Scale(domain=[-20, 20])),
    y=alt.Y('tsne2_100:Q', scale=alt.Scale(domain=[-20, 20]))
)


all_viz_30= (chart_heatmap | chart1) & (chart3 | chart4) 


all_viz_30.properties(
    title= 'Decomposition of tSNE into multiple views for perplexity = 30'
).configure_title(
    anchor= 'middle',
    fontSize=16  
)

In [39]:
#for when perplexity is 50
brush = alt.selection_interval()
color = alt.Color('count()', scale=alt.Scale(scheme='greenblue'))

base_heatmap = alt.Chart(tsne_para).mark_rect().encode(
    color=color,
    tooltip= ['disgnosis:N', 'count():Q']
).add_selection(
    brush
)

chart_heatmap = base_heatmap.encode(
    x = alt.X('tsne1_50:Q',bin=alt.Bin(maxbins=20), title="tsne1_50"),
    y = alt.Y('tsne2_50:Q',bin=alt.Bin(maxbins=20), title="tsne2_50"),
)

base_scatter = alt.Chart(tsne_para).mark_circle().encode(
    color= alt.Color('diagnosis:N', legend=None),
    tooltip= ['diagnosis:N']
).transform_filter(
    brush
)

chart1 = base_scatter.encode(
    x=alt.X('tsne1_5:Q', scale=alt.Scale(domain=[-25, 20])),
    y=alt.Y('tsne2_5:Q', scale=alt.Scale(domain=[-25, 25])),
    color= alt.Color('diagnosis:N',)
)

chart2 = base_scatter.encode(
    x=alt.X('tsne1_30:Q', scale=alt.Scale(domain=[-20, 20])),
    y=alt.Y('tsne2_30:Q', scale=alt.Scale(domain=[-20, 20]))
)


chart4 = base_scatter.encode(
    x=alt.X('tsne1_100:Q', scale=alt.Scale(domain=[-20, 20])),
    y=alt.Y('tsne2_100:Q', scale=alt.Scale(domain=[-20, 20]))
)


all_viz_50= (chart_heatmap | chart1) & (chart2 | chart4) 


all_viz_50.properties(
    title= 'Decomposition of tSNE into multiple views for perplexity = 50'
).configure_title(
    anchor= 'middle',
    fontSize=16  
)

In [40]:
#for when perplexity is 100
brush = alt.selection_interval()


color = alt.Color('count()', scale=alt.Scale(scheme='greenblue'))

base_heatmap = alt.Chart(tsne_para).mark_rect().encode(
    color=color,
    tooltip= ['diagnosis:N', 'count():Q']
).add_selection(
    brush
)

chart_heatmap = base_heatmap.encode(
    alt.X('tsne1_100:Q',bin=alt.Bin(maxbins=20), title="tsne1_100"),
    alt.Y('tsne2_100:Q',bin=alt.Bin(maxbins=20), title="tsne2_100"),
)

base_scatter = alt.Chart(tsne_para).mark_circle().encode(
    color= alt.Color('diagnosis:N', legend=None),
    tooltip= ['diagnosis:N']
).transform_filter(
    brush
)

chart1 = base_scatter.encode(
    x=alt.X('tsne1_5:Q', scale=alt.Scale(domain=[-25, 20])),
    y=alt.Y('tsne2_5:Q', scale=alt.Scale(domain=[-25, 25])),
    color= alt.Color('diagnosis:N',)
)

chart2 = base_scatter.encode(
    x=alt.X('tsne1_30:Q', scale=alt.Scale(domain=[-20, 20])),
    y=alt.Y('tsne2_30:Q', scale=alt.Scale(domain=[-20, 20]))
)

chart3 = base_scatter.encode(
    x=alt.X('tsne1_50:Q', scale=alt.Scale(domain=[-20, 20])),
    y=alt.Y('tsne2_50:Q', scale=alt.Scale(domain=[-20, 20]))
)



all_viz_100= (chart_heatmap | chart1) & (chart2 | chart3) 


all_viz_100.properties(
    title= 'Decomposition of tSNE into multiple views for perplexity = 100'
).configure_title(
    anchor= 'middle',
    fontSize=16  
)

### Further points

- The compositon operators (eg, juxaposition) can be used to construct a variety of multi view visualization. 
- For MDPs, there is a demand for layouts that properly explore the users graphic perception abilities while being enriched with interactive idioms and techniquies
- Linking and brushing allows for comparison across multiple views. In this case, projection for each parameter, the 2D histogram is represented as an overview, while the brushing and linking interaction technique provides connection with other plots, in this case scatterplot.
- The kind of interaction allows for a minimization of clusters points and also minimization of occlusion of points.
- Although these representations shows alternatives, it is very difficulties to compare different perplexties with the current layout. 


## Citation

If you found the examples in this notebook useful and you have used these alternatives in your research, please cite...