The paper ([The Visual Language of Multidimensional Data Projection: A Visualization Taxonomy and Informed Insights](https://)) discusses the need for alternative ways to think about visualizing multidimensional projection(MDP). This project shows the alternative ways for MDPs using three different use cases. There exist complex relationships between data instances, therefore there is need to explore different visualization techniques (encoding and interaction) to make clear these complexities. The goal of the project is not to propose new visualization rather to explore the design space of alternatives visualization techniques.


## Use case 3: t-SNE for time series





Time series are a special case, as they are often non-stationary and often only a time series is available without further explanatory variables. 

Here, we used the [Numenta Anomaly Benchmark](https://www.kaggle.com/boltzmannbrain/nab) dataset provided by [boltzmannbrain](https://www.kaggle.com/boltzmannbrain).

<a id='top'></a>
<div class="list-group" id="list-tab" role="tablist">
<h3 data-toggle="list"  role="tab" aria-controls="home"><p style="font-size : 30px"><font color="darkgrey">Content<font/></p></h3>

1. [<font color="darkgrey">One Dimension<font/>](#1)   
    - 1.1 [<font color="darkgrey"> 1D Stripe plot<font/>](#1.1)
2. [<font color="darkgrey"> Two Dimensions<font/>](#2)
    - 2.1 [<font color="darkgrey"> 2D stripe plot<font/>](#2.1)
    - 2.2 [<font color="darkgrey"> Bar plot <font/>](#2.2)
3. [<font color="darkgrey"> N Dimensions<font/>](#3)
    - 3.1 [<font color="darkgrey">Scatterplot<font/>](#3.1)    
4. [<font color="darkgrey">Multiple views<font/>](#4)
    - 4.1 [<font color="darkgrey">Multi view 1<font/>](#4.1) 
    - 4.2 [<font color="darkgrey">Multi view 2<font/>](#4.2) 

In [2]:
from sklearn.preprocessing import StandardScaler
from sklearn.manifold import TSNE
import altair as alt
import numpy as np 
import pandas as pd


In [3]:
def scale_data(X):
    scaler = StandardScaler()
    scaled_X = scaler.fit_transform(X)
    return scaled_X

In [4]:
train = pd.read_csv("https://raw.githubusercontent.com/numenta/NAB/master/data/artificialNoAnomaly/art_daily_small_noise.csv")
train['value'] = scale_data(train['value'].values.reshape(-1,1))
train["timestamp"] = train.timestamp.astype("datetime64")

test1 = pd.read_csv("https://raw.githubusercontent.com/numenta/NAB/master/data/artificialWithAnomaly/art_daily_jumpsdown.csv")
test1['value'] = scale_data(test1['value'].values.reshape(-1,1))
test1["timestamp"] = test1.timestamp.astype("datetime64")

test2 = pd.read_csv("https://raw.githubusercontent.com/numenta/NAB/master/data/artificialWithAnomaly/art_daily_jumpsup.csv")
test2['value'] = scale_data(test2['value'].values.reshape(-1,1))
test2["timestamp"] = test2.timestamp.astype("datetime64")

In [5]:
time = test1.timestamp

In [6]:
pltDf = pd.concat([train[["timestamp", "value"]], test1[["timestamp", "value"]], test2[["timestamp", "value"]]]).reset_index(drop=True)

l = train.shape[0]*["train"]
l.extend(train.shape[0]*["test1"])
l.extend(train.shape[0]*["test2"])

pltDf["sample"] = l


alt.data_transformers.enable('default', max_rows=None)

base = alt.Chart(pltDf).mark_line().encode(
    x=alt.X('timestamp:T', axis=alt.Axis(format="%d %B")),
    y=alt.Y('value:Q'),
    color=alt.Color('sample:N'),
    tooltip=["timestamp", "value", "sample"]
).interactive()




base 



In [7]:
def perform_tsne(data, n_components=2, perplexity=50, n_iter=300, random_state=0):
    """
    Perform t-SNE dimensionality reduction on the input data.
    Parameters:
        data (pandas.DataFrame): The data to perform t-SNE on.
        n_components (int): The number of dimensions in the reduced space (default=2).
        perplexity (float): The perplexity parameter for t-SNE (default=50).
        n_iter (int): The number of iterations for t-SNE (default=300).
        random_state (int): The random seed for t-SNE (default=0).
    Returns:
        numpy.ndarray: The reduced data in the new space.
    """
    tsne = TSNE(n_components=n_components, perplexity=perplexity, n_iter=n_iter, random_state=random_state)
    tsne_result = tsne.fit_transform(data)
    return tsne_result


In [8]:
tsne_result_train1 = perform_tsne(train.drop(columns=["timestamp"]))

In [9]:
tsne_result_train = perform_tsne(train.drop(columns=["timestamp"]))
tsne_result_test1 = perform_tsne(test1.drop(columns=["timestamp"]))
tsne_result_test2 = perform_tsne(test2.drop(columns=["timestamp"]))

In [10]:
train['tsne1'] = tsne_result_train[:,0]
train['tsne2'] = tsne_result_train[:,1]
test1['tsne1'] = tsne_result_test1[:,0]
test1['tsne2'] = tsne_result_test1[:,1]
test2['tsne1'] = tsne_result_test2[:,0]
test2['tsne2'] = tsne_result_test2[:,1]


In [11]:
sample = time.map(lambda x: "yes" if (x > pd.to_datetime("2014-04-11 08:00")) & (x <= pd.to_datetime("2014-04-11 20:50")) else "no").values


In [12]:
test1['outlier'] = sample
test2['outlier'] = sample


### Taxonomy

![FlowMDP](sankey_full.png)


<font size="+3" color="grey"><b>1. One Dimension </b></font><br><a id="1"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>

<font size="+2" color="grey"><b>1.1 1D Stripe plot  </b></font><br><a id="1.1"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>


From the taxanomy, plotting 1D stripe plot
- Dimension: D = 1
- Data abstraction: one numeric variable
- Encoding: position (x-axis), lines, color
- Interaction: selection and filter
- Layout: Juxtaposition (horizontal concatenation)

In [54]:
selection = alt.selection_multi(fields=['outlier'], bind='legend')

base = alt.Chart(test1).encode().properties(
    width=1000,)


pca1_tick = base.mark_tick().encode(
    x=alt.X('tsne1'),
    color=alt.Color('outlier:N'),
    opacity=alt.condition(selection, alt.value(1), alt.value(0.1))
).add_selection(
    selection
)

pca2_tick = base.mark_tick().encode(
    x=alt.X('tsne2'),
    color=alt.Color('outlier:N'),
    opacity=alt.condition(selection, alt.value(1), alt.value(0.1))
).add_selection(
    selection
)

(pca1_tick & pca2_tick).properties(
    title='Decomposition of tSNE values using tick marks',
   ).configure_title(
    anchor= 'middle',
    fontSize=16  
)

### Further points

- Considered each tsne value individually as lines (ticks) represented in one dimensional plane, For this consideration, occulsion is present which can hide lots of datapoints. However,Labels can be identified using the color encoding with interactions like select and filter.
- For the task of outlier detection, this kind of representation can show how the outliers as distributed accross the reduced dataset
- 

<font size="+3" color="grey"><b>2. Two Dimensions </b></font><br><a id="2"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>


<font size="+2" color="grey"><b>2.1  2D Stripe plot  </b></font><br><a id="2.1"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>


From the taxanomy, plotting 2D stripe plot
- Dimension: D = 2
- Data abstraction: one numeric and one categorical variable
- Encoding: position (x&y-axis), lines, color
- Interaction: none
- Layout: Juxtaposition (horizontal concatenation)

In [53]:
pca1_tick = base.mark_tick().encode(
    x=alt.X('tsne1'),
    y=alt.Y('outlier:O'),
    color=alt.Color('outlier:N'),
    #tooltip=['quality'],
    #opacity=alt.condition(selection, alt.value(1), alt.value(0.1))
)

pca2_tick = base.mark_tick().encode(
    x=alt.X('tsne2'),
    y=alt.Y('outlier:O'),
    color=alt.Color('outlier:N'),
    #tooltip=['quality'],
    #opacity=alt.condition(selection, alt.value(1), alt.value(0.1))
)

(pca1_tick & pca2_tick).properties(
    title='Decomposition of tSNE values to show outliers using tick marks',
   ).configure_title(
    anchor= 'middle',
    fontSize=16  
)

<font size="+2" color="grey"><b>2.2 Barplot  </b></font><br><a id="2.2"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>


From the taxanomy, plotting bar plot
- Dimension: D = 2
- Data abstraction: one numeric and one temporal variable
- Encoding: position (x&y-axis), length, color
- Interaction: none
- Layout: Juxtaposition (horizontal concatenation)

In [33]:


tsne1 = alt.Chart(test1).mark_bar().encode(
    x=alt.X('timestamp:T', axis=alt.Axis(format="%d %B")),
    y=alt.Y("tsne1:Q", stack=None),
    color="outlier:N"
).properties(
    width=500,
    height=500
)

tsne2 = alt.Chart(test1).mark_bar().encode(
    x=alt.X('timestamp:T', axis=alt.Axis(format="%d %B")),
    y=alt.Y("tsne2:Q"),
    color="outlier:N"
).properties(
    width=500,
    height=500
)

(tsne1 | tsne2).properties(
    title='tSNE values plotted against the timestamp',
   ).configure_title(
    anchor= 'middle',
    fontSize=16  
)

### Further points

- The tsne values are plotted against the time variable. This can allow effective comparison and easily provides insight on how the projected data behaves with respect to time.
-



<font size="+3" color="grey"><b>3. N Dimensions </b></font><br><a id="3"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>

<font size="+2" color="grey"><b>3.1 Scatterplot  </b></font><br><a id="3.1"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>


From the taxanomy,
- Dimension: D = >2
- Data abstraction: 2 numeric variables, temporal and categorical variable
- Encoding: position (X&Y-axis), shape, color, 
- Interaction: selection and filter
- Layout: Juxtaposition (horizontal concatenation)

In [56]:

chart = alt.Chart(test1).mark_point().encode(
    x='timestamp:T',
    y='value:Q',
    #strokeDash = 'outlier',
    color=alt.Color('tsne1:Q', scale=alt.Scale(scheme='brownbluegreen')),
    shape='outlier:N',
    tooltip=['tsne1', 'tsne2']
).properties(
    #title='Decomposition of tSNE values to show outliers',
    width=1000,
    height=500
).interactive()


chart.properties(
    title='Decomposition of tSNE values to show outliers',
   ).configure_title(
    anchor= 'middle',
    fontSize=16  
)


<font size="+3" color="grey"><b>4. Multiple Views </b></font><br><a id="4"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>

<font size="+2" color="grey"><b>4.1 Multi view 1  </b></font><br><a id="4.1"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>

- Dimension: D = >2
- Data abstraction: n numeric, temporal and categorical
- Encoding: position (x & y-axis), line, points,  color.
- Interaction: brushing and linking
- Layout: Juxtaposition (vertical & horizontal concatenation)

In [60]:
# Configure the options common to all layers
brush = alt.selection(type='interval')
base = alt.Chart(test1).add_selection(brush)

# Configure the points(to change colorscale use: scale=alt.Scale(scheme='set2'))
points = base.mark_point().encode(
    x=alt.X('timestamp', title='Timestamp', axis=alt.Axis(format="%d %B")),
    y=alt.Y('value', title='Value'),
    color=alt.condition(brush, 'outlier', alt.value('grey'), )
).properties(
    width=700,
    height=500
)



# Configure the ticks
tick_axis = alt.Axis(labels=False, domain=False, ticks=False)

x_ticks = base.mark_tick().encode(
    alt.X('tsne1', ),
    alt.Y('outlier', title='', axis=tick_axis),
    color=alt.condition(brush, 'outlier', alt.value('lightgrey'))
).properties(
    width=700,
)


y_ticks = base.mark_tick().encode(
    alt.X('outlier', title='', axis=tick_axis),
    alt.Y('tsne2', ),
    color=alt.condition(brush, 'outlier', alt.value('lightgrey'))
).properties(
    height=500
)


# Build the chart
(y_ticks | (points & x_ticks)).properties(
    title= 'Multi-view for tSNE time series'
).configure_title(
    anchor= 'middle',
    fontSize=16  
)


### Further points

- Layout and interactions techniques comes in handy when the tsne values are considered individually but allows interactions with the original variables of the dataset
- Brushing through any layout shows a corresponding response on the remaining layouts

<font size="+2" color="grey"><b>4.2 Multi view 2 </b></font><br><a id="4.2"></a>
<a href="#top" class="btn-xs btn-danger" role="button" aria-pressed="true" style="color:white" data-toggle="popover">Go back to the TOP</a>

- Dimension: D = >2
- Data abstraction: 2 numeric, one temporal
- Encoding: position (x & y-axis), color, length.
- Interaction: brushing and linking
- Layout: Juxtaposition (vertical concatenation)

In [57]:
brush = alt.selection_interval(
    encodings=['x'],
    resolve='intersect'
)

base = alt.Chart(test1)

hist = base.mark_bar().encode(
    alt.X(alt.repeat('row'), type='quantitative', 
          bin=alt.Bin(maxbins=100, minstep=1),
          axis=alt.Axis(format="d"),
    ),
    alt.Y('count():Q', title=None),
    color=alt.Color('outlier'),
    tooltip= 'count()'
)

alt.layer(
    hist.add_selection(brush).encode(color=alt.value('lightgrey')),
    hist.transform_filter(brush)
).properties(
    width=900,
    height=100
).repeat(
    row=['tsne1', 'tsne2', 'timestamp'],
).transform_calculate(
    #delay='datum.delay < 180 ? datum.delay : 180', #clamp delays > 3 hours
    timestamp='hours(datum.timestamp) + minutes(datum.timestamp) / 60' #fractional hours
).configure_view(
    stroke='transparent' #no outline
).properties(
    title='Multi-view for tSNE time series',
).configure_title(
    anchor= 'middle',
    fontSize=16  
)



### Further points

- Multi-view composition can come in lots of variations. It combinations can allow a proper exploration of a users graphic perception abilities.
- In the above, each tsne values are vertical concantated and brushing and linking interaction helps communication across charts.



## Citation

If you found the examples in this notebook useful and you have used these alternatives in your research, please cite...