## High Dimensional Cytometry Data with Python
Methods employed in this notebook for exploratory analysis of high-dimensional single-cell data: 
- Nonlinear dimensionality reduction algorithm using t-SNE (actually using Barnes-Hut t-SNE as data set is quite large larger, could use t-SNE cascaded with PCA in order to reduce computation time)
- k-means clustering algorithm in hopes that cells with similar phenotypes will be grouped together
- Matplotlib for data visualisation of clusters + plotly for interactive 3d mappings of clusters

In [None]:
import matplotlib.pyplot as plt
import pandas as pd
%matplotlib inline

In [None]:
data = pd.read_csv("test.txt",sep='\t',header=(0))
#data from healthy bone marrow
visne_data = pd.read_csv("viSNE_Marrow1_nsub1000.txt",sep='\t',header=(0))

Got data from [here]("https://github.com/lmweber/Rtsne-example"), it's an example use case of the viSNE software. Using the same data and comparing outputted clusters.

In [64]:
from sklearn.cluster import KMeans
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

In [None]:
kmeans = KMeans(n_clusters=6)
tsn = TSNE(n_components=3)
visne_data.head()

In [None]:
kmeans.fit(data)

scaler = StandardScaler()
scaler.fit(data)
scaled_data = scaler.transform(data)

#x_pca = tsn.fit_transform(scaled_data)
#x_pca.shape

In [None]:
kmeans.cluster_centers_

In [None]:
f, (ax1, ax2) = plt.subplots(1, 2, sharey=True,figsize=(10,6))
ax1.set_title('K Means')
ax1.scatter(x_pca[:,0],x_pca[:,1],c=kmeans.labels_,cmap='rainbow')

In [None]:
from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
import plotly.plotly as py
import plotly.graph_objs as go
print(__version__) # requires version >= 1.9.0
import cufflinks as cf
init_notebook_mode(connected=True)
cf.go_offline()

In [63]:
trace1 = go.Scatter3d(
    x=x_pca[:,0],
    y=x_pca[:,1],
    z=x_pca[:,2],
    mode='markers',
    marker=dict(
        size=12,
        color=kmeans.labels_,                # set color to an array/list of desired values
        colorscale='Viridis',   # choose a colorscale
        opacity=0.8
    )
)

data = [trace1]
layout = go.Layout(
    margin=dict(
        l=0,
        r=0,
        b=0,
        t=0
    )
)
fig = go.Figure(data=data, layout=layout)
cf.iplot(fig, filename='3d-scatter-colorscale')