Example Notebook 2: Loading and Visualization of MS2LDA Analysis
==============================================

This notebook demonstrates how to load an existing MS2LDA analysis (see example_notebook_1) containing discovered LDA topics, Mass2Motifs, alongside the list of MS1 and MS2 peaks that have been putatively annotated with their elemental formulae. Subsequently, it is shown how the MS2LDA project file can be examined by displaying the Mass2Motif contents or loading it into MS2LDAvis, the visualization module.

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import numpy as np
import pandas as pd
from IPython.display import display

In [None]:
import os
import sys
basedir = '../MS2LDA/python'
sys.path.append(basedir)

from lda_for_fragments import Ms2Lda

If there is any error above, please ensure that the basedir correctly points to the location of the MS2LDA Python codes.

<h2>1. Loading Existing MS2LDA</h2>

Here it is shown how to load an existing MS2LDA project for Beer3 positive ionization mode used for analysis in the paper.

In [None]:
ms2lda = Ms2Lda.resume_from('projects/Manuscript_Beer3POSmode_EFassigner_ALLextended.project')

In [None]:
# display(ms2lda.ms1)

In [None]:
# display(ms2lda.ms2)

<h2>2. Thresholding</h2>

In order to study the MS2LDA project in as sensible manner, the document-topic and topic-word distributions produced by LDA has to be thresholded. The current thresholds on the distributions are used to produce the results described in the manuscript. It is expected that they form a good starting point for other analyses as well.

In [None]:
ms2lda.do_thresholding(th_doc_topic=0.05, th_topic_word=0.01)

From this point onwards, we will refer to an LDA topic as a **Mass2Motif** (or shortnames M2M or motif) when interpreting the results.

<h2>3. Print Results</h2>

Print in the notebook which fragment/loss features occur with probability above the thresholds defined above in each Mass2Motif.

In [None]:
ms2lda.print_motif_features()

It is also possible to save the output to CSV files.

In [None]:
ms2lda.write_results('beer3pos_csv_out')

<h2>4. Visualisation</h2>

A visualisation module, MS2LDAvis, can be used to explore the results in an interactive manner. Please note that the MS2LDA project has to be thresholded to allow for visualzation. Nodes can be coloured as explained in the cells below. Furthermore, annotations as stored in csv files can be loaded into the visualization module. The visualization module will be loaded in a browser and can be stopped by closing the tab and subsequently interrupt the kernel using the 'Kernel tab' on top of the notebook.

In [None]:
# If True, an interactive visualisation is shown in a separate tab. 
# You need to interrupt the kernel to stop it once you're done with it 
# (from the menu above, Kernel > Interrupt).
interactive=True

In [None]:
# Used for graph visualisation in the interactive mode only. 
# Specifies the 'special' nodes to be coloured differently.
special_nodes = [
    # you can colour the MS1 peak in the graph
    # 'doc_peakid', where peakid is the peak ID of the MS1 peak    
    # ('doc_21758', 'gold'),
    # you can also colour the Mass2Motif in the graph
    # ('motif_0', '#ff1493')
    'None'
]

# If nothing ..
# special_nodes = None

In [None]:
# read the annotation assigned to each Mass2Motif from a CSV file
import csv
motif_annotation = {}
for item in csv.reader(open("annotations/beer3pos_annotation_Nov2015.csv"), skipinitialspace=True):
    key = int(item[0])
    val = item[1]
    print str(key) + '\t' + val
    motif_annotation[key] = val

# here we set all the motifs having annotations as special nodes too
motif_colour = '#CC0000'
to_add_list = ['motif_' + str(x) for x in motif_annotation.keys()]
for item in to_add_list:
    special_nodes.append((item, motif_colour))

# If nothing ..
# motif_annotation = {} # or just leave the 'additional_info' parameter out when calling plot_lda_fragments below

<h4>Run Visualisation</h4>

In [None]:
ms2lda.plot_lda_fragments(interactive=interactive, to_highlight=special_nodes, 
                          additional_info=motif_annotation)