Example Notebook
============

<h2>1. Run LDA the first time</h2>

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import sys
basedir = '../'
sys.path.append(basedir)

from lda_for_fragments import Ms2Lda

In [None]:
fragment_filename = basedir + 'input/final/Beer_3_full1_5_2E5_pos_fragments.csv'
neutral_loss_filename = basedir + 'input/final/Beer_3_full1_5_2E5_pos_losses.csv'
mzdiff_filename = None
ms1_filename = basedir + 'input/final/Beer_3_full1_5_2E5_pos_ms1.csv'
ms2_filename = basedir + 'input/final/Beer_3_full1_5_2E5_pos_ms2.csv'
ms2lda = Ms2Lda.lcms_data_from_R(fragment_filename, neutral_loss_filename, mzdiff_filename, 
                             ms1_filename, ms2_filename)

In [None]:
### all the parameters you need to specify to run LDA ###

n_topics = 300 # 300 - 400 topics from cross-validation
n_samples = 500 # 100 is probably okay for testing. For manuscript, use > 500-1000.
n_burn = 250 # if 0 then we only use the last sample
n_thin = 5 # every n-th sample to use for averaging after burn-in
alpha = 50.0/n_topics # hyper-parameter for document-topic distributions
beta = 0.1 # hyper-parameter for topic-word distributions

ms2lda.run_lda(n_topics, n_samples, n_burn, n_thin, alpha, beta)

In [None]:
ms2lda.write_results('beer3_test_method3')
ms2lda.save_project('results/beer3pos.project')

<h2>2. Resuming from Previous Run</h2>

If you did the save_project() above, you can resume from this step directly the next time you load the notebook ..

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import sys
basedir = '../'
sys.path.append(basedir)

from lda_for_fragments import Ms2Lda

In [None]:
ms2lda = Ms2Lda.resume_from('results/beer3pos.project')

<h2>3. Results</h2>

We need to threshold the document-topic and topic-word distributions produced by LDA, so we can say which topics are used in which documents, and which words 'belongs' to a topic. 

In [None]:
# Fixed thresholding of 0.05 for the doc_topic and topic_word matrices
# NOTE: this is what we used before ..
ms2lda.do_thresholding(th_doc_topic=0.05, th_topic_word=0.05)

# Doc_topic matrix is thresholded at 0.05
# Topic_word matrix is thresholded by the smallest value in each row
# ms2lda.do_thresholding(th_doc_topic=0.05, th_topic_word=0.0)

# Both matrices are thresholded by the smallest value in each row 
# Seems a bit difficult to visualise the results effectively due to the very high number of MS1 peaks per topic?
# ms2lda.do_thresholding(th_doc_topic=0.0, th_topic_word=0.0)

Print the words in each topic.

In [None]:
ms2lda.print_topic_words()

Save the output CSV files

In [None]:
ms2lda.write_results('beer3_test_method3')

Set into the list below the MS1 peaks that you want to color differently in the graph page. You can see the names from the label of the nodes in the graph page or also from the CSV matrices written above. Also, in the graph page, you can press the keyboard shortcuts 'C', 'S' and 'T' to hide all circles (topics), squares (documents) and triangle (special documents).

In [None]:
special_peaks = [
    'doc_372.18877_540.996',
    'doc_291.66504_547_239',
    'doc_308.17029_289.13'
]

If the 'interactive' parameter below is True, we will show an interactive visualisation of the results in a separate tab. You need to interrupt the kernel to stop it once you're done with it (from the menu above, Kernel > Interrupt).

In [43]:
ms2lda.plot_lda_fragments(consistency=0.50, sort_by="h_index", interactive=True, to_highlight=special_peaks)
# ms2lda.plot_lda_fragments(consistency=0.50, sort_by="in_degree")


75, topic_251 degree=6 added
76, topic_252 degree=8 added
77, topic_253 degree=10 added
78, topic_254 degree=4 added
79, topic_255 degree=9 added
82, topic_258 degree=8 added
83, topic_259 degree=8 added
84, topic_8 degree=8 added
136, topic_237 degree=6 added
137, topic_36 degree=16 added
138, topic_68 degree=19 added
139, topic_69 degree=5 added
141, topic_62 degree=11 added
143, topic_60 degree=15 added
144, topic_61 degree=13 added
145, topic_66 degree=5 added
146, topic_67 degree=7 added
148, topic_65 degree=36 added
149, topic_16 degree=10 added
156, topic_231 degree=9 added
170, topic_197 degree=15 added
214, topic_225 degree=4 added
215, topic_224 degree=11 added
217, topic_226 degree=20 added
218, topic_221 degree=7 added
220, topic_223 degree=8 added
221, topic_222 degree=8 added
223, topic_229 degree=7 added
224, topic_228 degree=9 added
225, topic_74 degree=11 added
232, topic_139 degree=9 added
233, topic_138 degree=2 added
236, topic_131 degree=4 added
237, topic_130 deg

127.0.0.1 - - [13/Aug/2015 22:07:44] "GET /graph.json HTTP/1.1" 200 -
