Example Notebook
============

<h2>1. Run LDA the first time</h2>

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import sys
basedir = '../'
sys.path.append(basedir)

from lda_for_fragments import Ms2Lda

In [2]:
fragment_filename = basedir + 'input/final/Beer_3_full1_5_2E5_pos_fragments.csv'
neutral_loss_filename = basedir + 'input/final/Beer_3_full1_5_2E5_pos_losses.csv'
mzdiff_filename = None
ms1_filename = basedir + 'input/final/Beer_3_full1_5_2E5_pos_ms1.csv'
ms2_filename = basedir + 'input/final/Beer_3_full1_5_2E5_pos_ms2.csv'
ms2lda = Ms2Lda.lcms_data_from_R(fragment_filename, neutral_loss_filename, mzdiff_filename, 
                             ms1_filename, ms2_filename)

Data shape (1588, 3171)


In [3]:
### all the parameters you need to specify to run LDA ###

n_topics = 300 # 300 - 400 topics from cross-validation
n_samples = 500 # 100 is probably okay for testing. For manuscript, use > 500-1000.
n_burn = 250 # if 0 then we only use the last sample
n_thin = 5 # every n-th sample to use for averaging after burn-in
alpha = 50.0/n_topics # hyper-parameter for document-topic distributions
beta = 0.1 # hyper-parameter for topic-word distributions

ms2lda.run_lda(n_topics, n_samples, n_burn, n_thin, alpha, beta)

Fitting model...
CGS LDA initialising
...............................................................................................................................................................
Using Numba for LDA sampling
Preparing words
Preparing Z matrix
DONE
Burn-in 1 
Burn-in 2 
Burn-in 3 
Burn-in 4 
Burn-in 5 
Burn-in 6 
Burn-in 7 
Burn-in 8 
Burn-in 9 
Burn-in 10 
Burn-in 11 
Burn-in 12 
Burn-in 13 
Burn-in 14 
Burn-in 15 
Burn-in 16 
Burn-in 17 
Burn-in 18 
Burn-in 19 
Burn-in 20 
Burn-in 21 
Burn-in 22 
Burn-in 23 
Burn-in 24 
Burn-in 25 
Burn-in 26 
Burn-in 27 
Burn-in 28 
Burn-in 29 
Burn-in 30 
Burn-in 31 
Burn-in 32 
Burn-in 33 
Burn-in 34 
Burn-in 35 
Burn-in 36 
Burn-in 37 
Burn-in 38 
Burn-in 39 
Burn-in 40 
Burn-in 41 
Burn-in 42 
Burn-in 43 
Burn-in 44 
Burn-in 45 
Burn-in 46 
Burn-in 47 
Burn-in 48 
Burn-in 49 
Burn-in 50 
Burn-in 51 
Burn-in 52 
Burn-in 53 
Burn-in 54 
Burn-in 55 
Burn-in 56 
Burn-in 57 
Burn-in 58 
Burn-in 59 
Burn-in 60 
Burn-in 61 
Burn-in 62

In [5]:
ms2lda.write_results('beer3_test_method3')
ms2lda.save_project('results/beer3pos.project')

Writing topics to results/beer3_test_method3/beer3_test_method3_topics.csv
Writing fragments x topics to results/beer3_test_method3/beer3_test_method3_all.csv
Writing topic docs to results/beer3_test_method3/beer3_test_method3_docs.csv
Project saved to results/beer3pos.project time taken = 21.3921849728


<h2>2. Resuming from Previous Run</h2>

If you did the save_project() above, you can resume from this step directly the next time you load the notebook ..

In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

import sys
basedir = '../'
sys.path.append(basedir)

from lda_for_fragments import Ms2Lda

In [2]:
ms2lda = Ms2Lda.resume_from('results/beer3pos.project')

Project loaded from results/beer3pos.project time taken = 17.6787488461
 - input_filenames = 
	../input/final/Beer_3_full1_5_2E5_pos_fragments.csv
	../input/final/Beer_3_full1_5_2E5_pos_losses.csv
	../input/final/Beer_3_full1_5_2E5_pos_ms1.csv
	../input/final/Beer_3_full1_5_2E5_pos_ms2.csv
 - df.shape = (1588, 3171)
 - K = 300
 - alpha = 0.166666666667
 - beta = 0.1
 - last_saved_timestamp = Thu Aug  6 16:13:04 2015


<h2>3. Visualisation</h2>

If the 'interactive' parameter below is True, we will show an interactive visualisation of the results in a separate tab. You need to interrupt the kernel to stop it once you're done with it (from the menu above, Kernel > Interrupt).

In [20]:
ms2lda.print_topic_words()

Topic 0: fragment_176.87617 (0.305917347625), fragment_119.04873 (0.236429857941), loss_54.01014 (0.0593820676499),
Topic 1: fragment_275.11062 (0.562559598374), fragment_159.02737 (0.190113652915),
Topic 2: fragment_119.04988 (0.159031248426), fragment_272.14799 (0.120978493019), fragment_240.12245 (0.106465355378), loss_96.04228 (0.0558866742727),
Topic 3: fragment_121.06488 (0.632489599641), fragment_103.05448 (0.112052512239), fragment_93.06981 (0.0846015130716),
Topic 4: fragment_130.05044 (0.871631350453),
Topic 5: fragment_53.00259 (0.49781597501), fragment_85.06476 (0.366667217937),
Topic 6: loss_143.05788 (0.664422815706), fragment_215.13975 (0.117796346517),
Topic 7: fragment_118.08616 (0.638593921988), fragment_132.11306 (0.0966651989898),
Topic 8: fragment_153.06589 (0.191235471369), loss_119.06984 (0.126144913845), fragment_143.01705 (0.105913741549), loss_115.02683 (0.0635469443177), fragment_366.08148 (0.0608730622739),
Topic 9: loss_53.04741 (0.299999048735), fragment_2

In [None]:
ms2lda.plot_lda_fragments(consistency=0.50, sort_by="h_index", interactive=True)
# ms2lda.plot_lda_fragments(consistency=0.50, sort_by="in_degree")