Skip to content

Latest commit

 

History

History
133 lines (108 loc) · 4.09 KB

README.md

File metadata and controls

133 lines (108 loc) · 4.09 KB

MaLTE

Machine Learning of Transcript Expression

TODO:

  • in per-gene-tuning, add tuning parameters to results in TT.Seq.Gene objects
  • leave-N-out cross-validation
  • direct comparison to median-polish et al. (DONE)
  • add ComBat stuff (NOPE)
  • make prepare_data parallel (DONE)

News

0.3-3:

  • Added quantile normalisation argument (based on limma R package) to the get.predictions() function together with corresponding documentation

0.3-2: 23-12-2013

  • Added filtering by Spearman correlation in 'oob.filter()' and added required documentation

0.3-1:

  • Added filtering by FPKM and count for comparing to med-pol and PLIER compare.correlations(...)
  • Added filtering by FPKM and count for comparing to med-pol and PLIER within_correlations(...)
  • Made within_correlations(...) much faster by subbing the for{...} with mclapply(...)

0.3-0: FINAL (merged 'princomps' branch)

0.3-0: EXPERIMENTAL (forked as 'princomps' branch)

  • Augmented samples.txt to now include principal components

0.2-5: FINAL (merged 'quantreg' to 'master')

0.2-5: EXPERIMENTAL (forked as 'quantreg' branch)

  • Incorporated quantregForest for interval estimation of predictions
  • Incorporated gene-specific tuning; modified TT.Params to have a slot for the tuned OOB Pearson correlation for comparison (without the p-value)
  • Fixed a bug that affected OOB filtering
  • Delete models immediately after tuning: does this save memory? (Naa...)
  • Added accessor functions to get lower and upper prediction intervals when using quantreg=TRUE

0.2-4:

  • OOB filter for genes now takes into account significant p-values (<=0.05)

0.2-3:

  • Bug: '*NA' in samples.txt fails (FIXED)

0.2-2: 10-10-2013

  • Bug: avoid computing correlations when the number of test samples <=2

0.2-1: 10-09-2013

  • Added a utility function to estimate within-sample correlations + documentation

0.2: 08-09-2013

  • Added TT.Seq.[Gene/Tx] methods: 'cor.P()' and 'cor.S()' to get Pearson and Spearman correlations with RNA-Seq expression
  • Added utility functions: 'cors.P()' and 'cors.S()' to collate vectors of correlation
  • Added utility funtions to compare MaLTE with microarray summarisation results: 'compare.correlations()'
  • Added necessary documentation

0.1-8: 03-09-2013

  • Experimental: modified 'get.predictions()' to work on non-filtered tt.seq lists
  • Experimental: added a 'get.trues()' function that can work on non-filtered tt.seq.lists

0.1-7: 22-08-2013

  • made 'get.predictions()' much faster using unlist( list, F, F ) and *apply functions instead of a for loop

17-08-2013

  • fixed a bug that prevented 'get.predictions()' for transcript isoform expression with more than one isoform filtered (DONE)

0.1-6: 15-08-2013

  • refactored 'prepare_txs_data.py'
  • fixed a bug in the OOB filter function that prevented filtering of genes with single transcripts that didn't pass OOB
  • added a feature to include transcript correlations in the rare event that both RNA-Seq and array data is available for test samples

16-08-2013

  • fixed the above bug on filtering transcripts

0.1-5: 14-08-2013

  • refactored 'prepare_data.py'
  • corrected stats produced by 'prepare_data.py'

0.1-4: 13-08-2013

  • included parallel execution to create resource files used to build training and test data

0.1-3: 05-08-2013

  • major fixes to documentation except documenting datasets

Checklist:

  • 80 characters

  • comments

  • replace 'NA' by NULL appropriately

  • make sure user cannot enter samples.txt with samples that don't exist

  • make sure that the return status of the python script are appropriately set under all conditions of either success or failure

  • a data package?

  • what if you try to filter and the other is not oob? (DONE)

  • consistency in show() methods for all classes (DONE)

  • handle warnings to do with correlation computations e.g. run( tt.ready.txs[[10]], tt.params )

  • add keywords in the documentation (DONE)

  • replace 'T' with 'TRUE' (DONE)

  • Later: provide a graphical description of which the best-performing probes are (I'm thinking of integrating the gene annotation model information together with a color-key-map that shows the reliablity of a probe in say a particular tissue)