So far we ran:

* `1 Experimental data prep using rinfino.ipynb`
* `1.1 combine Cell Features.ipynb`

# Chunk up our data

In [1]:
!chunker -h

usage: chunker [-h] --data DATA [--output_file OUTPUT_FILE] --pagesize
               PAGESIZE [--min_chunk_size MIN_CHUNK_SIZE]

optional arguments:
  -h, --help            show this help message and exit
  --data DATA           path to test data (tab delimited genes x samples)
  --output_file OUTPUT_FILE
                        output file prefix (optional)
  --pagesize PAGESIZE   page size (default 10)
  --min_chunk_size MIN_CHUNK_SIZE
                        minimum number of samples in a chunk (default 3),
                        achieved via rebalancing (set to 1 to disable)


In [2]:
!ls ../data/

all_cohorts.cibersort_results.tsv
bladder.tpm.tsv
cohort_newbladder.cibersort.input.classes.datatype_est_counts.txt
provenance.md
rcctils.cellfeatures.tsv
rcctils.celltypes.tsv
singleorigin.cellfeatures.tsv
tcgakirc.tpm.tsv


In [4]:
!ls out

experiment_bladder.test.expression.tsv
experiment_bladder.training.expression.tsv
experiment_bladder.training.xdata.tsv
experiment_rcc.test.chunked.chunk-1-of-2.tsv
experiment_rcc.test.chunked.chunk-2-of-2.tsv
experiment_rcc.test.expression.tsv
experiment_rcc.training.expression.tsv
experiment_rcc.training.xdata.tsv
singleorigin_plus_rcctils.combined.cellfeatures.tsv


In [2]:
!chunker \
--data out/experiment_rcc.test.expression.tsv \
--output_file out/experiment_rcc.test.chunked \
--pagesize 40 --min_chunk_size 10;

Output:
out/experiment_rcc.test.chunked.chunk-1-of-2.tsv  -- 40 samples (X953473f4.9927.4fd4.bfee.8f5908638cc7, ..., a0f62f75.1a15.413e.9960.3cfc09f52b9e)
out/experiment_rcc.test.chunked.chunk-2-of-2.tsv  -- 39 samples (X116f7723.de02.4a81.bec1.cc068019ff38, ..., X0153168a.39ce.47d6.b7f5.ac1e0f312245)


# Compile our Stan models

Create executable version of `modelname.stan`: `make -C $HOME/cmdstan $(pwd)/modelname`. (Note you leave off the `.stan` suffix.)

This creates `modelname` executable. We may need to clear the compiled files out manually to force a rerun of `make`: `rm modelname.hpp modelname`

In [15]:
!ls ../models/*.stan

../models/model6.4_negbinom_matrix_correlation_features_oos_optim_otherbucket.stan
../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance.stan


In [16]:
!make -C $HOME/cmdstan $(pwd)/../models/model6.4_negbinom_matrix_correlation_features_oos_optim_otherbucket

make: Entering directory '/home/jovyan/cmdstan'

--- Translating Stan model to C++ code ---
bin/stanc  /home/jovyan/work/infino-private-2/notebooks/../models/model6.4_negbinom_matrix_correlation_features_oos_optim_otherbucket.stan --o=/home/jovyan/work/infino-private-2/notebooks/../models/model6.4_negbinom_matrix_correlation_features_oos_optim_otherbucket.hpp
Model name=model6_4_negbinom_matrix_correlation_features_oos_optim_otherbucket_model
Input file=/home/jovyan/work/infino-private-2/notebooks/../models/model6.4_negbinom_matrix_correlation_features_oos_optim_otherbucket.stan
Output file=/home/jovyan/work/infino-private-2/notebooks/../models/model6.4_negbinom_matrix_correlation_features_oos_optim_otherbucket.hpp

--- Linking C++ model ---
g++ -I src -I stan/src -isystem stan/lib/stan_math/ -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.62.0 -isystem stan/lib/stan_math/lib/cvodes_2.9.0/include -Wall -DEIGEN_NO_DEBUG  -DBOOST_RESULT_OF_USE_TR1 -DBO

In [17]:
!ls ../models/model6.4*

../models/model6.4_negbinom_matrix_correlation_features_oos_optim_otherbucket
../models/model6.4_negbinom_matrix_correlation_features_oos_optim_otherbucket.hpp
../models/model6.4_negbinom_matrix_correlation_features_oos_optim_otherbucket.stan


In [18]:
!make -C $HOME/cmdstan $(pwd)/../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance

make: Entering directory '/home/jovyan/cmdstan'

--- Translating Stan model to C++ code ---
bin/stanc  /home/jovyan/work/infino-private-2/notebooks/../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance.stan --o=/home/jovyan/work/infino-private-2/notebooks/../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance.hpp
Model name=model6_5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance_model
Input file=/home/jovyan/work/infino-private-2/notebooks/../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance.stan
Output file=/home/jovyan/work/infino-private-2/notebooks/../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance.hpp

--- Linking C++ model ---
g++ -I src -I stan/src -isystem stan/lib/stan_math/ -isystem stan/lib/stan_math/lib/eigen_3.3.3 -isystem stan/lib/stan_math/lib/boost_1.62.0 -isystem stan/lib/stan_math/lib/

In [19]:
!ls ../models/model6.5*

../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance
../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance.hpp
../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance.stan


In [20]:
!../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance

Usage: ../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance <arg1> <subarg1_1> ... <subarg1_m> ... <arg_n> <subarg_n_1> ... <subarg_n_m>

Begin by selecting amongst the following inference methods and diagnostics,
  sample      Bayesian inference with Markov Chain Monte Carlo
  optimize    Point estimation
  variational  Variational inference
  diagnose    Model diagnostics

Or see help information with
  help        Prints help
  help-all    Prints entire argument tree

Additional configuration available by specifying
  id          Unique process identifier
  data        Input data options
  init        Initialization method: "x" initializes randomly between [-x, x], "0" initializes to 0, anything else identifies a file of values
  random      Random number configuration
  output      File output options

See ../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance <arg1> [ help | help-all ]

# Execute model

In [11]:
!execute-model -h

usage: execute-model [-h] --train_samples TRAIN_SAMPLES --train_xdata
                     TRAIN_XDATA --train_cellfeatures TRAIN_CELLFEATURES
                     --test_samples TEST_SAMPLES [--n_chains N_CHAINS]
                     --output_name OUTPUT_NAME --model_executable
                     MODEL_EXECUTABLE [--dry_run] [--num_samples NUM_SAMPLES]
                     [--num_warmup NUM_WARMUP]

optional arguments:
  -h, --help            show this help message and exit
  --train_samples TRAIN_SAMPLES
                        training matrix filename
  --train_xdata TRAIN_XDATA
                        map from training samples to subsets (xdata design
                        matrix
  --train_cellfeatures TRAIN_CELLFEATURES
                        cell features matrix
  --test_samples TEST_SAMPLES
                        test matrix filename (to be deconvolved)
  --n_chains N_CHAINS   number of MCMC chains
  --output_name OUTPUT_NAME
                        pre

In [19]:
!ls out

experiment_bladder.test.expression.tsv
experiment_bladder.training.expression.tsv
experiment_bladder.training.xdata.tsv
experiment_rcc.test.chunked.chunk-1-of-2.tsv
experiment_rcc.test.chunked.chunk-2-of-2.tsv
experiment_rcc.test.expression.tsv
experiment_rcc.training.expression.tsv
experiment_rcc.training.xdata.tsv
singleorigin_plus_rcctils.combined.cellfeatures.tsv


In [1]:
!execute-model \
--train_samples out/experiment_rcc.training.expression.tsv \
--train_xdata out/experiment_rcc.training.xdata.tsv \
--train_cellfeatures out/singleorigin_plus_rcctils.combined.cellfeatures.tsv \
--test_samples out/experiment_rcc.test.chunked.chunk-1-of-2.tsv \
--n_chains 4 \
--output_name out/experiment_rcc.test.chunked.chunk-1-of-2 \
--model_executable ../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance \
--dry_run;

Traceback (most recent call last):
  File "/opt/conda/bin/execute-model", line 11, in <module>
    load_entry_point('infino', 'console_scripts', 'execute-model')()
  File "/home/jovyan/pyinfino/infino/execute_model.py", line 178, in main
    assert len(set(train_df.index.values).symmetric_difference(set(test_df.index.values))) == 0
AssertionError


Above error means we were regenerating the data but not rechunking it. So different random genes got captured. Oops!

In [21]:
!execute-model \
--train_samples out/experiment_rcc.training.expression.tsv \
--train_xdata out/experiment_rcc.training.xdata.tsv \
--train_cellfeatures out/singleorigin_plus_rcctils.combined.cellfeatures.tsv \
--test_samples out/experiment_rcc.test.chunked.chunk-1-of-2.tsv \
--n_chains 4 \
--output_name out/experiment_rcc.test.chunked.chunk-1-of-2.model6_5 \
--model_executable ../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance \
--dry_run;

Launching chain:  1
Launching chain:  2
Launching chain:  3
Launching chain:  4
[Chain 1] ('../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance method=sample num_samples=1000 num_warmup=1000 save_warmup=0 thin=1 random seed=1973908125 id=1 data file=out/experiment_rcc.test.chunked.chunk-1-of-2.model6_5.standata.Rdump output file=out/experiment_rcc.test.chunked.chunk-1-of-2.model6_5.samples.1.csv refresh=25\n', '')
[Chain 2] ('../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance method=sample num_samples=1000 num_warmup=1000 save_warmup=0 thin=1 random seed=607365375 id=2 data file=out/experiment_rcc.test.chunked.chunk-1-of-2.model6_5.standata.Rdump output file=out/experiment_rcc.test.chunked.chunk-1-of-2.model6_5.samples.2.csv refresh=25\n', '')
[Chain 3] ('../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance method=sample num_samples=1000 num_warmup=1000 save_

Don't do a dry-run with echos, but do a manual dry-run with two iterations:

In [22]:
!execute-model \
--train_samples out/experiment_rcc.training.expression.tsv \
--train_xdata out/experiment_rcc.training.xdata.tsv \
--train_cellfeatures out/singleorigin_plus_rcctils.combined.cellfeatures.tsv \
--test_samples out/experiment_rcc.test.chunked.chunk-1-of-2.tsv \
--n_chains 4 \
--output_name out/experiment_rcc.test.chunked.chunk-1-of-2.model6_5 \
--model_executable ../models/model6.5_negbinom_matrix_correlation_features_oos_optim_otherbucket_samplevariance \
--num_samples 2 \
--num_warmup 0;

Launching chain:  1
Launching chain:  2
Launching chain:  3
Launching chain:  4
[Chain 1] ('method = sample (Default)\n  sample\n    num_samples = 2\n    num_warmup = 0\n    save_warmup = 0 (Default)\n    thin = 1 (Default)\n    adapt\n      engaged = 1 (Default)\n      gamma = 0.050000000000000003 (Default)\n      delta = 0.80000000000000004 (Default)\n      kappa = 0.75 (Default)\n      t0 = 10 (Default)\n      init_buffer = 75 (Default)\n      term_buffer = 50 (Default)\n      window = 25 (Default)\n    algorithm = hmc (Default)\n      hmc\n        engine = nuts (Default)\n          nuts\n            max_depth = 10 (Default)\n        metric = diag_e (Default)\n        stepsize = 1 (Default)\n        stepsize_jitter = 0 (Default)\nid = 1\ndata\n  file = out/experiment_rcc.test.chunked.chunk-1-of-2.model6_5.standata.Rdump\ninit = 2 (Default)\nrandom\n  seed = 1706812208\noutput\n  file = out/experiment_rcc.test.chunked.chunk-1-of-2.model6_5.samples.1.csv\n  diagnostic_file =  (Default