-
Notifications
You must be signed in to change notification settings - Fork 0
1. Usage Overview
From the bin directory, run ./RealTrace [-options] with the following options:
-h, --help this help message
-i, --infile (required) input data file
-b, --parameter_bounds (required) file(s) setting the type, step, bounds of the parameters
-c, --csv_config file that sets the columns that will be used from the input file
-o, --outdir specify output direction and do not use default
-t, --tolerance_maximization absolute tolerance of maximization between optimization steps, default: 1e-10
-r, --rel_tolerance_joints relative tolerance of joint calculation: default 1e-10
-n, --n_samples number of samples: default 10
-space, --search_space search parameter space in {'log'|'linear'} space, default: 'log'
-noise, --noise_model measurement noise of fp content {'scaled'|'const'} default: 'scaled'
-div, --cell_division_model cell divison model {'binomial'|'gauss'} default: 'binomial'
-m, --maximize run maximization
-p, --predict run prediction
-s, --samples sample trajectories
-j, --joints run calculation of joint probabilities
Example: ./RealTrace -c csv_config.txt -b parameters.txt -i data/example.csv -o out/ -l 1 -t 1e-10 -m -p
-
infilesets the input file that contains the data, e.g., as provided by MOMA. -
parameter_boundssets the file that defines the parameter space.
- Likelihood maximization:
-m, --maximizemore details - Prediction:
-p, --predictmore details - Sample trajectories:
-s, --samplesmore details - Run joints calculation:
-j, --jointsmore details
The input file is assumed to fulfill the following:
- The data points of a cell appear as consecutive rows and are in the correct order with respect to time.
- The data set has to include all columns that are set via the
csv_configfile, i.e.,time_col,length_col,fp_col. - The cells can be uniquely identified via the tags provided via
parent_tagsandcell_tagsand each mother cell has at most 2 daughter cells. If that is not the case, theparent_tagsandcell_tagsare not sufficient and a warning will be printed. - To estimate the initial covariance matrix, the data set needs to contain at least (!) 2 cells.
- An optional column may be added for the usage of segments, see below for more information.
How the different parameters are treated during the likelihood maximization is defined by the following syntax:
- free_parameter = init, step
- bound_parameter = init, step, lower, upper
- fixed_parameter = init
An example file can look like this:
mean_lambda = 0.01, 1e-3
gamma_lambda = 0.01, 1e-3, 1e-4, 0.05
var_lambda = 1e-07
mean_q = 10, 1e-1
gamma_q = 0.01, 1e-3, 1e-4, 0.05
var_q = 1, 1e-2
beta = 5e-2
var_x = 1e-3, 1e-5
var_g = 1, 1e-3
var_dx = 1e-4, 1e-5
var_dg = 1, 1e-2
ALL parameters are restricted to positive numbers by default, avoiding unphysical/meaningless parameter ranges. However, this can be overwritten by setting bounds
During the maximization, the step will be the initial step size. From nlopt doc: "For derivative-free local-optimization algorithms, the optimizer must somehow decide on some initial step size to perturb x by when it begins the optimization. This step size should be big enough that the value of the objective changes significantly, but not too big if you want to find the local optimum nearest to x."
Model parameters: The 2 OU processes are described with a mean value (thus the mean growth/production rate), a gamma parameter determining how fast the process is driven towards its mean after a deviation, and a characteristic kick size (parameterized by the square kick sizes var_lambda and var_q) that scales the noise term.
- Growth rate fluctuations parameters:
- mean_lambda
- gamma_lambda
- var_lambda
- Fluorescence production fluctuation parameters:
- mean_q
- gamma_q
- var_q
- Bleaching rate:
- beta
- Cell division noise parameters:
- var_dx
- var_dg
- Measurement noise parameters:
- var_x
- var_g
(Defaults are in brackets.)
-
csv_configsets the file that contains information on which columns will be used from the input file (see 2.3.1) -
tolerance_maximization (1e-10)sets the stopping criterion by setting the tolerance of maximization: Stop when an optimization step changes the function value by less than tolerance. By setting very low tolerances, one might encounter rounding issues; in that case, the last valid step is taken, and a warning is printed to stderr. -
rel_tolerance_joints (1e-10)sets the stopping criterion for the joint calculation. The calculation is stopped when the cross-covariances between the two time points are smaller than the product of the corresponding means times the set tolerance. $\frac{\text{Cov}(z_{n+m}, z_n){i,j}}{ \langle z{n+m}\rangle_i \langle z_n\rangle_j} < \text{tolerance }$ -
outdiroverwrites the default output directory, which is (given the infiledir/example.csv/)dir/example_out/ -
search_space (log)sets the search space of the parameters to be either in log space or linear space. The parameter file does not need to be changed as everything is done internally. -
noise_model (scaled)defines how the measurement noise depends on the content of fluorescence proteins.constmeans that the measurement is constant with a variancevar_g.scaledmeans the variance of the measurement scales linearly with the fluorescence protein content. In this case,var_gis the prefactor of the scaling. -
cell_division (binomial)defines the model for cell division.binomialsplits the FP content according to the cell sizes of the daughter cells and binomial sampling. In this case, the parametervar_dgis the conversion factor between the FP input and the physical number of independent molecules that can be distributed across cells.gaussrefers to a model where the FP contents of the daughter cells are drawn from a Gaussian with variancevar_dgcentered around half of the mother cell FP content
Example:
time_col = time_min
rescale_time = 60
length_col = length_um
fp_col = GFP
cell_tags = date, cell_id
parent_tags = date, parent_id
The following settings define how the input file will be interpreted. (Defaults are in brackets.)
-
time_col (time): column from which the time is read -
rescale_time (1): the factor by which time will be divided at the start, thus changing the time unit (e.g.rescale_time=60may change the time unit from sec to min) -
length_col (length): column from which the length of the cell is read -
length_islog (false): indicates if the cell length in the data file is in logscale (true) or not (false) -
fp_col (gfp): column from which the fluorescence protein content is read -
delm (,): delimiter between columns, probably ',' or ';' -
segment_col (): column from which the segment index is read. Not settingsegment_colin the file indicates that segment indices will not be used -
filter_col (): column from which the filter will be read. To include a data point, set the entry in this column toTrue,true,TRUEor1and to EXclude a data point, set the entry in this column toFalse,false,FALSEor0. Not settingfilter_colin the file indicates that the input file will not be filtered -
cell_tags (cell_id): columns that will make up the unique cell id, separated by ',' -
parent_tags (parent_id): columns that will make up the unique cell id of the parent cell ','