-
Notifications
You must be signed in to change notification settings - Fork 0
Home
From the bin directory, run ./RealTrace [-options] with the following options:
-h, --help this help message
-i, --infile (required) input data file
-b, --parameter_bounds (required) file(s) setting the type, step, bounds of the parameters
-c, --csv_config file that sets the columns that will be used from the input file
-o, --outdir specify output direction and do not use default
-t, --tolerance_maximization absolute tolerance of maximization between optimization steps, default: 1e-10
-r, --rel_tolerance_joints relative tolerance of joint calculation: default 1e-10
-space, --search_space search parameter space in {'log'|'linear'} space, default: 'log'
-noise, --noise_model measurement noise of fp content {'const'|'scaled'} default: 'const'
-div, --cell_division cell division model {'gauss'|'binomial'} default: 'gauss'
-m, --maximize run maximization
-p, --predict run prediction
-j, --joints run calculation of joint probabilities
Example: ./RealTrace -c csv_config.txt -b parameters.txt -i data/example.csv -o out/ -l 1 -t 1e-10 -m -p
-
infilesets the input file that contains the data, eg as provided by MOMA (see 2.1.1) -
parameter_boundssets the file that defines the parameter space (see 2.1.2)
The input file is assumed to fulfill the following:
- the data points of a cell appear as consecutive rows and are in the correct order with respect to time.
- The data set has to include all columns that are set via the
csv_configfile, i.e.time_col,length_col,fp_col. - The cells can be uniquely identified via the tags provided via
parent_tagsandcell_tagsand each mother cell has at most 2 daughter cells. If that is not the case, theparent_tagsandcell_tagsare not sufficient and a warning will be printed. - In order to estimate the initial covariance matrix, the data set needs to contain at least (!) 2 cells.
- An optional column may be added for the usage of segments, see below for more information.
How the different parameters are treated during the likelihood maximization is defined by the following syntax:
- free_parameter = init, step
- bound_parameter = init, step, lower, upper
- fixed_parameter = init
An example file can look like this:
mean_lambda = 0.01, 1e-3
gamma_lambda = 0.01, 1e-3, 1e-4, 0.05
var_lambda = 1e-07
mean_q = 10, 1e-1
gamma_q = 0.01, 1e-3, 1e-4, 0.05
var_q = 1, 1e-2
beta = 5e-2
var_x = 1e-3, 1e-5
var_g = 1, 1e-3
var_dx = 1e-4, 1e-5
var_dg = 1, 1e-2
ALL parameters are restricted to positive numbers by default avoiding unphysical/meaningless parameter ranges. However, this can be overwritten by setting bounds
During the maximization, the step will be the initial step size. From nlopt doc: "For derivative-free local-optimization algorithms, the optimizer must somehow decide on some initial step size to perturb x by when it begins the optimization. This step size should be big enough that the value of the objective changes significantly, but not too big if you want to find the local optimum nearest to x."
To analyze data sets that contain data points that need be fitted by a different set of underlying parameters, segment indices can be used. For that, a segment_col in the csv_config file can be specified. This column should contain the segment index specifying for each data point to which segment it belongs. The segment indices are required to be consecutive and start at index 0.
The likelihood maximization that determines the parameter estimates is run independently for each segment. That means there is no difference between running different segments in separate runs or as part of the same data set. The same behavior is used for 1d scans. However, the predictions as well as the calculation of the joint probabilities that are used for the correlation functions are calculated by iterating through the entire data set. For that, the following scheme is used
Note, that the prior calculation to go from time points 2 to 3 and vice versa both take the parameters of the 0th segment.
For each segment in the data set one parameter file is required submitted in the order of the segment indices. For example:
./RealTrace -b parametersA.txt parametersB.txt ...
will use the parameters in the file parametersA.txt for the segment with index 0 and the parameters in the file parametersB.txt for the segment with index 1, etc...
(Defaults are in brackets.)
-
csv_configsets the file that contains information on which columns will be used from the input file (see 2.3.1) -
tolerance_maximization (1e-10)sets the stopping criterion by setting the tolerance of maximization: Stop when an optimization step changes the function value by less than tolerance. By setting very low tolerances one might encounter rounding issues, in that case, the last valid step is taken and a warning is printed to stderr. -
rel_tolerance_joints (1e-10)sets the stopping criterium for the joint calculation. The calculation is stopped when the cross covariances between the two time points are smaller than the product of the corresponding means times the set tolerance. $\frac{\text{Cov}(z_{n+m}, z_n){i,j}}{ \langle z{n+m}\rangle_i \langle z_n\rangle_j} < \text{tolerance }$ -
outdiroverwrites the default output directory, which is (given the infiledir/example.csv/)dir/example_out/ -
search_space (log)sets the search space of the parameters to be either in log space or linear space. The parameter file does not need to be changed as everything is done internally. -
noise_model (scaled)defines how the measurement noise depends on the content of fluorescence proteins.constmeans that the measurement is constant with a variancevar_g.scaledmeans the variance of the measurement scales linearly with the fluorescence protein content. In this casevar_gis the prefactor of the scaling. -
cell_division (binomial)defines the model for cell division.binomialsplits the FP content according to the cell sizes of the daughter cells and binomial sampling. In this case, the parametervar_dgis the conversion factor between the FP input and the physical number of independent molecules that can be distributed across cells.gaussrefers to a model where the FP contents of the daughter cells are drawn from a gaussian with variancevar_dgcentered around half of the mother cell FP content
Example:
time_col = time_min
rescale_time = 60
length_col = length_um
fp_col = GFP
cell_tags = date, cell_id
parent_tags = date, parent_id
The following settings define how the input file will be interpreted. (Defaults are in brackets.)
-
time_col (time): column from which the time is read -
rescale_time (1): the factor by which time will be divided at the start, thus changing the time unit (e.g.rescale_time=60may change the time unit from sec to min) -
length_col (length): column from which the length of the cell is read -
length_islog (false): indicates if the cell length in the data file is in logscale (true) or not (false) -
fp_col (gfp): column from which the fluorescence protein content is read -
delm (,): delimiter between columns, probably ',' or ';' -
segment_col (): column from which the segment index is read. Not settingsegment_colin the file indicates that segment indices will not be used -
filter_col (): column from which the filter will be read. To include a data point, set the entry in this column toTrue,true,TRUEor1and to EXclude a data point, set the entry in this column toFalse,false,FALSEor0. Not settingfilter_colin the file indicates that the input file will not be filtered -
cell_tags (cell_id): columns that will make up the unique cell id, separated by ',' -
parent_tags (parent_id): columns that will make up the unique cell id of the parent cell ','