Skip to content

How to interpret the output file for GRNmap

Kam Dahlquist edited this page Jan 21, 2016 · 18 revisions

All input worksheets are preserved in the output Excel file. See this page for a description of those sheets. This wiki page describes the new worksheets created in the GRNmap output workbook:

Table of Contents

STRAIN_log2_optimized_expression sheet

One worksheet is created for each strain specified in the optimization_parameters sheet in the input workbook. The word "STRAIN" is replaced by the actual strain designation, e.g., wt_log2_optimized_expression. This worksheet contains the "SystematicName" and the "StandardName" in the first two columns in accordance to the "STRAIN_log2_expression" sheets from the input file. The rest of row 1 contains the time points specified by the "simtime" parameter in the optimization_parameters worksheet in the input workbook. The values in each cell for the remainder of the sheet correspond to the simulated log2 fold change expression values for each gene for each timepoint. The program evaluates the differential equation for each gene using the w, P, and b parameters. If the user has chosen to run a forward simulation only, these parameters are taken from the network_weights, production_rates, and threshold_b worksheets, respectively. If the user has chosen to estimate parameters, the w parameter is taken from the network_optimized_weights sheet. If P has been estimated, its value is taken from the optimized_production_rates sheet; if b has been estimated, its value is taken from the optimized_threshold_b sheet.

STRAIN_sigmas sheet

One worksheet is created for each strain specified in the optimization_parameters sheet in the input workbook. The word "STRAIN" is replaced by the actual strain designation, e.g., wt_sigmas. This worksheet contains the "SystematicName" and the "StandardName" in the first two columns in accordance to the "STRAIN_log2_expression" sheets from the input file. There is then one column for each timepoint specified by the "time" parameter in the optimization_diagnostics sheet in the input workbook. The values are the standard deviations of the log2 fold changes of expression for each gene and timepoint computed from the data contained in the corresponding STRAIN_log2_expression sheet.

optimized_production_rates sheet

If fix_P = 0 in the optimization_parameters sheet in the input workbook, the production rates are estimated and this sheet is created. This sheet contains the "SystematicName" and "StandardName" just like in the production_rates sheet in the input workbook. The "prorate" column contains the optimized production rates from the estimation procedure.

optimized_threshold_b sheet

If fix_b = 0 from the optimization parameters sheet in the input workbook, the threshold (b) parameters are estimated and this sheet is created. This sheet contains the "SystematicName" and "StandardName" just like in the worksheet described above. The third column, entitled "b", contains the optimized threshold b parameters for each gene in the network.

network_optimized_weights sheet

If estimateParams = 1 in the optimization_parameters sheet in the input workbook, the weight parameters, w, are estimated and this sheet is created. The format of this sheet is the same as the "network" sheet in the input workbook. However, the "1's" of the adjacency matrix are replaced by the estimated w parameters. These values represent the magnitude and sign of the regulatory relationship between the transcription factors (genes located in the very first row) and the target genes (genes in the leftmost column). Cell A1 contains the text "rows genes affected/cols genes controlling" as a reminder of the direction of the regulatory relationships specified by the adjacency matrix. If the value in the cell is negative, the target gene is repressed. If the value is positive, the target gene is activated. A value of 0 means that there is no regulatory relationship between those two transcription factors.

optimization_diagnostics sheet

This worksheet contains some diagnostic information about the estimation that can be used to evaluate the performance of the model. The following information is presented.

  • Top block
    • LSE: the value of the overall least squares error for the estimation
    • Penalty: the value of the sum of squares of all parameters being estimated
    • min LSE: the value of the minimum least squares error possible for the estimation that could be achieved given this particular set of expression data. This value is obtained from the variance of the flask data for each time and for each gene
    • iteration count: the count of the total number times the least squares function is evaluated by the optimization algorithm at the termination of the program
  • The bottom block of data has gene-specific information.
    • The left column, entitled "Gene", lists the IDs of the genes in the regulatory network, in the order they appear in the other worksheets.
    • Each column to the right is entitled "STRAIN SSE", where the word STRAIN is replaced by the strain designation from the optimization_parameters sheet in the input workbook, for example "wt MSE". The MSE value is the mean squared error comparing the simulated expression data found in that particular strain's STRAIN_log2_optimized_expression sheet to the experimental data found in the corresponding STRAIN_log2_expression sheet. The MSE gives an indication of how well the model fits each individual gene's expression data, whereas the LSE value in the top block indicates the overall performance of the model across all the genes.