The main functions of file_parser
module are devoted to data import/export and generation of synthetic data.
file_parser.read_excel(file_name, trim=False, diff=False, rescale_x=False, independent=True, remove_outliers=False)
Reads the measurent from excel (xlsx) file. Each group of measurement (e.g., each gene) should be written in a separate sheet. First columns corresponds to time points and the second column to the measured quantities. Replications should be given in a consecutive order, i.e. first replication, seconds replication, etc. Within each replication, times should be sorted. See an example of a valid file here.
file_name
(str
): name of the filetrim
(bool
,default=False
): ifTrue
first line of the file should include the times describing the interval of the measurement to include in the analysis; other measurement are removed; if the interval is set to [-1, -1] all measurement are included in the analysis.diff
(bool
,default=False
): ifTrue
differentials of the measurement are calculated (e.g. to remove the accumulation of luminescence)rescale_x
(bool
,default=False
): ifTrue
timepoints with the same values within the same replicate are rescaled to an interval between the current timepoint and the next timepoint.independent
(bool
,default=True
): ifFalse
replicates are stored sepparately under namesx_rep1
,x_rep2
, etc., wherex
corresponds to the current measurement.remove_outliers
(bool
,default=False
): ifTrue
removes outliers.
pandas
DataFrame
with three columnstest
: names of groups of measurementsx
: timepointsy
: measurements
Reads the measurent from csv file as used by other methods, such as JTK_CYLCE and RAIN. The first column should include the names of groups of measurement (e.g., genes). The first row should include the data describing the timepoints and replicates. If replicates are available, they should be labeled with increasing numbers (starting with 1) and these should be separated with timepoints using the underscore symbol (_
). See an example of a valid file here.
file_name
(str
): name of the filesep
(str
): separator
pandas
DataFrame
with three columnstest
: names of groups of measurementsx
: timepointsy
: measurements
Exports the measurements into an xlsx file.
df
(DataFrame
): with columnstest
,x
andy
file_name
(str
): name of the fileindependent
(bool
,default=True
): ifFalse
replicates that are stored sepparately under namesx_rep1
,x_rep2
, etc., are merged together to a single measurement group
Exports the measurements into a csv file.
df
(DataFrame
): with columnstest
,x
andy
file_name
(str
): name of the fileindependent
(bool
,default=True
): ifFalse
replicates that are stored sepparately under namesx_rep1
,x_rep2
, etc., are merged together to a single measurement group
Exports the measurements into a csv file.
df
(DataFrame
): with columnstest
,x
andy
file_name
(str
): name of the fileindependent
(bool
,default=True
): ifFalse
replicates that are stored sepparately under namesx_rep1
,x_rep2
, etc., are merged together to a single measurement group
file_parser.export_cosinor2(input_file_name, output_file_name, period = 24, trim=False, diff=False, remove_outliers=False, rescale_median = False, remove_lin_comp = False)
Convert the xlsx file with the name input_file_name
into an xlsx file with the name output_file_name
that can be used in a combination with the population-mean cosinor tests implemented in the cosinor2 R package (see https://cran.r-project.org/web/packages/cosinor2/).
input_file_name
(str
): input file name (xlsx); seefile_parser.read_excel
for the description of required file formatting.output_file_name
(str
): name of the output file (xlsx).period
(int
,default=24
): presumed period; only used ifremove_lin_comp
is set toTrue
.trim
(bool
,default=False
): ifTrue
first line of the file should include the times describing the interval of the measurement to include in the analysis; other measurement are removed; if the interval is set to [-1, -1] all measurement are included in the analysis.diff
(bool
,default=False
): ifTrue
differentials of the measurement are calculated (e.g. to remove the accumulation of luminescence)remove_outliers
(bool
,default=False
): ifTrue
removes outliers.rescale_median
(bool
,default=False
): ifTrue
timepoints with the same values within the same replicate are rescaled to an interval between the current timepoint and the next timepoint using the median values of each timepoint.remove_lin_comp
(bool
,default=False
): ifTrue
linear component is identified and removed from the data.
file_parser.generate_test_data(n_components=1, period = 24, amplitudes = 0, baseline = 0, phase = 0, min_time = 0, max_time = 48, time_step = 2, replicates = 1, independent = True, name="test", noise = 0)
Synthetic test-data generator.
n_components
(int
,default=1
): number of components in the cosinor data generatorperiod
(int
,default=24
): period of generated dataamplitudes
(list
,default=0
): amplitudes of each cosinor, default sets amplitudes to1, 1/2, 2/3, 1/4
.baseline
(int
,default=0
): baseline of the cosinorphase
(int
,default=0
): acrophasemin_time
(int
,default=0
): minimal timemax_time
(int
,default=48
): maximal timetime_step
(int
,default=2
): time between measurementsreplicates
(int
,default=1
): number of replicatesindependent
(bool
,default=True
): ifFalse
replicates are stored sepparately under namesx_rep1
,x_rep2
, etc., wherex
corresponds to the current measurement.name
(str
,"test"
): name of the group of measurementsnoise
(int
,default=0
): noise amplitude
pandas
DataFrame
with three columnstest
: name of group of measurementsx
: timepointsy
: measurements