If you've collected data from ablations of multiple samples and standards in a single, long data file, read on.
To work with this data, you have to split it up into numerous shorter files, each containing ablations of a single sample. This can be done using latools.preprocessing.split.long_file
.
- A single data file containing multiple analyses
- A
Data Format description <data_format_description>
for that file (you can also use pre-configured formats).- A list of names for each ablation in the file.
To keep things organise, we suggest creating a file structure like this:
- Import your data, and provide a list of sample names.
- Apply
~latools.processes.signal_id.autorange
to identify ablations.- Match the sample names up to the ablations.
- Save a single file for each sample in an output folder, which can be imported by
~latools.latools.analyse
- Plot a graph showing how the file has been split, so you can make sure everything has worked as expected.
After you've applied ~latools.preprocessing.split.long_file
, a few more files will have been created, and your directory structure will look like this:
If you have multiple consecutive ablations with the same name (i.e. repeat ablations of the same sample) these will be saved to a single file that contains all the ablations of the same file.
To try this example at home this zip file <resources/long_example.zip>
contains all the files you'll need.
Unzip this file, and you should see the following files:
First, read in the list of samples in the file. We have examples in two formats here - both plain text and in an Excel file. We don't care what format the sample list is in, as long as you can read it in to Python as an array or a list. In the case of these examples:
This loads the sample list into a numpy array, which looks like this:
This will load the data into a DataFrame, which looks like this:
The sample names can be accessed using:
This will produce some output telling you what it's done:
The single long file has been split into 13 component files in the format that latools
expects - each file contains ablations of a single sample. Note that consecutive ablations with the same sample are combined into single files, and if a sample name is repeated _N
is appended to the sample name, to make the file name unique.
The function also produces a plot showing how it has split the files:
So far so good, right? NO! This split has not worked properly.
Take a look at the printed output. On the second line, it says that the number of samples in the list and the number of ablations don't match. This is a red flag - either your sample list is wrong, or latools is not correctly identifying the number of ablations.
The key to diagnosing these problems lies in the plot showing how the file has split the data. Take a look at the right hand side of this plot:
Something has gone wrong with the separation of the jcp
and jct
ablations. This is most likely related to the signal decreasing to close to zero mid-way through the the second-to-last ablation, causing it to be itendified as two separate ablations.
In this case, a simple solution could be to smooth the data before splitting.
The ~latools.preprocessing.split.long_file
function uses ~latools.processes.signal_id.autorange
to identify ablations in a file, and you can modify any of the autorange parameters by passing giving them directly to ~latools.preprocessing.split.long_file
.
Take a look at the ~latools.processes.signal_id.autorange
documentation. Notice how the input parameter swin
applies a smoothing window to the data before the signal is processed. So, to smooth the data before splitting it, we can simply add an swin
argument to ~latools.preprocessing.split.long_file
:
This produces the output:
You can see in the image that this has fixed the issue:
You can now continue with you latools
analysis, as normal.