Skip to content

Latest commit

 

History

History
250 lines (180 loc) · 8.7 KB

1-longfile.rst

File metadata and controls

250 lines (180 loc) · 8.7 KB

Long File Splitting

If you've collected data from ablations of multiple samples and standards in a single, long data file, read on.

To work with this data, you have to split it up into numerous shorter files, each containing ablations of a single sample. This can be done using latools.preprocessing.split.long_file.

Ingredients

  • A single data file containing multiple analyses
  • A Data Format description <data_format_description> for that file (you can also use pre-configured formats).
  • A list of names for each ablation in the file.

To keep things organise, we suggest creating a file structure like this:

Method

  1. Import your data, and provide a list of sample names.
  2. Apply ~latools.processes.signal_id.autorange to identify ablations.
  3. Match the sample names up to the ablations.
  4. Save a single file for each sample in an output folder, which can be imported by ~latools.latools.analyse
  5. Plot a graph showing how the file has been split, so you can make sure everything has worked as expected.

Output

After you've applied ~latools.preprocessing.split.long_file, a few more files will have been created, and your directory structure will look like this:

If you have multiple consecutive ablations with the same name (i.e. repeat ablations of the same sample) these will be saved to a single file that contains all the ablations of the same file.

Example

To try this example at home this zip file <resources/long_example.zip> contains all the files you'll need.

Unzip this file, and you should see the following files:

1. Load Sample List

First, read in the list of samples in the file. We have examples in two formats here - both plain text and in an Excel file. We don't care what format the sample list is in, as long as you can read it in to Python as an array or a list. In the case of these examples:

Text File

This loads the sample list into a numpy array, which looks like this:

Excel File

This will load the data into a DataFrame, which looks like this:

The sample names can be accessed using:

2. Split the Long File

This will produce some output telling you what it's done:

The single long file has been split into 13 component files in the format that latools expects - each file contains ablations of a single sample. Note that consecutive ablations with the same sample are combined into single files, and if a sample name is repeated _N is appended to the sample name, to make the file name unique.

The function also produces a plot showing how it has split the files:

3. Check Output

So far so good, right? NO! This split has not worked properly.

Take a look at the printed output. On the second line, it says that the number of samples in the list and the number of ablations don't match. This is a red flag - either your sample list is wrong, or latools is not correctly identifying the number of ablations.

The key to diagnosing these problems lies in the plot showing how the file has split the data. Take a look at the right hand side of this plot:

Something has gone wrong with the separation of the jcp and jct ablations. This is most likely related to the signal decreasing to close to zero mid-way through the the second-to-last ablation, causing it to be itendified as two separate ablations.

4. Troubleshooting

In this case, a simple solution could be to smooth the data before splitting.

The ~latools.preprocessing.split.long_file function uses ~latools.processes.signal_id.autorange to identify ablations in a file, and you can modify any of the autorange parameters by passing giving them directly to ~latools.preprocessing.split.long_file.

Take a look at the ~latools.processes.signal_id.autorange documentation. Notice how the input parameter swin applies a smoothing window to the data before the signal is processed. So, to smooth the data before splitting it, we can simply add an swin argument to ~latools.preprocessing.split.long_file:

This produces the output:

You can see in the image that this has fixed the issue:

5. Analyse

You can now continue with you latools analysis, as normal.