<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Cleaning-files-individually" data-toc-modified-id="Cleaning-files-individually-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Cleaning files individually</a></span></li><li><span><a href="#Cleaning-files-by-batch" data-toc-modified-id="Cleaning-files-by-batch-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Cleaning files by batch</a></span><ul class="toc-item"><li><span><a href="#Start/stop-time-log-file" data-toc-modified-id="Start/stop-time-log-file-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Start/stop-time log file</a></span></li><li><span><a href="#Reading-SST-log-files-with-the-Reader-class" data-toc-modified-id="Reading-SST-log-files-with-the-Reader-class-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Reading SST log files with the Reader class</a></span></li></ul></li><li><span><a href="#Test-with-real-data" data-toc-modified-id="Test-with-real-data-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Test with real data</a></span></li></ul></div>

# How to discard invalid sequences in actigraphy recordings before analysis

Quite frequently, actigraphy recordings contain sequences that do not correspond to the participant's activities but rather to artefacts, such as bringing the actimeter to the participant's home or removing the actimeter from the participant to read the acquired data.
Such meaningless data happen either at the beginning, when the record starts before the actimeter is actually worn by the participant or at the end, when the actimeter is removed by the participants before the end of the data acquisition. Or even at both ends...


In any case, it is necessary, before any analysis of the data, to discard these invalid sequences of activity counts and *pyActigraphy* provides the adequate tools for that.

In [None]:
import pyActigraphy
import os

In [None]:
import plotly.graph_objs as go

In [None]:
# retrieve path to example files
fpath = os.path.join(os.path.dirname(pyActigraphy.__file__),'tests/data/')

In [None]:
# Read test file
raw = pyActigraphy.io.read_raw_awd(fpath+'example_01.AWD')

In [None]:
# Check the start time of the actigraphy recording
raw.start_time

In [None]:
# Check the duration of the recording
raw.duration()

In [None]:
go.Figure(data=[go.Scatter(x=raw.data.index, y=raw.data)], layout=go.Layout())

The initial recording starts the 23rd of January, around 13:00 and lasts for 12 days, 18 hours and 41 minutes. Both at the beginning and at the end of the recording, the recorded activity does not seem to be valid:
- at the beginning, this mixture of zero and non-zero activity counts with no cyclic pattern is most likely due to the fact the acquisition started before the participant actually wore the actimeter but the actimeter was nontheless manipulated for demonstration purposes
- towards the end of the recording, the activity counts suddenly fall to zero, when the participant removed the actimeter, until the last epochs, when the actimeter was most likely manipulated for reading the acquired data.


## Cleaning files individually

With *pyActigraphy*, it is quite easy to mask these invalid sequences by specifiying a start and stop times for the analysis of the recordings.

In our example above, to discard the first hour of data acquisition and keep only 9 days of recordings:

In [None]:
raw_cropped = pyActigraphy.io.read_raw_awd(
    fpath+'example_01.AWD', 
    start_time='1918-01-24 08:00:00',
    period='9 days')

In [None]:
raw_cropped.start_time

In [None]:
raw_cropped.duration()

In [None]:
go.Figure(data=[go.Scatter(x=raw_cropped.data.index, y=raw_cropped.data)], layout=go.Layout())

Let us now verify the impact of discarding invalid activity counts on the usual rest-activity rhythm variables:

In [None]:
raw.IS()

In [None]:
raw_cropped.IS()

The difference is quite substantial, isn't it?

## Cleaning files by batch

It is certainly managable to invalidate activity sequences in a few files by specifying a start and a stop time. However, it is less so when one wants to apply this procedure to hundreds or thousands of recordings. Fortunately, *pyActigraphy* allows users to specify a start and a stop time to actigraphy recordings by batch.


All it needs is a start/stop-time (SST) log file.



A Start/stop-time log file consists in a list of participant's ID associated with two datetimes corresponding to the requested start and end of the actigraphy recording, respectively. These datetimes will overwrite the actual start and stop times of the recordings.

-------------

Start/stop-time log file *must* be formatted as the following:

|Subject_id|Start_time|Stop_time|Remarks|
|----------|----------|---------|-------|
|name_in_header | YYYY-MM-DD HH:MM| YYYY-MM-DD HH:MM| WhateverTextYouWant|

### Start/stop-time log file

As an example, let us first create an SSTLog object in order to inspect it:

In [None]:
from pyActigraphy.log import SSTLog

In [None]:
sstlog_ods = pyActigraphy.log.read_sst_log(fpath+'example_sstlog.ods')

In [None]:
sstlog_csv = pyActigraphy.log.read_sst_log(fpath+'example_sstlog.csv')

In [None]:
sstlog_xls = pyActigraphy.log.read_sst_log(fpath+'example_sstlog.xlsx')

To access to the data contained in the log file:

In [None]:
sstlog_ods.log

In [None]:
sstlog_csv.log

In [None]:
sstlog_xls.log

Please, notice that the duration of the time interval between the start and stop times is automatically calculated and therefore does not need to be entered in the log file.

A `summary` function is available. It returns the min/max/median/mean of the durations found in the log file:

In [None]:
sstlog_ods.summary()

In [None]:
sstlog_csv.summary()

In [None]:
sstlog_xls.summary()

### Reading SST log files with the Reader class

As it is possible to read actigraphy files by batch with the `Reader` class, this class has been extended to read a SST log file.

As an illustration, let us first read a small batch of files:

In [None]:
rawReader = pyActigraphy.io.read_raw(input_path=fpath+'example_0[0-3]*.AWD', reader_type='AWD')

Check how many files were found and read:

In [None]:
len(rawReader.readers)

To read a SST log files containing the start and stop times for these recordings:

In [None]:
rawReader.read_sst_log(fpath+'example_sstlog.csv')

Now, verify the data found in this log file by accessing and displaying the log associated to the reader object:

In [None]:
rawReader.sst_log.log

Then, let us check the IS values of these recordings:

In [None]:
rawReader.IS()

There are quite low. Indeed, the SST log has been read by the reader object but the start and stop times have not been applied to the actigraphy recordings yet.

To do so:

In [None]:
rawReader.apply_sst(verbose=True)

By using the `verbose` option, one can easily check if there is an entry in the SST log file for each actigraphy recording.

Now, the IS values display a significant change for the recordings found in the SST log file:

In [None]:
rawReader.IS()

The actigraphy recordings have been truncated according to the start and stop times specified in the SST log file.

Et voil√†! Easy, isn't it?