# Actigraph + Qstarz

## Accelerometer data

- Loads accelerometer data originating from ActiGraph devices.
- This CSV parsing functionality is currently experimental. Due to the diverse range of potential file structures, successful processing isn't guaranteed in all cases without potential adjustments.
- Assuming the exported data reflects the local timezone, the timezone setting is being configured accordingly.

In [None]:
from labda.parsers import Actigraph

# path = "actigraph.csv"
path = "../temp/teun/acc/3002_Fontys_277 (2025-01-27)10sec.csv"

acc = Actigraph.from_csv(
    path,
    columns={
        0: "datetime",
        3: "counts_y",
        4: "counts_x",
        5: "counts_z",
    },
    timezone="Europe/Amsterdam",
)

Next, you can review the parsed data and information. Let's begin by examining the metadata.

In [None]:
acc.metadata

You can also inspect the resulting dataframe.

*Note: While the parser is equipped to extract additional information from the file, detailed documentation for these advanced features is currently unavailable. For now, let's focus on the basic counts.*

In [None]:
acc.df

Although both datasets are recorded in 10-second epochs, they aren't time-aligned. This misalignment could complicate later merging. Therefore, aligning them to the nearest neighbor is a simpler approach.

*Note: Employing the resample method without a specified target sampling frequency defaults to a "uniform" alignment. While there are defined rules for uniforming, upsampling, and downsampling, these are not yet documented.*

In [None]:
acc.resample()

Next, we'll identify non-wear time using the Choi (2011) algorithm, a method based on the vector magnitude (VM) derived from activity counts.

- The vector magnitude (VM) needs to be calculated as a prerequisite for the non-wear detection algorithm.
- Following this, you can examine daily wear times and remove any invalid days. While I aim to make this function more robust in the future, it should suffice for now. Let's set the parameter for minimum wear time to 10 hours per day. If the accelerometer (counts) data doesn't meet this threshold, invalid days will be excluded.

*Note: It's important to be aware that the non-wear detection algorithms will likely be updated. Instead of the current threshold scaling, best practice, supported by research, indicates that we should first downsample the 10-second epochs to 1-minute epochs (as the Choi algorithm is intended for this duration), perform the non-wear detection, and then upscale the results. This straightforward method should improve the alignment with existing validation findings.*

In [None]:
from labda.accelerometer import WEAR_ALGORITHMS

acc.add_vector_magnitude("counts", overwrite=True)
acc.detect_wear(WEAR_ALGORITHMS["choi_2011"], overwrite=True)

days, valid = acc.get_wear_times(duration="10h", drop="acc")
days

- Activity intensity will now be determined based on established cut-points. Several implementations are currently available, and new ones can be easily integrated or modified upon request. If you have specific cut-points in mind, please let me know; otherwise, I can offer some recommendations.
- Next, you can review the activity summaries. By default, these are aggregated to a daily level, allowing you to assess the participant's overall activity levels.

*Note: Similar to the non-wear detection, the algorithm for scaled cut-points is also likely to be updated in the near future.*

In [None]:
from labda.accelerometer import COUNTS_CUT_POINTS

acc.detect_activity_intensity(
    COUNTS_CUT_POINTS["evenson_children_2018"], overwrite=True
)
acc.get_summary("activity_intensity")

## Spatial data

- Loading GPS Data: The process for loading GPS data is similar to that of accelerometer data.
- Aligning to 10-Second Epochs: The GPS data will also be aligned to 10-second epochs for consistency.

*Note: I'm currently developing a method to impute missing GPS data caused by signal loss. Once implemented, this will allow for the imputation of a reasonable amount of data: a maximum of 5-15 minutes for transport and up to 1 hour for stationary periods (e.g., being inside a building). Data exceeding these limits will not be imputed and will be classified as "non-wear." This imputation process will also enable us to filter out inaccurate GPS points resulting from poor signal quality.*

In [None]:
from labda.parsers import Qstarz

# path = "qstarz.csv"
path = "../temp/teun/gps/3002.csv"

gps = Qstarz.from_csv(path)
gps.resample()

For a bit of fun with the GPS data, the timezone and Coordinate Reference System (CRS) can be automatically inferred.

In [None]:
gps.set_timezone()
gps.set_crs()

In [None]:
gps.metadata

In [None]:
gps.df

Drawing on some legacy methods from PALMS, it's possible to roughly determine if a person is indoors or outdoors. While not highly precise, this can offer some assistance to trip detection algorithms. However, it's ineffective for data collected via phones.

In [None]:
gps.detect_indoor()
gps.plot("map", color="indoor")

To prepare for merging, we need to verify that the accelerometer and GPS data have consistent participant IDs, timezones, and CRSs. We'll resolve any inconsistencies initially.

*Note: There are several ways to merge the data. An "inner join" will only include data points present in both the accelerometer and GPS datasets. Alternatively, an "outer join" will retain all data points from both datasets, even if there isn't a corresponding entry in the other.*

In [None]:
id = "test"
acc.id = id
gps.id = id

acc.metadata.crs = gps.metadata.crs

sbj = acc.merge(gps, how="inner")
sbj.df

Similar to our previous checks, we now need to verify the wear times for the multi-modal data and remove any invalid days.

In [None]:
days, valid = sbj.get_wear_times(duration="10h", drop="acc+gnss")
days

Finally, for trip detection. There are a lot of parameters you can set up. Since I did a validation study, I recommend some, and they are set as the default.

*Note: I'm planning to release a new algorithm in early 2026, hopefully even more precise.*

In [None]:
sbj.detect_trips()
sbj.plot("map", color="trip_status")

Let's look at the daily summaries to understand participant behavior. You can also combine this with other variables for a more detailed picture. The provided results are in a "long-format," which means we can also reshape them into a "wide-format" if needed.

*Note: freq="YE" means that the data will be aggregated to yearly intervals, so all days together. Most probably, I will add options to calculate daily means. You can always get the data in daily format and then do your own calculations.*

In [None]:
sbj.get_summary(["trip_status", "activity_intensity"], freq="YE")

Let's get to transportation detection based on multi-modal data – this algorithm uses fuzzy inference based on GPS speed and physical activity intensity. As before, you can also adapt and change thresholds. I did some big improvements for this, so I need to check its validity again, meaning those thresholds might not be perfect.

In [None]:
from labda.spatial import TRANSPORTATION_CUT_POINTS

sbj.detect_transportation(TRANSPORTATION_CUT_POINTS["heidler_intensity_2025"])
sbj.plot("map", color="trip_transport")

Let's look at the daily summaries for transporation.

In [None]:
sbj.get_summary("trip_transport")

You might also be interested in detecting behavior at specific locations. For example, if you provide the shapes of homes, schools, etc., I have functions to perform those analyses.

I have a few more features that aren't fully tested yet, and I need to implement them for easy use. What will be available in the coming weeks includes:

1) Automatic location detection using OpenStreetMap – great for initial exploration.
2) Automatic detection of home and school/work locations (so you won't need to provide the shapes yourself).
3) More transportation modes, mainly focusing on public transport.
4) Dyads – exploring relationships, like who spends time with whom, how much, what they do together, how active they are, etc.