# Appendix A: Important Considerations in DHT Data Collection

Appendix A: Important Considerations in DHT Data Collection

Standard practices for actigraphy measurements are as important as the reliability assessment itself. Here we present additional information on how to prepare and design the most effective validation method for DHTs.

A.1Practical Considerations

To collect accurate and reliable data for actigraphy measurements, researchers should consider the limitations of their cohorts carefully, such as physical limitations, DHT placement, duration of the measurement, among others. Validation of actigraphy DHTs may require manual measurement (such as 6MWD or step count) or other validated DHTs (sleep quality, or activity rate). The first step in the validation is ensuring that the consistency in the measurement setting and data quality.

Sampling actigraphy data, in addition to referring to specific guidelines for collecting data from the unvalidated DHT, also encapsulates data collection practices for the gold standard. Most common issue in actigraphy measurements are inter-rater and intra-rater variability, as well as changes due to the environment (time of the day, location, etc.).

Furthermore, data acquisition practices should report the sampling rate of the DHT as well as the gold standard to ensure no aliasing effect or under sampling occurs during feature extraction. For example, activity counts for a sampling rate of 30 Hz and 100 Hz for the same device can have a bias as large as 1,000 –3,000 more counts per minute, accounting for more than 10% difference 18. In addition, outliers and preprocessing the data will minimize the effect of rater-based errors, giving a clearer picture of the DHT reliability.

A.2The Difference Between Agreement and Correlation

As mentioned in the Section “Error! Reference source not found.”, correlation measures the strength of the linear relationship between two variables, whereas agreement measures the consistency between them, meaning how well they match.

Two variables could be strongly linearly related, hence highly correlated, but not agreeing. For example, temperatures in °C and °F are highly correlated as they are linearly proportional but do not agree, since 0°C is not the same temperature as 0°F, as illustrated by Appendix Figure 1. This applies equally to °C and K; although they employ the same scale, the 273-degree shift results in a significant discrepancy, and thus, no agreement. Hence, considering the use case study aims to evaluate the agreement between manual and digital 6MWD measurements, a high correlation index (ρ) is insufficient to demonstrate the Apple Watch's accuracy. However, correlation is still an important measure to consider, as it provides a sense of the dataset before assessing for agreement.

The relationships between °C and °F and °C and K demonstrates the nuance between correlation and agreement. All variables provide measurements of the same metric, temperature, but vary according to different scales, as illustrated by the shifts of the °F (red) and K (blue) lines from the 45° (y=x) perfect agreement line (black). Hence, they are perfectly correlated (ρ=1), but do not agree, as demonstrated by the extreme CCC and MAPE values, which reflect how 1°C is very different from 1°F or 1 K in temperature.

To extend this logic to DHT validation, comparing the measurements of a digital device with the gold standard (e.g., manual measurements) includes assessing the agreement between measurements yielded by both methods. For example, when assessing the reliability of a smartwatch in measuring walking distance, we want to know whether a smartwatch’s measured distance is equal to (i.e., agrees with) the corresponding manual measurements; hence, assessing for correlation only constitutes an incomplete methodological approach.

A.3Individual-Level and Systematic-Level Reliability

Evaluating the reliability of a DHT implies assessing its fitness-for-purpose; however, given that DHTs can serve diverse purposes, the statistical methodology employed in their analytical validation must be tailored according to the context of use of the device of interest.

Thus, we divided the metrics into two different categories, depending on whether they assess reliability on an individual or a systematic level. On one hand, individual-level reliability, assessed by CCC, ICC, and BA LoAs, pertains to the reproducibility of measurements on an individual level, meaning how well each individual’s corresponding measurements (device vs gold standard) agree. On the other hand, systematic-level reliability, assessed through MAPE and Bland-Altman analysis focused on bias, relates to agreement within measurements at the level of the tested group as a whole.

Consider the following example: two different thermometers are used to measure room temperature every hour, over the course of a week. When comparing the measurements of two different thermometers to measure room temperature, a high individual-level reliability would indicate that both thermometers yield similar measurements on each individual hourly reading, whereas a high systematic-level reliability would be indicated by both thermometers yielding similar mean temperatures each day without necessarily providing identical readings every hour.

A.4 Regulatory Considerations

The FDA guidance on integration of DHTs into clinical investigations serves as a crucial compass for industry professionals, investigators, and stakeholders alike, offering invaluable insights into the adept utilization of DHTs for remote data acquisition within clinical investigations.32

Currently, the FDA has approved 62 artificial intelligence-driven DHTs in the field of cardiology that either measure functional status or improve HF treatment trackingpatients.33 However, to date, there has been little systematic evaluation of actigraphy-based endpoints collected by DHT against recognized functional endpoints.

The FDA's guidance emphasizes that the application of DHTs must be "fit-for-purpose," signifying that the level of validation accompanying these tools should be commensurate with their intended use and interpretability in clinical investigation. Key considerations encompass data accuracy, reliability, and security. DHTs enlisted for clinical trials must be meticulously selected, considering a range of factors, including the nature of data to be gathered, the target patient population, and any relevant regulatory prerequisites. The rationale for selecting a specific DHT and its intended use should be substantiated, thereby ensuring alignment with the study's objectives.

Clinical endpoints must be rigorously justified and precisely defined, encompassing the timing and tools employed for assessments. DHTs may replace established functional endpoints, necessitating validation to guarantee data reliability. The statistical methodologies used for analyzing data collected via DHTs should be clearly and pre-emptively defined, with consideration given to potential intercurrent events that may impact data integrity.

The rise of decentralized clinical trials (DCTs) heralds a transformative shift in clinical trial methodology. One such trial on SGLT2 inhibitors in HF highlights the importance of DCTs in novel therapies.34,35 These trials empower participants to receive treatments, undergo assessments, and provide data from the comfort of their homes or local healthcare facilities. The use of DHTs in DCTs offers several advantages, including increased patient access, enhanced convenience, personalized engagement, efficient data collection, and cost reduction.

Despite the evident benefits of DCTs, adherence to applicable regulations and guidelines remains paramount. Study design, patient recruitment, data collection, monitoring, and ethical considerations must align with regulatory standards to safeguard human subjects, ensure patient safety, and maintain data quality while upholding regulatory compliance in clinical trials. A summary of the interpretation and considerations are given in Appendix Figure 2.