# Assignment: Data Analysis of Vital Signs in Surgical Patients using VitalDB

## Domain Relevance
Monitoring vital signs is crucial in evaluating a patient's physiological state during surgical interventions. VitalDB, an extensive repository, offers access to detailed vital sign data recorded from surgical patients. This assignment aims to harness previously acquired skills by developing a client application for processing and analyzing vital signs data. 

## About the data
The VitalDB (Vital Signs DataBase) is an open dataset created specifically to facilitate machine learning studies related to monitoring vital signs in surgical patients. This dataset contains high-resolution multi-parameter data from 6,388 cases, including 486,451 waveform and numeric data tracks of 196 intraoperative monitoring parameters, 73 perioperative clinical parameters, and 34 time-series laboratory result parameters. All data is stored in the public cloud after anonymization. The dataset can be freely accessed and analysed using application programming interfaces and Python library[1,2,3]. Documentation about the vitalDB can be found here: https://physionet.org/content/vitaldb/1.0.0/vital_files/#files-panel


---
## Learning outcomes

1. You get proficiency in interpreting instructions and documentation to access data from non-standard sources, such as VitalDB.
2.	You demonstrate a high level of competence in applying python and relevant libraries as well as appropriate mathematical, and statistical methods to effectively identify patterns, causal relationships, and actionable insights.
3.	You adeptly select appropriate data analysis methods, provide sound justifications for your choice, or creatively adapt existing methods to develop solutions for the problems. 
4.	You can integrate diverse knowledge domains, effectively handle complexity, and extract meaningful information from data, even in the presence of incomplete or challenging datasets. 
5.	You develop a maintainable well-structured solution comprising modules for parsing, feature extraction, analysis, and visualization of signal data. You adhere to [the fair principles](https://en.wikipedia.org/wiki/FAIR_data). Your code is organized, well written, well documented, traceable via version control management systems, and suitably licensed. 

--- 

## Instructions
- Read [the assessment](#assessment-criteria) criteria carefully.
- Conduct the tasks [as described below](#tasks).
- Upload the solution of this study case in a repository and submit the link to the blackboard assignment. Make sure that your repository is private and invite your teachers and tutors.
- Make sure that you do not upload data and or API Keys to your repository. Make use of a config-file and write instructions in the readme how to access the data.
- Ensure that you acknowledge other persons's contributions and/or usage of AI tools.


## Tasks:

### Data Acquisition and Preparation
- Read the instructions provided to access VitalDB and select one of the datasets.
- Identify at least two vital signals (tracks) from the chosen dataset for univariate and multivariate signal analysis. Choose signals that are clinical relevant to analyse together. 
- Thoroughly inspect and clean the dataset, addressing datatype formats, missing values, outliers, and other anomalies.
- Using the insights from the previous steps, construct a parser module (python script that loads, cleans and reformat the data into a required format).
  
**Deliverable: Parser module**

### Feature Extraction and Analysis
- Explore the dataset to extract relevant features from the vital sign signals, enhancing the depth of data understanding.
- Using the insights from the previous steps, design and implement an analyzer module capable of summarizing signal data and derived features over specified time frames.

**Deliverable: Analyzer module**

### Visualization
- Visualize the vital signal information using appropriate visualization techniques.
  
**Deliverable: visualization module**
  
### Bonus: Streaming Mode Solution
- Develop a streaming mode solution for real-time data analysis. You can choose to do this in a notebook or a script
  
**Deliverable: client module**

---

## Assessment Criteria:

* **Data Quality and Quantity (15 pt):** Rigorous evaluation of data quality, emphasizing accuracy, completeness, and consistency for time series analysis. Python code, leveraging Pandas and Numpy libraries, implements specialized preprocessing steps tailored for time series datasets.

* **Interactive Visualization (10 pt):** Develop interactive visualizations with a temporal perspective, incorporating time-specific axes, intervals, and trends. Captions provide insights into time-related patterns, ensuring clarity and relevance to time series characteristics.

* **Design Alignment and Functionality (10 pt):** Project design aligns with time series research questions. The information extracted from time series data is presented in a manner that is insightful and pertinent. Tables and visualizations emphasize temporal patterns, offering a strong foundation for time series-focused research conclusions.

* **Maintainable solution (20 pt):** Programming code is logically organized in modules, classess are implemented to support efficiency, readability and maintainability.

* **Efficient and Error-free Code (10 pt):** Code adheres to coding standards, ensuring efficiency, readability, and error-free processing. The code aims for maintainability and flexibility, allowing for future modifications without compromising functionality.

* **Repository and Documentation (10 pt):** All code is stored in a repository with a comprehensive README file. The README emphasizes implementation details specific to time series analysis, addressing nuances in the data and methodologies. The codebase is well-documented, facilitating easy understanding and implementation

* **Conduct Critical Research (20 pt):** Transparently state assumptions, justifying design choices with a focus on time series-specific considerations.  Argumentative documentation within the code or separate document, critically questioning and engaging with elements crucial for time series analysis.

* **References (5 pt):** Clearly documented references, following scientific citation standards, highlighting their relevance to time series analysis methodologies and techniques.

---
## References
[1] Lee, H., & Jung, C. (2022). VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients (version 1.0.0). PhysioNet. https://doi.org/10.13026/czw8-9p62.

[2] Lee, HC., Park, Y., Yoon, S.B. et al. VitalDB, a high-fidelity multi-parameter vital signs database in surgical patients. Sci Data 9, 279 (2022)

[3] Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220.