Skip to content

laurajochim/MasterThesis

Repository files navigation

Methodology for generating and evaluating realistic synthetic healthcare data

This repository contains the code, data, and documentation for my Master's Thesis on generating and evaluating realistic synthetic healthcare datasets. It demonstrates a reproducible pipeline for:

Data Cleaning — Preparing clinical (EHR) and proteomic (TCGA) datasets.

Synthetic Data Generation — Creating synthetic patient records using multiple methods (synthpop, vine copula, ctGAN).

Evaluation — Comparing univariate distributions and other metrics between real and synthetic data.

The was approved by the ethics committee of the faculty of social and Behavioural Sciences of the University of Utrecht:

FETC: 24-2032

This archive can be accessed via GitHub for an unlimited amount of time. I am responsible for the research archive. If there are any questions, feel free to contact me via: l.jochim@students.uu.nl

About

Master Thesis

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages