Modeling with patient biomarker data. This repository is a self-contained demonstration of my approach to exploring a dataset and building a machine learning model for a binary classification task with missing data.
Author: Zachary Levonian
A good entrypoint to this analysis is the Jupyter notebook that trains and evaluates models to predict the binary outcome. Initial exploration and description of the data is in this Jupyter notebook.
Synthetic patient data provided by Tempus. I don't have permission to share the data, although you can see excerpts in the analysis notebooks.
Data is assumed to be present in the data
folder.
Just make install
. Requires Python 3.10 or greater.
Poetry is used for managing Python dependencies, and will be installed if it isn't already available.
The directory layout is:
notebook
contains the analysis notebooks.src
contains thebcs
Python package with helper functions and classes to support the analysis.data
is presumed to be the location of the input data... see the Data section for more details.figures
contains any images produced within the analysis notebooks.