This repository contains the prototypical implementation of my Masterthesis in 2019 at the University of Heidelberg in the medical context of the German Cancer Research Center (DKFZ):
Platform to Assist Medical Experts in Training, Application, and Control of Machine Learning Models Using Patient Data from a Clinical Information System
The full thesis and further information are published in the Heidelberg Document Repository: HeiDOKs.
Further changes and developments after July 2019 were performed independently from the Heidelberg University and the DKFZ.
All data objects are stored using UUIDs to avoid conflicts with similar objects.
- /data is per default created and mounted to stores the data of the platform.
- /data/MMLP/models contains all data related to models, including training snapshots.
- /data/MMLP/datasets contains all data sets uploaded by the user.
- /data/MMLP/results if a user uploads data and applies a model, the resulting predictions are stored here.
The configuration is part of the backend, check backend/README and backend/mmlp/config.py
This prototypical platform implementation does support on-premise, hybrid, and public clouds. It is tested on Amazon Web Services, Microsoft Azure, and Google Cloud. In case you need assistance, please contact me.
Before you attempt to deploy the platform, please ensure your system meets the following requirements:
- Docker is installed
- GPU support is available within docker (if you run machine learning on GPU) For Nvidia GPUS: https://github.com/NVIDIA/nvidia-docker For AMD GPUs: https://rocm.github.io/
- If you do not update the default configuration, the following settings are assumed: The global folder /data is used to store all kinds of data related to the platform; it could consume a lot of disk space, depending on your model and data set. If you use a distributed computing environment, please ensure this folder is appropriately shared between the computing nodes. Note: Currently, distribution and scaling are planned but not yet implemented. Please contact me for further information.
- Clone the repository:
git clone https://github.com/magreiner/MMLP
cd MMLP
- Adjust the configuration
vi backend/mmlp/Config.py
- Deploy the platform The platform can be deployed using docker-compose:
# build the containers (repeat this step every time you changed the code or the configuration)
docker-compose build --parallel
# foreground deployment (useful for development, showing the logs directly):
docker-compose up
# background deployment as service (access logs via docker-compose -f logs)
docker-compose up -d
- Enjoy If deployed locally you can access the platform on port 80 with http://localhost
Note:
- https is not activated by default, due to the increased complexity with the certificates. To create certificates Letsencrypt is recommended.
- Sometimes, the browser tries to switch to https automatically and fails. If the platform is not showing as expected, check your browser.
-
Option to Switch Between Clinical Data Scientist (Developer) View and Medical Expert (User) View
-
Training Pipeline, please be aware that these pages have dynamic content based on the used model. Therefore this view can vary greatly, depending on the functionality of the used model. Due to copyright, no model is currently included in this prototype.
-
Method Overview (A method represents a model snapshot, that is exported and made available to a medical expert. It can be used without further configuration)
-
Result View: An overview of the results of the application of a method by the user. This is intended to allow further debugging by the clinical data scientist.
Various containers were helpful during development. Maybe they can be useful for you, too:
- PACS Container Stack (based on https://www.dcm4che.org)
- Dataset Generators
- Port Redirect
- Postprocessing
- Preprocessing
- Visdom-Docker
The platform (as of July 2019) was evaluated by clinical data scientists and medical experts. For details consult the thesis published here: http://www.ub.uni-heidelberg.de/archiv/27446