Skip to content

Machine Learning model that learns from Unified Medical Language System Metathesaurus (UMLS Metathesaurus) database tagging new graph in Semantic Network

Notifications You must be signed in to change notification settings

sifrproject/UMLS-Types-assignor

Repository files navigation

Python Jupyter Notebook MySQL TensorFlow Pandas UMLS

UMLS Metathesaurus - Semantic Network Machine Learning

python-linter

📖 Description :

Machine Learning model that learns from Unified Medical Language System Metathesaurus (UMLS Metathesaurus) database tagging new graph in Semantic Network

🚀 How to use :

0- Complete the .env file with the following variables :

HOST=<host_of_your_umls_database>
USER=<user_of_your_umls_database>
PASSWORD=<password_of_your_umls_database>
DB=<name_of_your_umls_database>
UMLS_API_KEY=<your_api_key>

1- Install the required packages

$ pip install -r requirements.txt

2.1- Modify the configuration file as you want

Very important to check all the parameters

2.2- Launch pipeline

You can use flags to customize the args in the pipeline

$ python main.py -h

> usage: main.py [-h] [--verbose] [--only_source] [--only_preprocess] [--from_preprocess] [--only_training]
               [--limit LIMIT] [--debug_output_path DEBUG_OUTPUT_PATH] --run_name RUN_NAME

optional arguments:
  -h, --help            show this help message and exit
  --verbose             Active verbose mode.
  --only_source         Pipeline launchs only the generation of the source data.
  --only_preprocess     Pipeline launchs only the preprocess of the source data.
  --from_preprocess     Pipeline launchs from the preprocess of the source data.
  --only_training       Pipeline launchs only the training of the preprocessed data.
  --limit LIMIT         Limit of the source data number generated.
  --debug_output_path DEBUG_OUTPUT_PATH
                        Path of the output log.
  --run_name RUN_NAME   REQUIRED: Name of the run.

Examples:

  • Launching all pipeline (data generation + preprocess + training & test + graph prediction)
$ python main.py --run_name="NAME_OF_THE_EXPERIMENT_RUN"

  • Launching in verbose mode only 100 data generation generating new artefact/data.csv
$ python main.py --run_name="NAME_OF_THE_EXPERIMENT_RUN" --only_source --limit=100 --verbose

  • Launching from preprocess generating new artefact/preprocessed_data.csv + training & test + graph prediction
$ python main.py --run_name="NAME_OF_THE_EXPERIMENT_RUN" --only_source --limit=100 --verbose

The most used command:

  • Launching in verbose mode training & test + graph prediction
python3 main.py --run_name="NAME_OF_THE_EXPERIMENT_RUN" --only_training --verbose

3- Use MlFlow UI to visualize data in localhost:5000

$ mlflow ui

⚡ UMLS API :

We build our own UMLS API to get the data from UMLS Metathesaurus database. To use it, you need to install the UMLS database locally. You can download the database from here and install it following these instructions. Then, you need to import the umls_api python package.

We also use the UMLS REST API to get the data from UMLS Metathesaurus.

💾 Model used :

Keras Visualization model

About

Machine Learning model that learns from Unified Medical Language System Metathesaurus (UMLS Metathesaurus) database tagging new graph in Semantic Network

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages