Skip to content

CoronAI: A Machine Learning Toolkit for Coronavirus

Notifications You must be signed in to change notification settings

shayanfazeli/coronai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CoronAI: A Machine Learning Toolkit for Coronavirus

An ERLab Project

Introdcution

Please view this article in that regard.

Installation

Clone this github repository and run:

python3 setup.py install

Usage

Please find sample experiment commands below:

Segment to Vector

To generate the mathematical vectors for our text segments, the following command can be used:

coronai_segment2vector --gpu=2 --input_csv=.../text_segment_dataset.csv \
--output_pkl=.../output.pkl \
--path_to_bert_weights=.../NCBI_BERT_pubmed_mimic_uncased_L-24_H-1024_A-16 \
--batch_size=25

This batch size works fine on a 2080Ti GPU. Note that part of this code is responsible for preprocessing and storing AllenNLP compatible instances to be fed to the BERT-based pipeline. There is no need to create them more than once, so if the input_text_sequence_instances.pkl file exists in the folder of your output_pkl file, it will be used directly. You can also get it from our shared drive, along with some representations and other files.

Unsupervised Segment Group Discovery

To do this, you need the mathematical representations of course. When you have those, the following command should be used:

coronai_unsupervised_clustering --method_name=kmeans \
--method_params=[random_state:int:2019] --method_searchspace=n_clusters:5:100 \
--output_bundle=...path_to_output_folder/kmeans.pkl \
--input_files=...path_to/full_representations_dataset.pkl
coronai_unsupervised_clustering --method_name=birch \
--method_params=[branching_factor:int:50,threshold:float:0.5] \
--method_searchspace=n_clusters:3:100 \
--output_bundle=...path_to_output_folder/birch.pkl \
--input_files=...path_to/full_representations_dataset.pkl

The method searchspace allows you to choose one parameter and do a search over it. In our case, we use number of clusters. In every step, the output file will be updated (so you do not need to wait forever to be able to check things). Also, the format for inputting method parameters are obvious.

Documentation

The documentation for this project is available in here

username: user
password: mlisLife

CoronAI Shared Drive

Please view here

About

CoronAI: A Machine Learning Toolkit for Coronavirus

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published