Performing NER on MEdical Datasets
-
Install Pytorch
Pytorch is a requirement for running some of the libraries. However, pytorch implementation varies according to machine, so it has to be done manually.
Refer to https://pytorch.org/get-started/locally/ for installation.
-
Installing requirements
Requirements can be installed using pipenv dependency manager. Install pipenv using command:
pip install pipenv
Then you can install the requirements using the command:
pipenv install
Note: The dependencies are a bit large so a stable internet connection and a bit of patience is required
-
Installing language model
We need to download a spacy model. We can download it by the command:
python -m spacy download en_core_web_lg
-
We need jupyter installed to run the notebook
pip install jupyter
-
Launch the notebook interface using command
jupyter notebook
-
Navigate to the required notebook to view
We can run the NER data extractor from the cli too by using the command from the base dir
python ner/medical_ner.py <location-to-file>
For instance,
python ner/medical_ner.py assets/txt_reports/report_0.txt