# Preprocessor proof of concept

This notebook is a proof of concept for a preprocessor that may be used to automatically preprocess a text before it is used. The preprocessor is based on [medSpaCy](https://github.com/medspacy/medspacy), a package that extends [spaCy](https://spacy.io/) to provide additional functionalities for processing clinical text. 

## 1. Importing the necessary libraries

We will start by importing the necesary libraries. We will use `medspacy` for preprocessing the text, as we said before. We should have installed it before running this notebook.

In [3]:
import medspacy
# For visualizing purposes, we need to import this section
from medspacy.visualization import visualize_ent

The preprocessor need rules to be applied to the text. We will use a simple set of rules for this proof of concept. The rules are specified as a list of items that are `TargetRule` objects. Each `TargetRule` object has the following attributes:

- `literal`: The literal string to match in the text
- `category`: The category to assign to the matched text

The preprocessor will look for the `literal` string in the text and assign the `category` to the matched text.

We will import the `TargetRule` class.

In [None]:
from medspacy.ner import TargetRule

## 2. The sample text

In this case, we will use a simple text as an example. We will use the following text:

```text
<div xmlns=\"http://www.w3.org/1999/xhtml\"><p><strong>Do not take Biktarvy</strong></p><ul><li>If you are allergic to bictegravir, emtricitabine, tenofovir alafenamide or any of the other ingredients of this medicine (listed in section 6).</li><li><p>If you are currently taking any of the following medicines:</p><ul><li>rifampicin used to treat some bacterial infections such as tuberculosis</li><li>St. John’s wort (Hypericum perforatum), a herbal remedy used for depression and anxiety, or products that contain it.</li></ul></li><li><p>If any of these apply to you, <strong>do not take Biktarvy and tell your doctor immediately.</strong></p></li></ul><p><strong>Warnings and precautions</strong></p><p><strong>Talk to your doctor before taking Biktarvy:</strong></p><ul><li>If you have liver problems or a history of liver disease, including hepatitis. Patients with liver disease including chronic hepatitis B or C, who are treated with antiretrovirals, have a higher risk of severe and potentially fatal liver complications. If you have hepatitis B infection, your doctor will carefully consider the best treatment regimen for you.</li><li><p>If you have hepatitis B infection. Liver problems may become worse after you stop taking Biktarvy.</p></li><li><p>Do not stop taking Biktarvy if you have hepatitis B. Talk to your doctor first. For more details, see section 3, Do not stop taking Biktarvy.</p></li><li><p>If you have had kidney disease or if tests have shown problems with your kidneys. Your doctor may order blood tests to monitor how your kidneys work when starting and during treatment with Biktarvy.</p></li></ul><p><strong>While you are taking Biktarvy</strong></p><p>Once you start taking Biktarvy, look out for:</p><ul><li>Signs of inflammation or infection</li><li>Joint pain, stiffness or bone problems</li></ul><p><strong>If you notice any of these symptoms, tell your doctor immediately. For more information see section 4, Possible side effects.</strong></p><p>There is a possibility that you may experience kidney problems when taking Biktarvy over a long period of time (see Warnings and precautions).</p><p>This medicine is not a cure for HIV infection. While taking Biktarvy you may still develop infections or other illnesses associated with HIV infection.</p><p><strong>Children and adolescents</strong></p><p>Do not give this medicine to children under 2 years of age, or weighing less than 14 kg regardless of age. The use of Biktarvy in children under 2 years of age, or weighing less than 14 kg has not yet been studied. For children and adolescents who weigh 25 kg or more, Biktarvy 50 mg/200 mg/25 mg film-coated tablets are available.</p><p>Loss of bone mass has been reported in some children from 3 to less than 12 years of age who received one of the medicinal products (tenofovir alafenamide) contained in Biktarvy. The effects on long term bone health and future fracture risk in children is uncertain. Your doctor will monitor your child’s bone health as needed.</p><p><strong>Other medicines and Biktarvy</strong></p><p>Tell your doctor or pharmacist if you are taking, have recently taken or might take any other medicines. Biktarvy may interact with other medicines. As a result, the amounts of Biktarvy or other medicines in your blood may change. This may stop your medicines from working properly, or may make any side effects worse. In some cases, your doctor may need to adjust your dose or check your blood levels.</p><p><strong>Medicines that must never be taken with Biktarvy:</strong></p><ul><li>rifampicin used to treat some bacterial infections such as tuberculosis</li><li>St. John’s wort (Hypericum perforatum), a herbal remedy used for depression and anxiety, or products that contain it.</li><li>If you are taking any of these medicines, do not take Biktarvy and tell your doctor immediately.</li></ul><p><strong>Talk to your doctor if you are taking:</strong></p><ul><li>medicines used for treating HIV and/or hepatitis B, containing:<ul><li>adefovir dipivoxil, atazanavir, bictegravir, emtricitabine, lamivudine, tenofovir alafenamide, or tenofovir disoproxil</li></ul></li><li>antibiotics used to treat bacterial infections, containing:<ul><li>azithromycin, clarithromycin, rifabutin or rifapentine</li></ul></li><li>anticonvulsants used to treat epilepsy, containing:<ul><li>carbamazepine, oxcarbazepine, phenobarbital or phenytoin</li></ul></li><li>immunosuppressants used to control your body’s immune response after a transplant, containing ciclosporin</li><li><p>ulcer-healing medicines containing sucralfate</p></li><li><p>Tell your doctor if you are taking any of these medicines. Do not stop your treatment without contacting your doctor.</p></li></ul><p><strong>Get advice from a doctor or pharmacist if you are taking:</strong></p><ul><li>antacids to treat stomach ulcers, heartburn, or acid reflux, containing aluminium and/or magnesium hydroxide</li><li>mineral supplements or vitamins containing magnesium or iron</li><li>Get advice from your doctor or pharmacist before taking Biktarvy if you are taking any of these medicines.</li></ul><p>Antacids and magnesium supplements: you will need to take Biktarvy at least 2 hours before antacids or supplements containing aluminium and/or magnesium. Or you can take Biktarvy with food at least 2 hours after.</p><p>Iron supplements: you will need to take Biktarvy at least 2 hours before iron supplements, or you can take them together with food.</p><p><strong>Pregnancy and breast-feeding</strong></p><ul><li>If you are pregnant or breast-feeding, think you may be pregnant or are planning to have a baby, ask your doctor or pharmacist for advice before taking this medicine.</li><li>Tell your doctor immediately if you become pregnant and ask about the potential benefits and risks of your antiretroviral therapy to you and your child.</li></ul><p>If you have taken Biktarvy during your pregnancy, your doctor may request regular blood tests and other diagnostic tests to monitor the development of your child. In children whose mothers took nucleoside reverse transcriptase inhibitors (NRTIs) during pregnancy, the benefit from the protection against HIV outweighed the risk of side effects.</p><p><strong>Do not breast-feed during treatment with Biktarvy.</strong> This is because some of the active substances in this medicine pass into human breast milk. Breast-feeding is not recommended in women living with HIV because HIV infection can be passed on to the baby in breast milk. If you are breast-feeding, or thinking about breast-feeding, you should discuss it with your doctor as soon as possible.</p><p><strong>Driving and using machines</strong></p><p>Biktarvy can cause dizziness. If you feel dizzy when taking Biktarvy, do not drive or ride a bicycle and do not use any tools or machines.</p><p><strong>Biktarvy contains sodium</strong></p><p>This medicine contains less than 1 mmol sodium (23 mg) per tablet, that is to say essentially ‘sodium-free’.</p></div>
```

This text is extracted from the Biktarvy leaflet. We will use this text as an example to show how the preprocessor works.


## 3. Defining the rules

Now, we will define the rules to be used by the preprocessor. We will use a simple set of rules for this proof of concept. The rules are defined on a list of `TargetRule` objects.

In [11]:
target_rules = [
    TargetRule("Biktarvy", "DRUG"),
    TargetRule("bictegravir", "INGREDIENT"),
    TargetRule("Emtricitabine", "INGREDIENT"),
    TargetRule("Tenofovir alafenamide", "INGREDIENT"),
    TargetRule("Rifampicin", "INGREDIENT"),
    TargetRule("St. John’s wort", "DRUG"),
    TargetRule("Hypericum perforatum", "INGREDIENT"),
    TargetRule("Pregnancy", "CONDITION"),
    TargetRule("Breast-feeding", "CONDITION"),
    TargetRule("HIV", "CONDITION"),
]

Later in this project, we will define a more complex set of rules to be used in the preprocessor. Also, we will extract the rules from a JSON file.

## 4. Preprocessing the text

We will preprocess the text using the rules defined before. We will use the `medspacy` package to preprocess the text. The preprocessor will look for the `literal` string in the text and assign the `category` to the matched text.

In [4]:
nlp = medspacy.load()


In [5]:
text = "<div xmlns=\"http://www.w3.org/1999/xhtml\"><p><strong>Do not take Biktarvy</strong></p><ul><li>If you are allergic to bictegravir, emtricitabine, tenofovir alafenamide or any of the other ingredients of this medicine (listed in section 6).</li><li><p>If you are currently taking any of the following medicines:</p><ul><li>rifampicin used to treat some bacterial infections such as tuberculosis</li><li>St. John’s wort (Hypericum perforatum), a herbal remedy used for depression and anxiety, or products that contain it.</li></ul></li><li><p>If any of these apply to you, <strong>do not take Biktarvy and tell your doctor immediately.</strong></p></li></ul><p><strong>Warnings and precautions</strong></p><p><strong>Talk to your doctor before taking Biktarvy:</strong></p><ul><li>If you have liver problems or a history of liver disease, including hepatitis. Patients with liver disease including chronic hepatitis B or C, who are treated with antiretrovirals, have a higher risk of severe and potentially fatal liver complications. If you have hepatitis B infection, your doctor will carefully consider the best treatment regimen for you.</li><li><p>If you have hepatitis B infection. Liver problems may become worse after you stop taking Biktarvy.</p></li><li><p>Do not stop taking Biktarvy if you have hepatitis B. Talk to your doctor first. For more details, see section 3, Do not stop taking Biktarvy.</p></li><li><p>If you have had kidney disease or if tests have shown problems with your kidneys. Your doctor may order blood tests to monitor how your kidneys work when starting and during treatment with Biktarvy.</p></li></ul><p><strong>While you are taking Biktarvy</strong></p><p>Once you start taking Biktarvy, look out for:</p><ul><li>Signs of inflammation or infection</li><li>Joint pain, stiffness or bone problems</li></ul><p><strong>If you notice any of these symptoms, tell your doctor immediately. For more information see section 4, Possible side effects.</strong></p><p>There is a possibility that you may experience kidney problems when taking Biktarvy over a long period of time (see Warnings and precautions).</p><p>This medicine is not a cure for HIV infection. While taking Biktarvy you may still develop infections or other illnesses associated with HIV infection.</p><p><strong>Children and adolescents</strong></p><p>Do not give this medicine to children under 2 years of age, or weighing less than 14 kg regardless of age. The use of Biktarvy in children under 2 years of age, or weighing less than 14 kg has not yet been studied. For children and adolescents who weigh 25 kg or more, Biktarvy 50 mg/200 mg/25 mg film-coated tablets are available.</p><p>Loss of bone mass has been reported in some children from 3 to less than 12 years of age who received one of the medicinal products (tenofovir alafenamide) contained in Biktarvy. The effects on long term bone health and future fracture risk in children is uncertain. Your doctor will monitor your child’s bone health as needed.</p><p><strong>Other medicines and Biktarvy</strong></p><p>Tell your doctor or pharmacist if you are taking, have recently taken or might take any other medicines. Biktarvy may interact with other medicines. As a result, the amounts of Biktarvy or other medicines in your blood may change. This may stop your medicines from working properly, or may make any side effects worse. In some cases, your doctor may need to adjust your dose or check your blood levels.</p><p><strong>Medicines that must never be taken with Biktarvy:</strong></p><ul><li>rifampicin used to treat some bacterial infections such as tuberculosis</li><li>St. John’s wort (Hypericum perforatum), a herbal remedy used for depression and anxiety, or products that contain it.</li><li>If you are taking any of these medicines, do not take Biktarvy and tell your doctor immediately.</li></ul><p><strong>Talk to your doctor if you are taking:</strong></p><ul><li>medicines used for treating HIV and/or hepatitis B, containing:<ul><li>adefovir dipivoxil, atazanavir, bictegravir, emtricitabine, lamivudine, tenofovir alafenamide, or tenofovir disoproxil</li></ul></li><li>antibiotics used to treat bacterial infections, containing:<ul><li>azithromycin, clarithromycin, rifabutin or rifapentine</li></ul></li><li>anticonvulsants used to treat epilepsy, containing:<ul><li>carbamazepine, oxcarbazepine, phenobarbital or phenytoin</li></ul></li><li>immunosuppressants used to control your body’s immune response after a transplant, containing ciclosporin</li><li><p>ulcer-healing medicines containing sucralfate</p></li><li><p>Tell your doctor if you are taking any of these medicines. Do not stop your treatment without contacting your doctor.</p></li></ul><p><strong>Get advice from a doctor or pharmacist if you are taking:</strong></p><ul><li>antacids to treat stomach ulcers, heartburn, or acid reflux, containing aluminium and/or magnesium hydroxide</li><li>mineral supplements or vitamins containing magnesium or iron</li><li>Get advice from your doctor or pharmacist before taking Biktarvy if you are taking any of these medicines.</li></ul><p>Antacids and magnesium supplements: you will need to take Biktarvy at least 2 hours before antacids or supplements containing aluminium and/or magnesium. Or you can take Biktarvy with food at least 2 hours after.</p><p>Iron supplements: you will need to take Biktarvy at least 2 hours before iron supplements, or you can take them together with food.</p><p><strong>Pregnancy and breast-feeding</strong></p><ul><li>If you are pregnant or breast-feeding, think you may be pregnant or are planning to have a baby, ask your doctor or pharmacist for advice before taking this medicine.</li><li>Tell your doctor immediately if you become pregnant and ask about the potential benefits and risks of your antiretroviral therapy to you and your child.</li></ul><p>If you have taken Biktarvy during your pregnancy, your doctor may request regular blood tests and other diagnostic tests to monitor the development of your child. In children whose mothers took nucleoside reverse transcriptase inhibitors (NRTIs) during pregnancy, the benefit from the protection against HIV outweighed the risk of side effects.</p><p><strong>Do not breast-feed during treatment with Biktarvy.</strong> This is because some of the active substances in this medicine pass into human breast milk. Breast-feeding is not recommended in women living with HIV because HIV infection can be passed on to the baby in breast milk. If you are breast-feeding, or thinking about breast-feeding, you should discuss it with your doctor as soon as possible.</p><p><strong>Driving and using machines</strong></p><p>Biktarvy can cause dizziness. If you feel dizzy when taking Biktarvy, do not drive or ride a bicycle and do not use any tools or machines.</p><p><strong>Biktarvy contains sodium</strong></p><p>This medicine contains less than 1 mmol sodium (23 mg) per tablet, that is to say essentially ‘sodium-free’.</p></div>"

In [12]:
nlp.get_pipe("medspacy_target_matcher").add(target_rules)

doc = nlp(text)

visualize_ent(doc)