# Honours Project Notebook.

This notebook contains the code for my honours project. 

To run the application, `ollama` must be installed on the system. The `ollama` daemon must be run in the background using `ollama serve`. 

In [55]:
# Dependencies:
import ollama
import pandas as pd
import nltk
import re
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /Users/harryk/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

In [56]:
MODELNAME = "deepseek-r1:1.5b"

SYSTEM = """
You help correct radiology reports. 
"""

INCONSISTENCIES = """Only point out inconsistencies in the report. Use the template below.
\n\n """

INCONSISTENCY_TEMPLATE = (
    """\n Inconsistency: { } \n This does not make sense with this line: { }"""
)

dataframe = pd.read_csv(
    "datasets/Clinical Info, Removed Correction, Issue - Sanitied_Text & Error Type Only.csv"
)

removedCorrection = dataframe["Removed Correction"]

correctedData = dataframe["Re-dictated"]

errors = dataframe["Type of Error"]


for i in range(len(removedCorrection)):
    # print(removedCorrection[i])
    # print("\n\n")
    if removedCorrection[i] == correctedData[i]:
        # print(i)
        # print(removedCorrection[i])
        raise ValueError(
            f"Correction not removed properly in {i} : \n{removedCorrection[i]}"
        )

for i in range(len(errors)):
    errorList = errors[i].split(";")
    errorNumber = len(errorList)
    print(f"{i} : {errorNumber} errors.")
# display(errors)

0 : 2 errors.
1 : 4 errors.
2 : 4 errors.
3 : 2 errors.
4 : 2 errors.
5 : 7 errors.
6 : 4 errors.
7 : 1 errors.
8 : 2 errors.
9 : 2 errors.
10 : 1 errors.
11 : 4 errors.
12 : 6 errors.
13 : 9 errors.
14 : 5 errors.
15 : 1 errors.
16 : 2 errors.
17 : 1 errors.
18 : 3 errors.
19 : 5 errors.
20 : 2 errors.
21 : 1 errors.
22 : 2 errors.
23 : 1 errors.
24 : 1 errors.
25 : 1 errors.


In [58]:
# Print differences:
from difflib import unified_diff

CURRENT_INDEX = 3

diff = unified_diff(
    removedCorrection[CURRENT_INDEX].splitlines(), correctedData[CURRENT_INDEX].splitlines(), lineterm=""
)

print("Differences : ")
for i in diff:
  if i.startswith("+"):
    print(i)
# print("\n".join(list(diff)))

Differences : 
+++ 
+70-year-old male (female). Fall  4 days ago with trauma to the left side of the chest/upper limb. New Hb drop 123>> 86.  Significant tenderness in the left chest wall and left iliac fossa. History of diverticulosis, no PR bleeding. Confirmed haematoma in the left arm. To check for source of bleeding in thorax/abdomen
+No acute injury to bony cranium (pelvis).  Degenerative changes in lumbar spine and sacroiliac joints.  No suspicious bony lesion.


In [60]:
tokenisedSentence = nltk.sent_tokenize(removedCorrection[CURRENT_INDEX])
tokenisedSentence = [sentence for sentence in tokenisedSentence if len(sentence) > 1]

print()

PROMPT = f"""\n
  Find inconsistencies with this report: \n {"\n".join(tokenisedSentence)}
"""

# print(PROMPT)
try:
    responses: str = ollama.generate(
        model=MODELNAME,
        system=SYSTEM + INCONSISTENCY_TEMPLATE,
        prompt=INCONSISTENCIES + "\n\n"+ PROMPT + "\n\n"+INCONSISTENCY_TEMPLATE,
        options={"temperature": 0.5},
    )["response"]
    # Generates a response from the AI.
    # display(responses)
    regex = r"<think>(.*?)</think>"
    # Remove the thinking portion of the response.
    thinking = re.findall(pattern=regex, string=responses, flags=re.S | re.M)
    print(f"Thinking : {thinking}")
    responses = re.sub(regex, "", responses, flags=re.S | re.M)
    print(responses)
except ConnectionError as ce:
    print(ce)
    print("You can connect to Ollama using ollama serve.")



Thinking : ["\nOkay, so I need to help correct this radiology report. The user mentioned an inconsistency and provided a template for finding them. Let me see what the report says.\n\nFirst, looking through the clinical information provided. It's about a 70-year-old male who fell four days ago with trauma to his left chest/upper limb. That seems straightforward. Then there are some blood tests: New Hb drop is 123>>86. Wait, that doesn't make sense. Hgb should be the hemoglobin level, not a rate. Maybe it's supposed to be 123<86? But that would mean Hgb increased, which could indicate an infection or something else.\n\nNext, there are findings in the chest: no acute injury to bony thorax or thoracic spine, degenerative changes, etc. That seems consistent. In the abdomen/pelvis, it's a small to moderate volume of hemoperitoneum around spleen and subphrenic space. I'm not entirely sure what that refers to, but maybe it's related to bleeding.\n\nIn the conclusion, it says there was no act