# Honours Project Notebook.

This notebook contains the code for my honours project. 

To run the application, `ollama` must be installed on the system. The `ollama` daemon must be run in the background using `ollama serve`. 

In [4]:
# Dependencies:
import ollama
import pandas as pd
import nltk
import re

nltk.download("punkt_tab")

[nltk_data] Downloading package punkt_tab to
[nltk_data]     /Users/harryk/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


True

In [5]:
MODELNAME = "deepseek-r1:1.5b"

SYSTEM = """
You help correct radiology reports. 
"""

INCONSISTENCIES = """Only point out inconsistencies in the report. Use the template below.
\n\n """

INCONSISTENCY_TEMPLATE = (
    """\n Inconsistency: { } \n This does not make sense with this line: { }"""
)

dataframe = pd.read_csv(
    "datasets/Clinical Info, Removed Correction, Issue - Sanitied_Text & Error Type Only.csv"
)

removedCorrection = dataframe["Removed Correction"]

correctedData = dataframe["Re-dictated"]

errors = dataframe["Type of Error"]


for i in range(len(removedCorrection)):
    # print(removedCorrection[i])
    # print("\n\n")
    if removedCorrection[i] == correctedData[i]:
        # print(i)
        # print(removedCorrection[i])
        raise ValueError(
            f"Correction not removed properly in {i} : \n{removedCorrection[i]}"
        )

for i in range(len(errors)):
    errorList = errors[i].split(";")
    errorNumber = len(errorList)
    print(f"{i} : {errorNumber} errors.")
# display(errors)

0 : 2 errors.
1 : 4 errors.
2 : 4 errors.
3 : 2 errors.
4 : 2 errors.
5 : 7 errors.
6 : 4 errors.
7 : 1 errors.
8 : 2 errors.
9 : 2 errors.
10 : 1 errors.
11 : 4 errors.
12 : 6 errors.
13 : 9 errors.
14 : 5 errors.
15 : 1 errors.
16 : 2 errors.
17 : 1 errors.
18 : 3 errors.
19 : 5 errors.
20 : 2 errors.
21 : 1 errors.
22 : 2 errors.
23 : 1 errors.
24 : 1 errors.
25 : 1 errors.


In [20]:
def generate(prompt: str = "") -> tuple[str, str]:
    # Custom ollama generation function in order to split the thinking and the answer of Deepseek.
    response: str = ollama.generate(
        model=MODELNAME,
        system="",
        prompt=prompt,
        options={"temperature": 0},
    )["response"]
    regex = r"<think>(.*?)</think>"
    # Remove the thinking portion of the response.
    thinking = re.findall(pattern=regex, string=response, flags=re.S | re.M)
    # print(f"Thinking : {thinking}")
    responses = re.sub(pattern=regex, repl="", string=response, flags=re.S | re.M)
    # print(responses)
    thinking = thinking[0].strip()
    responses = responses.strip()
    return (thinking,responses)

In [21]:
# Print differences:
from difflib import unified_diff

CURRENT_INDEX = 3

diff = unified_diff(
    removedCorrection[CURRENT_INDEX].splitlines(),
    correctedData[CURRENT_INDEX].splitlines(),
    lineterm="",
)

print("Differences : ")
for i in diff:
    if i.startswith("+"):
        print(i)
# print("\n".join(list(diff)))

Differences : 
+++ 
+70-year-old male (female). Fall  4 days ago with trauma to the left side of the chest/upper limb. New Hb drop 123>> 86.  Significant tenderness in the left chest wall and left iliac fossa. History of diverticulosis, no PR bleeding. Confirmed haematoma in the left arm. To check for source of bleeding in thorax/abdomen
+No acute injury to bony cranium (pelvis).  Degenerative changes in lumbar spine and sacroiliac joints.  No suspicious bony lesion.


In [22]:
tokenisedSentence = nltk.sent_tokenize(removedCorrection[CURRENT_INDEX])
tokenisedSentence = [sentence for sentence in tokenisedSentence if len(sentence) > 1]


PROMPT = f"""\n
  Find inconsistencies with this report: \n {"\n".join(tokenisedSentence)}
"""

# print(PROMPT)
try:
    answer: str = generate(INCONSISTENCIES + "\n\n" + PROMPT + "\n\n"+ INCONSISTENCY_TEMPLATE)
    # Generates a response from the AI.
    # display(responses)
    print(answer[0])
    print(answer[1])
except ConnectionError as ce:
    print(ce)
    print("You can connect to Ollama using ollama serve.")


Okay, so I need to figure out the inconsistencies in the report based on the provided template. Let me start by reading through both the Clinical Information Provided and the Findings sections carefully.

First, looking at the Clinical Information Provided, it mentions a 70-year-old male who fell four days ago with trauma to the left side of the chest/upper limb. The patient had a significant drop in new Hgb of 123>>86, which is quite concerning because a decrease in Hgb can indicate either infection or anemia from other causes like chronic diseases.

Next, there's a history of diverticulosis with no PR bleeding mentioned. Diverticulosis typically involves the small intestine and can cause blood loss if not treated properly. The patient didn't report any PR bleeding, which is a red pain in the leg, so that might be an issue because diverticulosis can lead to such symptoms.

Then, there's a confirmed haematoma in the left arm. Haematoma means a blood vessel, and it's usually seen as a s

In [None]:
# Use LLM for named entity recognition.
imPhrase: str = generate("Isolate any important phrases about the patient. Display as a comma separated list." + "\n".join(tokenisedSentence))

# print(imPhrase[1])

csvPhrases:str = generate(f"Are there any inconsistencies between the individual phrases below? \n {imPhrase[1]}")

print(csvPhrases[1])
