# Clinical Report Structurization
Extract structural information from clinical reports based on specific questions.

## Process
1. Prepare inputs, including a system prompt (instructions) and a user prompt (report and questions).
2. Send the prompts to the GPT through the OpenAI API and receive the response in JSON string.
3. Validate the response using `json.loads` and Pydantic.

## Concept JSON Schema
```python
[
    {
        "qid": int,  # index of the question
        "basis": str,  # fact in the report according to
        "answer": {type}  # {instruction for answering}
    },
    {
        ...
    },
    ...
]
```


## Resources
-  OpenAI API Docs: https://platform.openai.com/docs/guides/text-generation

In [None]:
import json

from openai import OpenAI


PATH_TO_ACCESS_KEY: str = 'openai-key'

with open(PATH_TO_ACCESS_KEY) as f:
    access_key = f.read()
    

client = OpenAI(api_key=access_key)

def chat(client: OpenAI, dialog: list[dict[str, str]]) -> str:
    result = client.chat.completions.create(
        model='gpt-4o', 
        messages=dialog,
        top_p=0.0  # no randomness
    )
    return result.choices[0].message.content

In [2]:
# (1) Prepare

system_prompt = """
You are an assistant designed to convert a clinical report into JSON-formatted outputs based on specific questions, following the provided schema and the listed instructions.
```python
[
    {
        "qid": int,  # index of the question,
        "basis": str,  # fact in the report according to,
        "answer": int  # the index of the chosen option
    },
    {
        ...
    },
    ...
]
```
- Do not use the markdown syntax to wrap the JSON output.
"""

user_prompt = """
Report:
{{report}}

Question:
{{questions}}
"""

report = r"""
 1. The stress imaging:
   The stress imaging following dipyridamole I.V. infusion and   
   the post-dipyridamole SPECT images reveal:
    
   Moderate hypoperfusion over the apex, septal, inferior and  
   lateral walls of LV. 
    
   (extent: 20% LAD, 30% of RCA & 20% of LCx territories) 
 2. The resting imaging: 
    
   As compared to the stress images, this 4-hour redistribution 
   images reveal partial reperfusion to the aforementioned areas. 
    
 Conclusion : 
     
    1. The current study demonstrates partially reversible,
       moderate hypoperfusion over the apex, septal, inferior
       and lateral walls of LV, suggests moderate CAD involving 3VD 
       territories. Please evaluate clinically. 
     
    2. We would like to follow up closely. 

"""

questions = r"""
Please answer the following questions based on the TL-201 report information.

Options for the following questions: 
1. No
2. Minimal
3. Mild
4. Moderate
5. Severe
6. Extensive

Q1. Perfusion defect severity in Basal Anterior (apex)?
Q2. Perfusion defect severity in Basal Anteroseptal
Q3. Perfusion defect severity in Basal Inferoseptal
"""

user_prompt = user_prompt.replace('{{report}}', report)
user_prompt = user_prompt.replace('{{questions}}', questions)

In [3]:
# (2) Send and receive
dialog = [
    {'role': 'system', 'content': system_prompt},
    {'role': 'user', 'content': user_prompt}
]
output = chat(client, dialog)

print(output)

[
    {
        "qid": 1,
        "basis": "Moderate hypoperfusion over the apex, septal, inferior and lateral walls of LV.",
        "answer": 4
    },
    {
        "qid": 2,
        "basis": "Moderate hypoperfusion over the apex, septal, inferior and lateral walls of LV.",
        "answer": 4
    },
    {
        "qid": 3,
        "basis": "Moderate hypoperfusion over the apex, septal, inferior and lateral walls of LV.",
        "answer": 4
    }
]


In [4]:
# # Validate the json format
# dialog = [
#     {'role': 'system', 'content': system_prompt},
#     {'role': 'user', 'content': user_prompt}
# ]

# while True:
#     output = chat(client, dialog)
    
#     try:
#         structured_answers = json.loads(output)
#         break
#     except json.JSONDecodeError as e:
#         dialog.append({'role': 'assistant', 'content': output})
#         dialog.append({'role': 'system', 'content': f'JSONDecodeError: {e}.'})
#         print(f'JSONDecodeError {e}: request re-generating.')

# structured_answers

In [5]:
# Validate data types of the response
from pydantic import BaseModel

structured_answers = json.loads(output)

class Answer(BaseModel):
    qid: int
    basis: str
    answer: int

answers = [Answer(**a) for a in structured_answers]

answers[0]

Answer(qid=1, basis='Moderate hypoperfusion over the apex, septal, inferior and lateral walls of LV.', answer=4)