## 🐍 Install Python dependencies

The following cells create a pip requirements file and then install the libraries.

In [None]:
%%writefile requirements.txt

olive-ai
onnxruntime-genai==0.6.0rc1
transformers==4.44.2
optimum==1.23.1
bitsandbytes==0.45.1
peft==0.13.2
accelerate>=0.30.0
scipy==1.14.1
torchvision==0.18.1
tabulate==0.9.0

In [None]:
%%capture

%pip install -r requirements.txt

## 🧠 Inference

The code below creates a test app that consumes the finetuned model in a simple console chat interface. Whilst the inference code uses the Python API for the ONNX Runtime, other language bindings are available in [Java, C#, C++](https://github.com/microsoft/onnxruntime-genai/tree/main/examples).

By default, the code will use the fine-tuned adapter to deliver more concise responses. To see the output of the original model (no fintuning), set `USE_ADAPTER = False`.

In [None]:
import onnxruntime_genai as og

USE_ADAPTER = True

model_path = "./model"

model = og.Model(f'{model_path}')
adapters = og.Adapters(model)
adapters.load(f'{model_path}/adapter_weights.onnx_adapter', "en_medical_reasoning")
tokenizer = og.Tokenizer(model)
tokenizer_stream = tokenizer.create_stream()

prompt_template = """
Below is an instruction that describes a task, paired with an input that provides further context. 
Write a response that appropriately completes the request. 
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a medical expert with advanced knowledge in clinical reasoning, diagnostics, and treatment planning. 
Please answer the following medical question. 

### Question:
{}

### Response:
<think>
"""

question = """
        A 54-year-old construction worker with a long history of smoking presents with swelling in his upper extremity and face, along with 
        dilated veins in this region. After conducting a CT scan and venogram of the neck, what is the most likely diagnosis for the cause of these symptoms?
"""
prompt = prompt_template.format(question, "")
input_tokens = tokenizer.encode(prompt)

# first run without the adapter
params = og.GeneratorParams(model)
params.set_search_options(past_present_share_buffer=False, temperature=0.6, max_length=1200)
generator = og.Generator(model, params)
# set the adapter to active for this response

if USE_ADAPTER:
        generator.set_active_adapter(adapters, "en_medical_reasoning")

generator.append_tokens(input_tokens)

print("Output: ", end='', flush=True)

while not generator.is_done():
        generator.generate_next_token()
        new_token = generator.get_next_tokens()[0]
        print(tokenizer_stream.decode(new_token), end='', flush=True)
print()
print()
del generator