# Example generated by GPT 4o model, to run a Phi-3 model

The `input_data` should be structured according to what the Phi-3 model expects. If the model is a language model, `input_data` would typically be a tokenized representation of a text prompt.

Here's how you might handle text input:

1. **Tokenization**: Convert the text prompt into tokens using a tokenizer compatible with the model.

2. **Input Format**: Ensure the tokens are in the correct format (e.g., a sequence of integers) and shape expected by the model.

Here's an example of how you might modify the script to handle text input:

```python
from transformers import AutoTokenizer
import onnxruntime as ort
import numpy as np

def load_model(model_path):
    session = ort.InferenceSession(model_path)
    return session

def prepare_input(prompt, tokenizer):
    # Tokenize the input prompt
    tokens = tokenizer(prompt, return_tensors='np')
    return tokens['input_ids']

def run_inference(session, input_data):
    input_name = session.get_inputs()[0].name
    outputs = session.run(None, {input_name: input_data})
    return outputs

def main():
    model_path = 'path/to/phi-3-model.onnx'
    tokenizer = AutoTokenizer.from_pretrained('phi-3-tokenizer')  # Replace with actual tokenizer

    session = load_model(model_path)
    
    prompt = "Your text prompt here"
    prepared_input = prepare_input(prompt, tokenizer)
    
    outputs = run_inference(session, prepared_input)
    
    print("Model outputs:", outputs)

if __name__ == "__main__":
    main()
```

### Key Points:

- **Tokenizer**: Use a tokenizer that matches the model's training setup. Replace `'phi-3-tokenizer'` with the appropriate tokenizer identifier.

- **Prompt**: The `prompt` variable is where you input your text prompt.

- **Tokenization**: The `prepare_input` function uses the tokenizer to convert the text into a format suitable for the model.

- **Model Compatibility**: Ensure the tokenizer and model are compatible in terms of vocabulary and tokenization strategy.

In [1]:
from transformers import AutoTokenizer
import onnxruntime as ort
import numpy as np

def load_model(model_path):
    session = ort.InferenceSession(model_path)
    return session

def prepare_input(prompt, tokenizer):
    # Tokenize the input prompt
    tokens = tokenizer(prompt, return_tensors='np')
    input_ids = tokens['input_ids']
    attention_mask = tokens['attention_mask']
    
    # Initialize past_key_values with zeros
    past_key_values = [np.zeros((1, 12, 0, 64), dtype=np.float32) for _ in range(32)]  # Adjust dimensions as needed
    
    return input_ids, attention_mask, past_key_values

def run_inference(session, input_ids, attention_mask, past_key_values):
    input_feed = {
        'input_ids': input_ids,
        'attention_mask': attention_mask
    }
    
    # Add past_key_values to input_feed
    for i, past_key in enumerate(past_key_values):
        input_feed[f'past_key_values.{i}.key'] = past_key
        input_feed[f'past_key_values.{i}.value'] = past_key
    
    outputs = session.run(None, input_feed)
    return outputs

def main():
    model_path = 'C:\\ai\\models\\Phi-3.5-mini-instruct-onnx\\cpu_and_mobile\\cpu-int4-awq-block-128-acc-level-4\\phi-3.5-mini-instruct-cpu-int4-awq-block-128-acc-level-4.onnx'
    tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3.5-mini-instruct")  # Replace with actual tokenizer

    session = load_model(model_path)
    
    prompt = "Tell me a joke"  # Hello World of Prompts
    input_ids, attention_mask, past_key_values = prepare_input(prompt, tokenizer)
    
    outputs = run_inference(session, input_ids, attention_mask, past_key_values)
    
    print("Model outputs:", outputs)

if __name__ == "__main__":
    main()

  from .autonotebook import tqdm as notebook_tqdm
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.


: 