# Local Inference with Azure Foundry Local

**Placeholder** This notebook demonstrates how to use Azure Foundry Local to run inference with your optimized model on your local machine. Azure Foundry Local provides a simple, containerized way to serve and interact with large language models, including those you have fine-tuned and exported from Azure ML.

![](../../lab_manual/images/step-4.png)

## What You'll Learn
- How to install and configure Azure Foundry Local
- How to launch a local model server using Foundry
- How to send prompts and receive completions from your model
- How to use the Foundry Python SDK for local inference


## 1. Prerequisites
- Completed the previous notebooks and have a model exported in ONNX or supported format (see 05.Local_Download.ipynb)
- Windows, macOS, or Linux
- Python 3.10+ installed locally
- Sufficient disk space and memory for your model

## References
- [Azure Foundry Local Documentation](https://github.com/microsoft/Foundry-Local/tree/main/docs)


## 2. Prepare Your Model and Config for Foundry Local

- Ensure your model and any adapters (such as LoRA) are exported in a format supported by Foundry Local (e.g., ONNX, GGUF, or HuggingFace Transformers format).
- You have successfully downloaded a onnx model from notebook 6. to `fine-tuning-phi-4-mini-onnx-int4-cpu`
- We now need to copy the contents of the `model` folder `fine-tuning-phi-4-mini-onnx-int4-cpu` to the `LocalFoundryEnv/models` folder
- Create or update an `inference_model.json` config file in the models directory, following the [Foundry Local model config guide](https://github.com/microsoft/Foundry-Local/blob/main/docs/model-config.md).

Example `inference_model.json`:

```
{
    "Name": "phi-4-mini-reasoning-onnx",
    "PromptTemplate": {
      "assistant": "{Content}",
      "prompt": "<|user|>Explain the Pythagorean Theorem<|end|><|assistant|>"
    }
}
```

> Tip: If you used 05.Local_Download.ipynb, your model files should already be in a suitable directory. Just add or edit the config file as above.


## Open the intergrated terminal in VScode 

Right Click on the LocalFoundryEnv folder and select open in intergrated terminal.

In VSCode you should now see a interfrated terminal with 
PS C:\Users\LabUser\Desktop\lab\Build25-LAB329\Lab329\LocalFoundryEnv>




## 4. Running Fine-tuning Model with Foundry Local


```bash

foundry cache cd models  

foundry cache list

foundry model run model --verbose 

```

You can copy this question to answer


```txt

Answer the following multiple-choice question by selecting the correct option.\n\nQuestion: Sammy wanted to go to where the people were.  Where might he go?\nAnswer Choices:\n(A) race track\n(B) populated areas\n(C) the desert\n(D) apartment\n(E) roadblock

```


## 5. Explore Foundry Local CLI commands
The foundry CLI is structured into several categories:

- Model: Commands related to managing and running models
- Service: Commands for managing the AI Foundry Local service
- Cache: Commands for managing the local cache where models are stored
- To see all available commands, use the help option: `foundry --help`

## 6. Try Your Own Questions

You can now use the `client` object to send any prompt to your local model. Try with your own multiple-choice questions or other tasks supported by your model.


## 7. Next Steps

- Explore more advanced prompt engineering and system instructions
- Benchmark your model's performance locally
- Integrate the local Foundry server into your applications
- For more details, see the [Foundry Local documentation](https://github.com/microsoft/Foundry-Local/tree/main/docs)


**Congratulations!** You have successfully run local inference with your optimized model using Azure Foundry Local.

