This repository demonstrates prompt optimization using both Local LLMs and an Azure AD Service Principal.
A dataset was created to identify Personally Identifiable Information (PII) using ChatGPT with the following prompt:
Create an Excel file that contains three columns: 'id', 'text', and 'contains_pii'.
Add around 1,000 rows of text in the 'text' column, ensuring some rows contain PII and others do not.
Fill the 'contains_pii' column with 'yes' or 'no' accordingly. Be consistent.
The generated dataset is saved in the data folder.
Using this dataset, we aim to optimize prompts by:
- Running a local LLM.
- Utilizing a GPT-4o endpoint with an Azure AD Service Principal.
This section demonstrates how to optimize prompts using a locally hosted LLM. The deepseek-r1:8b model has already been downloaded.
- Start the Ollama server:
ollama serve - Run the first part of the notebook.
This section demonstrates how to optimize prompts using a GPT-4o endpoint with an Azure AD Service Principal.
- TENANT_ID: Your Azure tenant ID.
- CLIENT_ID: Your Azure client ID.
- CLIENT_SECRET: Your Azure client secret.
- AZURE_OPENAI_ENDPOINT: The endpoint URL, e.g.,
https://gpt4o.... - AZURE_OPENAI_DEPLOYMENT: The deployment name (not the raw model name), e.g.,
gpt-4o. - AZURE_API_VERSION: The API version, e.g.,
2023-05-15.
prompt_optimization/
├── data/ # Contains the PII dataset
├── notebooks/ # Jupyter notebooks for optimization
├── README.md # Project documentation
- Ensure the dataset is available in the
datafolder. - Follow the steps in the notebook to perform prompt optimization.
This project is licensed under the MIT License. See the LICENSE file for details.