This repository demonstrates how to implement a Machine Learning Development and Operations (MLOps) process for Azure AI Search applications that use a pull model to index data. It creates an indexer with two custom skills that pull pdf documents from a blob storage container, chunks them, creates embeddings for the chunks and then adds the chunks into an index. Finally, it performs search evaluation for a collection of data and uploads the results to an AI Studio project so that evaluations can be compared across multiple runs to continue improving the custom skills.
- Azure AI Search
- Azure OpenAI
- Azure AI Studio project
- Azure Function App
- For the best performance of the skillset functions and slot deployments, it is recommended to use an App Service Plan with a level of at least
Standard S3
- For the best performance of the skillset functions and slot deployments, it is recommended to use an App Service Plan with a level of at least
- Azure Storage Account
Below are some key folders within the project:
- src/custom_skills: Contains the function app which has the chunking and embedding skillset functions used by the indexer
- mlops: Contains the scripts for implmenting MLOPs flows
- config: Configuration for the MLOPs scripts
- data: Sample data for testing the indexer
- .github: GitHub workflows that can be used to run an MLOPs pipeline
- .devcontainer: Contains a development container that can help you work with the repo and develop Azure functions
Additionally, the root folder contains some important files:
- .env.sample: The file should be renamed to
.envand sensitive parameters (parameters that cannot be hardcodeded inconfig.yaml) should be populated here. - setup.cfg: The repo uses strict rules to validate code quality using flake8. This file contains applied rules and exceptions.
- requirements.txt: This file lists all the packages that the repo is using.
The deployment scripts and github workflows use the git branch name to create a unique naming scheme for all of the deployed entities.
- Create an
.envfile based on.env.sampleand populate the appropriate values. - Modify
config/config.yamlto meet any changes that have been made within the project.
Sample pdfs are available in data to use for indexer testing. To upload the data to blob storage, use the following:
python -m mlops.deployment_scripts.upload_dataThe following deployment script will deploy the custom skillset functions to a function app deployment slot and poll the functions until they are ready to be tested:
python -m mlops.deployment_scripts.deploy_azure_functionsTo test the two skillset functions after they are deployed, run the following script:
python -m mlops.deployment_scripts.run_functionsMore information aboud local development of skillset functions can be found in the custom skills readme.
An indexer is composed for four entities: index, datasource, skillset, and indexer. The configuration for each is defined by the files in mlops/acs_config. To deploy the indexer and commence indexing the data in blob storage, run the following:
python -m mlops.deployment_scripts.build_indexerThis will perform search evaluation and upload the result to the AI Studio project specified. For more information about evaluation, see the search evaluation readme.
python -m mlops.evaluation.search_evaluation --gt_path "./mlops/evaluation/data/search_evaluation_data.jsonl" --semantic_config my-semantic-configSince the git branch name was used to create the deployed entities, this deployment script will clean up everything by deleting the deployment slot in the function app and the indexer entities.
python -m mlops.deployment_scripts.cleanup_prThis project contains github workflows for PR validation and Continuous Integration (CI).
The PR workflow executes quality checks using flake8 and unit tests. It then deploys the skillset functions to a deployment slot of the function app. Once the functions are deployed and tested, an indexer is deployed and all of the test data is ingested from blob storage. Search evaluation is run and uploaded to an AI Studio project.
The CI workflow executes a similar workflow to the PR workflow, but the skillset functions are deployed to the main function app, not a deployment slot.
In order for the cleanup step of the CI Workflow to work correctly, the development branch from a pull request must not be deleted until the cleanup step has run.
Some variables and secrets should be provided to execute the github workflows (primarily the same ones used in the .env file for local execution).
- azure_credentials
- subscription_id
- resource_group_name
- storage_account_name
- acs_service_name
- acs_api_key
- aoai_base_endpoint
- aoai_api_key
- ai_studio_project_name
- mlops-promptflow-prompt - This repository demonstrates how AI Studio and Prompt flow can be utilized in the Machine Learning Development and Operations (MLOps) process for LLM-based applications (aka LLMOps). It has base examples for inference evaluation using Prompt flow. When combined with mlops-aisearch-pull for search evaluation, a full end-to-end MLOPs workflow can be achieved.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.