page_type | languages | name | description | products | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
sample |
|
Azure Functions ETL application with Python |
The server application demonstrates how to use Azure Functions as part of an ETL (extract-transform-load) pipe line. Search Bing news, clean the results, and store in Azure Data Lake. |
|
The server application demonstrates how to use Azure Functions as part of an ETL (extract-transform-load) pipe line.
- Search news with Bing News search service
- Save search results to Azure Blob Storage service
- Clean search results
- Store in Azure Data Lake service
This project uses the following:
- Azure Functions
- HTTP Trigger: http://localhost:7071/api/search?search_term=seattle&count=5
- Search
- Get Bing news authentication key from Azure Key vault
- Query Bing news search for top 5 results for
seattle
- Get results as JSON
- Save
- Get Blob Storage authentication with DefaultAzureCredential from environment
- Save JSON results into Blob storage with name like
search_results_seattle_OmD9AQrCJvjieqd.json
- Success in debug console looks like
Executed 'Functions.api_search' (Succeeded, Id=989f1745-e1a8-4d31-845b-293c8de2601b, Duration=3810ms)
- Search
- Blob Trigger: triggered on file upload to container listed in
function.json
- Get Blob
- Get Blob Storage authentication with connection string named in
function.json
- Read blob contents passed in to function
- Get Blob Storage authentication with connection string named in
- Clean data for each article
- Send JSON to Azure Data lake
- Get Data lake authentication with DefaultAzureCredential from environment
- Save JSON to Data lake with name like
processed_search_results_seattle_OmD9AQrCJvjieqd.json
- Success in debug console looks like
Executed 'Functions.api_blob_trigger' (Succeeded, Id=9f8e23e9-d61f-41e2-af2f-4898ce4562f4, Duration=5594ms)
- Get Blob
- HTTP Trigger: http://localhost:7071/api/search?search_term=seattle&count=5
- Azure Blob Storage
- Store initial search results as JSON file
- Azure Data Lake Gen 2
- Store final process search results
- Azure Key Vault
- Securely store Bing Search key
- Bing Search
- Search Bing News
- Python 3.9
- Node.js LTS
- Azure resources
-
Local Identity (either User Identity or Service Principal)
- User identity (your Azure identity), signed in with Azure CLI
- Azure service principal for local development
-
Save service principal information to local.settings.json
"AZURE_CLIENT_ID":"", "AZURE_TENANT_ID":"", "AZURE_CLIENT_SECRET":"", "AZURE_SERVICE_PRINCIPAL_NAME":""
-
-
Key Vault
-
Save Key Vault information to local.settings.json
"KEY_VAULT_RESOURCE_NAME": "",
-
You don't need to change these settings in local.settings.json
"KEY_VAULT_SECRET_NAME": "bing-search-sub-key1",
-
-
Bing Search
-
Save Bing Search v7 key to Key Vault secret with name
bing-search-sub-key1
-
You don't need to change these settings in local.settings.json
"BING_SEARCH_URL": "https://api.bing.microsoft.com/v7.0/news/search", "BING_SEARCH_KIND": "Bing.Search.v7", "BING_SEARCH_KIND_NAME": "Bing Search",
-
-
Azure Blob Storage
-
You can use a single Blob Storage resource as long as hierarchical directories is enabled. This sample assumes Gen 2 Data Lake.
-
Save Blob Storage for search results to local.settings.json
"BLOB_STORAGE_RESOURCE_NAME": "", "BLOB_STORAGE_CONTAINER_NAME": "", "AzureWebJobsStorage": "",
The AzureWebJobsStorage is used by the Function Blob Trigger to access Blob Storage.
-
-
Azure Data Lake
-
Set Data Lake resource values in local.settings.json
"DATALAKE_GEN_2_RESOURCE_NAME": "", "DATALAKE_GEN_2_CONTAINER_NAME": "", "DATALAKE_GEN_2_DIRECTORY_NAME": "",
-
-
virtualenv --python="/usr/local/bin/python3.10" .venv
-
Install Azure Functions core tools for local development
npm i -g azure-functions-core-tools@4 --unsafe-perm true
-
Install Azurite for storage emulation
npm install -g azurite
-
Change directory to Azure Functions App folder
cd AzureFunctionsApp
-
Create virtual environment
virtualenv --python="/usr/local/bin/python3.9" .venv
-
Activate virtual environment
source .venv/bin/activate
-
Install Python packages
pip install -r requirements-dev.txt
-
Start local storage emulation
azurite -s -l azurite -d azurite\debug.log
-
Start function
func start
- Extraneous names: The service principal name and the Bing Search service name and kind aren't necessary in the
local.settings.json
for this sample application to work. These values are helpful when you need to:- Assign the service principal in the IAM of a resource
- Verify the correct Bing Search service was created
- Logging: logging is disabled in the
./host.json
file with thelogging.logLevel.default
property set to error. To see verbose logs, change the value toInformation
.