Skip to content

renaudjmathieu/situation-urgences

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

page_type languages name description products
sample
python
Azure Functions ETL application with Python
The server application demonstrates how to use Azure Functions as part of an ETL (extract-transform-load) pipe line. Search Bing news, clean the results, and store in Azure Data Lake.
azure
azure-functions
azure-blob-storage
azure-data-lake
azure-data-lake-gen2
azure-data-lake-storage
azure-key-vault
vs-code
bing-search-services

Azure Functions ETL application with Python

The server application demonstrates how to use Azure Functions as part of an ETL (extract-transform-load) pipe line.

Architecture

  • Search news with Bing News search service
  • Save search results to Azure Blob Storage service
  • Clean search results
  • Store in Azure Data Lake service

Azure Services

This project uses the following:

  • Azure Functions
    • HTTP Trigger: http://localhost:7071/api/search?search_term=seattle&count=5
      • Search
        • Get Bing news authentication key from Azure Key vault
        • Query Bing news search for top 5 results for seattle
        • Get results as JSON
      • Save
        • Get Blob Storage authentication with DefaultAzureCredential from environment
        • Save JSON results into Blob storage with name like search_results_seattle_OmD9AQrCJvjieqd.json
      • Success in debug console looks like Executed 'Functions.api_search' (Succeeded, Id=989f1745-e1a8-4d31-845b-293c8de2601b, Duration=3810ms)
    • Blob Trigger: triggered on file upload to container listed in function.json
      • Get Blob
        • Get Blob Storage authentication with connection string named in function.json
        • Read blob contents passed in to function
      • Clean data for each article
      • Send JSON to Azure Data lake
        • Get Data lake authentication with DefaultAzureCredential from environment
        • Save JSON to Data lake with name like processed_search_results_seattle_OmD9AQrCJvjieqd.json
      • Success in debug console looks like Executed 'Functions.api_blob_trigger' (Succeeded, Id=9f8e23e9-d61f-41e2-af2f-4898ce4562f4, Duration=5594ms)
  • Azure Blob Storage
    • Store initial search results as JSON file
  • Azure Data Lake Gen 2
    • Store final process search results
  • Azure Key Vault
    • Securely store Bing Search key
  • Bing Search
    • Search Bing News

Getting Started

Prerequisites

  • Python 3.9
  • Node.js LTS
  • Azure resources
    • Local Identity (either User Identity or Service Principal)

      • User identity (your Azure identity), signed in with Azure CLI
      • Azure service principal for local development
        • Save service principal information to local.settings.json

          "AZURE_CLIENT_ID":"",
          "AZURE_TENANT_ID":"",
          "AZURE_CLIENT_SECRET":"",
          "AZURE_SERVICE_PRINCIPAL_NAME":""         
          
    • Key Vault

      • Save Key Vault information to local.settings.json

        "KEY_VAULT_RESOURCE_NAME": "",
        
      • You don't need to change these settings in local.settings.json

        "KEY_VAULT_SECRET_NAME": "bing-search-sub-key1",
        
    • Bing Search

      • Save Bing Search v7 key to Key Vault secret with name bing-search-sub-key1

      • You don't need to change these settings in local.settings.json

        "BING_SEARCH_URL": "https://api.bing.microsoft.com/v7.0/news/search",
        "BING_SEARCH_KIND": "Bing.Search.v7",
        "BING_SEARCH_KIND_NAME": "Bing Search",            
        
    • Azure Blob Storage

      • You can use a single Blob Storage resource as long as hierarchical directories is enabled. This sample assumes Gen 2 Data Lake.

      • Save Blob Storage for search results to local.settings.json

        "BLOB_STORAGE_RESOURCE_NAME": "",
        "BLOB_STORAGE_CONTAINER_NAME": "",
        "AzureWebJobsStorage": "",
        

        The AzureWebJobsStorage is used by the Function Blob Trigger to access Blob Storage.

    • Azure Data Lake

      • Set Data Lake resource values in local.settings.json

        "DATALAKE_GEN_2_RESOURCE_NAME": "",
        "DATALAKE_GEN_2_CONTAINER_NAME": "",
        "DATALAKE_GEN_2_DIRECTORY_NAME": "",            
        

Installation

virtualenv --python="/usr/local/bin/python3.10" .venv

  1. Install Azure Functions core tools for local development

    npm i -g azure-functions-core-tools@4 --unsafe-perm true
    
  2. Install Azurite for storage emulation

    npm install -g azurite
    
  3. Change directory to Azure Functions App folder

    cd AzureFunctionsApp
    
  4. Create virtual environment

    virtualenv --python="/usr/local/bin/python3.9" .venv
    
  5. Activate virtual environment

    source .venv/bin/activate
    
  6. Install Python packages

    pip install -r requirements-dev.txt
    

Quickstart

  1. Start local storage emulation

    azurite -s -l azurite -d azurite\debug.log
  2. Start function

    func start
    

Troubleshooting

  • Extraneous names: The service principal name and the Bing Search service name and kind aren't necessary in the local.settings.json for this sample application to work. These values are helpful when you need to:
    • Assign the service principal in the IAM of a resource
    • Verify the correct Bing Search service was created
  • Logging: logging is disabled in the ./host.json file with the logging.logLevel.default property set to error. To see verbose logs, change the value to Information.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published