Move to Azure efficiently with customized guidance from Azure engineering. FastTrack for Azure – Benefits and FAQ | Microsoft Azure
This repository contains a template Azure Synapse Analytics project designed to ingest data that is available through REST APIs.
- Use GitHub Actions to deploy all the Azure resources you need to get started with API data ingestion in a matter of minutes
- Explore the OpenAQ API and get insights from sensor data
- Understand watermarking and metadata-driven data ingestion techniques
- Use the templates a starting point to integrate our own APIs
Bringing data from third party APIs into a Data Lake for further processing and reporting is a very common scenario in Analytics. Azure Synapse and Data Factory pipelines contain dozens of different connectors to help you integrate these data sources, but what happens when you need to integrate a custom API, or one that does not have a built-in connector? For those cases, the REST connector can help you bring in data from virtually any REST API, without depending on a specific connector.
In this repository, you will find tools to add a custom API and incrementally load data from it. We have used the OpenAQ API as an example so you can quickly test the pipelines. The API exposes air quality data collected from multiple sensors. In this example, we will copy this data to our own Azure Data Lake and perform simple analytics and visualizations to prove out the concepts.
You may the adapt the pipelines to collect data from any REST API of your choice, even private endpoints behind firewalls and authentication.
To execute the steps below, you will need:
- An active Azure subscription
- Contributor access to a resource group
The following pipelines are available for you to test and integrate your own APIs:
- 1_OpenAQ_Set_Up_Metadata: Checks if the metadata table exists and creates it if not.
- 2_OpenAQ_Incremental_Load: Loads one chunk of data from the API for one sensor.
- Starts from the current watermark, or the MinDate parameter if no watermark is found.
- Loads up until an amount of minutes determined by the DeltaMinutes parameter. Will fail if this puts the end of the window past the MaxDate parameter.
- Saves data in parquet format, partitioning by manufacturer, device, year, month and day. Check out the Copy Data activity for more details.
- Updates the watermark date to the end of the window loaded.
- 3_OpenAQ_Sync: Sets up metadata and runs Incremental Load for one sensor until the MaxDate parameter is reached.
- This will load all new data, from the current watermark until MaxDate.
- Please monitor this pipeline closely, as this can be a large operation. The default parameters for MinDate and MaxDate will not consume significant resources.
- 4_OpenAQ_Metadata_Driven: Lists sensors and runs 3_OpenAQ_Sync for each one.
- The default parameters will limit the listing operation to 20 sensors. In a real-world scenario, you'd have all active sensors being listed and sync'ed.
- 5_OpenAQ_Cleanup_Location: Convenience pipeline to clear metadata and parquet files for a specific sensor. Useful for testing.
- 6_OpenAQ_Full Load: Runs Cleanup and Sync operations in sequence for one sensor. Should rarely be used, only in case metadata gets out of sync or for testing purposes.
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.