Skip to content

Ensono/stacks-azure-data

Repository files navigation

Stacks Azure Data Platform

Link to the official documentation: Stacks Azure Data Platform.

Overview

The Ensono Stacks Azure Data Platform solution provides a framework for accelerating the deployment of a production-ready modern data platform in Azure.

Ensono Stacks Data Overview

  1. Use the Ensono Stacks CLI to generate a new data platform project.
  2. Build and deploy the data platform infrastructure into your Azure environment.
  3. Accelerate development of data workloads and ELT pipelines with the Datastacks CLI.

The Ensono Stacks Data Platform delivers a modern Lakehouse solution, based upon the medallion architecture, with Bronze, Silver and Gold layers for various stages of data preparation. The platform utilises tools including Azure Data Factory for data ingestion and orchestration, Databricks for data processing and Azure Data Lake Storage Gen2 for data lake storage. It provides a foundation for data analytics and reporting through Microsoft Fabric and Power BI.

Key elements of the solution include:

  • Infrastructure as code (IaC) for all infrastructure components (Terraform).
  • Deployment pipelines to enable CI/CD and DataOps for the platform and all data workloads.
  • Sample data ingest pipelines that transfer data from a source into the landing (Bronze) data lake zone.
  • Sample data processing pipelines performing data transformations from Bronze to Silver and Silver to Gold layers.

The solution utilises the Stacks Data Python library, which offers a suite of utilities to support:

High-level architecture

High-level architecture

Repository structure

stacks-azure-data
├── build           # Deployment pipeline configuration for building and deploying the core infrastructure
├── de_build        # Deployment pipeline configuration for building and deploying data engineering resources
├── de_workloads    # Data engineering workload resources, including data pipelines, tests and deployment configuration
│   ├── generate_examples    # Example config files for generating data engineering workloads using Datastacks
│   ├── ingest               # Data ingestion workloads
│   ├── processing           # Data processing and transformation workloads
│   ├── shared_resources     # Shared resources used across data engineering workloads
├── deploy          # TF modules to deploy core Azure resources (used by `build` directory)
├── docs            # Documentation
├── stacks-cli      # Example config to use when scaffolding a project using stacks-cli
├── utils           # Python utilities package used across solution for local testing
├── .pre-commit-config.yaml         # Configuration for pre-commit hooks
├── Makefile        # Includes commands for environment setup
├── pyproject.toml  # Project dependencies
├── README.md       # This file
├── stackscli.yml   # Tells the Stacks CLI what operations to perform when the project is scaffolded
├── taskctl.yaml    # Controls the independent runner
└── yamllint.conf   # Linter configuration for YAML files used by the independent runner

Terraform Locals

In the directories, deploy/azure/infra and deploy/azure/networking there is a locals.tf file. This file is used to calculate values based on given variable values and / or whether to get the values from a data block.

Additionally it is used to store complicated object variables that have been traditionally stored in the vars.tf file. For example the network_details in the deploy/azure/networking/locals.tf describes all of the networking information required to establish the hub and spoke network.

As the details are stored in the locals file they cannot be overridden using variables, so this will need to be updated with respect to specific network requirements, if the defaults do not work.

Developing the solution

Please refer to the documentation for getting started with developing Stacks: Local Development Quickstart.