Skip to content

Latest commit

 

History

History
136 lines (96 loc) · 11.6 KB

Deploy-FhirToDatalake.md

File metadata and controls

136 lines (96 loc) · 11.6 KB

FHIR to Synapse Sync Agent

FHIR to Synapse Sync Agent enables you to perform Analytics and Machine Learning on FHIR data by moving FHIR data to Azure Data Lake in near real time and making it available to a Synapse workspace.

It is an Azure Container App that extracts data from a FHIR server using FHIR Resource APIs, converts it to hierarchical Parquet files, and writes it to Azure Data Lake in near real time. This solution also contains a script to create External Tables and Views in Synapse Serverless SQL pool pointing to the Parquet files. For more information about External Tables and Views, see Data mapping from FHIR to Synapse.

This solution enables you to query against the entire FHIR data with tools such as Synapse Studio, SSMS, and Power BI. You can also access the Parquet files directly from a Synapse Spark pool. You should consider this solution if you want to access all of your FHIR data in near real time, and want to defer custom transformation to downstream systems.

Note: An API usage charge will be incurred in the FHIR server if you use this tool to copy data from the FHIR server to Azure Data Lake.

Deployment

Prerequisites

  • An instance of Azure API for FHIR, FHIR server for Azure, or the FHIR service in Azure Healthcare APIs. The pipeline will sync data from this FHIR server.
  • A Synapse workspace.

Steps at high level

  1. Deploy the pipeline to Azure Container App using the given ARM template.
  2. Provide access of the FHIR service to the Container App that was deployed in the previous step.
  3. Verify that the data gets copied to the Storage Account. If data is copied to the Storage Account, then the pipeline is working successfully.
  4. Provide access of the Storage Account and the Synapse workspace to your account for running the PowerScript mentioned below.
  5. Provide access of the Storage Account to the Synapse Workspace to access the data from Synapse.
  6. Run the provided PowerShell script that creates following artifacts:
    1. Resource specific folders in the Azure Storage Account.
    2. A database in Synapse serverless pool with External Tables and Views pointing to the files in the Storage Account.
  7. Query data from Synapse Studio.

1. Deploy the pipeline

  1. To deploy the FHIR to datalake sync pipeline, use the button below to deploy through the Azure Portal.

    Or you can browse to the Custom deployment page in the Azure portal, select Build your own template in the editor, then copy the content of the provided ARM template to the edit box and click Save.

    The deployment page should open the following form.

    image

  2. Fill the form based on the table below and click on Review and Create to start the deployment.

    Parameter Description
    Resource Group Name of the resource group where you want the pipeline related resources to be created.
    Location The location to deploy the FhirToDatalake pipeline.
    Pipeline Name A name for the FhirToDatalake pipeline, need to be unique in your subscription.
    Fhir Server Url The URL of the FHIR server. If the baseUri has relative parts (like http://www.example.org/r4), then the relative part must be terminated with a slash, (like http://www.example.org/r4/).
    Authentication Whether to access the FHIR server with managed identity authentication. Set it to false if you are using an instance of the FHIR server for Azure with public access.
    Fhir Version Version of the FHIR server. Currently only R4 is supported.
    Data Start Start time stamp of the data to be exported.
    Data End Start time stamp of the data to be exported. Leave it empty if you want to periodically export data in real time.
    Container Name A name for the Storage Account container to which Parquet files will be written. The Storage Account with autogenerated name will automatically be created during the installation.
    Job Concurrency Concurrent jobs executing in parallel.
    Scheduler Interval (new) The scheduler interval. To customize the scheduler, set the value to "customized" and specify the value in Scheduler Crontab Expression below.
    Scheduler Crontab Expression (new) Use a crontab expression to set the scheduler. A crontab expression is a six-part crontab format (sec min hour day month day-of-week). Refer to Crontab expressions for pipeline scheduler to learn how to write crontab expression to schedule a job.
    Customized Schema Image Reference The customized schema image reference for the image on Container Registry. Refer TemplateManagement for how to manage your template images.
    Filter Config Image Reference The filter config image reference for the image on Container Registry. Refer TemplateManagement for how to manage your template images.
    Filter Scope For data filtering use. The export scope can be System or Group. The default value is System if no filter is applied.
    Group Id For data filtering use. If the Filter scope is set as Group, you need to fill the group Id, otherwise leave it blank.
    Required Types For data filtering use. Specify which types of resources will be included. For example, type=Patient would return only patient resources. All resource types will be exported if not specified. Leave it blank if no filter is applied.
    Type Filters For data filtering use. Use along with the requiredTypes configuration. The value is a comma-separated list of FHIR queries that further restrict the results. All data of requiredTypes will be exported if not specified. Leave it blank if no filter is applied.
    Image FhirToDatalake Container image to deploy. You need not change this.
    Max Instance Count Maximum number of replicas running for pipeline Container App.
    Storage Account type Azure Storage Account type to deploy.
  3. Ensure to make note of the names of the Storage Account and the Azure Container Apps created during the deployment.

  4. Refer here for more information about using customized schema to handle FHIR Extension.

  5. Refer here for more information about filtering.

NOTE: You can also deploy the FhirToDatalake pipeline to Azure Function, see Deploy-FhirToDatalake-Function for more information.

2. Provide Access of the FHIR server to the Azure Container App

If you are using the Azure API for FHIR or the FHIR service in Azure Healthcare APIs, assign the FHIR Data Reader role to the Azure Container App deployed above.

If you are using the FHIR server for Azure with anonymous access, then you can skip this step.

3. Verify data movement

The Azure Container App runs automatically. You'll notice the progress of the Azure Container App in the Azure portal. The time taken to write the data to the storage account depends on the amount of data in the FHIR server. After the Azure Container App execution is completed, you should have Parquet files in the Storage Account. Browse to the results folder inside the container. You should see folders corresponding to different FHIR resources. Note that you will see folders for only those Resources that are present in your FHIR server. Running the PowerShell script described further below will create folders for other Resources.

blob result

4. Provide privilege to your account

You must provide the following roles to your account to run the PowerShell script in the next step. You may revoke these roles after the installation is complete.

  1. In your Synapse workspace, select Synapse Studio > Manage > Access Control, and then provide the Synapse Administrator role to your account.
  2. In the Storage Account created during the pipeline installation, select the Access Control (IAM) and assign the Storage Blob Data Contributor role to your account.

5. Provide access of the Storage Account to the Synapse Workspace

To enable Synapse to read the data from the Storage Account, assign the Storage Blob Data Contributor role to it. You can do this by selecting Managed Identity while adding members to the role. You should be able to pick your Synapse workspace instance from the list of managed identities shown on the portal.

6. Run the PowerShell script

Running the PowerShell script that creates following artifacts:

  1. Resource specific folders in the Azure Storage Account.
  2. A database in Synapse serverless SQL pool with External Tables and Views pointing to the files in the Storage Account.

To run the PowerShell Script, perform the following steps:

  1. Clone this FHIR-Analytics-Pipelines repo to your local machine.
  2. Open the PowerShell console, ensure that you have the latest version of the PowerShell 7 or Powershell 5.1.
  3. Install Powershell Az and separated Az.Synapse modules if they don't exist.
    Install-Module -Name Az
    Install-Module -Name Az.Synapse
  4. Install Powershell SqlServer module if it doesn't exist.
    Install-Module -Name SqlServer
  5. Sign in to your Azure account to the subscription where synapse is located.
    Connect-AzAccount -SubscriptionId 'yyyy-yyyy-yyyy-yyyy'
  6. Browse to the scripts folder under this path (..\FhirToDataLake\scripts).
  7. Run the following PowerShell script.
    ./Set-SynapseEnvironment.ps1 -SynapseWorkspaceName "{Name of your Synapse workspace instance}" -StorageName "{Name of your storage account where Parquet files are written}".
    For more details, refer to the complete syntax in Set-SynapseEnvironment Syntax.

7. Query data from Synapse Studio

Go to your Synapse workspace serverless SQL pool. You should see a new database named fhirdb. Expand External Tables and Views to see the entities. Your FHIR data is now ready to be queried.

As you add more data to the FHIR server, it will be fetched automatically to the Data Lake and become available for querying.