Skip to content

shawnweisfeld/AzureStorageTierDemo

AzureStorageTierDemo

The goal of this sample application is to show an efficient way to queue all the objects in an Azure Storage Container for moving from Archive to Hot or Cool storage. The goal is to enqueue objects to be moved from Archive to Hot or Cool as fast as possible. The time it takes to do the actual move will depend on a number of factors.

More info on the rehydration process can be found here

The sample leverages:

  1. Multi-threaded Architecture to increase total throughput. Threads are spawned based on the naming convention in your storage account using an / as the delimiter, see here for more info.
  2. Use of the Batch API to reduce calls to Azure
  3. Deployment to an Azure Container Instance to reduce network latency vs running over the internet

Setup

This application makes use of Log Analytics and Application Insights for logging. Follow the instructions here to deploy both of these services.

You will also need a storage account filled with objects that you want to move from Archive to Hot/Cool.

For ease of deployment I publish a build of this code to my Docker Hub account. However, please feel free to download and compile the code yourself.

Deploy to ACI

  • I will typically run this from the bash Azure Cloud Shell.
  • Update the values to match your environment

Explanation of what each of the arguments in the deployment script do

az container create \
    --name "azurestoragetierdemo" \ <-- the name you want to use for the ACI instance
    --resource-group "tierdemo" \ <-- the name of the resource group that you want to deploy the ACI instance to
    --location southcentralus \ <-- the region you want to deploy the ACI instance to (this should match the region the storage account is in)
    --cpu 2 \ <-- the amount of CPU to give your ACI instance
    --memory 4 \ <-- the amount of Memory to give your ACI instance
    --image "sweisfel/azurestoragetierdemo:latest" \ <-- the docker container to deploy into the ACI instance
    --restart-policy Never \ <-- We dont want ACI to restart if it fails/finishes as every time we run the container it starts over
    --no-wait \ <-- We dont want the CLI to wait for the deployment to complete before it returns control back to you
    --environment-variables \
        APPINSIGHTS_INSTRUMENTATIONKEY="key" \ <-- the key from App Insights (its a GUID)
        StorageConnectionString="connection string" \ <-- the connection string to your storage account (looks like this "DefaultEndpointsProtocol=https;AccountName=foo;AccountKey=bar;EndpointSuffix=core.windows.net")
        Container="container name" \ <-- the storage container that your blobs are in
        Prefix="" \ <-- the path to the blobs if you only want to scan a subset of the container
        WhatIf="false" \ <-- runs the app in whatif mode, in this mode it will do the scan, but not make any changes
        ThreadCount="0" \ <-- set to 0 to use the default number of threads, or enter a custom value
        TargetAccessTier="Hot" <-- what tier you want the blobs moved to, either "Hot" or "Cool"

Script without comments for you to copy/paste

az container create \
    --name "azurestoragetierdemo" \
    --resource-group "my rg" \
    --location southcentralus \
    --cpu 2 \
    --memory 4 \
    --image "sweisfel/azurestoragetierdemo:latest" \
    --restart-policy Never \
    --no-wait \
    --environment-variables \
        APPINSIGHTS_INSTRUMENTATIONKEY="my key" \
        StorageConnectionString="my connection string" \
        Container="container name" \
        Prefix="" \
        WhatIf="false" \
        ThreadCount="0" \
        TargetAccessTier="Hot"

Tips

  • Deploy the ACI instance to the SAME region your storage account is in. This will reduce network latency on the calls between the app and the storage account.
  • Running the application in WhatIf mode is a good way to get an idea of if the files have been read off of archive and put back in your tier of choice (hot/cool). However, it needs to scan each object in the container to do this. For larger containers this will take time and consume storage transactions.
  • You can rerun the above bash script with different values for your environmental variables to change them, without needing to delete and recreate the ACI instance.
  • You can run multiple instances of ACI (with different names) if you want to process multiple storage accounts/containers at the same time.

Monitoring

You can run this Kusto query against the Log Analytics Workspace to see logs for the last run.

let runs = AppDependencies | summarize TimeGenerated=max(TimeGenerated) by run=tostring(Properties["Run"]) | order by TimeGenerated | take 1 ;
AppDependencies
| where tostring(Properties["Run"]) in (runs)
| extend run=tostring(Properties["Run"])
| order by TimeGenerated 

Clean up

When you finish you can delete the ACI instances, App Insights and Log Analytics. However, you might want to hang onto the App Insights and Log Analytics logs incase you might need the info later or might need to do another run later.

About

No description, website, or topics provided.

Resources

License

CC-BY-4.0, MIT licenses found

Licenses found

CC-BY-4.0
LICENSE
MIT
LICENSE-CODE

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published