How to download files from Azure Blob Storage in the QESD Platform using Python in a local IDE

To access files stored in Azure Blob Storage within the QESD Platform from a local IDE (such as Visual Studio Code), you need to set up authentication. Azure offers various authentication methods for accessing resources. In the QESD Platform, the preferred authentication method for most users is through Microsoft Entra ID (previously known as Azure Active Directory - Azure AD). Check out the BlobServiceClient class in the azure.storage.blob to explore other features such as writing and uploading blobs to a storage account. 

The following example illustrates how to download a blob from Azure Blob Storage using the Azure SDK for Python. Search for <TODO> for locations where you need to input the appropriate values based on your use case. Please note that currently there is no suitable method available for R.

### Prerequisites:

- Data Reader permissions in the storage account you are trying to access.
- An installation of Python available in your local machine
- An IDE (like VSCode)

In [1]:
# Install required libraries. We recommend using your preferred package manager or virtual environment to install the libraries in the correct environment and avoid conflicts with other packages.

# %pip install azure-identity
# %pip install azure-storage-blob
# %pip install pandas

In [13]:
# In the code below, we import the necessary modules from the Azure SDK and create an instance of `InteractiveBrowserCredential` for interactive authentication. 
# This will prompt you to sign in interactively in the browser if no token is cached or if a device code is required. 
# We then create a `BlobServiceClient` object by providing the account URL and credential.


# Import the necessary modules from the Azure SDK 

from azure.identity import InteractiveBrowserCredential
from azure.storage.blob import BlobServiceClient

# Create an instance of InteractiveBrowserCredential for interactive authentication 
# This will prompt you to sign in interactively and obtain the necessary token
# The credential object will be used to authenticate with Azure services

try:
    credential = InteractiveBrowserCredential()

# Obtain a token from the Azure Active Directory endpoint to access the Azure Storage account
    credential.get_token("https://management.azure.com/.default")

# Check if the given credential can successfully obtain a token for the specified resource
# If an exception occurs during authentication, print the error message

except Exception as ex:
    print(f"Error: {ex}")

# Create a BlobServiceClient object by providing the account URL and credential
# The account URL is the URL of your Azure Blob Storage account
# The credential is the authentication credential used to access the storage account


blob_service_client = BlobServiceClient(
    #TODO: Replace the account_url parameter with the URL of your Azure Blob Storage account, for example "https://soilslakeuat.blob.core.windows.net"
    account_url="https://sboxlakedev.blob.core.windows.net", 
    credential=credential
)


# TODO: Specify the name of the container and the blob you want to download

container_name = "lake-userupload"
blob_name = "wqm17"

# Get a reference to the blob by calling the get_blob_client method of the BlobServiceClient
# Provide the container name and blob name as parameters

blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)


# Download the blob by calling the download_blob method of the blob client
downloaded_blob = blob_client.download_blob()

# TODO: Specify the local file path where you want to save the downloaded blob

local_file_path = "C:/Users/riveraarayam/Downloads/20231218_training_wqm17.csv"

# Open the local file in write binary mode and write the contents of the downloaded blob to the file

with open(local_file_path, "wb") as file:
    file.write(downloaded_blob.readall())

# Print a success message indicating that the blob has been downloaded
print("Blob downloaded successfully.")


Blob downloaded successfully.


# How to read files in a Blob storage account in the QESD Platform from a local IDE using Python (without downloading)

This guide explains how to read files stored in a Blob storage account within the QESD Platform using Python, without the need to download the entire file. By leveraging the Azure Blob Storage SDK for Python, you can access the file's content directly from your local IDE. This approach allows you to efficiently read and process the file's data without the overhead of downloading the entire file. The tutorial provides step-by-step instructions on setting up the necessary authentication, accessing the Blob storage account, and reading the file as a Pandas DataFrame using Python. Search for <TODO> for locations where you need to input the appropriate values based on your use case.


In [None]:
# Import required libraries 

import pandas as pd
from io import BytesIO
from azure.identity import InteractiveBrowserCredential
from azure.storage.blob import BlobServiceClient

# Create an instance of InteractiveBrowserCredential for interactive authentication 
# This will prompt you to sign in interactively and obtain the necessary token
# The credential object will be used to authenticate with Azure services

try:
    credential = InteractiveBrowserCredential()

# Obtain a token from the Azure Active Directory endpoint to access the Azure Storage account
    credential.get_token("https://management.azure.com/.default")
# Check if the given credential can successfully obtain a token for the specified resource
# If an exception occurs during authentication, print the error message

except Exception as ex:
    print(f"Error: {ex}")


# Create a BlobServiceClient object

blob_service_client = BlobServiceClient(
    # TODO: Replace the account_url parameter with the URL of your Azure Blob Storage account, for example "https://soilslakeuat.blob.core.windows.net"
    account_url="https://sboxlakedev.blob.core.windows.net",
    credential=credential
)


# TODO: Specify the name of the container and the blob you want to read

container_name = "lake-userupload"
blob_name = "wqm17"

# Get a BlobClient object for the specified blob

blob_client = blob_service_client.get_blob_client(container=container_name, blob=blob_name)

# Read the blob data as bytes
blob_data = blob_client.download_blob().readall()

# Create a BytesIO object from the blob data
bytes_io = BytesIO(blob_data)

# Read the BytesIO object as a Pandas DataFrame
df = pd.read_csv(bytes_io)

# Use the DataFrame as needed
print(df.head())