# Episode VII Notebook

The primary purpose of this episode is to connect to Microsoft Azure Storage to show how to upload and download a CSV file.

First, import the packages required to connect to Azure ML.

In [None]:
from azureml.core import Workspace
import azureml.core
import configparser

config = configparser.ConfigParser()
config.read('episode_vii.ini')  

## Connect to the Workspace

In [None]:
ws = Workspace.get(name=config['azure.settings']['workspace_name'], 
                   subscription_id=config['azure.settings']['subscription_id'], 
                   resource_group=config['azure.settings']['resource_group'])
print('Workspace loaded.')

## Have a Look at the CSV Data

The data is nothing special, by design so as to not dilute the Azure Storage topic.

In [None]:
import pandas as pd 
pd.read_csv('./data/regression.csv')

## Connect to the Storage Account and get a Blob Service Client Object

I want to save this CSV to Azure Storage so my fictional FranklyAI team can all see the raw data.

In [2]:
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

storage_connection_string = config['azure.settings']['storage_connection_string']

blob_service_client = None
# Create the BlobServiceClient object which will be used to create a container client
blob_service_client = BlobServiceClient.from_connection_string(storage_connection_string)
print('Got the Blob Service Client object.')

Got the Blob Service Client object.


## Create a Blob Container 

This is what holds the CSV file and is effectively the I/O.

This code only needs to be run once - a container can be reused. NOTE: save the UUID, which becomes the name
of your container. Put this in the .ini file for later use so you don't have to create the container each time.

In [6]:
import uuid

# Run the following the first time, to create the container.
# Create a unique name for the container
container_name = str(uuid.uuid4())

# Create the container
container_client = blob_service_client.create_container(container_name)
print('Container created:', container_name)

## Get a Container Client from the Blob Service Client and Upload the File

This code assumes the container name from the last code block was saved in the .ini file.

In [None]:
csv_file_name = './data/regression.csv'
container_name = config['azure.settings']['storage_container_name']

container_client = None
for container in blob_service_client.list_containers():
    if container.name == container_name:
        container_client = blob_service_client.get_container_client(container)

csv_file = open(csv_file_name, 'rb')
container_client.upload_blob(name='regression.csv', data=csv_file)
print('Blob upload complete.')

## Get the CSV and Create DataFrame Using Only Data
There's no need to save a .csv file - the stream of data (which can be bytes or a String) is sufficient. No duplicate CSV files!

In [None]:
import pandas as pd 
from azure.storage.blob import BlobProperties
from io import StringIO

download_stream = container_client.download_blob(blob=BlobProperties(name='regression.csv'))
download_str = download_stream.content_as_text()
df = pd.read_csv(StringIO(download_str))

## Look at the Downloaded CSV

Make sure it's OK and the same as the original.

In [None]:
print(df)

# References:
* https://franklyai.medium.com/fix-modulenotfounderror-no-module-named-ruamel-when-importing-azureml-core-7264d1860612
* https://docs.microsoft.com/en-us/azure/machine-learning/how-to-access-data
* https://docs.microsoft.com/en-us/python/api/overview/azure/ml/install?view=azure-ml-py
* https://code.visualstudio.com/docs/python/environments
* https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py
* https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python