# Step 3: Upload audios and transcriptions to Azure Storage

**Content**

* Create Containers
* Upload Audios and Transcriptions
* Check Blob Files

References:
   * Quickstart: Manage blobs with Python v12 SDK: https://docs.microsoft.com/pt-br/azure/storage/blobs/storage-quickstart-blobs-python

In [1]:
! pip install azure-storage-blob



You should consider upgrading via the 'c:\users\blueshift\appdata\local\programs\python\python38\python.exe -m pip install --upgrade pip' command.


In [2]:
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient, __version__
import os, shutil, yaml

Load config

In [3]:
config_file = os.path.join("config","config.yaml")
with open(config_file, 'r') as ymlfile:
    config = yaml.load(ymlfile, Loader=yaml.FullLoader)
    

##### Configurações do Azure Storage
container_name_audios= config['azure_storage']['container_name_audios']
container_name_transcricoes= config['azure_storage']['container_name_transcricoes']
AZURE_STORAGE_CONNECTION_STRING = config['azure_storage']['conn_string']
az_storage_sas_token = config['azure_storage']['sas_token']
az_storage_name = config['azure_storage']['storage_name']
az_storage_uri = "https://{name}.dfs.core.windows.net/{container}/".format(name=az_storage_name, container=container_name_audios)


In [4]:
blob_service_client = BlobServiceClient.from_connection_string(AZURE_STORAGE_CONNECTION_STRING)

## Create Containers
Create two containers:
* audios
* transcricoes

In [5]:
def create_blob_container(connect_str, containers):
    ''' Criaçã de conteiners no Azure Storage'''
    for i in containers:
        # Create the BlobServiceClient object which will be used to create a container client
        blob_service_client = BlobServiceClient.from_connection_string(connect_str)
        try:
            # Create the container
            container_client = blob_service_client.create_container(i)
        except Exception as e:
            if e.error_code == 'ContainerAlreadyExists':
                print ("The container {} had already been created.".format(i))
            else:
                print("Container {}: {}".format(i,e.message))
        else: 
            print("Container {} successfully added.".format(i))

In [6]:
create_blob_container(AZURE_STORAGE_CONNECTION_STRING, [container_name_audios, container_name_transcricoes])


Container audios successfully added.
Container transcricoes successfully added.


## Upload Audios and Transcriptions

In [7]:
def list_files(dir):
    '''Listar arquivos em um diretório específico no SO'''
    return [f for f in os.listdir(dir) if os.path.isfile(os.path.join(dir, f))]

In [8]:
def upload_files_to_storage(container_name):
    for i in list_files(container_name):
        blob_client = blob_service_client.get_blob_client(container=container_name, blob=i)
        print("\nUploading to Azure Storage as blob:\n\t" + i)
        
        # Upload the created file
        with open(os.path.join(container_name, i), "rb") as data:
            blob_client.upload_blob(data, overwrite=True)

In [10]:
#Audios
upload_files_to_storage(container_name_audios)



Uploading to Azure Storage as blob:
	id_1.wav

Uploading to Azure Storage as blob:
	id_2.wav

Uploading to Azure Storage as blob:
	id_3.wav

Uploading to Azure Storage as blob:
	id_4.wav

Uploading to Azure Storage as blob:
	id_5.wav


In [11]:
#Transcrições
upload_files_to_storage(container_name_transcricoes)



Uploading to Azure Storage as blob:
	id_1.json

Uploading to Azure Storage as blob:
	id_2.json

Uploading to Azure Storage as blob:
	id_3.json

Uploading to Azure Storage as blob:
	id_4.json

Uploading to Azure Storage as blob:
	id_5.json


## Check Blob Files


In [12]:
def list_blobs_from_container(container_name):
    '''Listar arquivos dentro de um container do Azure Storage'''
    container_client=blob_service_client.get_container_client(container_name)
    blob_list = container_client.list_blobs()
    audios = []
    for blob in blob_list:
        audios.append(blob.name)
    return audios

In [13]:
#Audios
list_blobs_from_container(container_name_audios)


['id_1.wav', 'id_2.wav', 'id_3.wav', 'id_4.wav', 'id_5.wav']

In [14]:
#Transcrições
list_blobs_from_container(container_name_transcricoes)


['id_1.json', 'id_2.json', 'id_3.json', 'id_4.json', 'id_5.json']