Getting to your Data in Azure Notebooks
Jupyter provides the basis of the Azure Notebooks user experience. But it doesn't provide us any data. This notebook provides samples of how you might retrieve data to use from within your own notebooks.

There are many ways to get your data in your notebooks ranging from using curl or leveraging the Azure package to access a variety of data all while working from a Jupyter Notebook. See the table of contents below to jump to a particular example.

Table of Contents
Use curl to retrieve a file from GitHub
Interacting with Azure Blobs
Using Azure Table Storage
Providing Read Only Access to Azure Storage through Shared Access Signatures
Cleaning up created blobs and tables
Using SQL
Other Resources

Interacting with Azure Blobs 
We can also use Azure Storage to store our data. It also makes it pretty straightforward to keep our data private or public. The below code shows using private keys first. Then, in the shared access section a shared access signature for read-only access is created.

Before we can do anything though, we need an Azure Storage Account. Read the documentation article on creating storage accounts or create a storage account using the Azure SDK.

You can put content into blobs using AzCopy or by using the Python Azure SDK as shown in the example below.

Once you retrieve your account and key, you can enter them below. This code will create a container and blob in the azure storage you provide. Then we will read that blob back.

In [1]:
azure_storage_account_name = "mlworkshop"
azure_storage_account_key = ""

if azure_storage_account_name is None or azure_storage_account_key is None:
    raise Exception("You must provide a name and key for an Azure Storage account")

In [2]:
!pip install azure-storage==0.32.0

[31mdistributed 1.21.8 requires msgpack, which is not installed.[0m
[33mYou are using pip version 10.0.1, however version 18.0 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


In [3]:
from azure.storage.blob import BlockBlobService
import csv
import json
import requests
import pandas as pd

In [4]:
# First, we need to connect to our blob via the BlobService
blob_service = BlockBlobService(azure_storage_account_name, azure_storage_account_key)

# There are also methods to list containers and blobs
containers = blob_service.list_containers()
#blobs = blob_service.list_blobs('mlworkshop')

# We can also read our blob from azure and get the text.
blob_service.get_blob_to_path('airline-data', 'Tweets.csv', 'Tweets.csv')


<azure.storage.blob.models.Blob at 0x10f182d68>

Now we want to convert the csv file to a dict (JSON equivelent) so we can send a post request. The max size is 1000 items at once, so we batch these post requests. Then we write them out to a file to later be joined into the csv.

In [5]:
def get_sentiment_scores(json_request):
    sentiment_analysis_endpoint = "https://eastus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment"
    text_analytics_key = ""
    text_analytics_key_name = "Ocp-Apim-Subscription-Key"
    headers = {
        text_analytics_key_name: text_analytics_key,
        "Content-Type": "application/json"
    }
    response = requests.post(sentiment_analysis_endpoint, data=json.dumps(json_request), headers=headers)
    # print(response.json())
    return [data[""] for data in response.json()['documents']]

In [8]:
documents = []
sentiment_scores = []
data_file_name = 'Tweets.csv'
max_request_volume = 1000
with open(data_file_name, mode='r') as infile:
    reader = csv.reader(infile)
    num_rows_read = 0
    headers = next(reader)
    row_id = 0
    for row in reader:
        row_id += 1
        documents.append({
            'language': 'en',
            'id': row_id,
            'text': row[10]
        })
        num_rows_read += 1
        if(num_rows_read == max_request_volume):
            sentiment_scores += get_sentiment_scores({'documents': documents})
            documents = []
            num_rows_read = 0
    sentiment_scores += (get_sentiment_scores({'documents': documents}))
with open('sentiment_scores.txt', 'w') as f:
    for item in sentiment_scores:
        f.write("%s\n" % item)

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



StopIteration: 

In [52]:
print(len(sentiment_scores))

14640


In [54]:
csv_input = pd.read_csv(data_file_name)
csv_input['azure_sentiment'] = [data['score'] for data in sentiment_scores]
csv_input.to_csv('airline_with_azure_sentiment.csv', index = False)

[0.7560583353042603, 0.73664790391922, 0.8193238973617554, 0.21620216965675354, 0.028564900159835815, 0.1923532485961914, 0.0436822772026062, 0.9808833599090576, 0.16980689764022827, 0.9988923072814941, 0.8604477643966675, 0.17874675989151, 0.9999823570251465, 0.8031506538391113, 0.17012923955917358, 0.987318217754364, 0.27715355157852173, 0.8213831782341003, 0.07769662141799927, 0.9999998807907104, 0.7888412475585938, 0.8145174980163574, 0.8507183790206909, 0.8966660499572754, 0.867006778717041, 0.024740606546401978, 0.21840545535087585, 0.10248401761054993, 0.8758406639099121, 0.12329471111297607, 0.1788865625858307, 0.20265507698059082, 0.752072811126709, 0.029622822999954224, 0.09244513511657715, 0.8156232833862305, 0.9543756246566772, 0.9959747791290283, 0.8999937176704407, 0.7459190487861633, 0.14532065391540527, 0.5, 0.8048480749130249, 0.8587582111358643, 0.20514383912086487, 0.5, 0.9735881090164185, 0.973930835723877, 0.8500254154205322, 0.5, 0.9953728318214417, 0.5, 0.9028596