[![AnalyticsDojo](https://s3.amazonaws.com/analyticsdojo/logo/final-logo.png)](http://www.analyticsdojo.com)
<center><h1>Introduction to Python - Test Azure Jupyter Notebook</h1></center>
<center><h3><a href = 'http://www.analyticsdojo.com'>www.analyticsdojo.com</a></h3></center>

### Links: [github](https://github.com/AnalyticsDojo/materials/blob/master/analyticsdojo/classes/01-overview/azureml-intro.ipynb) [viewer](http://nbviewer.jupyter.org/format/html/github/AnalyticsDojo/materials/blob/master/analyticsdojo/classes/01-overview/azureml-intro.ipynb#/) [slides](http://nbviewer.jupyter.org/format/slides/github/AnalyticsDojo/materials/blob/master/analyticsdojo/classes/01-overview/azureml-intro.ipynb#/) [anaconda](https://anaconda.org/analyticsdojo/azureml-intro/notebook)

## Test Notebook

The goal of this notebook is to show how to work with datasets on Azure.  Make sure you watch the associated video under Microsoft Challenges.  This workbook will show you how to load a dataframe as well as to save a dataframe as a new dataset. 

In [None]:
#It is possible to access Azure Workspaces from a local instance of Jupyter.  
#However, when running on Azure we can access workspace objects without passing config data. 
from azureml import Workspace
ws = Workspace() #Because we are Working on AZURE, we don't have to pass any details, they know who we are. 
ds = ws.datasets['iris.csv']  #This is a dataset object. The name should be whatever you named it when uploading.
frame = ds.to_dataframe()

In [38]:
# The datasets have properties.  These are set when uploading the dataset. 
print(ds.data_type_id) # 'GenericCSV'
print(ds.name)         # 'existing dataset'
print(ds.description)  # 'data up to jan 2015'

GenericCSV
Iris-Dataset
IRIS Dataset from http://archive.ics.uci.edu/ml/datasets/Iris


In [2]:
#By listing the dataframe it will print out the data. 
frame

Unnamed: 0,sepalLength,sepalWidth,petalLength,petalWidth,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
5,5.4,3.9,1.7,0.4,setosa
6,4.6,3.4,1.4,0.3,setosa
7,5.0,3.4,1.5,0.2,setosa
8,4.4,2.9,1.4,0.2,setosa
9,4.9,3.1,1.5,0.1,setosa


In [35]:
#This is how you would update a dataset from a dataframe.
#DS is the dataset object. 
ds = ws.datasets['Iris-Dataset']
ds.update_from_dataframe(frame)


In [36]:
#This will also update the description of the dataset.  

ds.update_from_dataframe(
    dataframe=frame,
    description='IRIS Dataset from http://archive.ics.uci.edu/ml/datasets/Iris',
)

In [39]:
# This will list all of the datasets with their description that are available on Azure workspaces by default
# (as well as any that you added).  Note, there are some of the 
# datasets are ARFF type.  This type of data structure is utilized by WEKA. Avoid these for now
for ds in ws.datasets:
    print(ds.name, '\t\t\t Type:', ds.data_type_id, '\n', ds.description, '\n')
   

Iris-Dataset 			 Type: GenericCSV 
 IRIS Dataset from http://archive.ics.uci.edu/ml/datasets/Iris 

text.preprocessing.zip 			 Type: Zip 
 Utility R script for text preprocessing to use with text classification template 

fraudTemplateUtil.zip 			 Type: Zip 
 Utility R script to use with online fraud detection template 

Sample Named Entity Recognition Articles 			 Type: GenericTSVNoHeader 
 Sample news articles for use with the Named Entity Recognition module 

Breast cancer data 			 Type: ARFF 
 Breast cancer diagnosis data against features from cell samples 

Forest fires data 			 Type: ARFF 
 Forest fire sizes in northeast Portugal against weather and other data 

Iris Two Class Data 			 Type: ARFF 
 Iris Two Class Data 

Adult Census Income Binary Classification dataset 			 Type: GenericCSV 
 Census Income dataset 

Steel Annealing multi-class dataset 			 Type: GenericCSV 
 Steel annealing data 

Automobile price data (Raw) 			 Type: GenericCSV 
 Missing Value Scrubber module requ

In [49]:
#Explor one of the csv files. 
ds2 = ws.datasets['Automobile price data (Raw)']  #This is a dataset object

print(ds2.data_type_id) # 'GenericCSV'
print(ds2.name)         # 'existing dataset'
print(ds2.description)  # 'data up to jan 2015'
frame2 = ds2.to_dataframe()

GenericCSV
Automobile price data (Raw)
Missing Value Scrubber module required. Prices of various automobiles against make, model and technical specifications


In [50]:
frame2

Unnamed: 0,symboling,normalized-losses,make,fuel-type,aspiration,num-of-doors,body-style,drive-wheels,engine-location,wheel-base,...,engine-size,fuel-system,bore,stroke,compression-ratio,horsepower,peak-rpm,city-mpg,highway-mpg,price
0,3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.00,111,5000,21,27,13495
1,3,?,alfa-romero,gas,std,two,convertible,rwd,front,88.6,...,130,mpfi,3.47,2.68,9.00,111,5000,21,27,16500
2,1,?,alfa-romero,gas,std,two,hatchback,rwd,front,94.5,...,152,mpfi,2.68,3.47,9.00,154,5000,19,26,16500
3,2,164,audi,gas,std,four,sedan,fwd,front,99.8,...,109,mpfi,3.19,3.40,10.00,102,5500,24,30,13950
4,2,164,audi,gas,std,four,sedan,4wd,front,99.4,...,136,mpfi,3.19,3.40,8.00,115,5500,18,22,17450
5,2,?,audi,gas,std,two,sedan,fwd,front,99.8,...,136,mpfi,3.19,3.40,8.50,110,5500,19,25,15250
6,1,158,audi,gas,std,four,sedan,fwd,front,105.8,...,136,mpfi,3.19,3.40,8.50,110,5500,19,25,17710
7,1,?,audi,gas,std,four,wagon,fwd,front,105.8,...,136,mpfi,3.19,3.40,8.50,110,5500,19,25,18920
8,1,158,audi,gas,turbo,four,sedan,fwd,front,105.8,...,131,mpfi,3.13,3.40,8.30,140,5500,17,20,23875
9,0,?,audi,gas,turbo,two,hatchback,4wd,front,99.5,...,131,mpfi,3.13,3.40,7.00,160,5500,16,22,?


In [54]:
#Save this as a separate dataset. 
from azureml import DataTypeIds
dataset = ws.datasets.add_from_dataframe(
    dataframe=frame2,
    data_type_id=DataTypeIds.GenericCSV,
    name='Automobile-price-data',
    description='This is one of the general datasets provided by Azure.'
)

## Extend your Knowledge

1. [Access datasets with Python using the Azure Machine Learning Python client library (Microsoft Tutorial)](https://azure.microsoft.com/en-us/documentation/articles/machine-learning-python-data-access/)
2. [Azure machine learning Python Library on Github](https://github.com/Azure/Azure-MachineLearning-ClientLibrary-Python)

### CREDITS

Copyright [AnalyticsDojo](http://www.analyticsdojo.com) 2016.
This work is licensed under the [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0/) license agreement.