# 4 - Data Analysis & Reporting with IDEAR

In this notebook you will leverage the IDEAR (Interactive Data Exploratory Analysis and Reporting) tool that is included as a part of the [TDSP utilities](https://github.com/Azure/Azure-TDSP-Utilities/) that we reviewed in a previous module.  This utility runs within a Jupyter notebook that we download and run.

## Downloading IDEAR Assets

As mentioned, we will need to download some assets to test the capabilities of the IDEAR tool: the data, and the IDEAR notebook itself.

### Downloading Sample Data

Next, we need to download some sample data.  Our MNIST data set won't work for this kind of analysis and reporting, so we will need to leverage another data set.  Microsoft includes a few sample data sets within the [TDSP Utilities Git repository on Github](https://github.com/Azure/Azure-TDSP-Utilities/).

The data set we will leverage is the [UCI Census Income data set](https://archive.ics.uci.edu/ml/datasets/Census+Income). We will download this onto the Jupyter server and then store it in our blob storage account for the workspace:

In [None]:
import urllib.request
import azureml.core
from azureml.core import Workspace, Datastore

# Get a reference to the workspace
ws = Workspace.from_config()

# Get a reference to the Datastore - blob storage for the workspace
datastore = Datastore.get(ws, datastore_name='workspaceblobstore')

# Create a new data directory for this data
data_folder = os.path.join(os.getcwd(), 'idear-data')
os.makedirs(data_folder, exist_ok=True)

# Download this content onto the Jupyter notebook server
data_url = 'https://raw.githubusercontent.com/Azure/Azure-TDSP-Utilities/master/DataScienceUtilities/DataReport-Utils/Python/adult-income.csv'
urllib.request.urlretrieve(data_url, f'{data_folder}/adult-income.csv')
data_description_url = 'https://raw.githubusercontent.com/Azure/Azure-TDSP-Utilities/master/DataScienceUtilities/DataReport-Utils/Python/para-adult.yaml'
urllib.request.urlretrieve(data_description_url, f'{data_folder}/para-adult.yaml')

# Upload this into blob storage
datastore.upload(src_dir=data_folder,
                 overwrite=True,
                 show_progress=True)

### Downloading IDEAR Notebook

Next, we need to downlaod the IDEAR notebook, which is provided in the TDSP Utilities.  This notebook includes the needed code to run the analysis against your data set.  There are two versions of the notebook that are included: one that is [designed to be run within Azure notebooks](https://github.com/Azure/Azure-TDSP-Utilities/blob/master/DataScienceUtilities/DataReport-Utils/Python/IDEAR-Python-AzureNotebooks.ipynb), and [one that can be used anywhere](https://github.com/Azure/Azure-TDSP-Utilities/blob/master/DataScienceUtilities/DataReport-Utils/Python/IDEAR.ipynb) (but which requires additional setup). 

We will be leveraging the one that is designed for Azure Notebooks.  You can download this onto your Jupyter server using the code below:

In [None]:
idear_url = 'https://raw.githubusercontent.com/Azure/Azure-TDSP-Utilities/master/DataScienceUtilities/DataReport-Utils/Python/IDEAR-Python-AzureNotebooks.ipynb'
urllib.request.urlretrieve(idear_url, './IDEAR.ipynb')


## Running the IDEAR Notebook

The next step is to switch over the `IDEAR.ipynb` notebook that we downloaded.  Once you load the notebook, clear all output and run the cells.  You will need a few pieces of information:

* Azure Storage Account Name
* Azure Storage Account Key
* Container Name