# Automated ML

Detailed package dependencies can be found on the [`env.yml`](envs/env.yml).
Use `conda install --file envs/env.yml` on your Terminal.
This file can be used to reproduce the conda environment used in this notebook.

In [None]:
from azureml.core import Workspace, Experiment, Environment, Datastore, Dataset
import pandas as pd
import os

# Setting up the workspace
ws = Workspace.from_config()

# Registering and building the environment
env = Environment.from_conda_specification(name = "az-ml-pers", file_path = "envs/env.yml")
env = env.register(workspace=ws)
env_build = env.build(workspace=ws)

# Setup the experiment
experiment_name = 'udacity-capstone'
experiment=Experiment(ws, experiment_name)

# Enable logs
run = exp.start_logging()

## Dataset

### Overview

The dataset we are using will be the one resulting from the [previous notebook](1-data-sourcing.ipynb) where
we dug into data sourcing and did some processing prior this task. The dataset
consists on financial data including OHLCV (open, high, low, close, volume) from diverse instruments (indices,
commodities, interest rates...) and technical indicators (moving averages, RSI, standard deviation...), that we will
use to create a ML-based trading model, that gives BUY, HOLD or SELL signals for Bitcoin trading.

If you want to dig more into how the dataset looks like or
into how the above-mentioned signals are generated, please refer to the "labelling the data" section of
the [data sourcing notebook](1-data-sourcing.ipynb) or the latest  print view of the DataFrame's head and/or tail
provided on the same file.

The task we will be trying to solve is basically a **classification problem**.
We are to predict whether the next-day, Bitcoin returns will be on the top 25% most positive returns (BUY, 1),
the 25% most negative (SELL, -1), or somewhere in between (HOLD, 0).

Since AutoML does grid search over features and normalization procedures, we will take joint, unaltered data as feed
in to the model. What we will make is dropping the last features and labels that are not really needed for the task.

In [None]:
# Access the data and drop unneeded columns for AutoML exercise
df = pd.read_csv('data/df.csv')
drop_col_list = ['', '']
df.drop(columns=drop_col_list, inplace=True)

# Register the dataset
datastore = ws.get_default_datastore()
dataset = Dataset.Tabular.register_pandas_dataframe(df, datastore, "automl_dataset", show_progress=True)
df = dataset.to_pandas_dataframe()

## AutoML Configuration

TODO: Explain why you chose the automl settings and configuration you used below.

In [None]:
# TODO: Put your automl settings here
automl_settings = {}

# TODO: Put your automl config here
automl_config = AutoMLConfig()

In [None]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config)

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [None]:
#TODO: Save the best model

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.
