Throughout this lesson, you've been trying different models on the same two datasets, wine and diabetes. Now, we're going to try our hand at accelerating this methodology by using AutoGluon. In this exercise, train two different AutonGluon models and see how they compare to previous iterations in exercise 1 and 2.

You're tasked with completing the following steps:
1. Load in the wine dataset from scikit learn.
2. For the wine dataset, create a train and test split, 80% train / 20% test.
3. Create a AutoGluon Classifier model with these hyper parameters:
    1. time_limit: 120
    2. presets: best_quality
4. Output the model table summary
5. Evaluate the trained model on the test dataset
6. Load the diabetes dataset from scikit learn
7. For the Diabetes dataset, create a train and test split, 80% train / 20% test.
8. Create a AutoGluon Regression model with these hyper parameters:
    1. eval_metric: r2
    2. time_limit: 120
    3. presets: best_quality
9. Output the model table summary
10. Evaluate the trained model on the test dataset

## Setup

### Open up Sagemaker Studio

1. Notebook should be using a `ml.t3.medium` instance (2 vCPU + 4 GiB)
2. Notebook should be using kernal: `Python 3 (MXNet 1.8 Python 3.7 CPU Optimized)`

In [4]:
!pip install pip



In [5]:
!pip install setuptools wheel



In [6]:
!pip install "mxnet<2.0.0" bokeh==2.0.1

Collecting mxnet<2.0.0
  Using cached mxnet-1.7.0.post2-py2.py3-none-win_amd64.whl (33.1 MB)
Collecting bokeh==2.0.1
  Using cached bokeh-2.0.1-py3-none-any.whl
Collecting numpy>=1.11.3
  Using cached numpy-1.16.6.zip (5.1 MB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting graphviz<0.9.0,>=0.8.1
  Using cached graphviz-0.8.4-py2.py3-none-any.whl (16 kB)
Building wheels for collected packages: numpy
  Building wheel for numpy (setup.py): started
  Building wheel for numpy (setup.py): finished with status 'error'
  Running setup.py clean for numpy
Failed to build numpy
Installing collected packages: numpy, graphviz, mxnet, bokeh
  Attempting uninstall: numpy
    Found existing installation: numpy 1.24.3
    Uninstalling numpy-1.24.3:
      Successfully uninstalled numpy-1.24.3
  Running setup.py install for numpy: started
  Running setup.py install for numpy: finished with status 'error'
  Rolling back uninstall of numpy
 

  error: subprocess-exited-with-error
  
  python setup.py bdist_wheel did not run successfully.
  exit code: 1
  
  [264 lines of output]
  Running from numpy source directory.
    return is_string(s) and ('*' in s or '?' is s)
  blas_opt_info:
  blas_mkl_info:
  No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
  customize MSVCCompiler
    libraries mkl_rt not found in ['C:\\Users\\HP\\anaconda3\\lib', 'C:\\', 'C:\\Users\\HP\\anaconda3\\libs']
    NOT AVAILABLE
  
  blis_info:
  No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
  customize MSVCCompiler
    libraries blis not found in ['C:\\Users\\HP\\anaconda3\\lib', 'C:\\', 'C:\\Users\\HP\\anaconda3\\libs']
    NOT AVAILABLE
  
  openblas_info:
  No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
  customize MSVCCompiler
  No module named 'numpy.distutils._msvccompiler' in numpy.distutils; trying from distutils
  cu

In [None]:
!pip install autogluon --no-cache-dir

In [None]:
import numpy as np
import pandas as pd
from sklearn import datasets
from sklearn.metrics import r2_score, accuracy_score
from sklearn.model_selection import train_test_split
from autogluon.tabular import TabularDataset, TabularPredictor

## AutoGluon Classifier

In [None]:
# Load in the wine dataset
wine = datasets.load_wine()

In [None]:
# Create the wine `data` dataset as a dataframe and name the columns with `feature_names`
df = pd.DataFrame(wine.data, columns=wine.feature_names)

# Include the target as well
df['target'] = wine.target

In [None]:
# Split your data with these ratios: train: 0.8 | test: 0.2
df_train, df_test = train_test_split(
    df,
    test_size=0.2,
    random_state=0
)

In [None]:
# How does the model perform on the training dataset and default model parameters?
# Using the hyperparameters in the requirements, is there improvement?
# Remember we use the test dataset to score the model
# No need to explicitly say this is a classifier, autogluon will pick it up
predictor = TabularPredictor(
    "time_limit": 120,
    "presets": "best_quality").fit(df_train[wine['feature_names']],df_train.target)

In [None]:
# Output the fit summary of the training run
predictor.summary()

In [None]:
# Evaluate the models performance on the test dataset
performance = predictor.evaluate(df_test)

## AutoGluon Regression

In [None]:
# Load in the diabetes dataset
diabetes = datasets.load_diabetes()

In [None]:
# Create the diabetes `data` dataset as a dataframe and name the columns with `feature_names`
dfd = pd.DataFrame(diabetes.data, columns=diabetes['feature_names'])

# Include the target as well
dfd['target'] = diabetes.target

In [None]:
# Split your data with these ratios: train: 0.8 | test: 0.2
dfd_train, dfd_test = train_test_split(
    dfd,
    test_size=0.2,
    random_state=0
)

In [None]:
# How does the model perform on the training dataset and default model parameters?
# Using the hyperparameters in the requirements, is there improvement?
# Remember we use the test dataset to score the model
# No need to explicitly say this is a regression, autogluon will pick it up
predictor = TabularPredictor("eval_metric": r2
"time_limit": 120
"presets": "best_quality").fit(dfd_train[diabetes['feature_names']],dfd_t)

In [None]:
# Output the fit summary of the training run
?

In [None]:
# Evaluate the models performance on the test dataset
performance = ?