# AutoGluon - Hands On

In [2]:
# Load coding libraries
from sklearn.model_selection import train_test_split
import pandas as pd

## Objective
This hands-on notebook is meant to let you practice the concepts you have learned in this course so far.
Here we explore a big database of books (books of different genres, from thousands of authors).<br/>

We want to predict book prices using book features, such as genre, release data, ratings, number of reviews. 
This is a regression problem: we have a book price column in our dataset that we can use as labels.

## 1. <a name="5">AutoGluon Installation</a>

We need to begin by installing AutoGluon (documentation [here](https://auto.gluon.ai/stable/install.html)).  


__NOTE__: This may take a few minutes to install (you can see that it has finished once the `[*]` symbol next to the cell disappears and turns into a number).

In [3]:
#!python3 -m pip install -qU pip
#!python3 -m pip install -qU setuptools wheel
#!python3 -m pip install -qU "mxnet<2.0.0"
#!python3 -m pip install -qU autogluon

Now we load the libraries needed to work with our Tabular dataset.

In [4]:
# Import the AutoGluon code library
from autogluon.tabular import TabularPredictor, TabularDataset

## 2. <a name="5">Business Problem Summary</a>

Let's output an overview of our book price predicting __business problem__. <br/>

<h2>Price Prediction for Books</h2>
This is a nice business problem to solve using a regression machine learning model.

<h2>Books Reviews Data Dictionary</h2>
This Book Dataset involves predicting the price of books based on a given set of features.

It is a regression problem. The variable names are as follows:

<h3>FEATURES:</h3>
<ul>
<li>Title: The title of the book
<li>Author: The author(s) of the book.
<li>Edition: The edition of the book eg (Paperback,– Import, 26 Apr 2018)
<li>Reviews: The customer reviews about the book. 
<li>Ratings: The customer ratings of the book
<li>Synopsis: The synopsis of the book
<li>Genre: The genre the book belongs to
<li>BookCategory: The department the book is usually available at.
</ul>
<h3>The Output variable or TARGET is:</h3>
<ul>
<li>Price: The price of the book
</ul>

## 3. <a name="5">Getting the Data</a>

Let's get the data for our business problem.

>  Run the cell below to load and take a look at the first samples of our train dataset. <br/>
Compare it with your data dictionary to see if everything is there and if the data makes sense. This is a very basic check when performing __Data Exploration__.

In [7]:
df_train = TabularDataset(data="./datasets/training.csv")
df_train.info()
df_train.head()

<class 'autogluon.core.dataset.TabularDataset'>
RangeIndex: 5051 entries, 0 to 5050
Data columns (total 10 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   ID            5051 non-null   int64  
 1   Title         5051 non-null   object 
 2   Author        5051 non-null   object 
 3   Edition       5051 non-null   object 
 4   Reviews       5051 non-null   object 
 5   Ratings       5051 non-null   object 
 6   Synopsis      5051 non-null   object 
 7   Genre         5051 non-null   object 
 8   BookCategory  5051 non-null   object 
 9   Price         5051 non-null   float64
dtypes: float64(1), int64(1), object(8)
memory usage: 394.7+ KB


Unnamed: 0,ID,Title,Author,Edition,Reviews,Ratings,Synopsis,Genre,BookCategory,Price
0,542,Foe (Penguin Essentials),J. M. Coetzee,"Paperback,– 21 Sep 2010",5.0 out of 5 stars,2 customer reviews,Nobel Laureate and two-time Booker prize-winni...,Action & Adventure (Books),Action & Adventure,2.52763
1,2380,Of Blood and Bone (Chronicles of The One),Nora Roberts,"Paperback,– 25 Jan 2019",4.3 out of 5 stars,5 customer reviews,"Thirteen years ago, a catastrophic pandemic kn...",Action & Adventure (Books),Romance,2.555094
2,5529,Then She Was Gone,Lisa Jewell,"Paperback,– Import, 14 Dec 2017",4.0 out of 5 stars,9 customer reviews,"BESTSELLING PSYCHOLOGICAL SUSPENSE, AND A TOP ...",Action & Adventure (Books),"Crime, Thriller & Mystery",2.531479
3,4511,Mongodb: The Definitive Guide- Powerful and Sc...,Kristina Chodorow,"Paperback,– 2013",4.7 out of 5 stars,11 customer reviews,Manage the huMONGOus amount of data collected ...,Computer Databases (Books),"Computing, Internet & Digital Media",2.845718
4,1305,Jerusalem: The Biography,Simon Sebag Montefiore,"Paperback,– 1 Mar 2012",4.6 out of 5 stars,18 customer reviews,The epic story of Jerusalem told through the l...,History of Civilization & Culture,"Biographies, Diaries & True Accounts",2.733197
