<img width="10%" alt="Naas" src="https://landen.imgix.net/jtci2pxwjczr/assets/5ice39g4.png?w=160"/>

# PyCaret - Automl regression
<a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/PyCaret/PyCaret_automl_regression.ipynb" target="_parent"><img src="https://naasai-public.s3.eu-west-3.amazonaws.com/Open_in_Naas_Lab.svg"/></a><br><br><a href="https://github.com/jupyter-naas/awesome-notebooks/issues/new?assignees=&labels=&template=template-request.md&title=Tool+-+Action+of+the+notebook+">Template request</a> | <a href="https://github.com/jupyter-naas/awesome-notebooks/issues/new?assignees=&labels=bug&template=bug_report.md&title=PyCaret+-+Automl+regression:+Error+short+description">Bug report</a> | <a href="https://app.naas.ai/user-redirect/naas/downloader?url=https://raw.githubusercontent.com/jupyter-naas/awesome-notebooks/master/Naas/Naas_Start_data_product.ipynb" target="_parent">Generate Data Product</a>

**Tags:** #automl #pandas #snippet #regression #dataframe #visualize #pycaret #operations

**Author:** [Minura Punchihewa](https://www.linkedin.com/in/minurapunchihewa/)

**Description:** This notebook demonstrates how to use PyCaret's automated machine learning capabilities to perform regression tasks.

## Input

### Import libraries

In [2]:
import pandas as pd

try:
    from pycaret.regression import setup, compare_models, evaluate_model, predict_model, finalize_model, \
         save_model, load_model, create_docker
except:
    !pip install --user pycaret
    from pycaret.regression import setup, compare_models, evaluate_model, predict_model, finalize_model, \
     save_model, load_model, create_docker

### Variables

In [3]:
csv_path = "https://raw.githubusercontent.com/MinuraPunchihewa/pycaret-automl/main/data/wine-quality.csv"
target_column = "quality"

## Model

### Read the CSV from path

In [4]:
df = pd.read_csv(csv_path)

### View a sample of the data

In [5]:
df.head()

### Setup the dataset

In [6]:
# must be called before executing any other function
# change target column as required
# can configure many types of transformation operations
# by default Missing Value Imputation, One-Hot Encoding and Train-Test Split operations will be performed
# press enter to continue
grid = setup(data=df, target=target_column)

### Train and compare all supported models

In [7]:
# uses cross-validation
best_model = compare_models()

### Report the best model

In [8]:
print(best_model)

### Evaluate the model using a number of different plots

In [9]:
# click on the different plot types to exlpore
# some plots may not work depending on the data and the model
evaluate_model(best_model)

### Make predictions on new data

In [10]:
# data should be a DataFrame without label
# predict_model(best_model, new_data)

### Finalize model

In [11]:
# trains the model on the entire dataset including the hold-out set
# does not change any parameter of the model
final_model = finalize_model(best_model)

## Output

### Save model as a pickle file

In [12]:
save_model(final_model, "regression_model")

### Load saved model from pickle file

In [13]:
model = load_model("regression_model")

### Create Dockerfile for model

In [14]:
# also creates a requirements.txt file for dependencies
create_docker("regression_model")