Hi,
I want to take this opportunity to thank every wonderful person out there who shares their work publicly to help beginners learn.
While the world of Kaggle can seem daunting, breaking it down into baby steps help.
One such thing is Baseline model. 
Simply put, a baseline model is the MVP (minimum viable product) of the data science world.
It is the most quick and dirty model you could make.

Using AutoML allows us to experiment(Fail often and fail fast), so I highly recommend beginners to start with this.

Let's try doing this together with a friendly and approachable TPS June Dataset. I have a similar notebook up for [TPS May2021](https://www.kaggle.com/kritidoneria/automl-evalml-tps-may21-starter) as well, and a few other AutoML Libraries.
Do leave comments and feedbacks.

Thanks for reading. Read more about EvalML [here](https://evalml.alteryx.com/en/stable/user_guide/automl.html)

# Installation and imports

In [None]:
!pip install evalml

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import evalml
from evalml import AutoMLSearch

# Loading the dataset

In [None]:
X = pd.read_csv('../input/tabular-playground-series-jun-2021/train.csv')
y = pd.read_csv('../input/tabular-playground-series-jun-2021/test.csv')

# Splitting

In [None]:
X_train, X_test, y_train, y_test = evalml.preprocessing.split_data(X.drop(columns=['target','id']),X['target'],problem_type='multiclass')
X_train.shape, X_test.shape, y_train.shape, y_test.shape

# Run the search for the best classification model

I have a discussion thread for understanding the evaluation metric better [here](https://www.kaggle.com/c/tabular-playground-series-jun-2021/discussion/243636)
I'll set ensembling=True for magic :D

In [None]:
automl = AutoMLSearch(X_train=X_train, y_train=y_train,   problem_type='multiclass', ensembling=True,max_batches=100)
automl.search() 

# Model ranking and Best pipeline

In [None]:
automl.rankings

In [None]:
automl.describe_pipeline(automl.rankings.iloc[0]["id"])

Now, this is telling us a lot of things, including improvement over and above baseline. Isin't that cool?

# Feature Importance
Getting the feature importance for best pipeline in this model

In [None]:
automl.best_pipeline.graph_feature_importance()

# Explaining Best and Worst Predictions
This function will display the output of explain_predictions for the best 2 and worst 2 predictions. By default, the best and worst predictions are determined by the absolute error for regression problems and cross entropy for classification problems.
It uses SHAP Values. I have a notebook on SHAP for XAI [here](https://www.kaggle.com/kritidoneria/responsible-ai-model-explainability)

In [None]:
from evalml.model_understanding.prediction_explanations import explain_predictions_best_worst

report = explain_predictions_best_worst(pipeline=automl.best_pipeline, input_features=X_test, y_true=y_test,
                                        include_shap_values=True, top_k_features=6, num_to_explain=2)

print(report)

# Making predictions

In [None]:
winner = automl.best_pipeline
df_submission = winner.predict_proba(y.drop(columns=['id']))
df_submission['id'] = y['id']
df_submission.set_index('id').to_csv('submission.csv')

# Saving the entire automl search

In [None]:
automl.save("automl.cloudpickle")