## Notebook - Table of Contents


1. [**Basic**](#1.-Basic)  
    1.1 [**Importing the necessary libraries & installing EvalML**](#1.1-Importing-the-necessary-libraries-&-installing-EvalML)    
    1.2 [**Loading the income data and basic analysis**](#1.2-Loading-the-income-data-and-basic-analysis)         
2. [**Splitting into train-test set**](#2.-Splitting-into-train-test-set) 
3. [**Finding best ML model pipeline using EvalML**](#3.-Finding-best-ML-model-pipeline-using-EvalML)  
4. [**Interacting with pipelines**](#4.-Interacting-with-pipelines)                         
5. [**Evaluating the best pipeline on test data**](#5.-Evaluating-the-best-pipeline-on-test-data)  

### 1. Basic

#### 1.1 Importing the necessary libraries & installing EvalML

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
#installing EvalML
!pip install EvalML

In [None]:
# getting list of problem types
import evalml
evalml.problem_types.problem_types.ProblemTypes.all_problem_types

In [None]:
# getting list of objectives
evalml.objectives.get_all_objective_names() 

#### 1.2 Loading the income data and basic analysis

In [None]:
df = pd.read_csv("/kaggle/input/income-classification/income_evaluation.csv")
df.head()

In [None]:
df.shape

In [None]:
df[" income"].value_counts()

In [None]:
df[" income"].value_counts().plot.pie(autopct="%.1f%%",figsize=(7,7))
plt.title("Pie chart of Income")

### 2. Splitting into train-test set

In [None]:
X = df.drop(" income", axis=1)
y = df[" income"]

In [None]:
X_train, X_test, y_train, y_test = evalml.preprocessing.split_data(X, y, problem_type='binary', test_size=.2)
print("Size of training data : ", X_train.shape[0])
print("Size of test data : ", X_test.shape[0])

In [None]:
type(X_train)

In [None]:
X_train[[" workclass", " occupation"]].head().T

### 3. Finding best ML model pipeline using EvalML

In [None]:
from evalml import AutoMLSearch

In [None]:
automl = AutoMLSearch(X_train=X_train, y_train=y_train, problem_type="binary", objective="F1")
automl.search()

### 4. Interacting with pipelines

In [None]:
automl.rankings

In [None]:
# getting details about best pipeline
automl.describe_pipeline(automl.rankings.iloc[0]["id"])

In [None]:
# getting details about 2nd best pipeline
automl.describe_pipeline(automl.rankings.iloc[1]["id"])

In [None]:
#visualizing best pipeline flow 
best_pipeline = automl.best_pipeline
best_pipeline.graph()

### 5. Evaluating the best pipeline on test data

In [None]:
best_pipeline.score(X_test, y_test, objectives = ["auc","f1"]) 