<p style="text-align:center;">
<img src="https://github.com/digital-futures-academy/DataScienceMasterResources/blob/main/Resources/datascience-notebook-header.png?raw=true"
     alt="DigitalFuturesLogo"
     style="float: center; margin-right: 10px;" />
</p>

## Digital Futures Data Programme

### DT & RF with YDF
https://ydf.readthedocs.io/en/latest/

#### V4

In [None]:
# Install YDF
# What is YDF: https://ydf.readthedocs.io/en/stable/
!pip install ydf -U

In [None]:
# Import required modules
import ydf
import pandas as pd
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

In [None]:
# Read in penguins
# REMINDER: Since we're working in colab, we first need to upload the file to cloud if we want to read it in
df = pd.read_excel("1 - Project Data.xlsx")

In [None]:
df['Total Charges'] = pd.to_numeric(df['Total Charges'], errors='coerce')
df['Total Charges'].fillna(0, inplace=True)

In [None]:
# View the data
df.head()

> We are now going to skip _a lot_ of steps..

> Ready to see the simplicity of YDF? Ok - let's go!

In [None]:
# As usual, select our features & target
# Nulls? Non-numerical data? Misaligned data types? Who cares - when you're using YDF
features = list(df.columns)
features.remove('Churn Value')
features.remove('Churn Label')
features.remove('Churn Reason')

y = df['Churn Value']
X = df[features]

In [None]:
# We do the split, then bring back the targets because.. YDF doesn't even need the features split apart from the target! :o
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.2, random_state=1204, stratify=y)

X_train['Churn Value'] = y_train # This is adding the label/target back to X train
X_test['Churn Value'] = y_test   # Same for X_test

In [None]:
X_train.head()

### Modelling

In [None]:
# Ready? Let's model - just as simple as sklearn
model = ydf.GradientBoostedTreesLearner(label="Churn Value")
model = model.train(X_train)

In [None]:
# What is this?
model

In [None]:
# How well did we do on X_train?
model.evaluate(X_train)

In [None]:
# What about X_test?
model.evaluate(X_test)

In [None]:
# Cherry on top - the visuals of the entire process
model.plot_tree()

In [None]:
# Lastly, one of the major advantages of ydf: Comprehensive summary
model.describe()

## Fine tuning

In [None]:
# YDF has a built-in tuner to optimise your model.. very little to do for us!
tuner = ydf.RandomSearchTuner(num_trials=50)

In [None]:
# Fit the improved model
new_model = ydf.GradientBoostedTreesLearner(tuner=tuner, label='Churn Value').train(X_train)

In [None]:
# Evaluate improved model on Train
new_model.evaluate(X_train)

In [None]:
# And finally, on Test
new_model.evaluate(X_test)