The following is from [this article](https://medium.com/towards-data-science/130-ml-tricks-and-resources-curated-carefully-from-3-years-plus-free-ebook-7832ca4a37ef) in Medium.

# 1. Permutation Importance with ELI5

Permutation importance is one of the most reliable ways to see the important features in a model.

Its advantages:

1. Works on any model structure
2. Easy to interpret and implement
3. Consistent and reliable

Permutation importance of a feature is defined as the change in model performance when that feature is randomly shuffled.

PI is available through the eli5 package. Below are PI scores for an XGBoost Regressor model👇

The show_weights function displays the features that hurt the model’s performance the most after being shuffled — i.e. the most important features.

# 2. ConfusionMatrix display for better confusion matrix

If you want much more control over how you display your confusion matrix in Sklearn, use ConfusionMatrixDisplay class.

With the class, you can control how X and Y labels look, what texts they display, the colormap of the matrix, and much more.

Besides, it has a from_estimator function that enables you to plot the matrix without having to generate predictions beforehand.

# 4. Default RMSE in Sklearn

I always found it strange that Room Mean Squared Error wasn’t available in Sklearn given that it was such a popular metric.

Later, I found that I didn’t look long enough because it was available as a parameter inside mean_squared_error (squared=False)👇

In [1]:
from sklearn.metrics import mean_squared_error

In [2]:
mean_squared_error([10, 10, 10], [11, 12, 13])

4.666666666666667

In [3]:
mean_squared_error([10, 10, 10], [11, 12, 13], squared=False)

2.160246899469287

# 10. Getting a scorer object from just the name

In a single project, you may evaluate your models using multiple metrics. Instead of importing them one by one from sklearn and polluting your namespace, you can use the “get_scorer” function of the metrics module.

Just pass the name of the metric you want, and you get a scorer object ready to use👇

In [4]:
from sklearn import metrics

In [5]:
metrics.get_scorer("roc_auc")

make_scorer(roc_auc_score, needs_threshold=True)

In [6]:
metrics.get_scorer("precision")

make_scorer(precision_score, average=binary)

In [7]:
metrics.get_scorer("r2")

make_scorer(r2_score)

# 14. Get all scorer’s names in Sklearn

Sklearn has over 50 metrics to evaluate the performance of its models. To pass those metrics inside pipelines or GridSearch instances, you have to remember their text names.

If you forget any of them, here is how you can print out the names of all the metrics👇

In [8]:
from sklearn import metrics

In [9]:
scorers = list(metrics.SCORERS.keys())

In [10]:
len(scorers)

54

In [11]:
scorers[:10]

['explained_variance',
 'r2',
 'max_error',
 'matthews_corrcoef',
 'neg_median_absolute_error',
 'neg_mean_absolute_error',
 'neg_mean_absolute_percentage_error',
 'neg_mean_squared_error',
 'neg_mean_squared_log_error',
 'neg_root_mean_squared_error']

# 22. The difference between two time series dates

How do you find the difference between the dates of two time series?

As long as they have the same format, you can use the difference method of Pandas DateTimeIndex objects.

Below, we create two time series: one for a full year and one for only business days. The rest is fairly easy👇

In [12]:
import pandas as pd

In [13]:
# Regular days
ts1 = pd.date_range(start="2022-01-01", end="2022-12-31")

In [14]:
# Business days
ts2 = pd.bdate_range(start="2022-01-01", end="2022-12-31")

In [15]:
difference = ts1.difference(ts2)

In [16]:
len(difference)

105

In [17]:
difference

DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-08', '2022-01-09',
               '2022-01-15', '2022-01-16', '2022-01-22', '2022-01-23',
               '2022-01-29', '2022-01-30',
               ...
               '2022-11-27', '2022-12-03', '2022-12-04', '2022-12-10',
               '2022-12-11', '2022-12-17', '2022-12-18', '2022-12-24',
               '2022-12-25', '2022-12-31'],
              dtype='datetime64[ns]', length=105, freq=None)

# 31. Chaining multiple Pandas functions with Pandas pipe

Pandas has a similar “pipeline” feature like in Sklearn. By chaining multiple “pipe” functions together, you can call multiple preprocessing functions in a single line of code. Makes your code much more readable and easier to debug.

In [18]:
def explode(df, column):
    df = df.explode(column=column)
    return df


def fill_na(df, value):
    df = df.fillna(value)
    return df


def encode(df):
    df = pd.get_dummies(df)
    return df

# 33. Format dates in Matplotlib plots

Did it ever happen to you when you visualized a time series, the dates on the XAxis got smooshed together making them illegible? You can avoid that by calling the “`autofmt_xdate()`” function on the figure object to automatically format date labels in Matplotlib.

# 38. Mlextend — plot decision boundaries of classifiers

One of the most fun things you can do with your classifier is plot its decision boundaries. But, you will quickly realize that the code to generate such a plot is, put mildly, a giant pain in the keyboard.

Fortunately, the mlextend package collapses all that code into a function, so that you can draw decision boundaries of any classifier in a single line of code👇

# 39. Displaying ROC Curve without generating predictions

Can you spell out ROC curve without looking it up? If yes, don’t flatter yourself, because many people can😁.

But not a lot of people know that you can draw the ROC curve without even generating predictions. Just use the RocCurveDisplay class and its from_estimator method👇

# 46. Filtering by partial date components

If you have a DateTimeIndex in your Pandas dataframes, you can filter it by partial date components.

For example, from 1995 to 1997, from the 5th month of 1995 to the end of 2000, from the beginning of 2015 to the 17th of July of 2018, etc.

And these all work regardless of the time series index granularity — all courtesy of Pandas.

In [19]:
df = pd.DataFrame()

df["date"] = pd.date_range(start="2022-01-01", end="2025-12-31")
df["fake_data"] = 1
df.set_index("date", inplace=True)

In [20]:
df

Unnamed: 0_level_0,fake_data
date,Unnamed: 1_level_1
2022-01-01,1
2022-01-02,1
2022-01-03,1
2022-01-04,1
2022-01-05,1
...,...
2025-12-27,1
2025-12-28,1
2025-12-29,1
2025-12-30,1


In [21]:
df["2022":"2023"]

Unnamed: 0_level_0,fake_data
date,Unnamed: 1_level_1
2022-01-01,1
2022-01-02,1
2022-01-03,1
2022-01-04,1
2022-01-05,1
...,...
2023-12-27,1
2023-12-28,1
2023-12-29,1
2023-12-30,1


In [22]:
df["2022-01":"2022-05"]

Unnamed: 0_level_0,fake_data
date,Unnamed: 1_level_1
2022-01-01,1
2022-01-02,1
2022-01-03,1
2022-01-04,1
2022-01-05,1
...,...
2022-05-27,1
2022-05-28,1
2022-05-29,1
2022-05-30,1


In [23]:
df["2022-11":"2022"]

Unnamed: 0_level_0,fake_data
date,Unnamed: 1_level_1
2022-11-01,1
2022-11-02,1
2022-11-03,1
2022-11-04,1
2022-11-05,1
...,...
2022-12-27,1
2022-12-28,1
2022-12-29,1
2022-12-30,1


In [24]:
df["2022":"2022-01-05"]

Unnamed: 0_level_0,fake_data
date,Unnamed: 1_level_1
2022-01-01,1
2022-01-02,1
2022-01-03,1
2022-01-04,1
2022-01-05,1


# 47. Displaying Precision/Recall curve without generating predictions

The area under the Precision/Recall curve is one of the best metrics to evaluate the performance of models in imbalanced classification problems.

Precision measures the percentage of true predictions (true positives / (true positives + false positives)).

Recall is the same as sensitivity (true positives / (true positives + false negatives)).

In an imbalanced problem, we are interested in correctly classifying as much of the minority class (positive class or 1) as possible — i.e. true positives. As both the above metrics focus on true positives and don’t care about correctly classifying the majority class (true negatives), they are one of the best metrics in this context.

By varying the decision threshold of the classifier and plotting precision and recall for each threshold, we get a Precision/Recall curve.

A perfect classifier for an imbalanced problem would have an area of 1.

Below is how you can plot the curve in the easiest way possible in Sklearn👇

55. Decomposing time series into trend, seasonality, and residuals

Time series has three core components — seasonality, trend, and noise (residuals).

These components aren’t easily discernible by looking at the plot of the series itself. So, we often use decomposition to isolate each of these components.

Seasonality lets you see repeating patterns over the time period of the series.

Trend shows you the general upwards or downwards progress of the time series from the beginning of its earliest date to the latest.

Anything left out from these two components is noise.

You can use statsmodels’ tsa_decompose function to perform this operation and plot the results. The first subplot displays the series itself while the rest shows the individual components.

You can learn more about time series decomposition in my separate artilce on the topic.

# 61. Encoding rare labels with RareLabelEncoder

Often, when a categorical variable has a high cardinality (too many categories), many of the categories represent only a tiny proportion of the total.

Having too many classes with very few samples is noise. For ML models to generalize well for all classes, each class must have enough samples.

One solution to the problem is to group rare categories into a single category called “rare” or “other”. The “rarity” can be chosen by selecting a proportion threshold.

You can do this manually in Python but there is a better way. Using the feature-engine library, you can perform the operation using a Sklearn-like transformer.

Useful parameters of RareLabelEncoder:

tol: threshold
replace_with: custom text to replace rare categories
ignore_format: when True, the transformer will work on numerically-encoded features as well. By default, it only works on Pandas “other” or “category” data types.

RareLabelEncoder: https://bit.ly/3vfjNkv

# 76. Voting classifier/regressor

How to reach democracy in machine learning? By using a voting ensemble!

Max voting is a common ensembling technique that uses the majority of vote to label new classification samples. If we have three models with the following predictions in a binary classification problem:

- Model 1 -> class 1
- Model 2 -> class 2
- Model 3 -> class 1

The final prediction would be class 1. VotingClassifier of Sklearn can be used to build such an ensemble.

It takes a list of individual classifiers and ensembles them with the max voting technique when its “voting” parameter is set to “hard”. When it is set to “soft”, the ensemble uses predicted class probabilities and averages them and thresholds the result.

VotingRegressor is the same as VotingClassifier when its “voting” is set to “soft” and works for regression.

In [25]:
from sklearn.ensemble import VotingClassifier
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier, ExtraTreeClassifier
from sklearn.linear_model import LogisticRegression

In [26]:
X, y = make_classification(n_samples=1000, n_features=5)

In [27]:
estimators = [
    ("dtree", DecisionTreeClassifier()),
    ("etree", ExtraTreeClassifier()),
    ("log_reg", LogisticRegression()),
]

In [28]:
ensemble = VotingClassifier(
    estimators=estimators, voting="soft", n_jobs=-1
)

In [29]:
ensemble.fit(X, y)

# 77. Stacking ensemble/regressor

People use stacking to silently win competitions on Kaggle. How does it work?

As a rule, multiple performant models with as different learning functions as possible are chosen to form an ensemble. Then, using KFold cross-validation, predictions are generated for each model.

As an example, with 5 models in a stack doing a 5-fold CV on the data, we will have 25 columns of predictions. This concludes the level 1 of the stack.

In the next level, using these 25 columns of predictions as features, a final — meta estimator is trained with cross-validation and final predictions are made.

This leverages the strength of each individual model in the stack and uses their output as inputs to the final estimator. This helps greatly reduce bias in the predictions.

This complicated ensembling technique is implemented in its basic format in Sklearn as Stacking Classifier/Regressor. You pass a list of base estimators and one final lightweight meta estimator like Logistic Regression. Works just like any Sklearn model.

In [30]:
from sklearn.ensemble import StackingClassifier
from sklearn.datasets import make_classification
from sklearn.tree import DecisionTreeClassifier, ExtraTreeClassifier
from sklearn.linear_model import LogisticRegression

In [31]:
X, y = make_classification(n_samples=1000, n_features=5)

In [32]:
estimators = [
    ("dtree", DecisionTreeClassifier()),
    ("etree", ExtraTreeClassifier()),
]

log_reg = LogisticRegression()

In [33]:
ensemble = StackingClassifier(
    estimators=estimators, final_estimator=log_reg, n_jobs=-1
)

In [34]:
ensemble.fit(X, y)