Human-Learn is a library that helps people create and test rule-based systems that work well with scikit-learn tools. It's designed to make it easier for humans to build and evaluate these systems, and you can also use it alongside machine learning models for even better results.

In [None]:
# First, install the package if you haven't already
# python -m pip install human-learn

from hulearn.datasets import load_titanic
import numpy as np


In [None]:
# Load the Titanic dataset
df = load_titanic(as_frame=True)
X, y = df.drop(columns=['survived']), df['survived']

In [None]:
# Define the fare_based function for classification
def fare_based(dataf, threshold=10):
    return np.array(dataf['fare'] > threshold).astype(int)

In [None]:
# Example usage:
# Predict survival based on whether the fare is above a threshold (default threshold is 10)
predictions = fare_based(X)

In [None]:
# Print the predictions
print(predictions)

FunctionClassifier from the Human-Learn library in a scikit-learn pipeline for classification tasks with the Titanic dataset and grid search for hyperparameter tuning:

In [None]:
import pandas as pd
import numpy as np
from hulearn.datasets import load_titanic
from hulearn.classification import FunctionClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import precision_score, recall_score, accuracy_score, make_scorer

# Load the Titanic dataset
df = load_titanic(as_frame=True)
X, y = df.drop(columns=['survived']), df['survived']

# Define the fare_based function for classification
def fare_based(dataf, threshold=10):
    return np.array(dataf['fare'] > threshold).astype(int)

# Convert the function into a scikit-learn compatible model
mod = FunctionClassifier(fare_based, threshold=10)

# Set up the GridSearchCV for hyperparameter tuning
grid = GridSearchCV(mod,
                    cv=2,
                    param_grid={'threshold': np.linspace(0, 100, 30)},
                    scoring={'accuracy': make_scorer(accuracy_score),
                            'precision': make_scorer(precision_score),
                            'recall': make_scorer(recall_score)},
                    refit='accuracy'
                )
grid.fit(X, y)

In [None]:
# Create a DataFrame to visualize scores vs. threshold
score_df = pd.DataFrame(grid.cv_results_).set_index('param_threshold')[['mean_test_accuracy', 'mean_test_precision', 'mean_test_recall']]

# Plot the scores vs. threshold
score_df.plot(figsize=(12, 5), title="Scores vs. Fare Threshold")

In [None]:
import numpy as np
from hulearn.classification import FunctionClassifier

# Assume you have trained outlier_detector and classifier beforehand

# Define the make_decision function for decision-making with fallback mechanisms
def make_decision(dataf, proba_threshold=0.8):
    # First, create a resulting array with all the predictions
    res = classifier.predict(dataf)

    # Check confidence level, if below threshold, use fallback
    proba = classifier.predict_proba(dataf)
    res = np.where(proba.max(axis=1) < proba_threshold, "doubt_fallback", res)

    # Check for outliers and use fallback if detected
    res = np.where(outlier_detector.predict(dataf) == -1, "outlier_fallback", res)

    # The `res` array contains the output of the decision-making process.
    return res

# Create a FunctionClassifier with the make_decision function
fallback_model = FunctionClassifier(make_decision, proba_threshold=0.8)


In this code, outlier_detector and classifier represent your trained outlier detection and classification models. The make_decision function uses these models for decision-making with fallback mechanisms based on confidence levels and outlier detection. Finally, the FunctionClassifier is created with the make_decision function to form the fallback_model.

Human-Learn to draw a machine learning model using interactive charts with the Penguin dataset from scikit-lego:

In [None]:
from sklego.datasets import load_penguins
from hulearn.experimental.interactive import InteractiveCharts
from hulearn.classification import InteractiveClassifier
import matplotlib.pyplot as plt

# Load the Penguin dataset and drop NaN values
df = load_penguins(as_frame=True).dropna()

# Create interactive charts for visualization
clf = InteractiveCharts(df, labels="species")

# Add interactive charts for bill_length_mm vs. bill_depth_mm
clf.add_chart(x="bill_length_mm", y="bill_depth_mm")

# Add interactive charts for flipper_length_mm vs. body_mass_g
clf.add_chart(x="flipper_length_mm", y="body_mass_g")

# Create an InteractiveClassifier using the interactive charts data
model = InteractiveClassifier(json_desc=clf.data())

# Dummy data for plotting
X, y = df.drop(columns=['species']), df['species']
preds = model.fit(X, y).predict_proba(X)

# Plot the interactive charts with predictions
plt.figure(figsize=(12, 3))
for i in range(3):
    plt.subplot(131 + i)
    plt.scatter(X['bill_length_mm'], X['bill_depth_mm'], c=preds[:, i])
    plt.xlabel('bill_length_mm')
    plt.ylabel('bill_depth_mm')
    plt.title(model.classes_[i])

plt.figure(figsize=(12, 3))
for i in range(3):
    plt.subplot(131 + i)
    plt.scatter(X['flipper_length_mm'], X['body_mass_g'], c=preds[:, i])
    plt.xlabel('flipper_length_mm')
    plt.ylabel('body_mass_g')
    plt.title(model.classes_[i])

plt.show()


This code utilizes Human-Learn to create interactive charts for the Penguin dataset, converts these charts into an InteractiveClassifier model, and then visualizes the predictions using scatter plots based on the features in the dataset.

In [None]:
import pandas as pd
import numpy as np

# Define old and new examples with missing values
old_example = pd.DataFrame([{
    'island': 'Torgersen',
    'bill_length_mm': 39.1,
    'bill_depth_mm': 18.7,
    'flipper_length_mm': 220.0,
    'body_mass_g': 5750.0,
    'sex': 'male'}
])

new_example = pd.DataFrame([{
    'island': 'Torgersen',
    'bill_length_mm': np.nan,
    'bill_depth_mm': 18.7,
    'flipper_length_mm': 220.0,
    'body_mass_g': 5750.0,
    'sex': 'male'}
])

# Predict probabilities for the old and new examples
old_preds = model.predict_proba(old_example)
new_preds = model.predict_proba(new_example)

# Print the predicted probabilities for both examples
print("Predicted probabilities for old example:", old_preds)
print("Predicted probabilities for new example with missing values:", new_preds)


This code snippet demonstrates how the machine learning model can handle missing values, as shown by predicting probabilities for an old example without missing values (old_example) and a new example with missing values (new_example).

 Human-Learn as an outlier detection model:

In [None]:
from hulearn.experimental.interactive import InteractiveCharts
from hulearn.outlier import InteractiveOutlierDetector
import matplotlib.pyplot as plt

# Create interactive charts for visualization
charts = InteractiveCharts(df, labels="species")
charts.add_chart(x="bill_length_mm", y="bill_depth_mm")

# Create an InteractiveOutlierDetector using the drawn data
outlier_model = InteractiveOutlierDetector(json_desc=charts.data())

In [None]:
# Dummy data for plotting
X, y = df.drop(columns=['species']), df['species']
preds = outlier_model.fit(X, y).predict(X)

# Plot the scatter plot with outliers detected based on drawn shapes
plt.scatter(X['bill_length_mm'], X['bill_depth_mm'], c=preds)
plt.xlabel('bill_length_mm')
plt.ylabel('bill_depth_mm')
plt.title('Outlier Detection based on Drawn Shapes')
plt.show()

In [None]:
from hulearn.experimental.interactive import InteractiveCharts
from hulearn.preprocessing import InteractivePreprocessor

# Create interactive charts without supplying a label column
charts = InteractiveCharts(df, labels=["group_one", "group_two"])
charts.add_chart(x="bill_length_mm", y="bill_depth_mm")

# Create an InteractivePreprocessor using the drawn data
tfm = InteractivePreprocessor(json_desc=charts.data())

# The flow for scikit-learn pipeline
tfm.fit(df).transform(df)

# The flow for pandas pipeline
df.pipe(tfm.pandas_pipe)


InteractiveCharts are created without supplying a label column, and the drawn features are then used to create an InteractivePreprocessor. The InteractivePreprocessor can be used in both scikit-learn pipelines and pandas pipelines for featurization based on the drawn features.