RANSAC stands for RANdom SAmple Consensus, which uses random sampling and fitting to reduce the influences of outliers.
RANSAC selects a random number of examples to be inliers and fit a linear regression to it. The rest of the points are tested, and up to a user defined (hyperparameter) tolerance/distance, the test points are classified as inliers versus those that lie outside. Estimate the model error versus the inliers. If this doesn't produce an accurate enough model, restart the process.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# We will use the AMES housing dataset by Dean De Cock

df = pd.read_csv('http://jse.amstat.org/v19n3/decock/AmesHousing.txt', sep='\t') # Tab separated
# This is unnecessarily large, so let's pick out a subset of 5 variables, along wit the target
columns = ['Overall Qual', 'Overall Cond', 'Gr Liv Area', 'Central Air', 'Total Bsmt SF', 'SalePrice']

df = df[columns]

# There is only one missing entry. Let's drop that row, since we have a large enough dataset
df = df.dropna(axis=0)
# Let's encode the central air conditioning variable
df['Central Air'] = df['Central Air'].map({'N': 0, 'Y': 1})

In [None]:
from sklearn.model_selection import train_test_split
# Let's start with a 1D example, with the Gr Liv Area
from sklearn.linear_model import RANSACRegressor, LinearRegression

X = df[['Gr Liv Area']].values
y = df['SalePrice'].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

ransac = RANSACRegressor(LinearRegression(),
                         max_trials=100, # Number of RANSAC iterations to perform
                         min_samples=0.95, # Number of randomly chosen data points need to make up at least 95% of the dataset
                         residual_threshold=None, # sklearn defaults to Mean Absolute Deviation of the target 
                         random_state=1)

# The residual threshold is a model specific criterion that needs to be adapted appropriately

ransac.fit(X, y)