# Random Probe Feature Selection


In this notebook, we demonstrate the concept of using a "random probe" feature to rank and select important features in a machine learning model.
We'll use a synthetic dataset and a simple linear regression model for this demonstration.

## Steps:
1. Generate a Synthetic Dataset
2. Introduce a Random Probe
3. Fit a Model
4. Rank Features


In [1]:

# Import necessary libraries
from sklearn.datasets import make_regression
import pandas as pd
import numpy as np

# Generate synthetic dataset
np.random.seed(42)
X, y = make_regression(n_samples=200, n_features=5, n_informative=3, noise=0.1, random_state=42)

# Convert the dataset to a DataFrame
df = pd.DataFrame(X, columns=[f'Feature_{i}' for i in range(1, 6)])
df['Target'] = y

df.head()




Unnamed: 0,Feature_1,Feature_2,Feature_3,Feature_4,Feature_5,Target
0,-0.385314,0.19906,-0.600217,0.462103,0.069802,-28.346743
1,0.130741,1.632411,-1.430141,-1.247783,-0.440044,-94.823426
2,-0.77301,0.224092,0.012592,-0.40122,0.097676,-3.452789
3,-0.576771,-0.050238,-0.238948,0.270457,-0.907564,-11.095716
4,-0.575818,0.614167,0.757508,-0.22097,-0.530501,51.040021



## Introduce a Random Probe
We add a random feature to act as our probe for feature ranking.


In [2]:

# Add a random probe feature
df['Random_Probe'] = np.random.randn(df.shape[0])

df.head()


Unnamed: 0,Feature_1,Feature_2,Feature_3,Feature_4,Feature_5,Target,Random_Probe
0,-0.385314,0.19906,-0.600217,0.462103,0.069802,-28.346743,0.496714
1,0.130741,1.632411,-1.430141,-1.247783,-0.440044,-94.823426,-0.138264
2,-0.77301,0.224092,0.012592,-0.40122,0.097676,-3.452789,0.647689
3,-0.576771,-0.050238,-0.238948,0.270457,-0.907564,-11.095716,1.52303
4,-0.575818,0.614167,0.757508,-0.22097,-0.530501,51.040021,-0.234153



## Fit a Model
We use linear regression to fit a model and examine the coefficients.


In [3]:

from sklearn.linear_model import LinearRegression

# Prepare the features and target variable
features = df.drop('Target', axis=1).columns
X = df[features]
y = df['Target']

# Fit a linear regression model
model = LinearRegression()
model.fit(X, y)

# Get the coefficients for each feature
coefficients = pd.DataFrame({
    'Feature': features,
    'Coefficient': model.coef_
}).sort_values('Coefficient', key=abs, ascending=False)

coefficients


Unnamed: 0,Feature,Coefficient
2,Feature_3,63.648034
3,Feature_4,16.758093
1,Feature_2,10.458872
4,Feature_5,-0.003535
5,Random_Probe,0.00167
0,Feature_1,0.001627



## Interpretation
- Any feature with a coefficient lower than the random probe could be considered less important for the model.
- Features with coefficients significantly higher than the random probe are likely to be important for the model.


*Thanks for reading! Feel free to share your feedback or ask questions in the comments below. And if you found this helpful - please upvote! Check back again soon for more easy to follow demos of how to improve your code in future notebooks!*