# Machine Learning
### Pearson correlation for feature selection
**Pearson correlation** measures the *strength* and *direction* of the **linear** relationship between a numerical feature $x$ and a numerical target $y$.
For a feature vector $\boldsymbol{x}$ and a target vector $\boldsymbol{y}$, which include $n$ samples each, the **Pearson corrleation** is computed by:
<br>$\large r=\frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^n (x_i-\bar{x})}\sqrt{\sum_{i=1}^n (y_i-\bar{y})}}$
<br> Where:
- $\bar{x}=\frac{1}{n}\sum_{i=1}^n x_i$ is the **sample mean** of $\boldsymbol{x}$.
- $\bar{y}=\frac{1}{n}\sum_{i=1}^n y_i$ is the **sample mean** of $\boldsymbol{y}$.

<hr>

**Reminder 1:** If we view $\boldsymbol{x}$  as random variable $X$, and $\boldsymbol{y}$ as random variable $Y$; then, **Pearson correlation** may be written as:
<br>$\large r=\frac{Cov(X,Y)}{\sigma_X \sigma_Y}$
<br> Where:
- $Cov(X,Y)$ is the **covariance** between $X$ and $Y$.
- $\sigma_X$ is the **standard deviation** of $X$.
- $\sigma_Y$ is the **standard deviation** of $Y$.

<hr>

For **feature selection** by **Pearson correlation**:
- Compute $∣r∣$ for each feature versus target.
- Rank features by $∣r∣$ (higher = more linearly predictive).
- Select top $k$ features, or those above a threshold.

**Hint 1:** Some points on feature selection by pearson correlation:
- It captures only linear relationships (or sometimes monotonic ones).
- It is sensitive to outliers.
- It fails on non-monotonic patterns (e.g., $y=x^2$)
- It is only for regression problems with numerical features/target, and not for classification ones

**Hint 2**: A near-zero Pearson correlation does not mean "no relationship".
- It just means no linear (or sometimes monotonic) relationship. Non-monotonic dependencies can still be strong

<hr>

**Reminder 2:** **Monotonic Relationship** (for contrast):
- Monotonically increasing: As $x$ increases, $y$ never decreases (may stay flat or go up).
- Monotonically decreasing: As $x$ increases, $y$ never increases.
- Examples: $y=x$, $y=log⁡(x)$, and $y=e^x$.

**Reminder 3:** **Non-monotonic** patterns describe relationships between two variables where the direction of change is not consistently increasing or decreasing. 
- Examples:
    - $y=x^2$ → decreases for $x<0$, increases for $x>0$.
    - $y=sin⁡(x)$ → oscillates up and down.
<hr>

In the following, 
- We first implement the function `pearson_correlation` to compute **Pearson correlation** for given dataset X, and target set y.
    - X is a matrix of shape `(number-of-samples, number-of-features)` such that each row is a data point.
- We also implement the function `select_features_by_pearson` 
    - to select features based on the top `k` features with ighest correlaitons, 
    - or select all features with thier correlations higher than a threshold. 
- Finally, we test the feature selection by a simple data.
- As a bonus, we use functions of **scikit-learn** to use feature selection by Pearson correlations.

<hr>

https://github.com/ostad-ai/Machine-Learning
<br> Explanation: https://www.pinterest.com/HamedShahHosseini/Machine-Learning/

In [1]:
# Import required module
import numpy as np

In [2]:
def pearson_correlation(X, y):
    """
    Compute Pearson correlation coefficient between each feature in X and target y.
    
    Parameters:
        X : ndarray of shape (n_samples, n_features) — numerical features
        y : ndarray of shape (n_samples,) — numerical target
    
    Returns:
        correlations : ndarray of shape (n_features,) — Pearson r for each feature
    """
    X = np.asarray(X)
    
    y = np.asarray(y).flatten()
    
    if X.shape[0] != y.shape[0]:
        raise ValueError("X and y must have the same number of samples.")
    
    # Center the variables (subtract mean)
    X_centered = X - np.mean(X, axis=0,keepdims=True)
    y_centered = y - np.mean(y)
    
    # Compute covariance (numerator)
    cov_xy = np.sum(X_centered * y_centered[:, np.newaxis], axis=0)
    
    # Compute standard deviations (denominator)
    std_x = np.sqrt(np.sum(X_centered ** 2, axis=0))
    std_y = np.sqrt(np.sum(y_centered ** 2))
    
    # Safe division: set correlation to 0 where denominator is 0
    with np.errstate(divide='ignore', invalid='ignore'):
        correlations = cov_xy / (std_x * std_y)
    
    # Replace NaN (0/0) and inf (non-zero/0) with 0
    correlations=np.nan_to_num(correlations, nan=0.0, posinf=0.0, neginf=0.0)
    
    return correlations

In [3]:
def select_features_by_pearson(X, y, k=None, threshold=None):
    """
    Select top-k or threshold-based features using Pearson correlation.
    
    Parameters:
        X : feature matrix
        y : target vector
        k : int, optional — number of top features to select
        threshold : float, optional — minimum |correlation| to keep
    
    Returns:
        selected_X : selected features
        selected_indices : indices of selected features
        scores : correlation scores (absolute values)
    """
    corr = pearson_correlation(X, y)
    scores = np.abs(corr)
    
    if k is not None:
        selected_indices = np.argsort(scores)[-k:]  # top-k
    elif threshold is not None:
        selected_indices = np.where(scores >= threshold)[0]
    else:
        raise ValueError("Either k or threshold must be specified.")
    
    return X[:, selected_indices], selected_indices, scores

In [4]:
# Example
# Sample regression data
X = np.array([
    [1, 10, 5],
    [2, 20, 3],
    [3, 30, 8],
    [4, 40, 2],
    [5, 50, 7]
], dtype=float)

y = np.array([2.1, 4.0, 6.2, 8.1, 10.0])  # roughly y ≈ 2*x1

# Compute correlations
corr = pearson_correlation(X, y)
print("Pearson correlations:", corr)
# feature 2 (index 2) is weakly related

# Select top k=2 features
X_selected, idx, scores = select_features_by_pearson(X, y, k=2)
print("Selected feature indices:", idx) 
print("New Dataset:\n",X_selected)

Pearson correlations: [0.99965927 0.99965927 0.19626949]
Selected feature indices: [0 1]
New Dataset:
 [[ 1. 10.]
 [ 2. 20.]
 [ 3. 30.]
 [ 4. 40.]
 [ 5. 50.]]


<hr style="height:3px; background-color:lightblue">

# Bonus
#### Feature selection by Pearson correlation with scikit-learn

In [5]:
from sklearn.feature_selection import SelectKBest, f_regression

# Note: f_regression uses F-stat, but ranking ≡ |Pearson| for single features
selector = SelectKBest(score_func=f_regression, k=2)
X_selected_SL = selector.fit_transform(X, y)
print("New Dataset with Scikit-learn:\n",X_selected_SL)

New Dataset with Scikit-learn:
 [[ 1. 10.]
 [ 2. 20.]
 [ 3. 30.]
 [ 4. 40.]
 [ 5. 50.]]
