# Computing Group-Wise L1 Regularization Strengths from SHAP Values  
# L1-Grouped-Complete


Compute group-wise L1 regularization strengths for feature groups based on SHAP values. The process includes SHAP extraction, contribution analysis, normalization and the final computation of the strengths using a 5-step procedure.

In [1]:
import numpy as np
import pandas as pd

## Step 1: Extract SHAP Values and Split into Groups

Compute the absolute mean SHAP values and organize them into their respective groups (features, epigenetic codes, etc.).

In [2]:
shap_values = np.load("path/to/shap_values.npy")
print(f"SHAP Values Shape: {shap_values.shape}")

SHAP Values Shape: (1, 50, 264)


In [3]:
shap_values = shap_values.squeeze()
print(f"Squeezed SHAP shape: {shap_values.shape}")

Squeezed SHAP shape: (50, 264)


In [4]:
mean_shap_values = np.mean(np.abs(shap_values), axis=0)

In [5]:
shap_df = pd.DataFrame({
    "Feature": [f"Feature {i+1}" for i in range(mean_shap_values.shape[0])],
    "Mean_SHAP_Value": mean_shap_values
})

In [6]:
f_length=24
group_names = ["features", "feature_ont", "feature_offt", "on_epigenetic_code", "off_epigenetic_code"]
group_ranges = [0, f_length, 2*f_length, 3*f_length, 7*f_length, len(shap_df)]

In [7]:
# Split the SHAP values into named groups
feature_groups = {
    name: shap_df.iloc[start:end]
    for name, start, end in zip(group_names, group_ranges[:-1], group_ranges[1:])
}

In [8]:
features = feature_groups['features']
feature_ont = feature_groups['feature_ont']
feature_offt = feature_groups['feature_offt']
on_epigenetic_code = feature_groups['on_epigenetic_code']
off_epigenetic_code = feature_groups['off_epigenetic_code']

In [9]:
# 24 features for each epigenetic factor
factors = ["CTCF", "DNase", "H3K4me3", "RRBS"]
features_per_factor = 24
num_of_factors = 4

In [10]:
on_target_factors = {
    f"On-Target - {factors[i]}": on_epigenetic_code.iloc[i * features_per_factor:(i + 1) * features_per_factor]
    for i in range(num_of_factors)
}

In [11]:
off_target_factors = {
    f"Off-Target - {factors[i]}": off_epigenetic_code.iloc[i * features_per_factor:(i + 1) * features_per_factor]
    for i in range(num_of_factors)
}

In [12]:
data = {**on_target_factors, **off_target_factors}

# for factor_name, factor_data in data.items():
#     print(f"\n{factor_name}:\n")
#     print(factor_data.head())

## Step 2: Compute Mean SHAP Contribution per Group

Calculate the average SHAP importance for each functional group (features, on/off-target, and epigenetic factors).

In [13]:
factor_mean_contribution = {
    factor: group['Mean_SHAP_Value'].mean()
    for factor, group in data.items()
}

In [14]:
features_mean_contribution = features['Mean_SHAP_Value'].mean()
feature_ont_mean_contribution = feature_ont['Mean_SHAP_Value'].mean()
feature_offt_mean_contribution = feature_offt['Mean_SHAP_Value'].mean()

In [15]:
total_mean_contributions = {
    "features": features_mean_contribution,
    "feature_ont": feature_ont_mean_contribution,
    "feature_offt": feature_offt_mean_contribution,
    **factor_mean_contribution  
}

# pd.DataFrame.from_dict(total_mean_contributions, orient='index', columns=["Mean_SHAP_Contribution"])
print("Mean SHAP Contribution per Group:\n")
for name, value in total_mean_contributions.items():
    print(f"{name:<20}  {value:.8f}")

Mean SHAP Contribution per Group:

features              0.00236280
feature_ont           0.00076624
feature_offt          0.00082277
On-Target - CTCF      0.00039243
On-Target - DNase     0.00056323
On-Target - H3K4me3   0.00044109
On-Target - RRBS      0.00046129
Off-Target - CTCF     0.00059545
Off-Target - DNase    0.00068344
Off-Target - H3K4me3  0.00072015
Off-Target - RRBS     0.00073305


In [16]:
mean_contributions_array = np.array(list(total_mean_contributions.values()))

## 3. Normalize group contributions
#### x_norm = (x - min) / (max - min)

In [17]:
normalized_group_contributions = (mean_contributions_array - mean_contributions_array.min()) / (mean_contributions_array.max() - mean_contributions_array.min())
# pd.Series(normalized_group_contributions, index=total_mean_contributions.keys())
print("Normalized SHAP Contribution per Group:\n")
for name, value in zip(total_mean_contributions.keys(), normalized_group_contributions):
    print(f"{name:<20}  {value:.8f}")

Normalized SHAP Contribution per Group:

features              1.00000000
feature_ont           0.18971717
feature_offt          0.21840700
On-Target - CTCF      0.00000000
On-Target - DNase     0.08668313
On-Target - H3K4me3   0.02469607
On-Target - RRBS      0.03494938
Off-Target - CTCF     0.10303925
Off-Target - DNase    0.14769449
Off-Target - H3K4me3  0.16632368
Off-Target - RRBS     0.17286938


## Step 4: Reverse normalized values

In [18]:
regularization_weights  = 1 - normalized_group_contributions
for name, value in zip(total_mean_contributions.keys(), regularization_weights ):
    print(f"{name:<20} {value:.8f}")

features             0.00000000
feature_ont          0.81028283
feature_offt         0.78159300
On-Target - CTCF     1.00000000
On-Target - DNase    0.91331687
On-Target - H3K4me3  0.97530393
On-Target - RRBS     0.96505062
Off-Target - CTCF    0.89696075
Off-Target - DNase   0.85230551
Off-Target - H3K4me3 0.83367632
Off-Target - RRBS    0.82713062


## Step 5: Scale reversed values to regularization strengths

Convert reversed scores into regularization values within a target range (e.g., 0.001 to 0.01).

In [19]:
min_k = 0.001  # Min penalty
max_k = 0.01  # Max penalty

In [20]:
scaled_k_values = min_k + (max_k - min_k) * regularization_weights
print("Scaled Regularization Strengths per Group:\n")
for name, value in zip(total_mean_contributions.keys(), scaled_k_values):
    print(f"{name:<20}  {value:.8f}")

Scaled Regularization Strengths per Group:

features              0.00100000
feature_ont           0.00829255
feature_offt          0.00803434
On-Target - CTCF      0.01000000
On-Target - DNase     0.00921985
On-Target - H3K4me3   0.00977774
On-Target - RRBS      0.00968546
Off-Target - CTCF     0.00907265
Off-Target - DNase    0.00867075
Off-Target - H3K4me3  0.00850309
Off-Target - RRBS     0.00844418
