In [6]:
import numpy as np
import pandas as pd
import plotly
import plotly.express as px

Class weights would be modifying the cost functions by a multiplier of the actual gradient  
1. This allows for imbalanced datasets to put more weight on the minority class to balance out how strongly the weight updates are affected by the respective classes  
2. Also allows for modifying the cost based on true value loss. A False Negative may be more costly than a False Positive. Determining monetary value associated with these losses could drive a more optimal solution 

Example: Cost Function results in update of + 0.2 for a given feature weight
If this was minority class, where data is imbalanced 5:95 ratio, then could suggest the following updates
If true_case == 

wj=n_samples / (n_classes * n_samplesj)

Takes in a list of class labels and a list of their respective counts in the dataset

In [3]:
def calculate_class_weights(classes, class_counts):
    
    weights = []
    for i, class_name in enumerate(classes):    
        weights.append(n_samples / (len(classes) * class_counts[i]))
        
    return weights

loops through possible conversion rates and calculates optimal class weights for each class at each conversion rate

In [65]:
def simulate_conversion_rates(n_samples=10000, start=0.01, stop=1.00, step=0.01):
 
    class_weights = []
    conversion_rates = []
    for conversion_rate in np.arange(start, stop, step):
        conversion_rates.append(conversion_rate)
        class_weights.append(
            calculate_class_weights(
                classes=[0, 1],
                class_counts=[n_samples - (n_samples * conversion_rate), (n_samples * conversion_rate)]))

    x_y = np.array(class_weights)
    df_class_weights = pd.DataFrame(x_y).assign(conversion_rate=conversion_rates)
    
    return df_class_weights

In [66]:
df_class_weights = simulate_conversion_rates()

In [67]:
df_plot = (
    df_class_weights
    .melt(id_vars=['conversion_rate'], 
          var_name='Class Label', value_name='Class Weight')
    .rename(columns={'conversion_rate': 'Conversion Rate'}))

In [68]:
fig = px.line(
    df_plot,
    x='Conversion Rate', y='Class Weight', 
    color='Class Label',
    title='Balanced Class Weights for Given Conversion Rate',
    )
fig.show()

# Log Loss
Logistic Regression loss function

$$\frac{1}{N}\sum_{i=1}^N(-(y_i * log(\hat{y_i}) + (1 - y_i) * log(1 - \hat{y_i} )$$

Adding Weights to cost function

$$\frac{1}{N}\sum_{i=1}^N(-w_0(y_i * log(\hat{y_i})) + w_1((1 - y_i) * log(1 - \hat{y_i}))$$

# Resources
https://www.analyticsvidhya.com/blog/2020/10/improve-class-imbalance-class-weights/