# Scikit-learn Linear Regression Tutorial using Workflow Interface with Ridge Regularization


This tutorial demonstrates how to train a linear regression model using scikit-learn with Ridge regularization on a dataset, leveraging the new OpenFL Workflow Interface. The Workflow Interface provides a novel way to compose federated learning experiments with OpenFL, enabling researchers to handle non-IID data and perform federated averaging. Through this tutorial, you will learn how to set up the federated learning environment, define the flow, and execute the training process across multiple collaborators.

## We will use MSE as loss function and Ridge weights regularization
![image.png](https://www.analyticsvidhya.com/wp-content/uploads/2016/01/eq5-1.png)

## What is it?

The Workflow Interface is a new way of composing federated learning experiments with OpenFL. It was developed through conversations with researchers and existing users who had novel use cases that didn't quite fit the standard horizontal federated learning paradigm.

## Getting Started

First we start by installing the necessary dependencies for the workflow interface

In [None]:
!pip install -r workflow_interface_requirements.txt

Now, we import the relevant modules and do some basic initializations

In [None]:
from typing import List, Union

from sklearn.linear_model import Lasso
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import numpy as np
import random
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 7, 5

## Implementing a Scikit Linear Regression Model with Lasso Regularization

The following section provides an implementation of a linear regression model using scikit-learn's Lasso (L1 regularization). The SklearnLinearRegressionLasso class includes methods for fitting the model, making predictions, calculating mean squared error (MSE), and printing the model parameters.

In [None]:
class SklearnLinearRegressionLasso:
    def __init__(self, n_feat: int, alpha: float = 1.0) -> None:
        self.model = Lasso(alpha=alpha)
        self.scaler = StandardScaler()
        
    def predict(self, feature_vector: Union[np.ndarray, List[int]]) -> float:
        '''
        feature_vector may be a list or have shape (n_feat,)
        or it may be a bunch of vectors (n_vec, nfeat)
        '''
        feature_vector = np.array(feature_vector)
        if len(feature_vector.shape) == 1:
            feature_vector = feature_vector[:,np.newaxis]
            
        feature_vector = self.scaler.transform(feature_vector)
        return self.model.predict(feature_vector)
    
    def mse(self, X: np.ndarray, Y: np.ndarray) -> float:
        Y_predict = self.predict(X)
        return mean_squared_error(Y, Y_predict)
    
    def fit(self, X: np.ndarray, Y: np.ndarray, silent: bool=False) -> None:
   
        X = self.scaler.fit_transform(X)
        self.model.fit(X, Y)
        mse = self.mse(X, Y)
        if not silent:
            print(f'MSE: {mse}')
            
    def print_parameters(self) -> None:
        print('Final parameters: ')
        print(f'Weights: {self.model.coef_}')
        print(f'Bias: {self.model.intercept_}')
            

In [None]:
# Define input array with angles from 60deg to 400deg converted to radians
x = np.array([i*np.pi/180 for i in range(60,400,4)])
np.random.seed(10)  # Setting seed for reproducibility
y = np.sin(x) + np.random.normal(0,0.15,len(x))
# plt.plot(x,y,'.')

In [None]:
# Initialize the model
lr_model = SklearnLinearRegressionLasso(n_feat=1, alpha=0.1)

# Fit the model
lr_model.fit(x[:,np.newaxis], y)

#print the final parameters
lr_model.print_parameters()

In [None]:
# We can also solve this 1D problem using Numpy
numpy_solution = np.polyfit(x,y,1)
predictor_np = np.poly1d(numpy_solution)

In [None]:
# Predict using the model
y_hat = lr_model.predict(x)
# Plot the results
y_np = predictor_np(x)
plt.plot(x,y,'.')
plt.plot(x,y_hat,'.')
plt.plot(x,y_np,'--')

## Now we run the same training on federated learning workflow api

## Import required libraries for federated learning

In [None]:
# Import necessary libraries
from openfl.experimental.workflow.interface import FLSpec
from openfl.experimental.workflow.placement import aggregator, collaborator
from openfl.experimental.workflow.runtime import FederatedRuntime
from sklearn.model_selection import train_test_split


VALID_PERCENT = 0.3

# Splitting dataset into train and test set 
X_train, X_test, Y_train, Y_test = train_test_split(x, y, test_size=1/3, random_state=0)


print("Training matrix shape", X_train.shape)
print("Test matrix shape", X_test.shape)

## Federated Learning Helper Functions
Define helper functions for training and validating the federated models.

In [None]:
# Import necessary libraries for federated learning
from openfl.experimental.workflow.interface import Collaborator, Aggregator
from openfl.experimental.workflow.runtime import LocalRuntime

# Define a callable to initialize collaborator private attributes
def callable_to_initialize_collaborator_private_attributes(n_collaborators, index, train_dataset, test_dataset, batch_size):
   
    train_splitter = ShardSplitter(n_collaborators)
    X_train, Y_train = train_dataset
    X_test, Y_test = test_dataset

    train_idx = train_splitter.split(X_train, Y_train)
    valid_idx = train_splitter.split(X_test, Y_test)

    train_dataset = X_train[train_idx[index]], Y_train[train_idx[index]]
    test_dataset = X_test[valid_idx[index]], Y_test[valid_idx[index]]

    return {
        "train_loader": train_dataset, "test_loader": test_dataset,
        "batch_size": batch_size
    }
    
# # Setup participants
aggregator = Aggregator()
# aggregator.private_attributes = {}
collaborators = []
collaborator_names = ['Portland', 'Seattle', 'Chandler','Bangalore']
for idx, collaborator_name in enumerate(collaborator_names):
    collaborators.append(
        Collaborator(
            name=collaborator_name, num_cpus=0, num_gpus=0.3,
            private_attributes_callable=callable_to_initialize_collaborator_private_attributes,
            n_collaborators=len(collaborator_names), index=idx, train_dataset=(X_train, Y_train),
            test_dataset=(X_test, Y_test), batch_size=32
        )
    )
local_runtime = LocalRuntime(aggregator=aggregator, collaborators=collaborators, backend='single_process')
print(f'Local runtime collaborators = {local_runtime.collaborators}')


## Shard Splitter Class
Define a helper class to split the data into shards for federated learning.

In [None]:
# Define a class to split the dataset into shards
class ShardSplitter:
    def __init__(self, num_shards):
        self.num_shards = num_shards

    def split(self, X, y):
        """Split the given 2D numpy arrays X and y into equal shards and return list of indexes for each shard."""
        num_samples = X.shape[0]
        shard_size = num_samples // self.num_shards
        indexes = np.arange(num_samples)
        np.random.shuffle(indexes)
        
        shards = []
        for i in range(self.num_shards):
            start_idx = i * shard_size
            if i == self.num_shards - 1:
                # Include any remaining samples in the last shard
                end_idx = num_samples
            else:
                end_idx = start_idx + shard_size
            shards.append(indexes[start_idx:end_idx])
        
        return shards

## Define Federated Averaging Method
The FedAvg method is used to average the models from all the collaborators after training.

In [None]:
# Federated Averaging for Lasso models
def FedAvg(models):
    new_model = models[0]
    coef_list = [model.model.coef_ for model in models]
    intercept_list = [model.model.intercept_ for model in models]
    new_model.coef_ = np.mean(coef_list, axis=0)
    new_model.intercept_ = np.mean(intercept_list, axis=0)
    return new_model

## Define Federated Learning Workflow
Define the workflow for federated learning using OpenFL's FLSpec.

In [None]:
# Define the federated learning workflow
from openfl.experimental.workflow.placement import aggregator, collaborator

# Federated Learning Workflow using OpenFL's Workflow API
class FederatedLassoFlow(FLSpec):
    def __init__(self, model, num_rounds=3):
        self.model = model
        self.num_rounds = num_rounds
        self._checkpoint = False

    @aggregator
    def start(self):
        self.current_round = 0
        self.collaborators = self.runtime.collaborators  # Fetch the collaborators dynamically
        self.next(self.aggregated_model_validation, foreach='collaborators')


    @collaborator
    def aggregated_model_validation(self):
        x_test, y_test = self.test_loader
        mse = self.model.mse(x_test, y_test)
        print(f"aggregation model validation MSE: {mse:.4f}")
        self.aggregated_mse = mse
        self.next(self.train)

    @collaborator
    def train(self):
        x_train, y_train = self.train_loader
        print(f'x_train shape: {x_train.shape}, y_train shape: {y_train.shape}')
        self.model.fit(x_train[:,np.newaxis], y_train)
        self.next(self.local_model_validation)

    @collaborator
    def local_model_validation(self):
        """Validate the model on local test data."""
        
        x_test, y_test = self.test_loader
        mse = self.model.mse(x_test, y_test)
        print(f"Local model validation MSE: {mse:.4f}")
        self.local_mse = mse
        self.next(self.join)


    @aggregator
    def join(self, inputs):

        self.aggregated_model_mse = sum(
            input.aggregated_mse for input in inputs) / len(inputs)
        self.local_model_mse = sum(
            input.local_mse for input in inputs) / len(inputs)
        print(f'Average aggregated model MSE = {self.aggregated_model_mse}')
        print(f'Average local model MSE = {self.local_model_mse}')
        
        print("Taking FedAvg of models of all collaborators")
        self.model = FedAvg([input.model for input in inputs])

        self.next(self.internal_loop)

    @aggregator
    def internal_loop(self):
        if self.current_round == self.num_rounds:
            self.next(self.end)
        else:
            self.current_round += 1
            print(f"current round : {self.current_round}")
            self.next(self.aggregated_model_validation, foreach='collaborators')

    @aggregator
    def end(self):
        print(f"Federated learning complete after {self.num_rounds} rounds.")
        final_predictions = self.model.predict(X_test)
        final_mse = self.model.mse(Y_test, final_predictions)
        print(f"Final aggregated model MSE on test data: {final_mse:.4f}")

## Start the Federated Learning Process
Create an instance of FederatedLassoFlow and run it with the new larger dataset.

In [None]:
# Initialize and run the federated learning workflow
federated_flow = FederatedLassoFlow(model=lr_model, num_rounds=10)

# Set the runtime for federated learning
federated_flow.runtime = local_runtime

# Start the federated learning process
federated_flow.run()

Now we can validate how our final trained model performs on any random dataset.

In [None]:
n_cols = 20
n_samples = 4
interval = 240
x_start = 60
noise = 0.3

X = None
final_model = federated_flow.model # Get the final model after training
for rank in range(n_cols):
    np.random.seed(rank)  # Setting seed for reproducibility
    x = np.random.rand(n_samples, 1) * interval + x_start
    x *= np.pi / 180
    X = x if X is None else np.vstack((X,x))
    y = np.sin(x) + np.random.normal(0, noise, size=(n_samples, 1))
    plt.plot(x,y,'+')
    
X.sort()    
Y_hat = final_model.predict(X)
plt.plot(X,Y_hat,'--')

## 🎉 Congratulations! 🎉

Now that you've completed workflow interface notebook for **scikit-learn Linear Regression** using federated learning.

### Happy learning and happy coding with OpenFL! 🎉