<a href="https://colab.research.google.com/github/kirank981/Project_space/blob/main/project_space.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Installing dependences


Install the necessary packages for PyTorch (torch and torchvision) and Flower (flwr) and pandas

In [11]:
!pip install -q flwr[simulation] torch torchvision matplotlib pandas

Import everything we need

In [22]:
from collections import OrderedDict
from typing import List, Tuple

from google.colab import drive
from pathlib import Path
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, random_split
from torchvision.datasets import CIFAR10

import flwr as fl
from flwr.common import Metrics

DEVICE = torch.device("cpu")  # Try "cuda" to train on GPU
print(
    f"Training on {DEVICE} using PyTorch {torch.__version__} and Flower {fl.__version__}"
)

Training on cpu using PyTorch 2.0.1+cu118 and Flower 1.4.0


# Loading the data

Mounting drive

In [None]:
# Mount Google Drive
drive.mount('/content/drive')


Setting the path to the location of the file

In [23]:
# Define the path to daily dataset folder
daily_dataset_path = Path('/content/drive/MyDrive/Federated learning implementation/dataset/dataset_archive/daily_dataset/daily_dataset')

# Define the path to daily dataset folder
weather_daily_dataset_path = Path('/content/drive/MyDrive/Federated learning implementation/dataset/dataset_archive/weather_daily_dataset.csv')


## Loading daily data
(of energy consumption)


In [14]:

# Initializing list to store dataframes
dfs = []

# Loop through the CSV files and reading them into dataframes
for i in range(111):
    filename = f'block_{i}.csv'
    df = pd.read_csv(daily_dataset_path / filename)
    dfs.append(df)

# Concatenating all the dataframes into a single dataframe
all_data = pd.concat(dfs, ignore_index=True)


unique_identifier_column = 'LCLid'
# Group the data by LCLid and create a dictionary of dataframes
grouped_data = dict(tuple(all_data.groupby(unique_identifier_column)))


Loading data using the 'LCLid'

In [15]:
# Now can access each dataframe separately using the LCLid as the key
# for "MAC000002"(random)
mac000002_data = grouped_data['MAC000002']

# Display the data for 'MAC000002'
print(mac000002_data)

         LCLid         day  energy_median  energy_mean  energy_max  \
0    MAC000002  2012-10-12         0.1385     0.154304       0.886   
1    MAC000002  2012-10-13         0.1800     0.230979       0.933   
2    MAC000002  2012-10-14         0.1580     0.275479       1.085   
3    MAC000002  2012-10-15         0.1310     0.213688       1.164   
4    MAC000002  2012-10-16         0.1450     0.203521       0.991   
..         ...         ...            ...          ...         ...   
500  MAC000002  2014-02-24         0.1345     0.261000       0.891   
501  MAC000002  2014-02-25         0.1925     0.246375       0.802   
502  MAC000002  2014-02-26         0.1515     0.256833       1.028   
503  MAC000002  2014-02-27         0.2180     0.427458       1.350   
504  MAC000002  2014-02-28         1.3870     1.387000       1.387   

     energy_count  energy_std  energy_sum  energy_min  
0              46    0.196034       7.098       0.000  
1              48    0.192329      11.087      

Loading data using file name

In [16]:
# Loading data from a specific CSV file
specific_file_data = pd.read_csv(daily_dataset_path / 'block_0.csv')

# Displaying data
print(specific_file_data)

           LCLid         day  energy_median  energy_mean  energy_max  \
0      MAC000002  2012-10-12         0.1385     0.154304       0.886   
1      MAC000002  2012-10-13         0.1800     0.230979       0.933   
2      MAC000002  2012-10-14         0.1580     0.275479       1.085   
3      MAC000002  2012-10-15         0.1310     0.213688       1.164   
4      MAC000002  2012-10-16         0.1450     0.203521       0.991   
...          ...         ...            ...          ...         ...   
25569  MAC005492  2014-02-24         0.1690     0.175042       0.378   
25570  MAC005492  2014-02-25         0.1550     0.160792       0.545   
25571  MAC005492  2014-02-26         0.1490     0.178542       0.687   
25572  MAC005492  2014-02-27         0.1140     0.146167       0.478   
25573  MAC005492  2014-02-28         0.0880     0.088000       0.088   

       energy_count  energy_std  energy_sum  energy_min  
0                46    0.196034       7.098       0.000  
1                48

## Loading daily weather data

Creating a 'day' column that stores only the date values from 'time' column
(for linking weather dataset 'day' with daily dataset 'day')

In [25]:
# Load the weather dataset into a DataFrame
weather_daily_data = pd.read_csv(weather_daily_dataset_path)

# Convert the 'time' column to datetime format
weather_daily_data['time'] = pd.to_datetime(weather_daily_data['time'])

# Extract the date from the 'time' column and create a new 'day' column
weather_daily_data['day'] = weather_daily_data['time'].dt.date

# Calculate the mean temperature for each day and store it in a new column 'mean_temp'
weather_daily_data['mean_temp'] = (weather_daily_data['temperatureMax'] + weather_daily_data['temperatureMin']) / 2

# Print the updated DataFrame
print(weather_daily_data)

     temperatureMax   temperatureMaxTime  windBearing                 icon  \
0             11.96  2011-11-11 23:00:00          123                  fog   
1              8.59  2011-12-11 14:00:00          198    partly-cloudy-day   
2             10.33  2011-12-27 02:00:00          225    partly-cloudy-day   
3              8.07  2011-12-02 23:00:00          232                 wind   
4              8.22  2011-12-24 23:00:00          252  partly-cloudy-night   
..              ...                  ...          ...                  ...   
877            9.03  2014-01-26 16:00:00          233    partly-cloudy-day   
878           10.31  2014-02-27 14:00:00          224    partly-cloudy-day   
879           18.97  2014-03-09 14:00:00          172  partly-cloudy-night   
880            8.83  2014-02-12 16:00:00          210                 wind   
881            9.90  2014-02-15 12:00:00          233                 wind   

     dewPoint   temperatureMinTime  cloudCover  windSpeed  pres

Available datasets

(for 500+ days)(household nos:5566 (in daily dataset))
Daily bases
Half-Hourly bases
Each household has different amount of data(ie. some have more days of data)

I need to create a code for taking in/loading all the data from the .csv files correctly.

To show that i need to load one household data using an id.

Then i need take 2(to begin with) random household id and club all the data in the id to do that i need to display them as well.



