### **Defrosting Simple Simulation TimeSeries**





### Import the libraries that we will need 

In [82]:
import pandas as pd
import numpy as np
from collections import Counter


### **Method generate_segment()**:
1. **Use** Generates a segment within the time series
2.  **Length input**: determines the length of the time series segment
3. **Temp input**: A tuple that represents the temp min and max variables for this segment
4. **Humidity input**: A tuple that represents the humidity min and max for this segment
5. **Ice thickness**: A tuple that represents the min and max for this segment
6. The method **uses** the paramters to generate values for diff variables within the segment, if the label is **defrosting**, the ice thickness **decreases** and if not ice thickness **increase** 

In [83]:
# Function to generate data for a segment
def generate_segment(length, label,temp=(20,30),humidity=(40,80),ice_thickness =(0,10) ):
    temperature = np.random.uniform(*temp, length)
    humidity = np.random.uniform(*humidity, length)
    if label == 'Normal':
        ice_thickness = np.linspace(*ice_thickness, length)  # Simulate ice building up
    else:  # 'defrosting'
        ice_thickness = np.linspace( *ice_thickness[::-1],length)  # Simulate ice melting
    return temperature, humidity, ice_thickness, [label] * length

### **Method creating_defrosting_synthetic_data()**:
1. **Use:** Generates the whole data that simulates the defrosting simple data
2. **Input**: the wanted number of points in the data, the segment_length_range and the start time stamp 
3. **Calls** the generate_segment method for each segment length generated
4. Uses all the data generated to return a dataframe that **represents the simulated data**   

In [84]:

def creating_defrosting_synthetic_data(num_timestamps,segment_length_range=(10,100),start_date='2023-01-01' ):

    # Generate timestamps with one-minute frequency
    timestamps = pd.date_range(start=start_date, periods=num_timestamps, freq='T')
    # Initialize empty lists to collect data
    temperature_data = []
    humidity_data = []
    ice_thickness_data = []
    segment_id_data =[]
    labels = []
    length_data = []
    # Generate data with variable segment lengths
    i = 0
    segment_id =0
    while i < num_timestamps:
        segment_length = np.random.randint(*segment_length_range)
        if i + segment_length > num_timestamps:
            segment_length = num_timestamps - i
        label = np.random.choice(['Normal', 'defrosting'])
        temp, hum, ice, label = generate_segment(segment_length, label)
        segment_id_data.extend([segment_id ]*segment_length)
        length_data.extend([segment_length]*segment_length)
        temperature_data.extend(temp)
        humidity_data.extend(hum)
        ice_thickness_data.extend(ice)
        segment_id = segment_id+1
        labels.extend(label)

        i += segment_length
    # Create DataFrame
    data ={
        'timestamp': timestamps,
        'segment_length':length_data,
        'segment_id': segment_id_data,
        'temperature': temperature_data,
        'humidity': humidity_data,
        'ice_thickness': ice_thickness_data,
        
        'class': labels
    }
    df = pd.DataFrame(data)
    df['timestamp'] = pd.to_datetime(df['timestamp'])
    df.set_index('timestamp', inplace=True)
    return df

defrosting_df = creating_defrosting_synthetic_data(num_timestamps = 20000,segment_length_range =(10,100),start_date = '2023-01-01' ) 
defrosting_df.head()

Unnamed: 0_level_0,segment_length,segment_id,temperature,humidity,ice_thickness,class
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2023-01-01 00:00:00,71,0,27.568287,62.383818,10.0,defrosting
2023-01-01 00:01:00,71,0,20.436334,46.813533,9.857143,defrosting
2023-01-01 00:02:00,71,0,28.583875,55.166274,9.714286,defrosting
2023-01-01 00:03:00,71,0,20.585185,52.539204,9.571429,defrosting
2023-01-01 00:04:00,71,0,21.017681,47.254363,9.428571,defrosting


### **Handling Variable Length Segments**  
1. Typically in a time series process, the segments **are not all equal length**
2. There are 2 ways to handle this: (**padding** for a desired_length, or **creating slding window**)
3. Here i'll implement a **sliding_window** 


### **Method create_sliding_windows_df()**:
1. The method is **used** to **divide** the time segments within the **required window size**.
2. Takes as **input**: the dataframe that windows should be created, the size of the datasframe, as well as the window_size
3. There are 2 ways to create sliding windows dataframe, **overlapping and non overlapping**, Here we are implmenting non overlapping 
3. The **different windows** are created and given a spefic windows_id
4. Another column is added, **"majority class"** is the label for that specfic window which is determined by the major number of labels within that window 
5. using all the different windows created, **a dataframe is returned** 

In [85]:

def create_sliding_windows_df(df,datafram_size, window_size):
    windows_id = 0
    df_windows = []
    end = datafram_size - window_size +1
    for start in range(0,  end, window_size):
        end = start + window_size
        window = df.iloc[start:end].copy() 
        # Assign window_id to the window  
        window["window_id"] = windows_id
        # Determine the majority class in the window
        class_counts = Counter(window['class'])
        majority_class = class_counts.most_common(1)[0][0]
        window["majority_class"] = majority_class 
        df_windows.append(window)
        windows_id = windows_id+1
    # Create a DataFrame from the windows
    concatenated_df = pd.concat(df_windows)
    return concatenated_df

# Define window size and step size
window_size = int(defrosting_df['segment_length'].unique().mean())
sliding_windows_df = create_sliding_windows_df(defrosting_df,len(defrosting_df),window_size)



In [86]:
#sliding_windows_df
group_lengths = sliding_windows_df.groupby('window_id').size()
print(group_lengths)

# Check if all lengths are the same
all_equal_length = group_lengths.nunique() == 1
print(all_equal_length)

window_id
0      55
1      55
2      55
3      55
4      55
       ..
358    55
359    55
360    55
361    55
362    55
Length: 363, dtype: int64
True


In [7]:
sliding_windows_df[sliding_windows_df["window_id"]==1 ]

Unnamed: 0_level_0,segment_length,segment_id,temperature,humidity,ice_thickness,class,window_id,majority_class
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2023-01-01 00:54:00,61,0,24.713244,55.815735,1.0,defrosting,1,Normal
2023-01-01 00:55:00,61,0,27.176574,55.441494,0.833333,defrosting,1,Normal
2023-01-01 00:56:00,61,0,28.488716,75.146234,0.666667,defrosting,1,Normal
2023-01-01 00:57:00,61,0,26.535951,61.494793,0.5,defrosting,1,Normal
2023-01-01 00:58:00,61,0,25.699279,75.205826,0.333333,defrosting,1,Normal
2023-01-01 00:59:00,61,0,28.899369,40.837526,0.166667,defrosting,1,Normal
2023-01-01 01:00:00,61,0,27.771327,60.581881,0.0,defrosting,1,Normal
2023-01-01 01:01:00,52,1,26.027166,78.011441,0.0,Normal,1,Normal
2023-01-01 01:02:00,52,1,22.033941,79.017571,0.196078,Normal,1,Normal
2023-01-01 01:03:00,52,1,29.196378,49.173472,0.392157,Normal,1,Normal


### Method extract_features():
1.  Takes a segment and produces extra Statstical  features for that segment.
2.  It takes as an input the segment as well as the columns we want to extract more features from.

 

In [75]:
#feature segmentation, 
def extract_features(df_segment,columns):
    features = {}
    for column in columns:
        if(column == 'majority_class'):
            features[column] = df_segment[column].iloc[0]
        else:
            features[column+"_mean"] = df_segment[column].mean()
            features[column+"_std"] = df_segment[column].std()
            features[column+"_percentile_25"] = np.percentile(df_segment[column], 25)
            features[column+"__percentile_75"] = np.percentile(df_segment[column], 75)
    return pd.Series(features)

grouped = sliding_windows_df.groupby('window_id')
columns_of_interest = ['temperature', 'humidity',"ice_thickness",'majority_class']
features_df = grouped.apply(lambda x: extract_features(x, columns_of_interest)).reset_index()
features_df[features_df["window_id"]==0]



Unnamed: 0,window_id,temperature_mean,temperature_std,temperature_percentile_25,temperature__percentile_75,humidity_mean,humidity_std,humidity_percentile_25,humidity__percentile_75,ice_thickness_mean,ice_thickness_std,ice_thickness_percentile_25,ice_thickness__percentile_75,majority_class
0,0,24.954695,2.861978,22.147278,27.212433,62.942396,11.575521,52.37479,73.406292,5.583333,2.622022,3.375,7.791667,defrosting
