<h4>Auto ML Sliding Window Regression Classifier</h4>
<p>Employing AutoML and sliding window regression classifier to find the count of each identified phase.</p>

In [None]:
import pandas as pd
df = pd.read_csv("June01.csv")

In [33]:
# naive approach just detecting change in phase
phase_counts = {phase: 0 for phase in df['Phase'].unique()} # Initializing a count map {"Climb":0, ...}

phases = df['Phase'].values
number_of_phases = len(phases)
for i in range(number_of_phases):
    if (i+1 < number_of_phases-1):
      # checking if there is a phase change
      if phases[i] != phases[i+1]:
        phase_counts[phases[i]]+=1
phase_counts

{'Hover In Ground Effect': 499,
 'LandingOrTakeOff': 61,
 'Standing': 467,
 'Hover Descent': 2,
 'Surface Taxi': 1,
 'Climb': 212,
 'Cruise': 423,
 'Descent': 236,
 'Hover Lift': 2}

In [7]:
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the dataset into a pandas DataFrame
data = df[['Groundspeed', 'Vert. Speed', 'Altitude(AGL)', 'Phase']]

# Define the window size
window_size = 5

# Create lists to store windowed data and corresponding phases
windowed_data = []
phases = []

# Iterate through the dataset with a sliding window
for i in range(len(data) - window_size + 1):
    window_data = data.iloc[i:i+window_size]

    # Check if all rows in the window have the same phase
    if window_data['Phase'].nunique() == 1:
        phase = window_data['Phase'].iloc[0]

        # Check if this phase is different from the previous one
        if not phases or phase != phases[-1]:
            windowed_data.append(window_data[['Vert. Speed', 'Groundspeed', 'Altitude(AGL)']].values.flatten())
            phases.append(phase)


In [8]:
# Convert lists to numpy arrays
import numpy as np

X = np.array(windowed_data)
y = np.array(phases)

In [9]:
from flaml import AutoML

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Define the AutoML search space and initialize the AutoML object
automl = AutoML()
settings = {
    "time_budget": 60,  # in seconds
    "task": "classification",
    "metric": 'accuracy',  # Metric to optimize for, in this case, accuracy
}
automl_settings = {**settings, **{"X_train": X_train, "y_train": y_train}}

# Search for the best classification model using FLAML
automl.fit(**automl_settings)

# Predict using the best model found by FLAML
y_pred = automl.predict(X_test)

# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.2f}")

[flaml.automl.logger: 10-24 02:26:02] {1679} INFO - task = classification
[flaml.automl.logger: 10-24 02:26:02] {1690} INFO - Evaluation method: cv


INFO:flaml.automl.task.generic_task:class 3 augmented from 2 to 20
INFO:flaml.automl.task.generic_task:class 5 augmented from 2 to 20


[flaml.automl.logger: 10-24 02:26:02] {1788} INFO - Minimizing error metric: 1-accuracy
[flaml.automl.logger: 10-24 02:26:02] {1900} INFO - List of ML learners in AutoML Run: ['lgbm', 'rf', 'xgboost', 'extra_tree', 'xgb_limitdepth', 'lrl1']
[flaml.automl.logger: 10-24 02:26:02] {2218} INFO - iteration 0, current learner lgbm
[flaml.automl.logger: 10-24 02:26:02] {2344} INFO - Estimated sufficient time budget=1405s. Estimated necessary time budget=32s.
[flaml.automl.logger: 10-24 02:26:02] {2391} INFO -  at 0.2s,	estimator lgbm's best error=0.0448,	best estimator lgbm's best error=0.0448
[flaml.automl.logger: 10-24 02:26:02] {2218} INFO - iteration 1, current learner lgbm
[flaml.automl.logger: 10-24 02:26:02] {2391} INFO -  at 0.3s,	estimator lgbm's best error=0.0448,	best estimator lgbm's best error=0.0448
[flaml.automl.logger: 10-24 02:26:02] {2218} INFO - iteration 2, current learner lgbm
[flaml.automl.logger: 10-24 02:26:02] {2391} INFO -  at 0.4s,	estimator lgbm's best error=0.0078

INFO:flaml.tune.searcher.blendsearch:No low-cost partial config given to the search algorithm. For cost-frugal search, consider providing low-cost values for cost-related hps via 'low_cost_partial_config'. More info can be found at https://microsoft.github.io/FLAML/docs/FAQ#about-low_cost_partial_config-in-tune


[flaml.automl.logger: 10-24 02:27:02] {2391} INFO -  at 60.2s,	estimator lrl1's best error=0.5259,	best estimator rf's best error=0.0009




[flaml.automl.logger: 10-24 02:27:02] {2627} INFO - retrain rf for 0.1s
[flaml.automl.logger: 10-24 02:27:02] {2630} INFO - retrained model: RandomForestClassifier(criterion='entropy', max_features=0.3280876938798049,
                       max_leaf_nodes=14, n_estimators=10, n_jobs=-1,
                       random_state=12032022)
[flaml.automl.logger: 10-24 02:27:02] {1930} INFO - fit succeeded
[flaml.automl.logger: 10-24 02:27:02] {1931} INFO - Time taken to find the best model: 14.89856767654419
Accuracy: 1.00


In [11]:
# Get the best model found by FLAML
best_model = automl.model
print(best_model)

<flaml.automl.model.RandomForestEstimator object at 0x7e8d9899bd00>


In [10]:
# Count the occurrences of each phase
phase_counts = {}
for phase in phases:
    phase_counts[phase] = phase_counts.get(phase, 0) + 1

# Print results
print('Occurrences of each phase:')
for phase, count in phase_counts.items():
    print(f'{phase}: {count}')

Occurrences of each phase:
Hover In Ground Effect: 253
LandingOrTakeOff: 61
Standing: 220
Hover Descent: 2
Climb: 210
Cruise: 421
Descent: 236
Hover Lift: 2
