# SAMIRA Process Hardware Requirements

## Overview

The SAMIRA (Statistical Analysis and Machine Learning Integrated Research Application) process involves complex computations that demand significant computational resources. To ensure the accurate and efficient execution of the SAMIRA process, it's essential to evaluate the hardware requirements.

## Current Challenges

- **Computational Intensity:** SAMIRA involves various data preprocessing, modeling, and evaluation steps that can be computationally intensive, especially with large datasets.
  
- **Memory Usage:** Some stages in the SAMIRA process might require substantial memory, especially when handling large data structures or when applying specific algorithms.

- **Parallel Processing:** Although SAMIRA can benefit from parallel processing, our current hardware might not fully support or optimize parallel tasks, leading to suboptimal performance.

## Recommendations

1. **Upgrade Hardware:** Consider investing in more advanced hardware with:
   - Faster CPUs with more cores to enhance parallel processing capabilities.
   - Increased RAM to handle large datasets and complex computations without memory bottlenecks.
   - Efficient GPU support if certain machine learning algorithms can leverage GPU acceleration.
   
2. **Extended Runtime:** If hardware upgrades are not immediately feasible, it's essential to understand that the SAMIRA process might require longer runtimes to complete on the existing hardware. It's crucial to allocate sufficient time for the process to ensure accurate results.

3. **Cluster Computing:** Consider leveraging cluster computing or cloud-based solutions that allow scaling of resources based on the computational demand of the SAMIRA process.

## Conclusion

To carry out the SAMIRA process effectively, it's imperative to either upgrade the hardware to meet the computational demands or allocate longer runtimes on the current setup. Proper planning and resource allocation are crucial to achieving the best results from the SAMIRA process.


In [1]:
import os
import pandas as pd
from statsmodels.tsa.statespace.sarimax import SARIMAX
from itertools import product
from joblib import Parallel, delayed, Memory
import gc

train_data_path = os.path.join('training_data.csv')
output_path = os.path.join('sarima_parameters.txt')


train_data = pd.read_csv(train_data_path)
energy_data_train = train_data['Building Semtex OFFICE (kWh)']

# Define the parameter grid for SARIMA
p = [2]
d = [1]
q = range(0, 6)
P = D = Q = [2]
S_values = [24, 12, 168]

pdq = list(product(p, d, q))
seasonal_pdq = [comb for S in S_values for comb in list(product(P, D, Q, [S]))]

# Setup joblib Memory caching to a specified folder
memory = Memory("sarima_cache", verbose=0)

In [1]:

@memory.cache
def fit_sarima(param, param_seasonal):
    try:
        model = SARIMAX(energy_data_train, order=param, seasonal_order=param_seasonal, enforce_stationarity=False,
                        enforce_invertibility=False)
        results = model.fit()
        return {
            "Parameters": param,
            "Seasonal Parameters": param_seasonal,
            "AIC": results.aic,
            "BIC": results.bic,
            "HQIC": results.hqic,
            "Log Likelihood": results.llf,
            "Converged": results.mle_retvals['converged'],
            "Num Iterations": results.mle_retvals['iterations'],
            "Summary": str(results.summary())
        }
    except Exception as e:
        return {
            "Parameters": param,
            "Seasonal Parameters": param_seasonal,
            "Error": str(e)
        }


f = open(output_path, 'w')

batch_size = 10
all_results = []

for i in range(0, len(pdq), batch_size):
    for j in range(0, len(seasonal_pdq), batch_size):
        batch_pdq = pdq[i:i + batch_size]
        batch_seasonal_pdq = seasonal_pdq[j:j + batch_size]

        n_jobs = 16
        batch_results = Parallel(n_jobs=n_jobs, pre_dispatch='2*n_jobs', max_nbytes='1M')(
            delayed(fit_sarima)(param, param_seasonal) for param in batch_pdq for param_seasonal in batch_seasonal_pdq)

        all_results.extend(batch_results)

        # Write and print each batch result immediately after computation
        for result in batch_results:
            for key, value in result.items():
                if key != "Summary":
                    f.write(f"{key}: {value}\n")
                    print(f"{key}: {value}")
            f.write(f"\n{result['Summary']}\n")
            f.write("="*80 + "\n\n")  # Separator
            print(result['Summary'])
        f.flush()  # Make sure data is written to disk

        print(
            f"Completed batch {i // batch_size + 1}-{j // batch_size + 1}/{len(pdq) // batch_size}-{len(seasonal_pdq) // batch_size}")
        gc.collect()

results_sorted = sorted(all_results, key=lambda x: x["AIC"])
best_result = results_sorted[0]

f.write(f"\nBest SARIMA Parameters: {best_result['Parameters']}\n")
f.write(f"Best Seasonal Parameters: {best_result['Seasonal Parameters']}\n")
f.write(f"Best AIC Value: {best_result['AIC']}\n\n")
f.close()

print(f"Best SARIMA Parameters: {best_result['Parameters']}")
print(f"Best Seasonal Parameters: {best_result['Seasonal Parameters']}")
print(f"Best AIC Value: {best_result['AIC']}\n")
print(f"Optimal parameters and all optimization outputs saved to {output_path}")


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            7     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.75543D+00    |proj g|=  1.44396D-01


 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =           10     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.82802D+00    |proj g|=  6.44068D-01


 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            8     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.14713D+00    |proj g|=  2.14455D-01
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            9     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.14266D+00    |proj g|=  2.13488D-01


 This problem is unconstrained.
 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            8     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.75368D+00    |proj g|=  1.44598D-01


 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            9     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.75200D+00    |proj g|=  1.44919D-01


 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            7     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.15507D+00    |proj g|=  2.12401D-01


 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =           11     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.65974D+00    |proj g|=  1.59582D-01


 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =           12     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.14072D+00    |proj g|=  2.13729D-01


 This problem is unconstrained.


RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =           10     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.14338D+00    |proj g|=  2.13369D-01


 This problem is unconstrained.



At iterate    5    f=  2.66158D+00    |proj g|=  1.54085D-01
RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =           11     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.14205D+00    |proj g|=  2.13632D-01


 This problem is unconstrained.


KeyboardInterrupt: 