# Data Prediction and Machine Learning

### Data Prediction

1.  **Time Series Forecasting:**
    
    *   **ARIMA (AutoRegressive Integrated Moving Average):** For forecasting future values of each variable based on its own past values.
    *   **SARIMA (Seasonal ARIMA):** An extension of ARIMA that supports univariate time series data with a seasonal component.
    *   **Prophet:** Developed by Facebook, it's useful for forecasting with daily observations that display patterns on different time scales.
2.  **Machine Learning Models:**
    
    *   **Regression Models:** Linear Regression, Polynomial Regression for predicting continuous values.
    *   **Random Forest and Gradient Boosting Machines (GBM):** For capturing non-linear relationships in the data.
    *   **Neural Networks:** LSTM (Long Short-Term Memory) networks are especially good for sequences like time series.
3.  **Multivariate Time Series Forecasting:**
    
    *   **Vector AutoRegression (VAR):** Models the relationship between multiple variables and their lagged values.
    *   **Multivariate LSTM:** A deep learning approach to handle multiple inputs for forecasting.
4.  **Evaluation:**
    
    *   Use metrics like MAE (Mean Absolute Error), RMSE (Root Mean Square Error), and MAPE (Mean Absolute Percentage Error) to evaluate the performance of your forecasting models.

### Tools and Libraries

*   **Pandas:** For data manipulation and analysis.
*   **NumPy:** For numerical computations.
*   **Matplotlib** and **Seaborn:** For data visualization.
*   **Statsmodels:** For implementing statistical models.
*   **Scikit-learn:** For machine learning models.
*   **TensorFlow** or **Keras:** For deep learning models.

### Steps to Get Started

1.  **Preprocess the Data:** Clean the data by handling missing values, outliers, and normalizing or standardizing the values if necessary.
2.  **Perform EDA:** Use visualization and statistical analysis to understand the data.
3.  **Model Selection:** Based on EDA, choose appropriate models for forecasting.
4.  **Model Training:** Train the model on historical data.
5.  **Model Evaluation:** Evaluate the model's performance using appropriate metrics.
6.  **Forecasting:** Use the model to make predictions.

In [1]:
# Helper Function
# Code to check if seaborn is installed and install it via pip if it is not installed
import subprocess
import pkg_resources
import sys

def install_package(package_name):
    try:
        # Check if the package is already installed
        pkg_resources.get_distribution(package_name)
        print(f"{package_name} is already installed.")
    except pkg_resources.DistributionNotFound:
        # If the package is not installed, install it using pip
        print(f"{package_name} is not installed, installing now...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package_name])
        print(f"{package_name} has been successfully installed.")


In [4]:
install_package('seaborn')
install_package('statsmodels')
install_package('prophet')
install_package('matplotlib')

seaborn is already installed.
statsmodels is already installed.
prophet is already installed.
matplotlib is already installed.


### 1\. ARIMA Forecasting

To forecast using ARIMA, call arima_forecast('data01.csv', 'Temperature', (p,d,q)) where (p,d,q) are the ARIMA model parameters for the Temperature variable.

In [14]:
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
import matplotlib.pyplot as plt

def arima_forecast(filename, variable, order):
    df = pd.read_csv(filename, parse_dates=['Lastupdatetime'], dayfirst=True)
    df.set_index('Lastupdatetime', inplace=True)
    df = df.asfreq('m')
    
    model = ARIMA(df[variable], order=order)
    model_fit = model.fit()
    
    df['forecast'] = model_fit.predict(start=len(df), end=len(df)+5, dynamic=True)
    df[[variable, 'forecast']].plot(figsize=(12, 8))
    
    plt.title(f'ARIMA Forecast for {variable}')
    plt.show()


In [15]:
# arima_forecast('data01.csv', 'Temperature', (p,d,q))
#arima_forecast('data02.csv', 'Temperature', (2,1,2))
arima_forecast('data01.csv', 'Temperature', (1, 1, 1))

  df = df.asfreq('30m')


ValueError: cannot reindex on an axis with duplicate labels

### 2\. SARIMA Forecasting

For SARIMA, use sarima_forecast('data01.csv', 'Temperature', (p,d,q), (P,D,Q,s)) where (P,D,Q,s) are the seasonal components of the SARIMA model.

### 3\. Prophet Forecasting