# Rolling Window vs Expanding Window

The difference between the two implementations lies in the type of window they use to calculate the mean:

1. `stock_data['PLUG']['2019':'2024']['2. high'].rolling(window=20).mean()`: This line is using a rolling window to calculate the mean. A rolling window of size `n` means that for each point, it takes into account the `n` most recent points (including the current one). In this case, it's calculating a 20-day rolling mean of the '2. high' stock prices for 'PLUG' from 2019 to 2024.

2. `microsoft.High.expanding().mean()`: This line is using an expanding window to calculate the mean. An expanding window starts at the first point and includes all subsequent points. So for each point, it calculates the mean of all prior data. In this case, it's calculating the expanding mean of the 'High' stock prices for Microsoft.

In summary, a rolling mean is a moving average where the window size stays constant and moves along with the data, while an expanding mean includes more and more data points as it moves along the data.

Both rolling and expanding windows have their strengths and weaknesses, and the choice between them depends on the specific use case.

**Rolling Window:**

Strengths:
- It provides a "localized" view of the data, which can be useful for identifying short-term trends or patterns.
- It can smooth out short-term fluctuations, which can make it easier to see the underlying trend.
- It's more sensitive to recent changes because it only considers the most recent `n` data points.

Weaknesses:
- The choice of window size can significantly affect the results. A larger window will smooth out more fluctuations, but it might also smooth out important details.
- It doesn't consider all past data, so it might miss long-term trends.

**Expanding Window:**

Strengths:
- It considers all past data, so it can capture long-term trends.
- The mean from an expanding window can provide a "cumulative" view of the data, which can be useful for understanding the overall trend over time.

Weaknesses:
- It's less sensitive to recent changes because it considers all past data.
- It can be heavily influenced by extreme values in the early data, as these values are included in the mean calculation for all subsequent points.

In the context of stock price prediction, a rolling window might be more useful if you're interested in short-term trends or if the market conditions are changing rapidly. An expanding window might be more useful if you're interested in the long-term trend or if the market conditions are relatively stable.

# LSTM vs GRU

Choosing between Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) layers for a neural network depends on the specific requirements of your application, the nature of your dataset, and computational resources. Both are types of Recurrent Neural Network (RNN) architectures designed to capture dependencies in sequential data, but they have some key differences:

## LSTM (Long Short-Term Memory)
- **Architecture**: LSTMs have a more complex architecture with three gates (input, forget, and output gates). This allows them to better capture long-range dependencies and maintain a longer memory.
- **Performance**: They can provide higher accuracy in problems where the dataset has long-range temporal dependencies.
- **Computational Cost**: Due to their complexity, LSTMs generally require more computational resources and time to train.
- **Parameters**: They have more parameters to train, which can be a drawback in terms of computational efficiency and the risk of overfitting on smaller datasets.

## GRU (Gated Recurrent Unit)
- **Architecture**: GRUs are simpler with two gates (update and reset gates). This makes them easier to optimize and generally faster to train than LSTMs.
- **Performance**: They can perform equally well or even better than LSTMs on datasets where long-range dependencies are less important.
- **Computational Cost**: GRUs are computationally more efficient due to their simpler structure.
- **Parameters**: They have fewer parameters than LSTMs, which can be beneficial in terms of memory usage and training time, especially on smaller datasets.

## Which to Choose?
- **Dataset and Problem Complexity**: If your problem involves learning very long-range dependencies, an LSTM might be more suitable. For less complex problems or datasets where long-range dependencies are less critical, a GRU might be the better choice.
- **Computational Resources**: If you have limited computational resources, or if you need to train a model quickly, GRUs might be more practical.
- **Experimentation**: Often, the best way to decide is through empirical testing. In many cases, both LSTMs and GRUs can perform similarly, and other aspects of the network architecture or the training process might have a more significant impact on performance.

In summary, there's no definitive answer to which is better overall; it depends on the specifics of your task and constraints. In practice, it's advisable to try both architectures and compare their performance on your specific dataset.


### Hidden Dimension and Number of Layers

- **`hidden_dim`**: Defines the size of the hidden layer(s). Here, 32 units are chosen for the hidden layers, which determines the model's capacity to learn representations from the data. The choice of `hidden_dim` significantly impacts the model's ability to capture the intricacies within the data. A higher `hidden_dim` can allow the model to learn more complex patterns, but it also increases the risk of overfitting, where the model learns the training data too well, including its noise, leading to poor generalization on unseen data.

- **`num_layers`**: Sets the number of recurrent layers in the network. Using 2 layers here suggests a deeper model for capturing more complex patterns in the data. Additional layers can enable the model to learn hierarchical representations, which can be beneficial for complex problem domains. However, increasing the number of layers also increases the model's complexity and computational cost. It may lead to challenges in training, such as difficulties in optimizing the model and the risk of overfitting.

#### Benefits of Higher Hidden Units/Layers:
- Increased model capacity to capture complex patterns and relationships in the data.
- Potential for improved accuracy on complex problem domains.

#### Detriments of Higher Hidden Units/Layers:
- Higher risk of overfitting, especially if the training data is not sufficient to support the increased model complexity.
- Increased computational cost and memory usage, leading to longer training times.
- Potential for training difficulties, including slower convergence and the need for more sophisticated regularization techniques.

#### Benefits of Lower Hidden Units/Layers:
- Reduced risk of overfitting, making the model potentially more generalizable to unseen data.
- Lower computational cost and faster training, which can be particularly beneficial in resource-constrained environments or when rapid prototyping is required.

#### Detriments of Lower Hidden Units/Layers:
- Limited model capacity, which might hinder the model's ability to learn and represent complex patterns in the data.
- Potential underfitting, where the model fails to capture the underlying structure of the data, leading to poor performance on both training and unseen data.

Ultimately, the choice of `hidden_dim` and `num_layers` should be guided by the specific requirements of the task, the complexity of the data, and the available computational resources. Experimentation, along with validation on a separate dataset, is essential to finding the optimal configuration that balances model complexity with generalization ability.


## Comparing Plotly with Matplotlib/Seaborn for Data Visualization

When considering data visualization tools, Plotly and Matplotlib/Seaborn serve overlapping but distinct purposes. The choice between these libraries depends on specific project requirements, including the need for interactivity, the visualization environment, and the target audience.

### Advantages of Plotly:
- **Interactivity**: Plotly excels in creating interactive plots with features like zooming, panning, and tooltips, making it ideal for web applications and interactive reports.
- **Web Integration**: Designed with web integration in mind, Plotly facilitates embedding plots into web pages or Jupyter notebooks.
- **Advanced Chart Types**: Supports a wide range of chart types, including 3D charts and other advanced visualizations not readily available or easily implemented in Matplotlib/Seaborn.

### Advantages of Matplotlib/Seaborn:
- **Publication-Quality Figures**: Widely used for creating static, publication-quality figures suitable for academic papers, reports, and articles.
- **Fine-grained Control**: Offers detailed control over plot aspects, allowing for highly customized visualizations, albeit with potentially complex code.
- **Community and Support**: An older, more established library with a large user base and extensive community support, easing troubleshooting and finding examples.

### Considerations:
- **Learning Curve**: Transitioning between Plotly and Matplotlib/Seaborn may require adjusting to different syntaxes and visualization philosophies.
- **Environment and Context**: Matplotlib/Seaborn might be preferred for static reporting or academic publications, while Plotly could be more advantageous for interactive, web-based visualizations.
- **Performance**: For very large datasets, consider the performance of interactive Plotly charts in web browsers, though Plotly is generally efficient at handling large data volumes.

### Conclusion:
Plotly can replace Matplotlib/Seaborn in many cases, especially those requiring interactivity. However, due to the specific strengths of Matplotlib/Seaborn in static visualization and detailed customization, Plotly might not be a complete replacement. The choice often comes down to the specific needs of your project and the preferences of your team. It's common in data science to use both tools: Plotly for interactive visualizations and Matplotlib/Seaborn for static, publication-quality figures.


# Understanding Prediction Skew in Time Series Forecasting

When predictions skew to the right compared to actual values as the forecast horizon increases, it often indicates a lag effect in the predictive model. This lag effect can be due to several reasons:

1. **Autocorrelation in the Time Series**: If the time series data are autocorrelated, meaning past values are correlated with future values, models might pick up on this and essentially "chase" the data, leading to predictions that are always a step behind the actual values.

2. **Model Reactivity**: The model might be reacting to past trends and changes in the data with a delay. This lag is common in models that rely heavily on past data to make future predictions, such as time series forecasting models.

3. **Insufficient Learning from Short-Term Fluctuations**: If the model does not capture short-term fluctuations well, it might be overly reliant on longer-term trends, which could cause a delay in prediction adjustments relative to actual changes.

4. **Over-Smoothing**: Efforts to reduce noise and make the model more generalizable can sometimes lead to over-smoothing in the data preprocessing or in the model's learning process, resulting in predictions that are too "smooth" compared to the actual values.

5. **Forecast Horizon Impact**: As the forecast horizon increases, so does the uncertainty and complexity of accurately predicting future values. The model has to account for more variables and potential changes over a longer period, which can naturally lead to less accurate and more delayed predictions.

#### Addressing Prediction Skew

To mitigate this issue, consider the following approaches:

- **Incorporate More Recent Data**: Adjust the model to give more weight to recent observations, either through feature engineering or by using models that emphasize more recent data, like Exponential Smoothing or certain configurations of Recurrent Neural Networks (RNNs) such as LSTM and GRU.

- **Adjust the Model's Sensitivity**: Experiment with the model's parameters to make it more responsive to short-term changes, either by tuning hyperparameters or modifying the model architecture.

- **Use Ensemble Methods**: Combining multiple models can sometimes help address individual model weaknesses, potentially balancing out the lag.

- **Error Correction Models**: Consider using error correction mechanisms in your models, specifically designed for forecasting time series data, that adjust predictions based on the discrepancy between past predictions and actual observations.

- **Real-time Data Adjustment**: Implementing a system where predictions are regularly updated as new data become available can help keep forecasts aligned with the most recent trends.

Understanding and mitigating the lag in predictions, especially for longer forecast horizons, is crucial for improving model performance and making more accurate future predictions.


# Adding Images to Jupyter Cells

## 1. Using Markdown Cells
You can add images to a Markdown cell in a Jupyter Notebook using the following Markdown syntax:
```
![alt text](URL_or_path_to_image "Optional title")
```

- alt text is the alternative text that describes the image (useful for accessibility and when the image cannot be displayed).

- URL_or_path_to_image is the URL of the image if it's hosted online or the file path if the image is stored locally. For local files, the path is relative to the location of the Jupyter Notebook file.

- "Optional title" is an optional title for the image that will be displayed as a tooltip on hover. This part is optional and can be omitted.

Example for an online image:

```
![Stock Prediction Graph](https://example.com/path/to/image.png "Stock Prediction")
```

Example for a local image:

```
![Stock Prediction Graph](./images/stock_prediction.png "Stock Prediction")
```

## 2. Using HTML in Markdown Cells

You can also use HTML tags to embed images in Markdown cells, which gives you more control over the image's appearance (size, alignment, etc.):

```
<img src="URL_or_path_to_image" alt="alt text" title="Optional title" width="300"/>
```

## 3. Using IPython.display in Code Cells


If you want to add images programmatically or as part of executing Python code, you can use the IPython.display module:

```
from IPython.display import Image
Image(url="https://example.com/path/to/image.png", width=300, height=200)
```

Or for a local file:

```
from IPython.display import Image
Image(filename='./images/stock_prediction.png', width=300, height=200)
```

This method is particularly useful when you're generating images or charts programmatically and want to display them inline in your notebook.

### Note on Using Local Images
When using local images, make sure the images are located in a directory that's accessible from the Jupyter Notebook's running directory. A common approach is to place images in a subdirectory (like ./images/) relative to your notebook. This way, you ensure that your notebook remains portable and the images can be accessed correctly as long as the relative structure is maintained.

### Displaying Plots Directly
For plots generated using libraries like matplotlib, seaborn, or plotly, you typically don't need to save and then load the image. Instead, you can display the plot directly in the notebook by creating the plot in a code cell. Most plotting libraries are designed to integrate seamlessly with Jupyter Notebooks, displaying the generated plots inline automatically.


# Statistical Methods to Explore

While RMSE is a popular metric for quantifying the difference between predicted and actual values, especially in regression and forecasting tasks, it primarily measures the magnitude of errors without necessarily providing insights into the direction of errors or how well the model captures trends. Here are other metrics and methods you can use to evaluate the predictive performance of your model more comprehensively:

1. **Mean Absolute Error (MAE)**: This metric provides the average absolute difference between predicted and actual values, offering a more straightforward interpretation than RMSE. It's less sensitive to outliers than RMSE.

    ```python
    from sklearn.metrics import mean_absolute_error
    mae = mean_absolute_error(y_true, y_pred)
    ```

2. **Mean Absolute Percentage Error (MAPE)**: This is useful for comparing the prediction accuracy of different models or datasets with different scales. It expresses error as a percentage of the actual values, making it easy to interpret, especially for business stakeholders.

    ```python
    mape = np.mean(np.abs((y_true - y_pred) / y_true)) * 100
    ```

3. **R-squared (Coefficient of Determination)**: This measures how well the future outcomes are likely to be predicted by the model. It provides a sense of the quality of the model in explaining the variance in the data.

    ```python
    from sklearn.metrics import r2_score
    r_squared = r2_score(y_true, y_pred)
    ```

4. **Explained Variance Score**: This metric measures the proportion of variance in the dependent variable that is predictable from the independent variables. It's similar to R-squared but accounts for the mean of the data.

    ```python
    from sklearn.metrics import explained_variance_score
    explained_variance = explained_variance_score(y_true, y_pred)
    ```

5. **Directional Accuracy**: Beyond magnitude-based metrics, it's often useful to evaluate whether your model predicts the correct direction of change. This can be calculated as the percentage of times the model correctly predicts the direction of movement (up or down) in the time series.

    ```python
    directional_accuracy = np.mean(np.sign(y_pred[1:] - y_pred[:-1]) == np.sign(y_true[1:] - y_true[:-1])) * 100
    ```

6. **Profitability Metrics (if applicable)**: In financial applications, the ultimate goal may be to maximize profit rather than minimize error. Metrics like total return, Sharpe ratio, or drawdown can be more relevant in these contexts.

Remember, no single metric can capture all aspects of a model's performance. It's often beneficial to evaluate models using multiple metrics to gain a comprehensive understanding of their strengths and weaknesses. Additionally, visualizing predictions vs. actual values can provide qualitative insights into how well the model captures trends and reacts to changes in the data.


# Financial metrics to explore

To directly evaluate how much profit or loss a model has generated for you, especially in the context of trading or investment strategies, traditional error metrics like RMSE, MAE, or R-squared might not be as informative. Instead, focusing on **profitability metrics** specifically designed to measure financial performance can give you a clearer picture. Here are a few key metrics:

1. **Total Return**: This metric calculates the total profit or loss generated by your model over a certain period as a percentage of the investment. It includes both capital gains and dividends (if any) in the calculation.

    ```python
    total_return = ((final_portfolio_value - initial_portfolio_value) / initial_portfolio_value) * 100
    ```

2. **Sharpe Ratio**: The Sharpe ratio measures the performance of an investment compared to a risk-free asset, after adjusting for its risk. It's a good way to understand how much excess return you are receiving for the extra volatility that you endure for holding a riskier asset.

    ```python
    sharpe_ratio = (mean_portfolio_return - risk_free_rate) / standard_deviation_of_portfolio_return
    ```

3. **Maximum Drawdown**: This measures the largest single drop from peak to bottom in the value of a portfolio, before a new peak is achieved. It's an indicator of downside risk over a specified time period.

    ```python
    drawdown = ((peak_value - trough_value) / peak_value) * 100
    ```

4. **Sortino Ratio**: Similar to the Sharpe ratio but only considers downside volatility. This is useful for investors who are only worried about declines in their investment value.

    ```python
    sortino_ratio = (mean_portfolio_return - risk_free_rate) / standard_deviation_of_negative_asset_returns
    ```

5. **Cumulative Returns**: Cumulative returns are the total change in the investment price over a set timeâ€”an aggregated return on an investment.

    ```python
    cumulative_returns = (current_price / initial_price) - 1
    ```

These metrics are particularly relevant if you're using your model to make investment decisions. They not only account for the profitability of the investments but also adjust for risk, providing a more nuanced view of the model's performance.

It's also essential to backtest your model against historical data to see how it would have performed in the past. This can give you insights into potential future performance and help you understand the risk/return profile of your investment strategy.


# Summary of "Stock Price Prediction using Dynamic Neural Networks"

This summary provides an overview of the research paper that explores the application of dynamic neural networks, specifically the Non-linear Autoregressive with Exogenous Inputs (NARX) model, for predicting daily closing stock prices. The study demonstrates the potential of neural networks to identify patterns within chaotic and non-linear data, making them highly suitable for financial market predictions.

## Paper URL
[https://arxiv.org/pdf/2306.12969.pdf](https://arxiv.org/pdf/2306.12969.pdf)

## Key Points

### Introduction and Background
- The paper discusses the capabilities of neural networks in identifying patterns in chaotic and non-linear data, which are common in stock market price movements.
- It contrasts traditional stock analysis methods and the Efficient Market Hypothesis (EMH) with Chaos theory, suggesting neural networks support the latter by refuting the former.

### Methodology
- **Algorithm**: Utilizes the Levenberg-Marquardt algorithm for efficient training of the neural network.
- **Network Architecture**: Employs a NARX network, known for its accuracy in modeling non-linear time series data. The architecture includes a two-layer feedforward network with specific transfer functions in the hidden and output layers.
- **Data Acquisition**: Uses four years of stock data from finance.yahoo.com, prepared for the network using MATLAB's `preparets` function.
- **Training, Testing, and Validation**: The network undergoes training with 70% of the data, with the remainder split equally for validation and testing. The MSEREG performance function is used to enhance generalizations.

### Results
- **Model Training and Validation**: Demonstrates high efficiency with an RSE of 0.998 and an MSE of 0.0242, indicating a close match between training targets and outputs.
- **Simulation and Prediction**: After training, the network is converted to a parallel (closed) network for predicting target prices for 100 days, showing minimal errors with a maximum error difference of 1.17%.

### Conclusion
- Neural networks can effectively predict stock prices by learning patterns in large datasets, supporting Chaos theory over EMH. They minimize randomness and make accurate predictions beyond human capabilities.

### Recommendations
- Suggests preprocessing input data using techniques like Independent Component Analysis or filtering the input data using an extended Kalman filter for improved predictions.

## Exogenous Inputs
The exogenous (external) inputs for the NARX model were:
- Opening price
- High price
- Low price
- Volume

These inputs, along with the primary target (closing price), enabled the model to incorporate both internal dynamics and external market factors into its predictions.

This comprehensive study showcases the potential of neural networks in financial market predictions, offering valuable insights and suggesting further research avenues to enhance prediction accuracy.


# Non-linear Autoregressive with Exogenous Inputs (NARX) Model

The **Non-linear Autoregressive with Exogenous Inputs (NARX) model** is a type of neural network used for time series forecasting, designed to handle situations where the future value of a variable not only depends on its own previous values but also on external inputs. This makes the NARX model particularly useful in scenarios where external factors significantly influence the system you're trying to model.

## Structure
The structure of a NARX model combines non-linear autoregression with the incorporation of exogenous (external) inputs, making it powerful in capturing complex patterns and relationships that might not be apparent or linearly correlated.

A NARX model is defined by the following equation:
$$ y(t+1) = f(y(t), y(t-1), ..., y(t-n_y), x(t), x(t-1), ..., x(t-n_x)) $$

where:
- $y(t+1)$ is the predicted value of the series at time $t+1$,
- $f$ is a non-linear function (often realized through a neural network),
- $y(t), y(t-1), ..., y(t-n_y)$ are past values of the target series up to lag $n_y$,
- $x(t), x(t-1), ..., x(t-n_x)$ are past values of the exogenous inputs up to lag $n_x$.

## Implementation
This model can be implemented using various types of neural networks, including feedforward neural networks, recurrent neural networks (RNNs), and more sophisticated architectures like Long Short-Term Memory (LSTM) networks, depending on the complexity of the time series and the task at hand.

## Applications
NARX models have been successfully applied in various fields such as finance, meteorology, and engineering, where forecasting accuracy is crucial and the influencing factors are not solely internal to the series being forecasted.
