---
title: "Using Tensor Flow for Stock Price Prediction: A Practical Guide"
author: "Iván de Luna-Aldape"
date: "3/9/2025"
categories:
    - machine-learning
    - tutorial
freeze: true
draft: true
---

## Introduction to TensorFlow

TensorFlow is an open-source machine learning framework developed by Google. It is widely used for building and training machine learning models, particularly deep learning models. TensorFlow provides a comprehensive ecosystem of tools, libraries, and community resources that make it easier to develop and deploy machine learning solutions.

### Key Features of TensorFlow

- **Flexibility**: TensorFlow supports both high-level APIs (like Keras) and low-level operations, making it suitable for beginners and experts alike.
- **Scalability**: It can run on CPUs, GPUs, and even distributed systems for large-scale training.
- **Ecosystem**: TensorFlow offers extensions like TensorFlow Like (for mobile devices), TensorFlow.js (for browser-based applications), and TensorFlow Extended (for production pipelines).

### Applications of TensorFlow

Tensor Flow is used in a wide range of domains, including:
- **Computer Visio**: image classification, object detection, and facial recognition.
- **Natural Language Processing (NLP)**: Text generation, sentiment analysis, and language translation.
- **Finance and Business**: Stock price prediction, fraud detection, customer churn analysis, and sales forecasting.
- **Healthcare**: Disease diagnosis, medical image analysis, and drug discovery.

## Stock Price Prediction Using TensorFlow

Stock price prediction is a classical example and a challenging task due to the volatile and unpredictable nature of financial markets. However, machine learning models, particularly deep learning models like **Long Short-Term Memory (LSTM) networks**, can capture temporal patterns in historical data to make informed predictions.

#### Step 1: Downloading Data from Yahoo Finance

To get started, we will use historical stock price data from Yahoo Finance. The `yfinance` library in Python makes it easy to download this data.

When downloading data from yfinance, it includes columns like `Date`, `Open`, `High`, `Low`, `Close`, and `Volume`, from which usually for forecasting the `Close` data is used.

In [1]:
import yfinance as yf

# Download historical stock data for Apple Inc. (AAPL)
data = yf.download('AAPL', start='2010-01-01', end='2025-01-01')

YF.download() has changed argument auto_adjust default to True


[*********************100%***********************]  1 of 1 completed


In case `yfinance` shows an error about not donwloading the required data, try updating the library using the following command in the terminal:

```Python
pip install --upgrade yfinance
```

### Step 2: Preprocessing the Data

Before feeding the data into a model, we need to preprocess it. This involves the following actions:

- **Normalization**: Scaling the data to a range of 0 to 1 to improve model performance.
- **Sequence Creation**: Creating input-output pairs where the input is a sequence of historical prices, and the output is the next day's price.

In order to normalize and create sequences, we need first to load the data, specifically the Closing prices, and then we convert it into a 2D array using the `reshape()` method.

In [2]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

# Load the data
prices = data['Close'].values.reshape(-1,1)


Next we perform the normalization of the data using the `MinMaxScaler`, in which we scale the Closing prices to a range of 0 to 1. This is important because the LSTMs and other neural networks perform better with normalized data.

In [4]:
# Normalize the data
scaler = MinMaxScaler(feature_range=(0,1))
# When loading the data, the first value is the ticker name, in this case 'AAPL', and the second value is a NULL or Missing.
scaled_prices = scaler.fit_transform(prices[2:])


Now we define a function `create_sequences` to create input-output pairs of the LSTM model. 

For each day, the input `X` is a sequence of the previous `seq_length` days' prices, and the output `y` is the price of the next day.For example, if the `seq_length = 60`, the model will use 60 days of historical data to predict the 61st day's price.

Finally, we split the data into training and testing sets, with 80% of the data used for training and 20% for testing.

In [5]:
# Create sequences for LSTM
def create_sequences(data, seq_length):
    X, y = [], []
    for i in range(len(data) - seq_length):
        X.append(data[i:i+seq_length])
        y.append(data[i+seq_length])
    return np.array(X), np.array(y)

# Use 60 days of data to predict the next day
seq_length = 60
X, y = create_sequences(scaled_prices, seq_length)

# Split into training and testing sets
split = int(0.8 * len(X))
X_train, X_test = X[:split], X[split:]
y_train, y_test = y[:split], y[split:]

### Step 3: Building and Training the LSTM Model

LSTM networks are a type of Recurrent Neural Network (RNN) that are well-suited for time-series data.

1. **Model Architecture**

We use the `Sequential`API to build the model layer by layer. The first LSTM layer has 50 units and returns sequences, which is necessary when stacking LSTM layers.

The second LSTM layer also has 50 units but does not return sequences.

Finally, two dense layers are added to produce the final output. The last dense layer has 1 unit, which corresponds to the predicted stock price.

In [None]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense

# Build the LSTM Model
model = Sequential([
    LSTM(50, return_sequences=True, input_shape=(seq_length, 1)),
    LSTM(50, return_sequences=False),
    Dense(25),
    Dense(1)
])

2. **Compilation**

We compile the model using the Adam optimizer, which is a popular choice for training neural networks.

The loss function is set to `mean_squared_error`, which measures the difference between the predicted and actual stock prices.

In [1]:
# Complie the model
model.compile(optimizer='adam', loss='mean_squared_error')

NameError: name 'model' is not defined

3. **Training**

The model is trained on the training data (`X_train` and `y_train`) for 20 epochs. The batch size is set to 32, meaning the model updates its weights after processing 32 samples. Lastly, the `validation_data`parameter allows us to evaluate the model on the test set after each epoch.

In [None]:
# Train the model
model.fit(X_train, y_train, batch_size=32, epochs=20, validation_data=(X_test, y_test))

### Step 4: Making Predictions

Once the model is trained, we can use it to predict future stock prices and visualize the results.

In [None]:
# Make predictions
predictions = model.predict(X_test)
# Reverse normalization
predictions = scaler.inverse_transform(predictions)
# Reverse normalization for actual values
y_test_actual = scaler.inverse_transform(y_test)

In [None]:
# Visualize the results
import matplotlib.pyplot as plt

plt.figure(figsize=(14,5))
plt.plot(y_test_actual, color = 'blue', label = 'Actual Stock Price')
plt.plot(predictions, color='red', label='Predicted Stock Price')
plt.title('Stock Price Prediction')
plt.xlabel('Time')
plt.ylabel('Stock Price')
plt.legen()
plt.show()

### Considerations and Key Insights