This project involves analyzing high-frequency trading data to identify patterns, anomalies, and insights. The analysis includes data preprocessing, exploratory data analysis (EDA), model training, impact analysis, and visualization. The project leverages Python libraries such as Pandas, Matplotlib, Seaborn, and Scikit-learn. Note that data from the AlphaVantage API is VERY unreliable, and as seen in the AAPL example in prediction.ipynb. This is just an example of a usage of such prediction function.
- Fetch Data: Use the Alpha Vantage API to fetch intraday trading data for multiple companies.
- Load and Preprocess Data: Convert date columns to datetime, normalize closing prices, and create lagged features.
- Combine Data: Combine data from multiple companies into a single dataset for training.
- Basic Statistics: Calculate and display basic statistics for the data.
- Time Series Plots: Plot the raw closing prices and trading volumes for each symbol.
- Distribution Analysis: Analyze the distribution of closing prices and trading volumes.
- Correlation Analysis: Calculate and visualize the correlation matrix for each symbol.
- Train-Test Split: Split the combined data into training and testing sets.
- Train Models: Train multiple models, including Linear Regression, Decision Tree, Random Forest, and Gradient Boosting.
- Evaluate Models: Evaluate the performance of each model on the test set.
- Volatility Analysis: Calculate and visualize the rolling volatility of the stock prices.
- Price Change and Volume Relationship: Analyze the relationship between price changes and trading volume.
- Feature Importance: Identify and visualize the importance of different features in the model.
To predict tomorrow's closing price for a specific company, follow these steps:
- Prepare the Latest Data: Ensure you have the most recent trading data available for the specific company.
- Train the Model: Follow the steps in the
model_training.ipynb
notebook to train the model on historical data for the specific company. - Make Predictions: Use the trained model to predict the next day's closing price based on the most recent data.
- Clone the repository.
- Install the required packages.
- Run the fetch_data.py by doing:
python scripts/fetch_data.py
(Note that you may change the symbols to change which companies data you use to train the model. Make sure that symbols in fetch_data.py contains all company tickers, data_processing.ipynb contain all training company tickers, and model_training.ipynb contain the one testing company ticker.)
-
Run the notebooks in the following order:
- data_preprocessing.ipynb
- EDA.ipynb
- model_training.ipynb
- impact_analysis.ipynb
-
Run the prediction notebook if desired:
- prediction.ipynb
This project provides a comprehensive analysis of high-frequency trading data, including data preprocessing, EDA, model training, impact analysis, and visualization. The insights gained from this analysis can help in understanding market behavior and making informed trading decisions.