TESLA Option Prediction Pipeline

This repository contains a pipeline developed to build regressor models and predict strike prices of options. The pipeline consists of data preprocessing, data loading, model construction and visualization. We used a Hadoop ecosystem: SparkML, HIVE, Sqoop, PostgreSQL, Streamlit to create this pipeline.

Dataset

The dataset used comprises of several millions options published on the market in the period from 2019 to 2022. The description of the dataset can be found here: https://www.optionsdx.com/option-chain-field-definitions/. And the source dataset is available here: https://www.optionsdx.com/option-chain-field-definitions/.

Pipeline

To run pipeline you might run the following script:

bash main.sh

That will run the whole pipeline stage by stage. Intermediate outputs will be available in the folder output. After running the whole pipeline a Streamlit application would be run and available for you to explore insights and results

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
config		config
data		data
models		models
notebooks		notebooks
output		output
scripts		scripts
sql		sql
.gitignore		.gitignore
README.MD		README.MD
main.sh		main.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TESLA Option Prediction Pipeline

Dataset

Pipeline

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

lxbanov/big_data_project

Folders and files

Latest commit

History

Repository files navigation

TESLA Option Prediction Pipeline

Dataset

Pipeline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages