Skip to content

lxbanov/big_data_project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TESLA Option Prediction Pipeline

This repository contains a pipeline developed to build regressor models and predict strike prices of options. The pipeline consists of data preprocessing, data loading, model construction and visualization. We used a Hadoop ecosystem: SparkML, HIVE, Sqoop, PostgreSQL, Streamlit to create this pipeline.

Dataset

The dataset used comprises of several millions options published on the market in the period from 2019 to 2022. The description of the dataset can be found here: https://www.optionsdx.com/option-chain-field-definitions/. And the source dataset is available here: https://www.optionsdx.com/option-chain-field-definitions/.

Pipeline

To run pipeline you might run the following script:

bash main.sh

That will run the whole pipeline stage by stage. Intermediate outputs will be available in the folder output. After running the whole pipeline a Streamlit application would be run and available for you to explore insights and results

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •