Big Data Pipeline (Sense Stock)

The project aims to build a big data pipeline for our business idea Stock sense where we Aimed to provide stock data along with social media data in order to help in the invetment providing better sense to investment. The detailed Idea can be seen here - Stock Sense Slides

To install the requirments run

pip install -r requirments.txt

The Data Pipeline :

------------------------------------------------------------------------------------------------------------------

The high level view of Phase one data pipeline is indicated below -

1. Setup Data Sources

1. Twitter API

Setup The twitter developer account to access the API keys, It Can be done from create deverloper account

Detailed Video - How to setup Tweet

2. Stock API

clone the repo

https://github.com/vsjha18/nsetools

nsetools - provides the real-time stock market data for National Stock Exchange, India. Further details can be found here - nsedocs

3. Companies List

We used the top 25 companies listed here in NSE Top 100 We have provided the csv file with top 25 companies listed for NSE.

Any granulrities can be adapted when scalling the project

2. Setup Data Collectors (DataBases/File Systems)

We are using the Database presented in Virtual Machine provided by UPC on cloud Systems. So the Databases are already setted up in VM. We are using the following Databases in our pipeline.

A. HDFS

To start server on VM use

/home/bdm/BDM_Software/hadoop/sbin/start-dfs.sh

B. MongoDB

a. modify the libcurl package using the below commands

sudo apt-get remove libcurl4
sudo apt-get install libcurl3

b. Open tmux session in detached mode and keep running mongoDB server on backend

tmux new -s mongodb
BDM_Software/mongodb/bin/mongod --bind_ip_all --dbpath /home/bdm/BDM_Software/data/mongodb_data/

3. Running Instructions

1. clone the repo

git clone https://github.com/himanshudce/sense-stock-bdm

2. run the below files to get the data locally

a. For tweets (place your API keys)

python3 tw_load_local.py

** for ease, I have provided my API keys for tweets

b. For Stock data

python3 st_load_local.py

After loading it will give the following file structure

3. Load data in mongo and hdfs

a. To load tweet data in mongo

python3 tw_load_mongo.py

b. To load stock data in hdfs

python3 st_load_hdfs.py

Note

The above files run automatically for each day when stock market gets open from 9 to 4 IST(Indian Standart time, tz = Asia/Kolkata)

We run these files in VM using crontab in unix environment, which execute the file every day accordint to pattern. Crontab

Below are the instruction to setup. Simply type

crontab -e

to edit the file in any mode (nano,vim,etc)

0 4-11 * * * python3 /home/bdm/tw_load_local.py >> /home/bdm/logs/tw_logs.log;
0 12 * * * python3 /home/bdm/tw_load_mongo.py >> /home/bdm/logs/tw_mongo_logs.log;
*/1 4-11 * * * python3 /home/bdm/st_load_local.py >> /home/bdm/logs/st_local_logs.log;
0 12 * * * python3 /home/bdm/st_load_hdfs.py >> /home/bdm/logs/st_hdfs_logs.log

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data_sources		data_sources
nsetools		nsetools
src		src
readme.md		readme.md
st_load_hdfs.py		st_load_hdfs.py
st_load_local.py		st_load_local.py
top_25NSE.csv		top_25NSE.csv
tw_load_local.py		tw_load_local.py
tw_load_mongo.py		tw_load_mongo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Big Data Pipeline (Sense Stock)

1. Setup Data Sources

2. Setup Data Collectors (DataBases/File Systems)

A. HDFS

B. MongoDB

3. Running Instructions

Note

About

Releases

Packages

Languages

himanshudce/sense-stock-bdm

Folders and files

Latest commit

History

Repository files navigation

Big Data Pipeline (Sense Stock)

1. Setup Data Sources

2. Setup Data Collectors (DataBases/File Systems)

A. HDFS

B. MongoDB

3. Running Instructions

Note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages