DATA-PIPELINE USING AWS SERVICES

Objective

This project deals with collecting real-time twitter data on COVID-19 topic, processing these tweets after mapping the attributes to the desired columns and then storing in the data for further analysis.

Dependencies

boto3

  pip install boto3

tweepy

  pip install tweepy

AWS CLI

Cloud services used in this project:

Kinesis Firehose
AWS S3
AWS Glue
IAM (For role creation)
CloudWatch

How to use ?

Use tweetercred.py to store twitter developer account credentials.
Use buildFirehose_AWS.py which uses boto3 API to create a data delivery stream using Kinesis Firehose.
Use ingesttwitterdata.py to ingest data to data delivery stream.
Delete data delivery stream after the data has been successfully ingested to S3 bucket.
To set the credentials for AWS account run the below command on command prompt after installing AWS CLI.

Architecture

Data pipeline has two parts :

Collecting Data
Processing Data

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
buildFirehose_AWS.py		buildFirehose_AWS.py
ingesttwitterdata.py		ingesttwitterdata.py
twitterDataCovid.json		twitterDataCovid.json
twitterProcessedData.csv		twitterProcessedData.csv
twittercred.py		twittercred.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

buildFirehose_AWS.py

buildFirehose_AWS.py

ingesttwitterdata.py

ingesttwitterdata.py

twitterDataCovid.json

twitterDataCovid.json

twitterProcessedData.csv

twitterProcessedData.csv

twittercred.py

twittercred.py

Repository files navigation

DATA-PIPELINE USING AWS SERVICES

Objective

Dependencies

How to use ?

Architecture

About

Releases

Packages

Languages

gchatterjee-git/Data-Pipeline-AWS

Folders and files

Latest commit

History

Repository files navigation

DATA-PIPELINE USING AWS SERVICES

Objective

Dependencies

How to use ?

Architecture

About

Topics

Resources

Stars

Watchers

Forks

Languages