Skip to content

malvika95/Twitter-Data-Fetching-Pipeline-Using-Azure-Storage-Blob

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 

Repository files navigation

Twitter-Data-Fetching-Pipeline-Using-Azure-Blob-Storage

Overview

This project is a scalable data pipeline that extracts tweets for a user defined hashtag and uploads data to Azure Blob Storage in a csv format.
The pipeline is divided into two modules, a sender that extracts tweets based on a particular hashtag and a receiever that recieves the data and uploads it to a Blob Storage. The two modules are connected through a service bus queue to prevent request timeouts and ensure independent scalability of the two modules.
Project Website : https://prathameshmahankal.github.io/tracking-online-disinformation/

Installation

Prerequistes

Before getting started with implementing this pipeline, make sure you have access to the following tools :

Configurations

The config.json present here can be used to configure the following parameters to fetch tweets :

  • num_results - The total number of tweets to be collected for a given hashtag
  • start_date - Collect tweets posted from the given start date
  • end_date - Collect tweets posted till the given end date

Running Code

Local execution
Since the code makes consists of two modules a sender and a receiver, that are connected through a service bus, the receiver needs to be run continuously in the background to recieve tweets for different user requests from the sender and upload the same to the blob storage.
Note : Configuration changes should be made before executing the code

Executing the Receiver Module

To execute the receiever module, navigate to the /receiver/receiver_module/src/ folder and run the following command :
python -u receiver.py

The code will then be listening on the service bus queue for requests from the sender.

Executing the Sender Module

The sender module is created as a lightweight flask application to ensure that it can be called by any application by triggering the flask endpoint and passing a hashtag of interest. To start the flask application, navigate to the fetch_tweets\sender_module folder and execute the following command:
python -u manage.py run

The flask application will be running on localhost and will specify the endpoint the flask application is running on. You can click on the endpoint or paste the same in your browser to view the swagger UI to trigger the sender module to fetch tweets by clicking the try it out button in the UI and passing your hashtag of interest.
The endpoint can also be triggered using an IDE such as postman by calling the endpoint as http://localhost:port_number/search/ and passing the hashtag in the request body in a json format such as :
{ "hashtag": "#YourHastagOfInterest" }

Once you have triggered the endpoint you shall see a "success" message as a response indicating your request has been processed successfully. You can then navigate to your azure blob storage account and view the data uploaded.
Note - Ensure your receiver module is running background before triggering the sender module or the data will be piled up in the queue and not be uploaded to the Blob storage

Deployment with Docker

Both the sender and receiver modules can be deployed as images on Docker that can be subsequently pushed on Azure Container Instance to run on the Cloud. To build the docker image of the two modules, navigate to the docker file path for each of the modules (the docker file for the sender module can be found under fetch_tweets/Dockerfile and for the receiever module under receiver/Dockerfile and run the docker build command on your local docker system. Once you have built your docker image locally, you can execute it by running the command : docker run -p [internal_port_number]:[external_port_number] [image_name]. The current code runs the flask application on port 8889.

Once you have built your docker image locally, you can push your local docker image into Azure container instances so that your code runs on the cloud. This can be done by following steps given in this video - https://www.youtube.com/watch?v=cW34LTeogAg&t=5s

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published