TweetMap.live

A webapp that reads, geolocates and visualizes tweets from disaster regions, currently in Florida for Hurricane Irma.

This webapp contains 4 primary components:

A tweet reading module, which reads Twitter's streaming API and publishes the Json responses (one for each tweet) into a Google Pub/Sub stream.
A tweet recording module, which reads these Json files from the Google Pub/Sub stream, parses them and records them in a Cloud SQL Postgres database.
A set of processing scripts, which sequentially work on the recorded tweets (periodically) to extract and filter location information from these tweets and records them into a geojson file.
A Flask webapp, which visualises the geojson file on a map using Mapbox API. The webapp also has a page where users can train the tweets for the machine learning model to show better results in the future.

Prerequisites

A Google Cloud account, with a Cloud SQL Postgres database and a Pub/Sub service ready to use. The current implementation can run on any webserver as long as the SQL database is authenticated to be accessed from these servers.
Enable PostGIS in the Postgres server (its preinstalled in Google Cloud SQL).
Use OSM2pgsql to load Openstreetmaps dataset that you will be querying in the project. The data for Florida was loaded from geofabrik.de.
Authentication can be performed by installation of the Google Cloud SDK in these servers, which enables the direct access of services in a Google Cloud account without usernames and passwords from within that server.

Start running the file readTweetsPublishToPubSub.py with the correct Pub/Sub names and Twitter API credentials, to begin publishing filtered tweets to the Pub/Sub queue.
The file createCloudDBandTables.py can be run once to create the database and table schema required for processing these tweets.
Start the file insertTweetsIntoDB.py to start reading the Pub/Sub messages, parse them and insert them into the database. This script can be run on multiple servers to increase the throughput.
Run the following periodic scripts (with periodicity as per your preference):
- Run searchAndInsertLocationDataFromTweets.py every few hours to perform a search for all the nodes in the tweets.
- Run doSpacyEntityRecogOnTweets.py every few hours (if prefered, on a separate server).
- Run the Jupyter Notebook 2017-09-24 Make a model to categorize tweets with a location about hurricane-related events after the above two scripts have run to retrain the model with any new data that has been generated from the website and re-predict the tweet flags for all location tweets (and generate the geojson file for the website).
Run the flask website using a nginx/gunicorn framework using standard methods.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
TweetMapFlask		TweetMapFlask
processTweetsForLocation		processTweetsForLocation
readTwitterStream		readTwitterStream
writeTwitterStream		writeTwitterStream
.gitignore		.gitignore
README.md		README.md