A look at some Tweets obtained during Thanksgiving 2015. Analysis done using Apache Zeppelin and the Spark interpreter
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
GetTweets.scala
README.md
thanksgiving_tweets_zeppelin.json
transform_thanksgiving_tweets.pig
tweets.txt

README.md

Thanksgiving Tweets

Overview

I will admit I am a huge fan of Twitter, and of the data it produces because it can holds many interesting facts, and opinions of almost any subject from people from all around the world. During Thanksgiving 2015 (November 26), while everyone was eating turkey, I fired up Spark to capture tweets containing the keyword ‘thanksgiving’.

The reason I did this work was because I was interested in exploring the tweets generated during that period of time, in particular topics such as the most common retweet, and hashtag. Moreover, I wanted to try Apache Zeppelin, a web-based notebook (similar to iPython or Jupyter) for interactive data analytics.

Tools used

  • Apache Zeppelin and the Spark interpreter
  • Spark Streaming
  • Pig

The data

The dataset used is made of 177955 tweets obtained on November 26, 2015.

Repository

This repo holds an export of the Zeppelin notebook, the Scala code used to capture the tweets, a Pig script used for merging all the tweets into one file, and the dataset.

Report

Report: A look at some Tweets from Thanksgiving 2015