# MapReduce Implementation in PySpark

In this work, we will perform basic data processing task using PySpark.

The **Amazon_Responded_Oct05.csv** contatins information of 400K tweets. The following 3 columns will be used for this implementation:


1.   **user_id_str**: user ID
2.   **user_followers_count**: the number of followers
3.   **text_**: the text of tweets


**Tasks:** 
1. Find out popular users whose followers are more than 5000, and 
2. Get Top 10 most popular words from the tweets posted by these popular users

Specifically, we need to do the following steps:



1.   **Read/load data**
2.   **Extract the columns** (user_id_str and user_followers_count and text_) 
3.   **Remove the duplicated user id**: some users have different number of followers in different rows. In this case, we will just keep the maximum number of followers for a particular user.

4.   **Find popular users**: create a filter to find popular users who have more than 5000 followers using the new pairs in step 3.

5.   **Count words frequency**: count words frequency of of the tweets posted by the popular users we get from step 4, and get the Top 10 most popular words and their words frequency.

# Installing Spark on Google Colab

In [1]:
# Checking folder
!ls

sample_data


In [0]:
# Installing JDK
!apt-get install openjdk-8-jdk-headless -qq > /dev/null

In [0]:
# Getting Spark installer (check the path on spark.apache.org)
!wget -q http://apache.mirrors.pair.com/spark/spark-2.4.4/spark-2.4.4-bin-hadoop2.7.tgz

In [4]:
# Checking if the file is copied
!ls

sample_data  spark-2.4.4-bin-hadoop2.7.tgz


In [5]:
# Untar the Spark installer
!tar -xvf spark-2.4.4-bin-hadoop2.7.tgz

spark-2.4.4-bin-hadoop2.7/
spark-2.4.4-bin-hadoop2.7/R/
spark-2.4.4-bin-hadoop2.7/R/lib/
spark-2.4.4-bin-hadoop2.7/R/lib/sparkr.zip
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/INDEX
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/html/
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/html/R.css
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/html/00Index.html
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/help/
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/help/aliases.rds
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/help/AnIndex
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/help/SparkR.rdx
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/help/SparkR.rdb
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/help/paths.rds
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/worker/
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/worker/worker.R
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/worker/daemon.R
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/tests/
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/tests/testthat/
spark-2.4.4-bin-hadoop2.7/R/lib/SparkR/tests/testthat/te

In [6]:
# Checking the Spark folder after untar
!ls 

sample_data  spark-2.4.4-bin-hadoop2.7	spark-2.4.4-bin-hadoop2.7.tgz


In [0]:
# Installing findspark - a python library to find Spark
!pip install -q findspark

In [0]:
# Setting environment variables: Setting Java and Spark home based on the location where they are stored
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.4.4-bin-hadoop2.7"

# Step 1: Loading data

In [0]:
# Creating a local Spark session
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()

In [3]:
# Mounting google drive
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [4]:
# Loading data in pandas dataframe
import pandas as pd
amazon_df = pd.read_csv("/content/gdrive/Amazon_Responded_Oct05.csv") 
amazon_df.head()

Unnamed: 0,id_str,tweet_created_at,user_screen_name,user_id_str,user_statuses_count,user_favourites_count,user_protected,user_listed_count,user_following,user_description,user_location,user_verified,user_followers_count,user_friends_count,user_created_at,tweet_language,text_,favorite_count,favorited,in_reply_to_screen_name,in_reply_to_status_id_str,in_reply_to_user_id_str,retweet_count,retweeted,text
0,,,,,,,,,,,,,,,,,,,,,,,,,
1,'793270689780203520',Tue Nov 01 01:57:25 +0000 2016,SeanEPanjab,143515471.0,51287.0,4079.0,False,74.0,False,Content marketer; Polyglot; Beard aficionado; ...,غریب الوطن,False,1503.0,850.0,Thu May 13 17:43:52 +0000 2010,en,@AmazonHelp Can you please DM me? A product I ...,0.0,False,AmazonHelp,,85741735.0,0.0,False,
2,'793281386912354304',Tue Nov 01 02:39:55 +0000 2016,AmazonHelp,85741735.0,2225450.0,11366.0,False,796.0,False,We answer Amazon support questions 7 days a we...,,True,149569.0,53.0,Wed Oct 28 04:17:54 +0000 2009,en,"@SeanEPanjab I'm sorry, we're unable to DM you...",0.0,False,SeanEPanjab,7.93e+17,143515471.0,0.0,False,
3,'793501578766319616',Tue Nov 01 17:14:53 +0000 2016,SeanEPanjab,143515471.0,51287.0,4079.0,False,74.0,False,Content marketer; Polyglot; Beard aficionado; ...,غریب الوطن,False,1503.0,850.0,Thu May 13 17:43:52 +0000 2010,en,@AmazonHelp It was purchased on https://t.co/g...,0.0,False,AmazonHelp,7.93e+17,85741735.0,0.0,False,@AmazonHelp It was purchased on https://t.co/g...
4,'793501657346682880',Tue Nov 01 17:15:12 +0000 2016,SeanEPanjab,143515471.0,51287.0,4079.0,False,74.0,False,Content marketer; Polyglot; Beard aficionado; ...,غریب الوطن,False,1503.0,850.0,Thu May 13 17:43:52 +0000 2010,en,"@AmazonHelp I am following you now, if it help...",0.0,False,AmazonHelp,7.93e+17,85741735.0,0.0,False,


In [5]:
amazon_df.shape

(462029, 25)

In [6]:
# Dropping all null rows
amazon_df.dropna(how="all", inplace=True)
amazon_df.shape

(378134, 25)

In [7]:
# Replacing carriage return and new line characters with a space
amazon_df = amazon_df.replace({r'\r\n': ' '}, regex=True)
amazon_df.head()

Unnamed: 0,id_str,tweet_created_at,user_screen_name,user_id_str,user_statuses_count,user_favourites_count,user_protected,user_listed_count,user_following,user_description,user_location,user_verified,user_followers_count,user_friends_count,user_created_at,tweet_language,text_,favorite_count,favorited,in_reply_to_screen_name,in_reply_to_status_id_str,in_reply_to_user_id_str,retweet_count,retweeted,text
1,'793270689780203520',Tue Nov 01 01:57:25 +0000 2016,SeanEPanjab,143515471.0,51287.0,4079.0,False,74.0,False,Content marketer; Polyglot; Beard aficionado; ...,غریب الوطن,False,1503.0,850.0,Thu May 13 17:43:52 +0000 2010,en,@AmazonHelp Can you please DM me? A product I ...,0.0,False,AmazonHelp,,85741735.0,0.0,False,
2,'793281386912354304',Tue Nov 01 02:39:55 +0000 2016,AmazonHelp,85741735.0,2225450.0,11366.0,False,796.0,False,We answer Amazon support questions 7 days a we...,,True,149569.0,53.0,Wed Oct 28 04:17:54 +0000 2009,en,"@SeanEPanjab I'm sorry, we're unable to DM you...",0.0,False,SeanEPanjab,7.93e+17,143515471.0,0.0,False,
3,'793501578766319616',Tue Nov 01 17:14:53 +0000 2016,SeanEPanjab,143515471.0,51287.0,4079.0,False,74.0,False,Content marketer; Polyglot; Beard aficionado; ...,غریب الوطن,False,1503.0,850.0,Thu May 13 17:43:52 +0000 2010,en,@AmazonHelp It was purchased on https://t.co/g...,0.0,False,AmazonHelp,7.93e+17,85741735.0,0.0,False,@AmazonHelp It was purchased on https://t.co/g...
4,'793501657346682880',Tue Nov 01 17:15:12 +0000 2016,SeanEPanjab,143515471.0,51287.0,4079.0,False,74.0,False,Content marketer; Polyglot; Beard aficionado; ...,غریب الوطن,False,1503.0,850.0,Thu May 13 17:43:52 +0000 2010,en,"@AmazonHelp I am following you now, if it help...",0.0,False,AmazonHelp,7.93e+17,85741735.0,0.0,False,
5,'793502854459879424',Tue Nov 01 17:19:57 +0000 2016,AmazonHelp,85741735.0,2225450.0,11366.0,False,796.0,False,We answer Amazon support questions 7 days a we...,,True,149569.0,53.0,Wed Oct 28 04:17:54 +0000 2009,en,@SeanEPanjab Please give us a call/chat so we ...,0.0,False,SeanEPanjab,7.94e+17,143515471.0,0.0,False,@SeanEPanjab Please give us a call/chat so we ...


In [8]:
# Converting Pandas Dataframe into Spark Dataframe
amazon_df = amazon_df.astype(str) # Converting pandas df to string first
amazon_sdf = spark.createDataFrame(amazon_df)
amazon_sdf.show(10, False) # False allows us to show entire content of the columns

+--------------------+------------------------------+----------------+-----------+-------------------+---------------------+--------------+-----------------+--------------+----------------------------------------------------------------------------------------------------------------------------------------------+-------------+-------------+--------------------+------------------+------------------------------+--------------+------------------------------------------------------------------------------------------------------------------------------------------+--------------+---------+-----------------------+-------------------------+-----------------------+-------------+---------+-----------------------------------------------------------------------------------------------------------------------------------------+
|id_str              |tweet_created_at              |user_screen_name|user_id_str|user_statuses_count|user_favourites_count|user_protected|user_listed_count|user_followin

In [16]:
# Columns in df
amazon_sdf.columns

['id_str',
 'tweet_created_at',
 'user_screen_name',
 'user_id_str',
 'user_statuses_count',
 'user_favourites_count',
 'user_protected',
 'user_listed_count',
 'user_following',
 'user_description',
 'user_location',
 'user_verified',
 'user_followers_count',
 'user_friends_count',
 'user_created_at',
 'tweet_language',
 'text_',
 'favorite_count',
 'favorited',
 'in_reply_to_screen_name',
 'in_reply_to_status_id_str',
 'in_reply_to_user_id_str',
 'retweet_count',
 'retweeted',
 'text']

In [17]:
# Schema: Datatypes associated with columns
amazon_sdf.printSchema()

root
 |-- id_str: string (nullable = true)
 |-- tweet_created_at: string (nullable = true)
 |-- user_screen_name: string (nullable = true)
 |-- user_id_str: string (nullable = true)
 |-- user_statuses_count: string (nullable = true)
 |-- user_favourites_count: string (nullable = true)
 |-- user_protected: string (nullable = true)
 |-- user_listed_count: string (nullable = true)
 |-- user_following: string (nullable = true)
 |-- user_description: string (nullable = true)
 |-- user_location: string (nullable = true)
 |-- user_verified: string (nullable = true)
 |-- user_followers_count: string (nullable = true)
 |-- user_friends_count: string (nullable = true)
 |-- user_created_at: string (nullable = true)
 |-- tweet_language: string (nullable = true)
 |-- text_: string (nullable = true)
 |-- favorite_count: string (nullable = true)
 |-- favorited: string (nullable = true)
 |-- in_reply_to_screen_name: string (nullable = true)
 |-- in_reply_to_status_id_str: string (nullable = true)
 |--

In [18]:
# Total number of rows
amazon_sdf.count()

378134

# Step 2: Extracting the three columns

In [9]:
# Extracting columns 'user_id_str', 'user_followers_count', and 'text_'
amazon_sub_df = amazon_sdf.select(amazon_sdf.user_id_str, amazon_sdf.user_followers_count.cast('int').alias('user_followers_count'), amazon_sdf.text_)
amazon_sub_df.show(20, False)

+-----------+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
|user_id_str|user_followers_count|text_                                                                                                                                                        |
+-----------+--------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+
|143515471.0|1503                |@AmazonHelp Can you please DM me? A product I ordered last year never arrived.                                                                               |
|85741735.0 |149569              |@SeanEPanjab I'm sorry, we're unable to DM you. Was this order purchased on https://t.co/nUUp5MLhYl, or one of our other sites? ^CL                          |
|143515471.0|1503                |@

In [20]:
# Checking datatype of columns
amazon_sub_df.printSchema()

root
 |-- user_id_str: string (nullable = true)
 |-- user_followers_count: integer (nullable = true)
 |-- text_: string (nullable = true)



In [21]:
# Number of distinct users 
import pyspark.sql.functions as f
amazon_sub_df.select(f.countDistinct("user_id_str")).show()

+---------------------------+
|count(DISTINCT user_id_str)|
+---------------------------+
|                      71417|
+---------------------------+



# Step 3: Removing the duplicated records

In [22]:
# Checking number of rows (tweets) for a particular user
amazon_sub_df.filter(amazon_sub_df.user_id_str == '85741735.0').count()

170691

In [10]:
# Removing duplicate records by keeping just the maximum number of followers for any user
# Step 1: First let us get the max followers for every user
import pyspark.sql.functions as f
maxf = amazon_sub_df.groupBy("user_id_str").agg(f.max("user_followers_count").alias("max")).alias("maxf")
maxf.show()

+------------+-----+
| user_id_str|  max|
+------------+-----+
| 182037498.0|   38|
|  16669765.0|   57|
| 264218502.0|   70|
|1873374613.0|  174|
|1305569118.0|  128|
|3255474476.0|    1|
|  29207516.0|  213|
| 123058781.0|  380|
|1079912580.0|  673|
| 277017869.0| 1103|
| 553031167.0|  491|
|  43597889.0| 1915|
|  45659327.0|  338|
| 101381642.0|  602|
|2811064347.0|11588|
|2325986832.0|  141|
| 998124780.0|  102|
| 220146294.0|  497|
| 7.11572e+17|  143|
|1362528962.0|  567|
+------------+-----+
only showing top 20 rows



In [24]:
# Total number of rows
maxf.count()

71417

In [0]:
# Step 2: Let us now join this with the original df (amazon_sub_df) to get all the rows of users which match their max follower count obtained in 'maxf'
from pyspark.sql.functions import col 
amazon_sub_df = amazon_sub_df.alias("amazon_sub_df") # defining alias for original df
amazon_sub_df2 = amazon_sub_df.join(maxf, (col("user_followers_count") == col("max")) & 
                                    (col("amazon_sub_df.user_id_str") == col("maxf.user_id_str"))).select(
                                     col("amazon_sub_df.user_id_str"), col("amazon_sub_df.user_followers_count"), col("amazon_sub_df.text_"))

In [26]:
amazon_sub_df2.show(20, False)

+------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
|user_id_str |user_followers_count|text_                                                                                                                                       |
+------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------+
|8.32319e+17 |0                   |@Amazon poor service by Amazon...my friend is waiting for the gift since a week still not delivered...really disappointed                   |
|8.14921e+17 |1                   |@AmazonHelp My order says its been delivered since dec 30 &amp; I still haven't received it. I called CS &amp; they said to wait 3 + days...|
|8.14921e+17 |1                   |@AmazonHelp I'm so confused because I also have amazon prime &amp; now its the 6

In [27]:
# Total number of rows
amazon_sub_df2.count()

194305

In [28]:
# Checking number of rows (tweets) for that user again
amazon_sub_df2.filter(amazon_sub_df2.user_id_str == '85741735.0').count()

13

# Step 4: Finding popular users

In [12]:
# Creating a filter to find popular users who have more than 5000 followers
popular_df = amazon_sub_df2.filter(amazon_sub_df2.user_followers_count >= 5000)
popular_df.count() # number of rows

5063

In [13]:
popular_df.show(20, False)

+------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|user_id_str |user_followers_count|text_                                                                                                                                                         |
+------------+--------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------+
|3380217501.0|5163                |Looks like my parcel isn't coming today @AmazonHelp 😒 I live 5 mins from a warehouse how can it be late?                                                     |
|3380217501.0|5163                |@AmazonHelp It says it was dispatched on the 30th and should arrive today, but nothing. When something I ordered after this arrived on Sat                    |
|23069141.0  |6550        

In [14]:
# Check: Sorting follower count Ascending
popular_df.sort(popular_df.user_followers_count).show(20, False)

+-----------+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
|user_id_str|user_followers_count|text_                                                                                                                                          |
+-----------+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
|26359772.0 |5003                |@AmazonHelp any chance you can tell me what time American Gods is going online on May 1st?                                                     |
|213353996.0|5007                |@AmazonHelp Last delivery I ordered was marked as "handed to resident" when it was really dropped over fence in rain. https://t.co/8bJgpAiU1m  |
|213353996.0|5007                |@AmazonHelp Had two shipments marked as "delivered" yesterday that neve

In [15]:
# Check: Sorting follower count Descending
popular_df.sort((popular_df.user_followers_count).desc()).show(20, False)

+------------+--------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|user_id_str |user_followers_count|text_                                                                                                                                                     |
+------------+--------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------+
|20793816.0  |2776204             |@Liberal_Lisa You're all set. We'll remind you when #PrimeDay is here! https://t.co/kQdsFAczL6                                                            |
|83554234.0  |1958810             |@AmazonHelp Over 1k properties in this block, where should I start? 4 orders/ 7 items to be delivered YESTERDAY to me and still... https://t.co/CdIavyHYD0|
|83554234.0  |1958810             |@AmazonHel

In [17]:
# Number of distinct users
popular_df.select('user_id_str').distinct().count()

2130

In [0]:
# Finding number of tweets per user using groupBy
groupedUsers = popular_df.groupby('user_id_str').count().withColumnRenamed("count","tweet_count")

In [19]:
# Sorting in descending order of count
groupedUsers.sort((groupedUsers.tweet_count).desc()).show(20)

+------------+-----------+
| user_id_str|tweet_count|
+------------+-----------+
|  17535421.0|         23|
|2454614098.0|         20|
|  16789906.0|         18|
|2950894703.0|         18|
| 518389533.0|         17|
| 568744096.0|         15|
|  17225620.0|         14|
| 153426866.0|         14|
|  22486545.0|         14|
|  85741735.0|         13|
|   2551781.0|         13|
|  27326436.0|         12|
|  31697976.0|         11|
|  14247313.0|         11|
|  94249055.0|         11|
| 313202447.0|         11|
|  17742722.0|         11|
|  74177248.0|         11|
|  15658980.0|         11|
|  60964525.0|         11|
+------------+-----------+
only showing top 20 rows



# Step 5: Finding top words and save the results

In [16]:
# Counting words frequency of the tweets posted by the popular users from step 4
# Reading column 'text_' from spark df (popular_df) to a python list, which will then be read into RDD object 
tweet = popular_df.select("text_").rdd.flatMap(lambda x: x).collect()
tweet[0:5]

["Looks like my parcel isn't coming today @AmazonHelp 😒 I live 5 mins from a warehouse how can it be late?",
 '@AmazonHelp It says it was dispatched on the 30th and should arrive today, but nothing. When something I ordered after this arrived on Sat',
 "@AmazonUK hi! I didn't have an option in Jan for overnight, just standard, any way to change this without mucking up my preorder? Thx! https://t.co/L4QEBPBbCi",
 "@AmazonHelp hi I've had money go out of my account and says it's for amazon prime which I have never signed up for???",
 '@AmazonHelp it says I can end it but not cancel will this work?']

In [0]:
# Creating Spark Context
from pyspark import SparkContext
sc = SparkContext.getOrCreate()

In [21]:
# Reading 'tweet' to a RDD object using spark context
tweet_rdd = sc.parallelize(tweet)
tweet_rdd.take(5)

["Looks like my parcel isn't coming today @AmazonHelp 😒 I live 5 mins from a warehouse how can it be late?",
 '@AmazonHelp It says it was dispatched on the 30th and should arrive today, but nothing. When something I ordered after this arrived on Sat',
 "@AmazonUK hi! I didn't have an option in Jan for overnight, just standard, any way to change this without mucking up my preorder? Thx! https://t.co/L4QEBPBbCi",
 "@AmazonHelp hi I've had money go out of my account and says it's for amazon prime which I have never signed up for???",
 '@AmazonHelp it says I can end it but not cancel will this work?']

In [0]:
# Function for cleaning the text of tweets
import string
import re

def clean_tweet(x):
  
  # Delete all the URLs in the tweets
  text00 = re.sub(r'www\S+', '', x)
  text01 = re.sub(r'http\S+', '', text00)
  
  # Delete all the numbers in the tweets
  text1 = ''.join([i for i in text01 if not i.isdigit()])
  
  # Delete all the punctuation marks in the tweets
  text2 = text1.translate(str.maketrans('','',string.punctuation))
  
  # Convert text to LOWERCASE
  text3 = text2.lower()

  return text3

In [23]:
# Cleaning the text of tweets: 1. Removing URLs, 2. Removing non-alphabets, 3. Lowercase 
clean_tweet_rdd = tweet_rdd.map(clean_tweet)
clean_tweet_rdd.take(10)

['looks like my parcel isnt coming today amazonhelp 😒 i live  mins from a warehouse how can it be late',
 'amazonhelp it says it was dispatched on the th and should arrive today but nothing when something i ordered after this arrived on sat',
 'amazonuk hi i didnt have an option in jan for overnight just standard any way to change this without mucking up my preorder thx ',
 'amazonhelp hi ive had money go out of my account and says its for amazon prime which i have never signed up for',
 'amazonhelp it says i can end it but not cancel will this work',
 'amazon one of your hard working ky employees lost her id and other cards while packing my prime box how can i return it',
 'amazonin order  has not delivered almost  days gone how many months do you need more',
 'amazonhelp what i have to fill could you tell me more clearly',
 'amazonin amazonhelp i didnt ask for a gift voucher i asked you to deliver my product ordered shameonyouamazon',
 'amazonhelp dont regret try to deliver the produ

In [24]:
# Building Map function
map = clean_tweet_rdd.flatMap(lambda line: line.split(" ")).map(lambda word: (word, 1))
map.take(5)

[('looks', 1), ('like', 1), ('my', 1), ('parcel', 1), ('isnt', 1)]

In [25]:
# Building Reduce function
counts = map.reduceByKey(lambda a, b: a + b)
counts.take(5)

[('looks', 29), ('like', 137), ('😒', 6), ('i', 2896), ('live', 34)]

In [26]:
# Total number of distinct words
print(len(counts.collect()))

6855


In [27]:
# Getting the Top 10 most popular words and their words frequency
# Sorting 'counts' in descending order and getting the first 10 elements
countsSortedTopTen = counts.takeOrdered(10, lambda a: -a[1] if len(a[0]) > 0 else False) # Conditioned on that number of characters in the string should be at least 1
countsSortedTopTen

[('amazonhelp', 3647),
 ('i', 2896),
 ('to', 2491),
 ('the', 2356),
 ('a', 1616),
 ('it', 1551),
 ('my', 1482),
 ('amazon', 1434),
 ('and', 1379),
 ('you', 1185)]

In [0]:
# Writing list to a text file
with open('/content/gdrive/Output.txt', 'w') as f:
    for item in countsSortedTopTen:
        f.write(item[0] + ', ' + str(item[1]) + '\n')