# BDCC HW: Task 1

Task: Implement a MapReduce job that creates a list of followers for each user in the dataset.

Example: the list of followers of user 534 is: [2, 16, 37, 73, 156, 210, 308, 347, 446, 455, 487, 519].

In [None]:
%%file src/task1.py
#! /usr/bin/env python3

from mrjob.job import MRJob


# Implement a MapReduce job that creates a list of followers for each user in the dataset.
class Followers(MRJob):

    # Arg 1: self: the class itself (this)
    # Arg 2: Input key to the map function
    # Arg 3: Input value to the map function (one line from the input file)
    def mapper(self, _, line):
        # yield (follower, followee) pair
        (follower, followee) = line.split()
        yield(followee, follower)


    # Arg 1: self: the class itself (this)
    # Arg 2: Input key to the reduce function (here: the key that was emitted by the mapper)
    # Arg 3: Input value to the reduce function (here: a generator object; something like a
    # sorted list of ALL values associated with the same key)
    def reducer(self, followee, followers):
        followers_list = [follower for follower in followers]
        yield(followee, followers_list)


if __name__ == '__main__':
    Followers.run()


### Run in Standalone Mode

In [None]:
!python3 src/task1.py data/graph.txt

### Run in the Hadoop cluster in a fully/pseudo distributed mode

In [None]:
!python3 src/task1.py -r hadoop data/graph.txt -o task1_output

### Copy the output from HDFS to local file system.

In [None]:
!hdfs dfs -copyToLocal task1_output /home/bdccuser/bdcc-assignment1/output/task1