# BDCC HW: Task 2

Task: Implement a MapReduce job that creates a list of followees for each user in the dataset.

Example: the list of followees of user 534 derives by reading the value of the second column in the lines 97097 – 97187.

In [None]:
%%file src/task2.py
#!/usr/bin/env python3


from mrjob.job import MRJob


# Implement a MapReduce job that creates a list of followees for each user in the dataset.
class Followees(MRJob):

    # Arg 1: self: the class itself (this)
    # Arg 2: Input key to the map function (here:none)
    # Arg 3: Input value to the map function (here:one line from the input file)
    def mapper(self, _, line):
        # yield (follower, followee) pair
        (follower, followee) = line.split()
        yield(follower, followee)


    # Arg 1: self: the class itself (this)
    # Arg 2: Input key to the reduce function (here: the key that was emitted by the mapper)
    # Arg 3: Input value to the reduce function (here: a generator object; something like a
    # sorted list of ALL values associated with the same key)
    def reducer(self, follower, followees):
        followees_list = [followee for followee in followees]
        yield(follower, followees_list)


if __name__ == '__main__':
    Followees.run()

### Run in Standalone Mode

In [None]:
!python3 src/task2.py data/graph.txt

### Run in the Hadoop cluster in a fully/pseudo distributed mode

In [None]:
!python3 src/task2.py -r hadoop data/graph.txt -o task2_output

### Copy the output from HDFS to local file system.

In [None]:
!hdfs dfs -copyToLocal task2_output /home/bdccuser/bdcc-assignment1/output/task2