Skip to content
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

#Data Challenge

This app contains the solution to the following problem: Given a set S of pairs of usernames corresponding to first degree relationships in a social network, write a program to output each user’s i-th degree friends, for every positive i less than or equal to N, for some fixed N. A successful solution can be scaled to very large S, using parallel and distributed computing techniques.

Implementation: I use Apache Pig, a Map-Reduce high-level dataflow scripting language, within a Bash wrapper script.

Running the program: The program is currently set to run locally on a single machine. The machine must have Pig installed. On Mac OS X, Pig can be installed using the homebrew package management system, with the following command: brew install pig. For other operating systems, Cloudera provides an installation guide at .

All other dependencies are included in this directory.

The program can be run through the following command: bash


  • names.tsv -- A list of tab-separated names. This can be replaced by the names that will be processed.
  • -- The main program file. The number of degrees i to be traversed can be set using the N_DEGREE variable.
  • solution_file_n/part-r-00000 -- The final solutions file for the nth degree calculation, where N was set using the N_DEGREE variable.
  • grouped_data_n -- Intermediary files generated in the solution generation process.


Pig and bash tool to find ith-degree connections for a list of 1st-degree connections between nodes.



No releases published


No packages published
You can’t perform that action at this time.