Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

#Data Challenge

This app contains the solution to the following problem: Given a set S of pairs of usernames corresponding to first degree relationships in a social network, write a program to output each user’s i-th degree friends, for every positive i less than or equal to N, for some fixed N. A successful solution can be scaled to very large S, using parallel and distributed computing techniques.

Implementation: I use Apache Pig, a Map-Reduce high-level dataflow scripting language, within a Bash wrapper script.

Running the program: The program is currently set to run locally on a single machine. The machine must have Pig installed. On Mac OS X, Pig can be installed using the homebrew package management system, with the following command: brew install pig. For other operating systems, Cloudera provides an installation guide at .

All other dependencies are included in this directory.

The program can be run through the following command: bash


  • names.tsv -- A list of tab-separated names. This can be replaced by the names that will be processed.
  • -- The main program file. The number of degrees i to be traversed can be set using the N_DEGREE variable.
  • solution_file_n/part-r-00000 -- The final solutions file for the nth degree calculation, where N was set using the N_DEGREE variable.
  • grouped_data_n -- Intermediary files generated in the solution generation process.


Pig and bash tool to find ith-degree connections for a list of 1st-degree connections between nodes.






No releases published


No packages published