Pig and bash tool to find ith-degree connections for a list of 1st-degree connections between nodes.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
grouped_data_2
grouped_data_3
grouped_data_4
solution_file_4
README.md
group_data.pig
lengthwise_reformatting.pig
load_data.pig
merged_data_from_grouping_round
names.tsv
nth_degree.sh

README.md

#Data Challenge

This app contains the solution to the following problem: Given a set S of pairs of usernames corresponding to first degree relationships in a social network, write a program to output each user’s i-th degree friends, for every positive i less than or equal to N, for some fixed N. A successful solution can be scaled to very large S, using parallel and distributed computing techniques.

Implementation: I use Apache Pig, a Map-Reduce high-level dataflow scripting language, within a Bash wrapper script.

Running the program: The program is currently set to run locally on a single machine. The machine must have Pig installed. On Mac OS X, Pig can be installed using the homebrew package management system, with the following command: brew install pig. For other operating systems, Cloudera provides an installation guide at http://www.cloudera.com/content/cloudera/en/documentation/cdh4/v4-2-2/CDH4-Installation-Guide/cdh4ig_topic_16_2.html .

All other dependencies are included in this directory.

The program can be run through the following command: bash nth_degree.sh

Files:

  • names.tsv -- A list of tab-separated names. This can be replaced by the names that will be processed.
  • nth_degree.sh -- The main program file. The number of degrees i to be traversed can be set using the N_DEGREE variable.
  • solution_file_n/part-r-00000 -- The final solutions file for the nth degree calculation, where N was set using the N_DEGREE variable.
  • grouped_data_n -- Intermediary files generated in the solution generation process.