Permalink
Browse files

Bumped version up to 1.0.5-SNAPSHOT.

  • Loading branch information...
lintool committed Sep 3, 2018
1 parent 67be2d4 commit aba9ffa3b3cda144cf07c868534ccfe0d6b72bc7
Showing with 23 additions and 23 deletions.
  1. +22 −22 README.md
  2. +1 −1 pom.xml
View
@@ -33,7 +33,7 @@ The datasets are stored in the [Bespin data repo](https://github.com/lintool/bes
Make sure you've downloaded the Shakespeare collection (see "Getting Started" above). Running word count in Java MapReduce:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.wordcount.WordCount \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.wordcount.WordCount \
-input data/Shakespeare.txt -output wc-jmr-combiner
```
@@ -42,7 +42,7 @@ To enable the "in-mapper combining" optimization, use the `-imc` option.
Running word count in Scala MapReduce:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.scala.mapreduce.wordcount.WordCount \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.scala.mapreduce.wordcount.WordCount \
--input data/Shakespeare.txt --output wc-smr-combiner
```
@@ -51,7 +51,7 @@ To enable the "in-mapper combining" optimization, use the `--imc` option.
And finally, running word count in Spark:
```
$ spark-submit --class io.bespin.scala.spark.wordcount.WordCount target/bespin-1.0.4-SNAPSHOT-fatjar.jar \
$ spark-submit --class io.bespin.scala.spark.wordcount.WordCount target/bespin-1.0.5-SNAPSHOT-fatjar.jar \
--input data/Shakespeare.txt --output wc-spark-default
```
@@ -74,21 +74,21 @@ $ diff counts.jmr.combiner.txt counts.spark.default.txt
Make sure you've downloaded the Shakespeare collection (see "Getting Started" above). Running a simple bigram count:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bigram.BigramCount \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bigram.BigramCount \
-input data/Shakespeare.txt -output bigram-count
```
Computing bigram relative frequencies using the "pairs" implementation:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bigram.ComputeBigramRelativeFrequencyPairs \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bigram.ComputeBigramRelativeFrequencyPairs \
-input data/Shakespeare.txt -output bigram-freq-mr-pairs -textOutput
```
Computing bigram relative frequencies using the "stripes" implementation:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bigram.ComputeBigramRelativeFrequencyStripes \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bigram.ComputeBigramRelativeFrequencyStripes \
-input data/Shakespeare.txt -output bigram-freq-mr-stripes -textOutput
```
@@ -140,14 +140,14 @@ $ diff freq.mr.stripes.txt freq.mr.pairs.txt
Make sure you've downloaded the Shakespeare collection (see "Getting Started" above). Running the "pairs" implementation:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.cooccur.ComputeCooccurrenceMatrixPairs \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.cooccur.ComputeCooccurrenceMatrixPairs \
-input data/Shakespeare.txt -output cooccur-pairs -window 2
```
Running the "stripes" implementation:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.cooccur.ComputeCooccurrenceMatrixStripes \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.cooccur.ComputeCooccurrenceMatrixStripes \
-input data/Shakespeare.txt -output cooccur-stripes -window 2
```
@@ -170,21 +170,21 @@ $ hadoop fs -cat cooccur-stripes/part* | awk '/^dream\t/'
Make sure you've downloaded the Shakespeare collection (see "Getting Started" above). Building the inverted index:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.search.BuildInvertedIndex \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.search.BuildInvertedIndex \
-input data/Shakespeare.txt -output index
```
Looking up an individual postings list:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.search.LookupPostings \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.search.LookupPostings \
-index index -collection data/Shakespeare.txt -term "star-cross'd"
```
Running a boolean retrieval:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.search.BooleanRetrieval \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.search.BooleanRetrieval \
-index index -collection data/Shakespeare.txt -query "white red OR rose AND pluck AND"
```
@@ -195,14 +195,14 @@ Note that the query must be in [Reverse Polish notation](https://en.wikipedia.or
Make sure you've grabbed the sample graph data (see "Getting Started" above). First, convert the plain-text adjacency list representation into Hadoop `Writable` records:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bfs.EncodeBfsGraph \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bfs.EncodeBfsGraph \
-input data/p2p-Gnutella08-adj.txt -output graph-BFS/iter0000 -src 367
```
In the current implementation, you have to run a MapReduce job for every iteration, like this:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bfs.IterateBfs \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bfs.IterateBfs \
-input graph-BFS/iter0000 -output graph-BFS/iter0001 -partitions 5
```
@@ -215,7 +215,7 @@ for i in `seq 0 14`; do
cur=`echo $i | awk '{printf "%04d\n", $0;}'`
next=`echo $(($i+1)) | awk '{printf "%04d\n", $0;}'`
echo "Iteration $i: reading graph-BFS/iter$cur, writing: graph-BFS/iter$next"
hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bfs.IterateBfs -input "graph-BFS/iter$cur" -output "graph-BFS/iter$next" -partitions 5
hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bfs.IterateBfs -input "graph-BFS/iter$cur" -output "graph-BFS/iter$next" -partitions 5
done
```
@@ -243,7 +243,7 @@ The MapReduce job counters tell you how many nodes are reachable at each iterati
To find all the nodes that are reachable at a particular iteration, run the following job:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bfs.FindReachableNodes \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bfs.FindReachableNodes \
-input graph-BFS/iter0005 -output graph-BFS/reachable-iter0005
$ hadoop fs -cat 'graph-BFS/reachable-iter0005/part*' | wc
@@ -254,7 +254,7 @@ These values should be the same as those in the second column of the table above
To find all the nodes that are at a particular distance (e.g., the search frontier), run the following job:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bfs.FindNodeAtDistance \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bfs.FindNodeAtDistance \
-input graph-BFS/iter0005 -output graph-BFS/d0005 -distance 5
$ hadoop fs -cat 'graph-BFS/d0005/part*' | wc
@@ -270,7 +270,7 @@ Here's a simple bash script for iterating through the reachability jobs:
for i in `seq 0 15`; do
cur=`echo $i | awk '{printf "%04d\n", $0;}'`
echo "Iteration $i: reading graph-BFS/iter$cur"
hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bfs.FindReachableNodes -input graph-BFS/iter$cur -output graph-BFS/reachable-iter$cur
hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bfs.FindReachableNodes -input graph-BFS/iter$cur -output graph-BFS/reachable-iter$cur
done
```
@@ -282,7 +282,7 @@ Here's a simple bash script for extracting nodes at each distance:
for i in `seq 0 15`; do
cur=`echo $i | awk '{printf "%04d\n", $0;}'`
echo "Iteration $i: reading graph-BFS/iter$cur"
hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bfs.FindNodeAtDistance -input graph-BFS/iter$cur -output graph-BFS/d$cur -distance $i
hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.bfs.FindNodeAtDistance -input graph-BFS/iter$cur -output graph-BFS/d$cur -distance $i
done
```
@@ -291,7 +291,7 @@ done
Make sure you've grabbed the sample graph data (see "Getting Started" above). First, convert the plain-text adjacency list representation into Hadoop `Writable` records:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.pagerank.BuildPageRankRecords \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.pagerank.BuildPageRankRecords \
-input data/p2p-Gnutella08-adj.txt -output graph-PageRankRecords -numNodes 6301
```
@@ -304,21 +304,21 @@ $ hadoop fs -mkdir graph-PageRank
Partition the graph:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.pagerank.PartitionGraph \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.pagerank.PartitionGraph \
-input graph-PageRankRecords -output graph-PageRank/iter0000 -numPartitions 5 -numNodes 6301
```
Run 15 iterations:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.pagerank.RunPageRankBasic \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.pagerank.RunPageRankBasic \
-base graph-PageRank -numNodes 6301 -start 0 -end 15 -useCombiner
```
Extract the top 20 nodes by PageRank value and examine the results:
```
$ hadoop jar target/bespin-1.0.4-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.pagerank.FindMaxPageRankNodes \
$ hadoop jar target/bespin-1.0.5-SNAPSHOT-fatjar.jar io.bespin.java.mapreduce.pagerank.FindMaxPageRankNodes \
-input graph-PageRank/iter0015 -output graph-PageRank-top20 -top 20
$ hadoop fs -cat graph-PageRank-top20/part-r-00000
View
@@ -7,7 +7,7 @@
<artifactId>bespin</artifactId>
<packaging>jar</packaging>
<name>Bespin</name>
<version>1.0.4</version>
<version>1.0.5-SNAPSHOT</version>
<description>Code for the big data course at the University of Waterloo.</description>
<url>http://bespin.io/</url>
<licenses>

0 comments on commit aba9ffa

Please sign in to comment.