# GraphLab Create PageRank Benchmark - CommonCrawl 2012 Dataset
## AWS EC2 Benchmark Notebook
This notebook should be used when running the GraphLab Create PageRank Benchmark [over an EC2 instance as described here](http://www.qwantz.com). If you are using a different environment

### Initialize and mount SSDs that will be used as cache locations

In [None]:
%%bash
# initialize filesystem on SSD drives
sudo mkfs -t ext4 /dev/xvdb
sudo mkfs -t ext4 /dev/xvdc

# create mount points for SSD drives
sudo mkdir -p /mnt/tmp1
sudo mkdir -p /mnt/tmp2

# mount SSD drives on created points and temporary file locations
sudo mount /dev/xvdb /mnt/tmp1
sudo mount /dev/xvdc /mnt/tmp2
sudo mount /dev/xvdb /tmp
sudo mount /dev/xvdc /var/tmp

# set permissions for mounted locations
sudo chown ubuntu:ubuntu /mnt/tmp1
sudo chown ubuntu:ubuntu /mnt/tmp2

### Initialize GraphLab Create

In [None]:
# Fill in YOUR_PRODUCT_KEY which you got from Dato; and from your AWS credentials, YOUR_ACCESS_KEY and YOUR_SECRET_KEY 
import graphlab as gl
gl.product_key.set_product_key("YOUR_PRODUCT_KEY")
gl.aws.set_credentials(access_key_id='YOUR_ACCESS_KEY', 
                       secret_access_key='YOUR_SECRET_KEY')

In [None]:
# Set the cache locations to the SSDs.
gl.set_runtime_config("GRAPHLAB_CACHE_FILE_LOCATIONS", "/mnt/tmp1:/mnt/tmp2")

### Run the Benchmark

In [None]:
# Load the CommonCrawl 2012 SGraph
s3_sgraph_path = "s3://dato-datasets-oregon/webgraphs/sgraph/common_crawl_2012_sgraph"
g = gl.load_sgraph(s3_sgraph_path)

In [None]:
# Run PageRank over the SGraph
pr = gl.pagerank.create(g)

### Review the Results

In [None]:
# Print results
print "Done! Resulting PageRank model:"
print
print pr

In [None]:
# Print timings
from datetime import timedelta
training_time_secs = pr['training_time']
print "Total training time:", timedelta(seconds=training_time_secs)
print "Avg. time per iteration:", timedelta(seconds=(training_time_secs / float(pr['num_iterations']))