# Using Ray for Web Scraping

In this example we will show you how to use Ray for scraping information from the web. There are sophisticated Python libraries to achieve this task (like [https://scrapy.org/](https://scrapy.org/)). In this example we will keep it very simple and adapt existing code from [https://www.scrapingbee.com/blog/crawling-python/](https://www.scrapingbee.com/blog/crawling-python/) and show how simple it is to parallelize the code with Ray.

First install the required dependencies with

```
pip install requests bs4
```

We can then already run the example from [https://www.scrapingbee.com/blog/crawling-python/](https://www.scrapingbee.com/blog/crawling-python/) out of the box like this:

In [3]:
import logging
from urllib.parse import urljoin
import requests
from bs4 import BeautifulSoup

logging.basicConfig(
    format='%(asctime)s %(levelname)s:%(message)s',
    level=logging.INFO)

class Crawler:

    def __init__(self, urls=[]):
        self.visited_urls = []
        self.urls_to_visit = urls

    def download_url(self, url):
        return requests.get(url).text

    def get_linked_urls(self, url, html):
        soup = BeautifulSoup(html, 'html.parser')
        for link in soup.find_all('a'):
            path = link.get('href')
            if path and path.startswith('/'):
                path = urljoin(url, path)
            yield path

    def add_url_to_visit(self, url):
        if url not in self.visited_urls and url not in self.urls_to_visit:
            self.urls_to_visit.append(url)

    def crawl(self, url):
        html = self.download_url(url)
        for url in self.get_linked_urls(url, html):
            self.add_url_to_visit(url)

    def run(self):
        while self.urls_to_visit:
            url = self.urls_to_visit.pop(0)
            logging.info(f'Crawling: {url}')
            try:
                self.crawl(url)
            except Exception:
                logging.exception(f'Failed to crawl: {url}')
            finally:
                self.visited_urls.append(url)

if __name__ == '__main__':
    Crawler(urls=['https://en.wikipedia.org/']).run()

2022-06-21 20:09:40,024 INFO:Crawling: https://en.wikipedia.org/
2022-06-21 20:09:40,211 INFO:Crawling: None
2022-06-21 20:09:40,212 ERROR:Failed to crawl: None
Traceback (most recent call last):
  File "<ipython-input-3-6ce3446d43f3>", line 41, in run
    self.crawl(url)
  File "<ipython-input-3-6ce3446d43f3>", line 32, in crawl
    html = self.download_url(url)
  File "<ipython-input-3-6ce3446d43f3>", line 17, in download_url
    return requests.get(url).text
  File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/api.py", line 76, in get
    return request('get', url, params=params, **kwargs)
  File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 528, in request
    prep = self.prepare_request(req)
  File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/session

KeyboardInterrupt: 

In order to parallelize the above logic, we are going to use two datastructures: A remote list for keeping track of the work

In [5]:
import ray
ray.init()

2022-06-22 11:40:35,831	INFO services.py:1476 -- View the Ray dashboard at [1m[32mhttp://127.0.0.1:8265[39m[22m


RayContext(dashboard_url='127.0.0.1:8265', python_version='3.7.4', ray_version='1.13.0', ray_commit='e4ce38d001dbbe09cd21c497fedd03d692b2be3e', address_info={'node_ip_address': '127.0.0.1', 'raylet_ip_address': '127.0.0.1', 'redis_address': None, 'object_store_address': '/tmp/ray/session_2022-06-22_11-40-32_931758_30346/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2022-06-22_11-40-32_931758_30346/sockets/raylet', 'webui_url': '127.0.0.1:8265', 'session_dir': '/tmp/ray/session_2022-06-22_11-40-32_931758_30346', 'metrics_export_port': 58737, 'gcs_address': '127.0.0.1:56024', 'address': '127.0.0.1:56024', 'node_id': '682ac70a0b51b49377105eb68dbce976c3f79893d0b68205d9851f9c'})

In [34]:
import collections

@ray.remote
class RaySet:
    def __init__(self):
        self.data = set()

    def contains(self, element):
        return element in self.data

    def add(self, element):
        self.data.add(element)

@ray.remote
class RayQueue:
    def __init__(self, data):
        self.data = collections.deque(data)
        
    def contains(self, element):
        return element in self.data
        
    def append(self, element):
        self.data.append(element)
    
    def pop(self):
        try:
            return self.data.popleft()
        except IndexError:
            return None

In [3]:
# Let's start Ray
ray.init(address='auto')

INFO:anyscale.snapshot_util:Synced git objects for /home/ray/workspace-project-waleed_test1 to /efs/workspaces/shared_objects in 0.07651424407958984s.
INFO:anyscale.snapshot_util:Created snapshot for /home/ray/workspace-project-waleed_test1 at /tmp/snapshot_2022-05-16T16:38:57.388956_otbjcv41.zip of size 1667695 in 0.014925718307495117s.
INFO:anyscale.snapshot_util:Content hashes b'f4fcea43e90a69d561bf323a07691536' vs b'f4fcea43e90a69d561bf323a07691536'
INFO:anyscale.snapshot_util:Content hash unchanged, not saving new snapshot.
INFO:ray.worker:Connecting to existing Ray cluster at address: 172.31.78.11:9031
2022-05-16 16:38:57,451	INFO packaging.py:269 -- Pushing file package 'gcs://_ray_pkg_bf4a08129b7b19b96a1701be1151f9a8.zip' (1.59MiB) to Ray cluster...
2022-05-16 16:38:57,470	INFO packaging.py:278 -- Successfully pushed file package 'gcs://_ray_pkg_bf4a08129b7b19b96a1701be1151f9a8.zip'.


Updated runtime env to {'working_dir': '/efs/workspaces/expwrk_aXjrEWxgAfCazC2KCUCttum5/snapshots/snapshot_2022-05-16T00:38:47.798071_auto_p0mfj5qr.zip'}


RayContext(dashboard_url='127.0.0.1:8265', python_version='3.8.5', ray_version='2.0.0.dev0', ray_commit='e2ee2140f97ca08b70fd0f7561038b7f8d958d63', address_info={'node_ip_address': '172.31.78.11', 'raylet_ip_address': '172.31.78.11', 'redis_address': None, 'object_store_address': '/tmp/ray/session_2022-05-16_16-09-56_740551_146/sockets/plasma_store', 'raylet_socket_name': '/tmp/ray/session_2022-05-16_16-09-56_740551_146/sockets/raylet', 'webui_url': '127.0.0.1:8265', 'session_dir': '/tmp/ray/session_2022-05-16_16-09-56_740551_146', 'metrics_export_port': 55904, 'gcs_address': '172.31.78.11:9031', 'address': '172.31.78.11:9031', 'node_id': 'a9667bf72f15c8289ed547e67b90d8098ff2771386b88774f2f33201'})

<!-- #raw -->
```{eval-rst}
.. code-block:: python
    :emphasize-lines: 19, 20, 21

    class RayCrawler:

        def __init__(self, visited_urls, urls_to_visit):
            self.visited_urls = visited_urls
            self.urls_to_visit = urls_to_visit

        def download_url(self, url):
            return requests.get(url).text

        def get_linked_urls(self, url, html):
            soup = BeautifulSoup(html, 'html.parser')
            for link in soup.find_all('a'):
                path = link.get('href')
                if path and path.startswith('/'):
                    path = urljoin(url, path)
                yield path

        def add_url_to_visit(self, url):
            if (not ray.get(self.visited_urls.contains.remote(url)) and
                not ray.get(self.urls_to_visit.contains.remote(url))):
                self.urls_to_visit.append.remote(url)

        def crawl(self, url):
            html = self.download_url(url)
            for url in self.get_linked_urls(url, html):
                self.add_url_to_visit(url)

        def run(self):
            while True:
                url = ray.get(self.urls_to_visit.pop.remote())
                logging.error(f'Crawling: {url}')
                try:
                    self.crawl(url)
                except Exception:
                    logging.exception(f'Failed to crawl: {url}')
                finally:
                    self.visited_urls.add.remote(url)
                
    @ray.remote
    def worker(visited_urls, urls_to_visit):
        RayCrawler(visited_urls, urls_to_visit).run()

    if __name__ == '__main__':
        visited_urls = RaySet.remote()
        urls_to_visit = RayQueue.remote(['https://en.wikipedia.org/'])
        ray.get([worker.remote(visited_urls, urls_to_visit) for i in range(5)])
```
<!-- #endraw -->

In [37]:
:emphasize-lines: 7, 11, 12

class RayCrawler:

    def __init__(self, visited_urls, urls_to_visit):
        self.visited_urls = visited_urls
        self.urls_to_visit = urls_to_visit

    def download_url(self, url):
        return requests.get(url).text

    def get_linked_urls(self, url, html):
        soup = BeautifulSoup(html, 'html.parser')
        for link in soup.find_all('a'):
            path = link.get('href')
            if path and path.startswith('/'):
                path = urljoin(url, path)
            yield path

    def add_url_to_visit(self, url):
        if (not ray.get(self.visited_urls.contains.remote(url)) and
            not ray.get(self.urls_to_visit.contains.remote(url))):
            self.urls_to_visit.append.remote(url)

    def crawl(self, url):
        html = self.download_url(url)
        for url in self.get_linked_urls(url, html):
            self.add_url_to_visit(url)

    def run(self):
        while True:
            url = ray.get(self.urls_to_visit.pop.remote())
            logging.error(f'Crawling: {url}')
            try:
                self.crawl(url)
            except Exception:
                logging.exception(f'Failed to crawl: {url}')
            finally:
                self.visited_urls.add.remote(url)
                
@ray.remote
def worker(visited_urls, urls_to_visit):
    RayCrawler(visited_urls, urls_to_visit).run()

if __name__ == '__main__':
    visited_urls = RaySet.remote()
    urls_to_visit = RayQueue.remote(['https://en.wikipedia.org/'])
    ray.get([worker.remote(visited_urls, urls_to_visit) for i in range(5)])

[2m[36m(worker pid=48556)[0m ERROR:root:Crawling: https://en.wikipedia.org/wiki/Final_Fantasy_Tactics
[2m[36m(worker pid=48601)[0m ERROR:root:Crawling: https://en.wikipedia.org/
[2m[36m(worker pid=48600)[0m ERROR:root:Crawling: None
[2m[36m(worker pid=48600)[0m ERROR:root:Failed to crawl: None
[2m[36m(worker pid=48600)[0m Traceback (most recent call last):
[2m[36m(worker pid=48600)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 33, in run
[2m[36m(worker pid=48600)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 24, in crawl
[2m[36m(worker pid=48600)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 8, in download_url
[2m[36m(worker pid=48600)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/api.py", line 76, in get
[2m[36m(worker pid=48600)[0m     return request('get', url, params=params, **kwargs)
[2m[36m(worker pid=48600)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/api.py", line 61, in reques

[2m[36m(worker pid=48600)[0m ERROR:root:Failed to crawl: None
[2m[36m(worker pid=48600)[0m Traceback (most recent call last):
[2m[36m(worker pid=48600)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 33, in run
[2m[36m(worker pid=48600)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 24, in crawl
[2m[36m(worker pid=48600)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 8, in download_url
[2m[36m(worker pid=48600)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/api.py", line 76, in get
[2m[36m(worker pid=48600)[0m     return request('get', url, params=params, **kwargs)
[2m[36m(worker pid=48600)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/api.py", line 61, in request
[2m[36m(worker pid=48600)[0m     return session.request(method=method, url=url, **kwargs)
[2m[36m(worker pid=48600)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 528, in request
[2m[36m

[2m[36m(worker pid=48600)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 8, in download_url
[2m[36m(worker pid=48600)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/api.py", line 76, in get
[2m[36m(worker pid=48600)[0m     return request('get', url, params=params, **kwargs)
[2m[36m(worker pid=48600)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/api.py", line 61, in request
[2m[36m(worker pid=48600)[0m     return session.request(method=method, url=url, **kwargs)
[2m[36m(worker pid=48600)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 528, in request
[2m[36m(worker pid=48600)[0m     prep = self.prepare_request(req)
[2m[36m(worker pid=48600)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 466, in prepare_request
[2m[36m(worker pid=48600)[0m     hooks=merge_hooks(request.hooks, self.hooks),
[2m[36m(worker pid=48600

[2m[36m(worker pid=48602)[0m ERROR:root:Failed to crawl: None
[2m[36m(worker pid=48602)[0m Traceback (most recent call last):
[2m[36m(worker pid=48602)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 33, in run
[2m[36m(worker pid=48602)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 24, in crawl
[2m[36m(worker pid=48599)[0m ERROR:root:Crawling: None
[2m[36m(worker pid=48599)[0m ERROR:root:Failed to crawl: None
[2m[36m(worker pid=48599)[0m Traceback (most recent call last):
[2m[36m(worker pid=48599)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 33, in run
[2m[36m(worker pid=48599)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 24, in crawl
[2m[36m(worker pid=48599)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 8, in download_url
[2m[36m(worker pid=48599)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/api.py", line 76, in get
[2m[36m(worker pid=48599)[0m     return request('get', url, params=params, **kwargs)


[2m[36m(worker pid=48600)[0m Traceback (most recent call last):
[2m[36m(worker pid=48600)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 33, in run
[2m[36m(worker pid=48600)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 24, in crawl
[2m[36m(worker pid=48600)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 8, in download_url
[2m[36m(worker pid=48600)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/api.py", line 76, in get
[2m[36m(worker pid=48600)[0m     return request('get', url, params=params, **kwargs)
[2m[36m(worker pid=48600)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/api.py", line 61, in request
[2m[36m(worker pid=48600)[0m     return session.request(method=method, url=url, **kwargs)
[2m[36m(worker pid=48600)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 528, in request
[2m[36m(worker pid=48600)[0m     prep = self.prepare_request(req)
[2m

[2m[36m(worker pid=48598)[0m ERROR:root:Failed to crawl: None
[2m[36m(worker pid=48598)[0m Traceback (most recent call last):
[2m[36m(worker pid=48598)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 33, in run
[2m[36m(worker pid=48598)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 24, in crawl
[2m[36m(worker pid=48598)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 8, in download_url
[2m[36m(worker pid=48598)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/api.py", line 76, in get
[2m[36m(worker pid=48598)[0m     return request('get', url, params=params, **kwargs)
[2m[36m(worker pid=48598)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/api.py", line 61, in request
[2m[36m(worker pid=48598)[0m     return session.request(method=method, url=url, **kwargs)
[2m[36m(worker pid=48598)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 528, in request
[2m[36m

[2m[36m(worker pid=48598)[0m     prep = self.prepare_request(req)
[2m[36m(worker pid=48598)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/sessions.py", line 466, in prepare_request
[2m[36m(worker pid=48598)[0m     hooks=merge_hooks(request.hooks, self.hooks),
[2m[36m(worker pid=48598)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/models.py", line 316, in prepare
[2m[36m(worker pid=48598)[0m     self.prepare_url(url, params)
[2m[36m(worker pid=48598)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/models.py", line 390, in prepare_url
[2m[36m(worker pid=48598)[0m     raise MissingSchema(error)
[2m[36m(worker pid=48598)[0m requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?
[2m[36m(worker pid=48598)[0m ERROR:root:Crawling: None
[2m[36m(worker pid=48598)[0m ERROR:root:Failed to crawl: None
[2m[36m(worker pid=48598)[0m Trace

[2m[36m(worker pid=48602)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/models.py", line 316, in prepare
[2m[36m(worker pid=48602)[0m     self.prepare_url(url, params)
[2m[36m(worker pid=48602)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/models.py", line 390, in prepare_url
[2m[36m(worker pid=48602)[0m     raise MissingSchema(error)
[2m[36m(worker pid=48602)[0m requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?
[2m[36m(worker pid=48602)[0m ERROR:root:Crawling: None
[2m[36m(worker pid=48602)[0m ERROR:root:Failed to crawl: None
[2m[36m(worker pid=48602)[0m Traceback (most recent call last):
[2m[36m(worker pid=48602)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 33, in run
[2m[36m(worker pid=48602)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 24, in crawl
[2m[36m(worker pid=48602)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 8, in 

[2m[36m(worker pid=48602)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/models.py", line 316, in prepare
[2m[36m(worker pid=48602)[0m     self.prepare_url(url, params)
[2m[36m(worker pid=48602)[0m   File "/Users/pcmoritz/anaconda3/lib/python3.7/site-packages/requests/models.py", line 390, in prepare_url
[2m[36m(worker pid=48602)[0m     raise MissingSchema(error)
[2m[36m(worker pid=48602)[0m requests.exceptions.MissingSchema: Invalid URL 'None': No schema supplied. Perhaps you meant http://None?
[2m[36m(worker pid=48599)[0m ERROR:root:Failed to crawl: None
[2m[36m(worker pid=48599)[0m Traceback (most recent call last):
[2m[36m(worker pid=48599)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 33, in run
[2m[36m(worker pid=48599)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 24, in crawl
[2m[36m(worker pid=48599)[0m   File "<ipython-input-37-6db0c4f2e1b0>", line 8, in download_url
[2m[36m(worker pid=48599)[0m   File "/User

[2m[36m(worker pid=48600)[0m ERROR:root:Crawling: https://en.wikipedia.org/wiki/File:Portrait_of_Henry,_Duke_of_Lancaster_-_William_Bruges%27s_Garter_Book_(c.1440-1450),_f.8_-_BL_Stowe_MS_594_(cropped).jpg
[2m[36m(worker pid=48600)[0m ERROR:root:Crawling: https://en.wikipedia.org/wiki/Lancaster%27s_Normandy_chevauch%C3%A9e_of_1356
[2m[36m(worker pid=48556)[0m ERROR:root:Crawling: https://en.wikipedia.org/wiki/Wikipedia:Today%27s_featured_article/June_2022
[2m[36m(worker pid=48598)[0m ERROR:root:Crawling: https://en.wikipedia.org/wiki/Henry_of_Grosmont,_1st_Duke_of_Lancaster
[2m[36m(worker pid=48602)[0m ERROR:root:Crawling: https://en.wikipedia.org/wiki/Hundred_Years%27_War
[2m[36m(worker pid=48600)[0m ERROR:root:Crawling: https://en.wikipedia.org/wiki/Chevauch%C3%A9e
[2m[36m(worker pid=48556)[0m ERROR:root:Crawling: https://lists.wikimedia.org/postorius/lists/daily-article-l.lists.wikimedia.org/
[2m[36m(worker pid=48556)[0m ERROR:root:Crawling: https://en.wikipe

KeyboardInterrupt: 

In [24]:
visited_urls = RaySet.remote()
urls_to_visit = RayQueue.remote(['https://en.wikipedia.org/'])
ray.get(urls_to_visit.pop.remote())

'https://en.wikipedia.org/'

We use the ``@ray.remote`` decorator to create a Ray task.
A task is like a function, except the result is returned asynchronously.

It also may not run on the local machine, it may run elsewhere in the cluster.
This way you can run multiple tasks in parallel,
beyond the limit of the number of processors you can have in a single machine.

In [4]:
@ray.remote
def pi4_sample(sample_count):
    """pi4_sample runs sample_count experiments, and returns the 
    fraction of time it was inside the circle. 
    """
    in_count = 0
    for i in range(sample_count):
        x = random.random()
        y = random.random()
        if x*x + y*y <= 1:
            in_count += 1
    return Fraction(in_count, sample_count)


To get the result of a future, we use ray.get() which 
blocks until the result is complete. 

In [5]:
SAMPLE_COUNT = 1000 * 1000
start = time.time() 
future = pi4_sample.remote(sample_count = SAMPLE_COUNT)
pi4 = ray.get(future)
end = time.time()
dur = end - start
print(f'Running {SAMPLE_COUNT} tests took {dur} seconds')

Running 1000000 tests took 1.4935967922210693 seconds


Now let's see how good our approximation is.

In [7]:
pi = pi4 * 4

In [8]:
float(pi)

3.143024

In [9]:
abs(pi-math.pi)/pi

0.0004554042254233261

Meh. A little off -- that's barely 4 decimal places.
Why don't we do it a 100,000 times as much? Let's do 100 billion!

In [10]:
FULL_SAMPLE_COUNT = 100 * 1000 * 1000 * 1000 # 100 billion samples! 
BATCHES = int(FULL_SAMPLE_COUNT / SAMPLE_COUNT)
print(f'Doing {BATCHES} batches')
results = []
for _ in range(BATCHES):
    results.append(pi4_sample.remote())
output = ray.get(results)

Doing 100000 batches


Notice that in the above, we generated a list with 100,000 futures.
Now all we do is have to do is wait for the result.

Depending on your ray cluster's size, this might take a few minutes.
But to give you some idea, if we were to do it on a single machine,
when I ran this it took 0.4 seconds.

On a single core, that means we're looking at 0.4 * 100000 = about 11 hours. 

Here's what the Dashboard looks like: 

![View of the dashboard](../images/dashboard.png)

So now, rather than just a single core working on this,
I have 168 working on the task together. And its ~80% efficient.

In [12]:
pi = sum(output)*4/len(output)

In [13]:
float(pi)

3.14159518188

In [14]:
abs(pi-math.pi)/pi

8.047791203506436e-07

Not bad at all -- we're off by a millionth. 