`@async` is your friend
===

Example 1: Sleep Sort
---
The [sleep sort algorithm](http://www.hemingwayapp.com/) sorts an array of real values. Each value represents the amount of time to `sleep` before it can be `push!`ed onto the sorted array. It doesn't always work. It depends on how accurate and precise `sleep` is, and how the implementor parallelizes the `sleeps`. Consider an unsorted array containing 0.1 and 0.11, representing seconds. It is possible that 0.11 comes before 0.1 in the "sorted" array. Maybe the garbage collector ran; introduced some jitter; and, both sleeps woke up, simultanously. 

But, it's a fun and illustrative example. The following `sleep_sort` uses `@async`. I'll explain it after demonstrating that it works (in the sense of sleep sort). 

In [1]:
function sleep_sort(items)
    result = []
    @sync for x in xs
        @async begin
            sleep(x)
            push!(result, x)
        end
    end
    result
end

sleep_sort (generic function with 1 method)

In [2]:
xs = rand(1:10, 5)' # Ignore the transpose. It's just for presentation.

1x5 Array{Int64,2}:
 2  5  3  2  5

In [3]:
sleep_sort(xs)' # Ignore the transpose. It's just for presentation.

1x5 Array{Any,2}:
 2  2  3  5  5

You may assume that the `@async` macro created a thread or `forked()` a process. It didn't. Only one process executed each `@async`d block -- the main one. The `@async` block and the main process form a cooperative agreement. When the `@async` block has something to do, it can. But, if it's waiting for things to happen, it gives control back to the main process. 

In this example, `sleep()`ing is "doing nothing." When it wakes up, it has something to do. 

Example 2: Fetching IP Addresses
---

Let's say your doing a study on where (geographically) major websites locate their servers. To do so, you need to grab their IP addresses.[1] In this example, I grab the top 100, according to an [Alexa](http://www.alexa.com/) snapshot. For each domain, I call `getaddrinfo`. As [the manual warns](http://julia.readthedocs.org/en/latest/stdlib/io-network/#Base.getaddrinfo), this may require a DNS lookup. This matters because that can be a relatively slow operation. But, it's also a blocking one. That is, it can signal when it is just waiting around.

In [5]:
# Remove the following error if you want to run this notebook yourself. Your 
# employer probably frowns upon DNS requests to `xhamster.com`.
error("!NSFW Gaurd!")

top_100 = [
    "google.com", "facebook.com", "youtube.com", "baidu.com", 
    "yahoo.com", "amazon.com", "wikipedia.org", "qq.com", 
    "twitter.com", "google.co.in", "live.com", "taobao.com", 
    "sina.com.cn", "linkedin.com", "yahoo.co.jp", "weibo.com", 
    "ebay.com", "google.co.jp", "yandex.ru", "vk.com", 
    "hao123.com", "blogspot.com", "t.co", "bing.com", 
    "google.de", "instagram.com", "aliexpress.com", 
    "msn.com", "amazon.co.jp", "google.co.uk", "reddit.com", 
    "ask.com", "pinterest.com", "google.com.br", "google.fr", 
    "wordpress.com", "tmall.com", "onclickads.net", "paypal.com", 
    "mail.ru", "microsoft.com", "sohu.com", "tumblr.com", 
    "imgur.com", "google.ru", "xvideos.com", "imdb.com", 
    "apple.com", "google.it", "fc2.com", "google.es", 
    "googleadservices.com", "netflix.com", "amazon.de", 
    "360.cn", "stackoverflow.com", "tianya.cn", "craigslist.org", 
    "alibaba.com", "ok.ru", "google.com.mx", "google.ca", 
    "gmw.cn", "google.com.hk", "pornhub.com", "naver.com", 
    "diply.com", "amazon.co.uk", "rakuten.co.jp", "go.com", 
    "xhamster.com", "blogger.com", "kat.cr", "outbrain.com", 
    "cnn.com", "adcash.com", "soso.com", "google.com.tr", 
    "nicovideo.jp", "xinhuanet.com", "amazon.in", 
    "flipkart.com", "cntv.cn", "google.co.id", "booking.com", 
    "people.com.cn", "bbc.co.uk", "github.com", 
    "googleusercontent.com", "pixnet.net", "google.com.au", 
    "dropbox.com", "google.co.kr", "espn.go.com", "google.pl",
    "ebay.de", "popads.net", "dailymotion.com", "livedoor.jp", 
    "ebay.co.uk"
];

### Serial Version


In [6]:
fetch_ip(domain) = try getaddrinfo(domain) catch ip"0.0.0.0" end

function serial_version(domains)
    domain_to_ip = Dict{String, IPv4}()
    for domain in domains
        domain_to_ip[domain] = fetch_ip(domain)
    end
    domain_to_ip
end

function async_version(domains)
    domain_to_ip = Dict{String, IPv4}()
    @sync for domain in domains
        @async domain_to_ip[domain] = fetch_ip(domain)
    end
    domain_to_ip
end

async_version (generic function with 1 method)

The following code just helps level the playing field. I call both functions with an empty array to make sure compilation expense isn't part of `@time`. Then, I call `async_version` with the top_100, without timing it. This is to warm up the relevant DNS caches on my computer and router. Otherwise, the first called function would have an inherent disadvantage with respect to time. 

In [7]:
serial_version([]); async_version([]); async_version(top_100);

Now, the `@time`ings.

In [14]:
@time async_version(top_100);

  0.033475 seconds (2.51 k allocations: 246.438 KB)


In [15]:
@time serial_version(top_100);

  0.851270 seconds (1.29 k allocations: 42.344 KB)


The `@async`-based function is much faster. It doesn't have to wait for a slow request to finish before starting the next request. In the async case, the the total run time should be a little more than that of the slowest `getaddrinfo` call. For the serial version, the run time is the sum of all calls. 

Why Should You Care?
---
If you haven't already, you should peruse the manual on parallel processing. Async is useful in a lot of cases. For data scientists, it's often used for feeding tasks to different processes. But, comparatively, it's a more impressive solution for parallelizing operations bound by IO. As the manual points out, the underlying IO calls are asynchronous. They are just presented synchronously for convenience. Wrapping those operations via `@async` makes it work asynchronously, again. If you are writing something like a server or a web-scraper this is useful. Here, "an event-driven, non-blocking I/O model [is] lightweight and efficient."[2]

#### Footnotes

1. Actually, you have to do way more than this. Major sites are multi-homed. And, they are probably geographically dispersed, too. Take this example with a grain of salt.
2. Yes, I'm just straight up quoting [Node.js](https://nodejs.org/en/).