Getting Data using CURL
-----------------------

We now move into a more interesting topic: How to get data from Internet sources. For that, we will use a command-line tool of Unix, called `curl`. (Later in class, we will learn how to achieve the same using Python, but for quick testing, curl is often the standard method used.) 

_Sometimes, curl does not come preinstalled, so the first time that we use it, we need to issue the appropriate command for installing it.  To install it, simply type:_

`$ sudo apt-get install curl`

In [None]:
!sudo apt-get install curl

Let's start by retrieving a simple text file, which we will use later in the class, to illustrate how different shell commands work. The sample data file is hosted online. You can use terminal commands to copy this remote file. Simply type:

In [1]:
!curl -L 'https://dl.dropboxusercontent.com/u/16006464/IPDS/sample.txt'

123	1346699925	11122	foo bar
222	1346699955	11145	biz baz
140	1346710000	11122	hee haw
234	1346700000	11135	bip bop
146	1346699999	11123	foo bar
99	1346750000	11135	bip bop
99	1346750000	11135	bip bop


The columns in this tab-separated data correspond to [order id] [time of order] [user id] [ordered item], something similar to what might be encountered in practice. If you wish, you can copy-paste the data written above into a text editor, making sure there is a newline following each of the ordered item columns (the columns with alphabetic characters).

To store the output to a file, we also add the `-o [output file]` in the command. (We are also going to see in the next session how to use _output redirection_ to store the output to a file.)

In [2]:
!curl -L 'https://dl.dropboxusercontent.com/u/16006464/IPDS/sample.txt' -o /home/ubuntu/data/sample.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   201  100   201    0     0    961      0 --:--:-- --:--:-- --:--:--   966


In [4]:
!curl "http://www.nyu.edu" -o nyu.html

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 39861  100 39861    0     0   411k      0 --:--:-- --:--:-- --:--:--  414k


In [3]:
!ls /home/ubuntu/data/

bank-names.txt	facebook.sql		 restaurant-names.txt  sample.txt
baseball.csv	hospital-discharges.csv  restaurants.csv       tips.csv
companies.txt	imdb.sql		 restaurants.xlsx      titanic.xls


This will pull the file to the directory `/home/ubuntu/data/`, creating a new file called `sample.txt`. If we do not want to see any statistics about the download, we can use the `-s` option:

In [None]:
!curl  -s -L 'https://dl.dropboxusercontent.com/u/16006464/IPDS/sample.txt' -o /home/ubuntu/data/sample.txt

Now, let's try to use curl to get access to some real data. A key component of today's data ecosystem is the existence of `Web APIs` which provide functionality for a variety of tasks.

#### Where am I?

For example, let's try to figure out programmatically the location of the computer where the iPython server is running. We can access the API call by issuing the following command:



In [6]:
!curl -s "http://freegeoip.net/json/" | jq .

[37m{
  [0m[34;1m"metro_code"[0m[37m: [0m[0m511[0m[37m,
  [0m[34;1m"longitude"[0m[37m: [0m[0m-77.4838[0m[37m,
  [0m[34;1m"latitude"[0m[37m: [0m[0m39.0335[0m[37m,
  [0m[34;1m"ip"[0m[37m: [0m[32m"54.174.159.22"[0m[37m,
  [0m[34;1m"country_code"[0m[37m: [0m[32m"US"[0m[37m,
  [0m[34;1m"country_name"[0m[37m: [0m[32m"United States"[0m[37m,
  [0m[34;1m"region_code"[0m[37m: [0m[32m"VA"[0m[37m,
  [0m[34;1m"region_name"[0m[37m: [0m[32m"Virginia"[0m[37m,
  [0m[34;1m"city"[0m[37m: [0m[32m"Ashburn"[0m[37m,
  [0m[34;1m"zip_code"[0m[37m: [0m[32m"20147"[0m[37m,
  [0m[34;1m"time_zone"[0m[37m: [0m[32m"America/New_York"[0m[37m
[37m}[0m


While this does not look nice to a human, for a computer is a perfectly legitimate answer. This format is called "JSON", and is an efficient and very commonly used way to trasfer data today on the Internet.
| jq controls presentation

Now, let's examine a few more web APIs, just for fun:

#### What's the weather?

Now, let's use the OpenWeather API to get the weather details in our location. (The details of the API calls are available at http://openweathermap.org/api.)

In [7]:
!curl -s "http://api.openweathermap.org/data/2.5/weather?q=New%20York,NY,USA&units=imperial&mode=json&appid=ffb7b9808e07c9135bdcc7d1e867253d" | jq .

[37m{
  [0m[34;1m"cod"[0m[37m: [0m[0m200[0m[37m,
  [0m[34;1m"name"[0m[37m: [0m[32m"New York"[0m[37m,
  [0m[34;1m"id"[0m[37m: [0m[0m5128581[0m[37m,
  [0m[34;1m"coord"[0m[37m: [0m[37m{
    [0m[34;1m"lat"[0m[37m: [0m[0m40.71[0m[37m,
    [0m[34;1m"lon"[0m[37m: [0m[0m-74.01[0m[37m
  [37m}[0m[37m,
  [0m[34;1m"weather"[0m[37m: [0m[37m[
    [37m{
      [0m[34;1m"icon"[0m[37m: [0m[32m"50d"[0m[37m,
      [0m[34;1m"description"[0m[37m: [0m[32m"haze"[0m[37m,
      [0m[34;1m"main"[0m[37m: [0m[32m"Haze"[0m[37m,
      [0m[34;1m"id"[0m[37m: [0m[0m721[0m[37m
    [37m}[0m[37m
  [37m][0m[37m,
  [0m[34;1m"base"[0m[37m: [0m[32m"cmc stations"[0m[37m,
  [0m[34;1m"main"[0m[37m: [0m[37m{
    [0m[34;1m"temp_max"[0m[37m: [0m[0m30.2[0m[37m,
    [0m[34;1m"temp_min"[0m[37m: [0m[0m28.4[0m[37m,
    [0m[34;1m"humidity"[0m[37m: [0m[0m42[0m[37m,
    [0m[34;1m"pressur

You will notice that we asked the service to return to us the data in JSON format. For that API, we can also ask the data to be returned in a different format, called XML, which is wordlier. (We will get back to these formats later in the semester.)

In [13]:
!curl -s "http://api.openweathermap.org/data/2.5/weather?q=New%20York,NY,USA&units=imperial&mode=xml&appid=ffb7b9808e07c9135bdcc7d1e867253d"

<current><city id="5128581" name="New York"><coord lon="-74.01" lat="40.71"></coord><country>US</country><sun rise="2016-01-24T12:12:36" set="2016-01-24T22:04:16"></sun></city><temperature value="29.23" min="28.4" max="30.2" unit="fahrenheit"></temperature><humidity value="42" unit="%"></humidity><pressure value="1015" unit="hPa"></pressure><wind><speed value="11.02" name="Strong breeze"></speed><gusts></gusts><direction value="300" code="WNW" name="West-northwest"></direction></wind><clouds value="20" name="few clouds"></clouds><visibility></visibility><precipitation mode="no"></precipitation><weather number="721" value="haze" icon="50d"></weather><lastupdate value="2016-01-24T18:13:05"></lastupdate></current>


#### What's the sentiment?

Now let's try to use a web service to automatically analyze the sentiment for a piece of text. (The service comes from the [IBM's Alchemy API](http://www.alchemyapi.com/api/sentiment/textc.html#textsentiment))

In [19]:
!curl "http://access.alchemyapi.com/calls/text/TextGetTextSentiment" \
-d "outputMode=json" \
-d "apikey=3d0b6858f7ef32fdf27ad402f4a9c270c9685d84" \
-d "text=I did not dislike it. " | jq .

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   454  100   362  100    92   2487    632 --:--:-- --:--:-- --:--:--  2496
[37m{
  [0m[34;1m"docSentiment"[0m[37m: [0m[37m{
    [0m[34;1m"type"[0m[37m: [0m[32m"negative"[0m[37m,
    [0m[34;1m"score"[0m[37m: [0m[32m"-0.254732"[0m[37m
  [37m}[0m[37m,
  [0m[34;1m"language"[0m[37m: [0m[32m"english"[0m[37m,
  [0m[34;1m"totalTransactions"[0m[37m: [0m[32m"1"[0m[37m,
  [0m[34;1m"usage"[0m[37m: [0m[32m"By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html"[0m[37m,
  [0m[34;1m"status"[0m[37m: [0m[32m"OK"[0m[37m
[37m}[0m


#### And a few synonyms

And now just a demo of a web API that I created myself a few years back. It analyzes Wikipedia to figure out different ways that people use to refer to the same entity



In [23]:
!curl -s "http://wikisynonyms.ipeirotis.com/api/Hillary_Clinton" | jq .

[37m{
  [0m[34;1m"terms"[0m[37m: [0m[37m[
    [37m{
      [0m[34;1m"oskill"[0m[37m: [0m[0m0[0m[37m,
      [0m[34;1m"canonical"[0m[37m: [0m[0m1[0m[37m,
      [0m[34;1m"term"[0m[37m: [0m[32m"Hillary Rodham Clinton"[0m[37m
    [37m}[0m[37m,
    [37m{
      [0m[34;1m"oskill"[0m[37m: [0m[0m0[0m[37m,
      [0m[34;1m"canonical"[0m[37m: [0m[0m0[0m[37m,
      [0m[34;1m"term"[0m[37m: [0m[32m"Hillary R. Clinton"[0m[37m
    [37m}[0m[37m,
    [37m{
      [0m[34;1m"oskill"[0m[37m: [0m[0m0[0m[37m,
      [0m[34;1m"canonical"[0m[37m: [0m[0m0[0m[37m,
      [0m[34;1m"term"[0m[37m: [0m[32m"Hilary clinton"[0m[37m
    [37m}[0m[37m,
    [37m{
      [0m[34;1m"oskill"[0m[37m: [0m[0m0[0m[37m,
      [0m[34;1m"canonical"[0m[37m: [0m[0m0[0m[37m,
      [0m[34;1m"term"[0m[37m: [0m[32m"Hilary Clinton"[0m[37m
    [37m}[0m[37m,
    [37m{
      [0m[34;1m"oskill"[0m[37m: [0m

## Exercise

The following websites contain listing of many useful APIs

* https://www.mashape.com 
* http://www.programmableweb.com/
* http://www.mashery.com/
* http://apigee.com/ 

Mashape is my own personal favorite in terms of user-friendliness and also has examples directly expressed using CURL. but the others are pretty nice as well. Your task: search through these websites and find a web API that does something that you like. Use CURL to issue a web API call to this service. 

In [29]:
!curl --get -s 'https://drrobotmck-nyc-health-inspection-results-v1.p.mashape.com/restaurants?boro=manhattan&dba=momofuku' \
  -H 'X-Mashape-Key: zG3wec50exmshxNoF1NMHNRH37GYp1d7oW8jsnWwIMTeMmALxg' | jq .

[37m{
  [0m[34;1m"response"[0m[37m: [0m[37m{
    [0m[34;1m"query_params"[0m[37m: [0m[37m{
      [0m[34;1m"controller"[0m[37m: [0m[32m"api/restaurants"[0m[37m,
      [0m[34;1m"action"[0m[37m: [0m[32m"index"[0m[37m,
      [0m[34;1m"format"[0m[37m: [0m[32m"json"[0m[37m,
      [0m[34;1m"boro"[0m[37m: [0m[32m"manhattan"[0m[37m,
      [0m[34;1m"dba"[0m[37m: [0m[32m"momofuku"[0m[37m
    [37m}[0m[37m,
    [0m[34;1m"data"[0m[37m: [0m[37m[
      [37m{
        [0m[34;1m"last_updated"[0m[37m: [0m[32m"2014-09-27T19:42:02.657Z"[0m[37m,
        [0m[34;1m"total_inspections"[0m[37m: [0m[0m5[0m[37m,
        [0m[34;1m"cuisine_type"[0m[37m: [0m[32m"American"[0m[37m,
        [0m[34;1m"cuisine_code"[0m[37m: [0m[32m"03"[0m[37m,
        [0m[34;1m"boro_name"[0m[37m: [0m[32m"Manhattan"[0m[37m,
        [0m[34;1m"boro_code"[0m[37m: [0m[0m1[0m[37m,
        [0m[34;1m"id"[0m[37m: [0m[