<a href="https://colab.research.google.com/github/nhwhite212/DealingwithDataSpring2021/blob/master/4-UNIX_Basics/B-Fetching_Data_Using_CURL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Getting Data using CURL
-----------------------

We now move into a more interesting topic: How to get data from Internet sources. For that, we will use a command-line tool of Unix, called `curl`. (Later in class, we will learn how to achieve the same using Python, but for quick testing, curl is often the standard method used.) We will also use a tool called `jq` to interact with JSON output. (Do not worry, we will revisit both these later in class.)

_Often, curl and jq do not come preinstalled, so the first time that we use them, we need to issue the appropriate command for installing it.  To install it, simply type:_

In [2]:
!sudo apt-get -y install curl
!sudo apt-get -y install jq

Reading package lists... Done
Building dependency tree       
Reading state information... Done
curl is already the newest version (7.58.0-2ubuntu3.12).
0 upgraded, 0 newly installed, 0 to remove and 30 not upgraded.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following additional packages will be installed:
  libjq1 libonig4
The following NEW packages will be installed:
  jq libjq1 libonig4
0 upgraded, 3 newly installed, 0 to remove and 30 not upgraded.
Need to get 276 kB of archives.
After this operation, 930 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libonig4 amd64 6.7.0-1 [119 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libjq1 amd64 1.5+dfsg-2 [111 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic/universe amd64 jq amd64 1.5+dfsg-2 [45.6 kB]
Fetched 276 kB in 1s (264 kB/s)
debconf: unable to initialize frontend: Dialog
debconf: (No usable dialog-l

Let's start by retrieving a simple text file, which we will use later in the class, to illustrate how different shell commands work. The sample data file is hosted online. You can use terminal commands to copy this remote file. Simply type:

In [None]:
!curl -L 'http://pages.stern.nyu.edu/~nwhite/DealingwithDataSpring2021/sample.txt'

123	1346699925	11122	foo bar
222	1346699955	11145	biz baz
140	1346710000	11122	hee haw
234	1346700000	11135	bip bop
146	1346699999	11123	foo bar
99	1346750000	11135	bip bop
99	1346750000	11135	bip bop


The columns in this tab-separated data correspond to [order id] [time of order] [user id] [ordered item], something similar to what might be encountered in practice. If you wish, you can copy-paste the data written above into a text editor, making sure there is a newline following each of the ordered item columns (the columns with alphabetic characters).

To store the output to a file, we also add the `-o [output file]` in the command. (We are also going to see in the next session how to use _output redirection_ to store the output to a file.)

In [None]:
!curl -L 'http://pages.stern.nyu.edu/~nwhite/DealingwithDataSpring2021/sample.txt' -o sample.txt

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100   201  100   201    0     0   2284      0 --:--:-- --:--:-- --:--:--  2258


In [None]:
!ls -al 

total 20
drwxr-xr-x 1 root root 4096 Apr  4 14:48 .
drwxr-xr-x 1 root root 4096 Apr  4 14:44 ..
drwxr-xr-x 4 root root 4096 Mar 25 13:38 .config
drwxr-xr-x 1 root root 4096 Mar 25 13:38 sample_data
-rw-r--r-- 1 root root  201 Apr  4 14:48 sample.txt


This will pull the file to our home directory, creating a new file called `sample.txt`. If we do not want to see any statistics about the download, we can use the `-s` option:

In [None]:
!curl  -s -L 'http://pages.stern.nyu.edu/~nwhite/DealingwithDataSpring2021/sample.txt' -o data/sample.txt
!ls -al  

total 20
drwxr-xr-x 1 root root 4096 Apr  4 14:48 .
drwxr-xr-x 1 root root 4096 Apr  4 14:44 ..
drwxr-xr-x 4 root root 4096 Mar 25 13:38 .config
drwxr-xr-x 1 root root 4096 Mar 25 13:38 sample_data
-rw-r--r-- 1 root root  201 Apr  4 14:48 sample.txt


Now, let's try to use curl to get access to some real data. A key component of today's data ecosystem is the existence of `Web APIs` which provide functionality for a variety of tasks.

#### Where am I?

For example, let's try to figure out programmatically the location of a computer by it's IP address. We can access the API call by issuing the following command: (128.122.85.5 is an Stern server's IP)



In [None]:
!curl -s "http://api.ipstack.com/128.122.85.5?access_key=c2192e9aa79a13153a328f383b810862"|jq

[1;39m{
  [0m[34;1m"ip"[0m[1;39m: [0m[0;32m"128.122.85.5"[0m[1;39m,
  [0m[34;1m"type"[0m[1;39m: [0m[0;32m"ipv4"[0m[1;39m,
  [0m[34;1m"continent_code"[0m[1;39m: [0m[0;32m"NA"[0m[1;39m,
  [0m[34;1m"continent_name"[0m[1;39m: [0m[0;32m"North America"[0m[1;39m,
  [0m[34;1m"country_code"[0m[1;39m: [0m[0;32m"US"[0m[1;39m,
  [0m[34;1m"country_name"[0m[1;39m: [0m[0;32m"United States"[0m[1;39m,
  [0m[34;1m"region_code"[0m[1;39m: [0m[0;32m"NY"[0m[1;39m,
  [0m[34;1m"region_name"[0m[1;39m: [0m[0;32m"New York"[0m[1;39m,
  [0m[34;1m"city"[0m[1;39m: [0m[0;32m"Manhattan"[0m[1;39m,
  [0m[34;1m"zip"[0m[1;39m: [0m[0;32m"10003"[0m[1;39m,
  [0m[34;1m"latitude"[0m[1;39m: [0m[0;39m40.73139190673828[0m[1;39m,
  [0m[34;1m"longitude"[0m[1;39m: [0m[0;39m-73.9884033203125[0m[1;39m,
  [0m[34;1m"location"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"geoname_id"[0m[1;39m: [0m[0;39m5125771[0m[1;39m,
    [0m[34;1m"cap

While this does not look nice to a human, for a computer is a perfectly legitimate answer. This format is called "JSON", and is an efficient and very commonly used way to transfer data today on the Internet.
jq controls the presentation by taking the output and formatting it as json.

Now, let's examine a few more web APIs, just for fun:

#### What's the weather?

Now, let's use the OpenWeather API to get the weather details in our location. (The details of the API calls are available at http://openweathermap.org/api.)

In [None]:
!curl -s "http://api.openweathermap.org/data/2.5/weather?\
&appid=ffb7b9808e07c9135bdcc7d1e867253d\
&q=New%20York,NY,USA\
&units=imperial\
&mode=json"|jq .

[1;39m{
  [0m[34;1m"coord"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"lon"[0m[1;39m: [0m[0;39m-73.9866[0m[1;39m,
    [0m[34;1m"lat"[0m[1;39m: [0m[0;39m40.7306[0m[1;39m
  [1;39m}[0m[1;39m,
  [0m[34;1m"weather"[0m[1;39m: [0m[1;39m[
    [1;39m{
      [0m[34;1m"id"[0m[1;39m: [0m[0;39m804[0m[1;39m,
      [0m[34;1m"main"[0m[1;39m: [0m[0;32m"Clouds"[0m[1;39m,
      [0m[34;1m"description"[0m[1;39m: [0m[0;32m"overcast clouds"[0m[1;39m,
      [0m[34;1m"icon"[0m[1;39m: [0m[0;32m"04d"[0m[1;39m
    [1;39m}[0m[1;39m
  [1;39m][0m[1;39m,
  [0m[34;1m"base"[0m[1;39m: [0m[0;32m"stations"[0m[1;39m,
  [0m[34;1m"main"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"temp"[0m[1;39m: [0m[0;39m49.91[0m[1;39m,
    [0m[34;1m"feels_like"[0m[1;39m: [0m[0;39m48.36[0m[1;39m,
    [0m[34;1m"temp_min"[0m[1;39m: [0m[0;39m48[0m[1;39m,
    [0m[34;1m"temp_max"[0m[1;39m: [0m[0;39m52[0m[1;39m,
    [0m[34;1m"pressure"[0m[1;39m: 

You will notice that we asked the service to return to us the data in JSON format. For that API, we can also ask the data to be returned in a different format, called XML, which is wordier. (We will get back to these formats later in the semester.)

In [None]:
!curl -s "http://api.openweathermap.org/data/2.5/weather?\
&q=New%20York,NY,USA\
&units=imperial\
&mode=xml\
&appid=ffb7b9808e07c9135bdcc7d1e867253d"

<?xml version="1.0" encoding="UTF-8"?>
<current><city id="5128581" name="New York"><coord lon="-73.9866" lat="40.7306"></coord><country>US</country><timezone>-14400</timezone><sun rise="2021-04-04T10:33:59" set="2021-04-04T23:23:19"></sun></city><temperature value="49.91" min="48" max="52" unit="fahrenheit"></temperature><feels_like value="48.36" unit="fahrenheit"></feels_like><humidity value="46" unit="%"></humidity><pressure value="1020" unit="hPa"></pressure><wind><speed value="4.61" unit="mph" name="Light breeze"></speed><gusts></gusts><direction value="270" code="W" name="West"></direction></wind><clouds value="90" name="overcast clouds"></clouds><visibility value="10000"></visibility><precipitation mode="no"></precipitation><weather number="804" value="overcast clouds" icon="04d"></weather><lastupdate value="2021-04-04T14:49:28"></lastupdate></current>

#### What's the sentiment?

Now let's try to use a web service to automatically analyze the sentiment for a piece of text. (The service comes from the [IBM's Alchemy API](http://www.alchemyapi.com/api/sentiment/textc.html#textsentiment))

Note that you can register for a free account at IBM Cloud Services, and try out many of their 
machine learning/data science api's.
https://www.ibm.com/cloud/

In [None]:
#  Create the parameter file for the call to watsons natural language service
!echo "{" >parameters.json
!echo  "   \"text\":\"I think that IBM watson is a wonderful service.\"," >>parameters.json
!echo "    \"features\":{" >>parameters.json
!echo "      \"entities\": {" >>parameters.json
!echo "        \"emotion\": true," >>parameters.json
!echo "        \"sentiment\": true," >> parameters.json
!echo "        \"limit\":2" >>parameters.json
!echo "      }," >>parameters.json
!echo "     \"keywords\": {" >>parameters.json
!echo "         \"emotion\":true," >>parameters.json
!echo "         \"sentiment\":true," >>parameters.json
!echo "         \"limit\": 2" >> parameters.json
!echo "    }" >> parameters.json
!echo "   }"  >> parameters.json
!echo "}" >> parameters.json
!cat parameters.json

{
   "text":"I think that IBM watson is a wonderful service.",
    "features":{
      "entities": {
        "emotion": true,
        "sentiment": true,
        "limit":2
      },
     "keywords": {
         "emotion":true,
         "sentiment":true,
         "limit": 2
    }
   }
}


In [None]:
#!curl -s "https://gateway-a.watsonplatform.net/calls/text/TextGetTextSentiment" \
#-d "outputMode=json" \
#-d "apikey=4b46c7859a7be311b6f9389b12504e302cac0a55" \
#-d "text=I did not dislike it. " 
!curl -X POST -H "content-Type: application/json"   \
-u "apikey":"JjBk20E2nz3jxv9tUkasj1CIcyPOwhcb1uhMhkXdTWEn" \
    -d @parameters.json  \
    "https://gateway.watsonplatform.net/natural-language-understanding/api/v1/analyze?version=2018-11-16"


!curl -G -u  "apikey":"JjBk20E2nz3jxv9tUkasj1CIcyPOwhcb1uhMhkXdTWEn" -d "version=2018-11-16" -d "url=pages.stern.nyu.edu/~nwhite" -d "features=keywords,entities" -d "entities.emotion=true" -d "entities.sentiment=true" -d "keywords.emotion=true" -d "keywords.sentiment=true" "https://gateway.watsonplatform.net/natural-language-understanding/api/v1/analyze"      




{
  "usage": {
    "text_units": 1,
    "text_characters": 47,
    "features": 2
  },
  "language": "en",
  "keywords": [
    {
      "text": "IBM watson",
      "sentiment": {
        "score": 0.983705,
        "label": "positive"
      },
      "relevance": 0.844343,
      "emotion": {
        "sadness": 0.006155,
        "joy": 0.113473,
        "fear": 0.002814,
        "disgust": 0.008891,
        "anger": 0.022145
      },
      "count": 1
    },
    {
      "text": "wonderful service",
      "sentiment": {
        "score": 0.983705,
        "label": "positive"
      },
      "relevance": 0.155657,
      "emotion": {
        "sadness": 0.006155,
        "joy": 0.113473,
        "fear": 0.002814,
        "disgust": 0.008891,
        "anger": 0.022145
      },
      "count": 1
    }
  ],
  "entities": [
    {
      "type": "Company",
      "text": "IBM",
      "sentiment": {
        "score": 0.728278,
        "label": "positive"
      },
      "relevance": 0.33,
      "emotion": {


## Exercise

The following websites contain listing of many useful APIs

* https://github.com/public-apis/public-apis
* https://www.programmableweb.com/category/all/apis
* http://www.mashery.com/
* http://apigee.com/ 


#### Your task: search through public APIs  and find a web API that does something that you like. Use CURL to issue a web API call to this service. Note, You may need to create an account to get a key to use with your application. Here is an example:

In [5]:
!curl  -s 'https://quote-garden.herokuapp.com/api/v3/quotes'|jq


[1;39m{
  [0m[34;1m"statusCode"[0m[1;39m: [0m[0;39m200[0m[1;39m,
  [0m[34;1m"message"[0m[1;39m: [0m[0;32m"Quotes"[0m[1;39m,
  [0m[34;1m"pagination"[0m[1;39m: [0m[1;39m{
    [0m[34;1m"currentPage"[0m[1;39m: [0m[0;39m1[0m[1;39m,
    [0m[34;1m"nextPage"[0m[1;39m: [0m[0;39m2[0m[1;39m,
    [0m[34;1m"totalPages"[0m[1;39m: [0m[0;39m7268[0m[1;39m
  [1;39m}[0m[1;39m,
  [0m[34;1m"totalQuotes"[0m[1;39m: [0m[0;39m72672[0m[1;39m,
  [0m[34;1m"data"[0m[1;39m: [0m[1;39m[
    [1;39m{
      [0m[34;1m"_id"[0m[1;39m: [0m[0;32m"5eb17aadb69dc744b4e70d23"[0m[1;39m,
      [0m[34;1m"quoteText"[0m[1;39m: [0m[0;32m"Age is an issue of mind over matter. If you don't mind, it doesn't matter."[0m[1;39m,
      [0m[34;1m"quoteAuthor"[0m[1;39m: [0m[0;32m"Mark Twain"[0m[1;39m,
      [0m[34;1m"quoteGenre"[0m[1;39m: [0m[0;32m"age"[0m[1;39m,
      [0m[34;1m"__v"[0m[1;39m: [0m[0;39m0[0m[1;39m
    [1;39m}[0m[1;39m,
   