<a href="https://colab.research.google.com/github/nhwhite212/DealingwithDataSpring2021/blob/master/4-UNIX_Basics/B-Fetching_Data_Using_CURL.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Getting Data using CURL
-----------------------

We now move into a more interesting topic: How to get data from Internet sources. For that, we will use a command-line tool of Unix, called `curl`. (Later in class, we will learn how to achieve the same using Python, but for quick testing, curl is often the standard method used.) We will also use a tool called `jq` to interact with JSON output. (Do not worry, we will revisit both these later in class.)

_Often, curl and jq do not come preinstalled, so the first time that we use them, we need to issue the appropriate command for installing it.  To install it, simply type:_

In [None]:
#!sudo apt-get -y install curl
#!sudo apt-get -y install jq

Let's start by retrieving a simple text file, which we will use later in the class, to illustrate how different shell commands work. The sample data file is hosted online. You can use terminal commands to copy this remote file. Simply type:

In [None]:
!curl -L 'http://pages.stern.nyu.edu/~nwhite/DealingwithDataFall2018/sample.txt'

The columns in this tab-separated data correspond to [order id] [time of order] [user id] [ordered item], something similar to what might be encountered in practice. If you wish, you can copy-paste the data written above into a text editor, making sure there is a newline following each of the ordered item columns (the columns with alphabetic characters).

To store the output to a file, we also add the `-o [output file]` in the command. (We are also going to see in the next session how to use _output redirection_ to store the output to a file.)

In [None]:
!curl -L 'http://pages.stern.nyu.edu/~nwhite/DealingwithDataFall2018/sample.txt' -o data/sample.txt

In [None]:
!ls -al data/

This will pull the file to the directory 'data', creating a new file called `sample.txt`. If we do not want to see any statistics about the download, we can use the `-s` option:

In [None]:
!curl  -s -L 'http://pages.stern.nyu.edu/~nwhite/DealingwithDataFall2018/sample.txt' -o data/sample.txt
!ls -al  data/

Now, let's try to use curl to get access to some real data. A key component of today's data ecosystem is the existence of `Web APIs` which provide functionality for a variety of tasks.

#### Where am I?

For example, let's try to figure out programmatically the location of the computer where the jupyter server is running. We can access the API call by issuing the following command: (128.122.85.5 is an Stern server's IP)



In [None]:
!curl -s "http://api.ipstack.com/128.122.85.5?access_key=c2192e9aa79a13153a328f383b810862"|jq

While this does not look nice to a human, for a computer is a perfectly legitimate answer. This format is called "JSON", and is an efficient and very commonly used way to trasfer data today on the Internet.
| jq controls the presentation by taking the output and formatting it as json.

Now, let's examine a few more web APIs, just for fun:

#### What's the weather?

Now, let's use the OpenWeather API to get the weather details in our location. (The details of the API calls are available at http://openweathermap.org/api.)

In [None]:
!curl -s "http://api.openweathermap.org/data/2.5/weather?\
&appid=ffb7b9808e07c9135bdcc7d1e867253d\
&q=New%20York,NY,USA\
&units=imperial\
&mode=json"|jq .

You will notice that we asked the service to return to us the data in JSON format. For that API, we can also ask the data to be returned in a different format, called XML, which is wordlier. (We will get back to these formats later in the semester.)

In [None]:
!curl -s "http://api.openweathermap.org/data/2.5/weather?\
&q=New%20York,NY,USA\
&units=imperial\
&mode=xml\
&appid=ffb7b9808e07c9135bdcc7d1e867253d"

#### What's the sentiment?

Now let's try to use a web service to automatically analyze the sentiment for a piece of text. (The service comes from the [IBM's Alchemy API](http://www.alchemyapi.com/api/sentiment/textc.html#textsentiment))

Note that you can register for a free account at IBM Cloud Services, and try out many of their 
machine learning/data science api's.
https://www.ibm.com/cloud/

In [None]:
#  Create the parameter file for the call to watsons natural language service
!echo "{" >parameters.json
!echo  "   \"text\":\"I think that IBM watson is a wonderful service.\"," >>parameters.json
!echo "    \"features\":{" >>parameters.json
!echo "      \"entities\": {" >>parameters.json
!echo "        \"emotion\": true," >>parameters.json
!echo "        \"sentiment\": true," >> parameters.json
!echo "        \"limit\":2" >>parameters.json
!echo "      }," >>parameters.json
!echo "     \"keywords\": {" >>parameters.json
!echo "         \"emotion\":true," >>parameters.json
!echo "         \"sentiment\":true," >>parameters.json
!echo "         \"limit\": 2" >> parameters.json
!echo "    }" >> parameters.json
!echo "   }"  >> parameters.json
!echo "}" >> parameters.json
!cat parameters.json

{
   "text":"I think that IBM watson is a wonderful service.",
    "features":{
      "entities": {
        "emotion": true,
        "sentiment": true,
        "limit":2
      },
     "keywords": {
         "emotion":true,
         "sentiment":true,
         "limit": 2
    }
   }
}


In [None]:
#!curl -s "https://gateway-a.watsonplatform.net/calls/text/TextGetTextSentiment" \
#-d "outputMode=json" \
#-d "apikey=4b46c7859a7be311b6f9389b12504e302cac0a55" \
#-d "text=I did not dislike it. " 
!curl -X POST -H "content-Type: application/json"   \
-u "apikey":"JjBk20E2nz3jxv9tUkasj1CIcyPOwhcb1uhMhkXdTWEn" \
    -d @parameters.json  \
    "https://gateway.watsonplatform.net/natural-language-understanding/api/v1/analyze?version=2018-11-16"


!curl -G -u  "apikey":"JjBk20E2nz3jxv9tUkasj1CIcyPOwhcb1uhMhkXdTWEn" -d "version=2018-11-16" -d "url=pages.stern.nyu.edu/~nwhite" -d "features=keywords,entities" -d "entities.emotion=true" -d "entities.sentiment=true" -d "keywords.emotion=true" -d "keywords.sentiment=true" "https://gateway.watsonplatform.net/natural-language-understanding/api/v1/analyze"      




{
  "usage": {
    "text_units": 1,
    "text_characters": 47,
    "features": 2
  },
  "language": "en",
  "keywords": [
    {
      "text": "IBM watson",
      "sentiment": {
        "score": 0.983705,
        "label": "positive"
      },
      "relevance": 0.844343,
      "emotion": {
        "sadness": 0.006155,
        "joy": 0.113473,
        "fear": 0.002814,
        "disgust": 0.008891,
        "anger": 0.022145
      },
      "count": 1
    },
    {
      "text": "wonderful service",
      "sentiment": {
        "score": 0.983705,
        "label": "positive"
      },
      "relevance": 0.155657,
      "emotion": {
        "sadness": 0.006155,
        "joy": 0.113473,
        "fear": 0.002814,
        "disgust": 0.008891,
        "anger": 0.022145
      },
      "count": 1
    }
  ],
  "entities": [
    {
      "type": "Company",
      "text": "IBM",
      "sentiment": {
        "score": 0.728278,
        "label": "positive"
      },
      "relevance": 0.33,
      "emotion": {


## Exercise

The following websites contain listing of many useful APIs

* https://market.mashape.com/explore
* https://www.programmableweb.com/category/all/apis
* http://www.mashery.com/
* http://apigee.com/ 

Mashape is my own personal favorite in terms of user-friendliness and also has examples directly expressed using CURL. but the others are pretty nice as well. 
#### Your task: search through mashup  and find a web API that does something that you like. Use CURL to issue a web API call to this service. Note, You will need to create a mashup account to get a key to use with your application. 

In [None]:
!curl -X POST --include 'https://andruxnet-random-famous-quotes.p.mashape.com/?cat=movies' \
-H 'X-Mashape-Key: OzEG3Zp1kqmshUxcOFT9hhn1LFbmp19ubr2jsnLlnpE40EIlCp' \
-H 'Content-Type: application/x-www-form-urlencoded' \
-H 'Accept: application/json'
