# Requesting Data from the Internet

There were [several excellent Coursera videos](https://www.coursera.org/learn/data-collection-processing-python/home/week/3) about the internet, getting data, REST API, etc.

## REST API

### REST

* doesn't mean it rests
* REST is just an acronym: __REpresentational State Transfer__
* REST means a way for servers to pass info back and forth over HTTP without having to render anything to a browser

### API

* Application Programming Interface
* __application__ refers to a __software__ that does a particular function
* __interface__ refers to a contract of service between two applications which defines how the two communicate with each other using requests and responses.
* when two software applications want to talk to each other they must follow rules set up by each software on how other software should talk to them.

## Requesting in Python

The __requests module__ is required.  It must be __installed__ using pip (`pip install --upgrade requests`)

Requests don't have to return JSON but that is a common format. __If JSON__ will be the format, you will need to __`import json`__ module.

### `requests.get()`

`requests.get()` returns an __instance__ of a class called __`Response`__ defined in the `requests` module.

Use `requests.get(<URL_TO_GET_THE_RESPONSE_FROM>)` to receive data from the URL.

(You can test what you will get by pasting the url into a browser first.)

#### passing parameters during get()

There are two ways of doing this:
1. pass within the url by adding `?foo=bar&greeting=hi+there` to the end of the url

    e.g. `response.get('https://foobar.com?foo=bar&greeting=hi+there')`
    

2. assign a __dictionary__ of parameters to the __`params`__ function parameter: `.get(<URL>, params=<PARAMETER DICTIONARY>)`.
    
    e.g. `response.get('https://foobar.com', params={'foo':'bar', 'greeting': 'hi there'})`

#### Response attributes and methods

Some of the most commonly used are:
* the attribute __`.text`__ gets the text that was returned with the response
* the attribute __`.url`__ gets the url that was used to get the response
* the method __`.json()`__ similar to json.loads() it __returns json__ from the `.text` in list or dictionary format.

Other attributes include:
* `.status_code`
* `.headers`
* `.history`

#### response codes and their meanings

Use .status_code to view the response code returned in the headers
Here are some of the most common ones:
* `200` responded successfully (returned information)
* `404` file not found (server didn't recognize the path you asked for)
* `401` not authorized to access content
* `301` content has moved to a different url

In [2]:
import requests
import json

In [23]:
page = requests.get('https://api.datamuse.com/words?rel_rhy=funny')
page

<Response [200]>

In [5]:
print(type(page))

<class 'requests.models.Response'>


In [21]:
# the first 150 characters of the response text
print(page.text[:155])

[{"word":"money","score":4415,"numSyllables":2},{"word":"honey","score":1206,"numSyllables":2},{"word":"sunny","score":717,"numSyllables":2},{"word":"bunny


In [22]:
# the url I used to get the response
print(page.url)

https://api.datamuse.com/words?rel_rhy=funny


In [11]:
# TWO ways to get the json
# using json.loads()
json.loads(page.text)

[{'word': 'money', 'score': 4415, 'numSyllables': 2},
 {'word': 'honey', 'score': 1206, 'numSyllables': 2},
 {'word': 'sunny', 'score': 717, 'numSyllables': 2},
 {'word': 'bunny', 'score': 702, 'numSyllables': 2},
 {'word': 'blini', 'score': 613, 'numSyllables': 2},
 {'word': 'gunny', 'score': 449, 'numSyllables': 2},
 {'word': 'tunny', 'score': 301, 'numSyllables': 2},
 {'word': 'sonny', 'score': 286, 'numSyllables': 2},
 {'word': 'dunny', 'score': 245, 'numSyllables': 2},
 {'word': 'runny', 'score': 225, 'numSyllables': 2},
 {'word': 'thunny', 'score': 222, 'numSyllables': 2},
 {'word': 'aknee', 'score': 179, 'numSyllables': 2},
 {'word': 'squinny', 'score': 170, 'numSyllables': 2},
 {'word': 'fiat money', 'score': 160, 'numSyllables': 4},
 {'word': 'gunnie', 'score': 156, 'numSyllables': 2},
 {'word': 'blood money', 'score': 152, 'numSyllables': 3},
 {'word': 'squiny', 'score': 151, 'numSyllables': 2},
 {'word': 'tunney', 'score': 119, 'numSyllables': 2},
 {'word': 'spinny', 'score'

In [12]:
# using the json function from the response
# (the requests module has this shortcut)
page.json()

[{'word': 'money', 'score': 4415, 'numSyllables': 2},
 {'word': 'honey', 'score': 1206, 'numSyllables': 2},
 {'word': 'sunny', 'score': 717, 'numSyllables': 2},
 {'word': 'bunny', 'score': 702, 'numSyllables': 2},
 {'word': 'blini', 'score': 613, 'numSyllables': 2},
 {'word': 'gunny', 'score': 449, 'numSyllables': 2},
 {'word': 'tunny', 'score': 301, 'numSyllables': 2},
 {'word': 'sonny', 'score': 286, 'numSyllables': 2},
 {'word': 'dunny', 'score': 245, 'numSyllables': 2},
 {'word': 'runny', 'score': 225, 'numSyllables': 2},
 {'word': 'thunny', 'score': 222, 'numSyllables': 2},
 {'word': 'aknee', 'score': 179, 'numSyllables': 2},
 {'word': 'squinny', 'score': 170, 'numSyllables': 2},
 {'word': 'fiat money', 'score': 160, 'numSyllables': 4},
 {'word': 'gunnie', 'score': 156, 'numSyllables': 2},
 {'word': 'blood money', 'score': 152, 'numSyllables': 3},
 {'word': 'squiny', 'score': 151, 'numSyllables': 2},
 {'word': 'tunney', 'score': 119, 'numSyllables': 2},
 {'word': 'spinny', 'score'

In [13]:
x = page.json()

In [14]:
print(type(x))

<class 'list'>


In [17]:
print(x[0])

{'word': 'money', 'score': 4415, 'numSyllables': 2}


In [19]:
# pretty print
print(json.dumps(x, indent = 2))

[
  {
    "word": "money",
    "score": 4415,
    "numSyllables": 2
  },
  {
    "word": "honey",
    "score": 1206,
    "numSyllables": 2
  },
  {
    "word": "sunny",
    "score": 717,
    "numSyllables": 2
  },
  {
    "word": "bunny",
    "score": 702,
    "numSyllables": 2
  },
  {
    "word": "blini",
    "score": 613,
    "numSyllables": 2
  },
  {
    "word": "gunny",
    "score": 449,
    "numSyllables": 2
  },
  {
    "word": "tunny",
    "score": 301,
    "numSyllables": 2
  },
  {
    "word": "sonny",
    "score": 286,
    "numSyllables": 2
  },
  {
    "word": "dunny",
    "score": 245,
    "numSyllables": 2
  },
  {
    "word": "runny",
    "score": 225,
    "numSyllables": 2
  },
  {
    "word": "thunny",
    "score": 222,
    "numSyllables": 2
  },
  {
    "word": "aknee",
    "score": 179,
    "numSyllables": 2
  },
  {
    "word": "squinny",
    "score": 170,
    "numSyllables": 2
  },
  {
    "word": "fiat money",
    "score": 160,
    "numSyllables": 4
  },
  {
    

In [28]:
# passing query parameters via dictionary rather than in the url
query_params = {'rel_rhy':'funny'}
page2 = requests.get('https://api.datamuse.com/words', params=query_params)
page2.text[:155]

'[{"word":"money","score":4415,"numSyllables":2},{"word":"honey","score":1206,"numSyllables":2},{"word":"sunny","score":717,"numSyllables":2},{"word":"bunny'

In [29]:
page2.status_code

200

## Using API Documentation

The API Documentation for software tells you how to pass parameters, what parameters to pass, etc. to get the results you want.

### Example: DataMuse

Here is what we learn from the [DataMuse API documentation](https://www.datamuse.com/api/)

#### what is datamuse:

* The Datamuse API is a __word-finding query engine__
* You use it in your apps to find __words that match__ a given set of __constraints__ and that are likely in a given __context__
* You can specify a wide variety of constraints on __meaning__, __spelling__, __sound__, and __vocabulary__ in your queries, in any __combination__.

#### what url to use:

`https://api.datamuse.com`

#### `/words` endpoint:

* `/words` is the endpoint that returns a list of words (and multiword expressions)
* `rd`, `sl`, `sp`, `rel_[code]`, and `v` can be thought of as hard constraints on the result set.
* `topics`, `lc`, and `rc` can be thought of as context hints. They impact the order in which results are returned.
* All parameters are optional.

##### `/words` query parameters

Too many to list here.  Read that section in the DataMuse API documentation for specifics

#### `/sug` endpoint:

* `/sug` produces JSON output similar to the `/words` resource and is suitable for widgets
* useful as a backend for “autocomplete” widgets on websites and apps when the vocabulary of possible search terms is very large.
* It provides word suggestions given a partially-entered query using a combination of the operations described in the “/words” resource above.
* The suggestions perform live spelling correction and intelligently fall back to choices that are phonetically or semantically similar when an exact prefix match can't be found.

##### `/sug` query parameters

`s`, `max`, `v`

#### interpreting the results:

(see the documentation)

#### usage limits:

(see the documentation)

In [1]:
import requests
import json

In [22]:
def get_rhymes(word, max_res, details=True): # details=True will print out other info
    base_url = "https://api.datamuse.com/words" # we will use the /words endpoint
    params_d = {} # set up empty parameters dictionary
    # add parameters:
    params_d['rel_rhy'] = word # rel_[code] where [code] is rhy (related word rhymes)
    params_d['max'] = max_res # maximum number of results to return
    # get a response object with results in JSON
    resp = requests.get(base_url, params=params_d)
    if details:
        print('The response object:',resp)
        print('')
    # change response object to python JSON object
    words_json = resp.json()
    if details:
        print('The JSON results are:')
        print(json.dumps(words_json, indent=2))
    print('')
    print(max_res, 'words that rhyme with', word, 'are:')
    # loop through the JSON list and get the actual words returned
    for wd in words_json:
        print('*', wd['word'])

In [23]:
get_rhymes('flow', '3')

The response object: <Response [200]>

The JSON results are:
[
  {
    "word": "go",
    "score": 7263,
    "numSyllables": 1
  },
  {
    "word": "blow",
    "score": 4758,
    "numSyllables": 1
  },
  {
    "word": "show",
    "score": 4706,
    "numSyllables": 1
  }
]

3 words that rhyme with flow are:
* go
* blow
* show


In [24]:
get_rhymes('pillow', 4, False)


4 words that rhyme with pillow are:
* billow
* armadillo
* amarillo
* castillo


## `requests_with_caching` custom module

**IMPORTANT:** `requests_with_caching` can only be used in Runestone.  To be used on a local Python environment requires a large rewrite.

I explained why in my [Stack Overflow answer](https://stackoverflow.com/questions/70291508/python-attributeerror-module-requests-has-no-attribute-requesturl/74295872#74295872).

The instructors of the Coursera Course ["Data Collection and Processing"](https://www.coursera.org/learn/data-collection-processing-python/home/welcome) wrote a helper module called `requests_with_caching`.

To avoid re-requesting the same data, we will use a programming pattern known as caching. It works like this:

1. Before doing some expensive operation (like calling requests.get to get data from a REST API), check whether you have already saved (“cached”) the results that would be generated by making that request.
2. If so, return that same data.
3. If not, perform the expensive operation and save (“cache”) the results (e.g. the complicated data) in your cache so you won’t have to perform it again the next time.

### The module is simplistic
The module `requests_with_caching` is very limited. 

* this code is optimized for conceptual simplicity
* it only saves the cached info to a file
* it would need to be ammended to write to memory instead

### Why cache?

Caching is a good idea when using REST APIs because:

* It reduces load on the website that is providing you data (be courteous when using other people’s resources).
* Some websites impose rate limits. For example, after 15 requests in a 15 minute period the site may start sending error responses.
* It will make your program run faster. Connections over the Internet can take a few seconds, or even tens of seconds, if you are requesting a lot of data.
* Easier to debug code when you know the content that is coming back is a static copy of the data.
* Easier to run automated tests on code that retrieves data if the data can never change.

### How to import

I have copied their code into `requests_with_caching.py` which can then be imported using `import requests_with_caching`.

### How to use

Invoke the REST API call using `requests_with_caching.get()` rather than `requests.get()`.

You’ll get exactly the same Response object back that you would have gotten. But you’ll also get a printout in the output window with one of the following three diagnostic messages:

* _"found in permanent cache"_
* _"found in page-specific cache"_
* _"new; adding to cache"_

#### optional parameters

__cache_file__: it’s value should be a string specifying the name of the file containing the permanent cache. If you don’t specify anything, the default value is “permanent_cache.txt”. For the datamuse API, we’ve provide a cache in a file called datamuse_cache.txt. It just contains the saved response to the query for “https://api.datamuse.com/words?rel_rhy=funny”.
__private_keys_to_ignore__: its value should be a list of strings. These are keys from the parameters dictionary that should be ignored when deciding whether the current request matches a previous request. The main purpose of this is that it allows us to return a result from the cache for some REST APIs that would otherwise require you to provide an API key in order to make a request. By default, it is set to [“api_key”], which is a query parameter used with the flickr API. You should not need to set this optional parameter.

#### types of cache created
__permanent cache__ is contained in a file that can't be added to
__page-specific cache__ is created when the url being requested isn't in the __permanent cache__.  Reloading the webpage will wipe out the __page-specific cache__.

More information about this module can be found here: https://fopp.umsi.education/books/published/fopp/InternetAPIs/cachingResponses.html