# APIs with Python
*Special thanks for Ekaterina Levitskaya (NYU) for her contributions to this notebook.*

APIs (application programming interfaces) are hosted on web servers. When you type www.google.com in your browser's address bar, your computer is actually asking the www.google.com server for a webpage, which it then returns to your browser. APIs work much the same way, except instead of your web browser asking for a webpage, your program asks for data. This data is usually returned in JSON format. 

To retrieve data, we make a request to a webserver. The server then replies with our data. In Python, we'll use the `requests` library to do this.



### Python Setup

In [1]:
# interacting with websites and web-APIs
import requests # easy way to interact with web sites and services

# data manipulation
from datascience import *
import numpy as np

## Retrieving Patent Data about University of Maryland

We will use the `request` package to retrieve information about the patents that have been granted to inventors at University of Maryland, using the PatentsView API. This notebook goes over using the `request` package to get the data, as well as putting that data into a form that is usable. 

## How does the request package work?

We first need to understand what information can be accessed from the API. We use an example of the **PatentsView API** (www.patentsview.org) to make the API call and check the information we get. 

### About PatentsView API

The PatentsView platform is built on data derived from the US Patent and Trademark Office (USPTO) bulk data to link inventors, their organizations, locations, and overall patenting activity. The PatentsView API provides programmatic access to longitudinal data and metadata on patents, inventors, companies, and geographic locations since 1976. 

To access the API, we use the `request` function. In order to tell Python what to access, we need to specify the url of the API endpoint.

PatentsView has several API endpoints. An endpoint is a server route that is used to retrieve different data from the API. You can think of the endpoints as just specifying what types of data you want. Examples of PatentsView API endpoints are shown here: http://www.patentsview.org/api/doc.html

Many times, we need to request a key from the data provider in order to access an API. For example, if you wanted to access the Twitter API, then you would need to get a Twitter developer account and access token (see [https://developer.twitter.com/en/docs/basics/authentication/overview/oauth](https://developer.twitter.com/en/docs/basics/authentication/overview/oauth)). Currently no key is necessary to access the PatentsView API. 

### Making a Request
When you ping a website or portal for information this is called making a request. That is exactly what the `requests` library has been designed to do. However, we need to provide a query URL according to the format defined by PatentsView. The details on how to do that is explained [at this link.](https://www.patentsview.org/api/query-language.html)

Following the directions detailed in the link above, let's build our first query URL.

**Query String Format**

The query string is always a single JSON object: **{`<field>`:`<value>`}**, where `<field>` is the name of a database field and `<value>` is the value the field will be compared to for equality (Each API Endpoint section contains a list of the data fields that can be selected for inclusion in output datasets).

We use the following base URL for the Patents Endpoint:

**Base URL**: `http://www.patentsview.org/api/patents/query?q={criteria}`



## Task example: Pull patents for University of Maryland

In this example, we will only pull patents from one organization: University of Maryland. Let's go to the Patents Endpoint (http://www.patentsview.org/api/patent.html) and find the appropriate field for the organization's name. Based on looking at the APID documentation, we can see that the variable that we need is called `"assignee_organization"` (organization name, if assignee is organization).

> _Note_: **Assignee**: the name of the entity - company, foundation, partnership, holding company or individual - that owns the patent. In this example we are looking at universities (organization-level).

We will pull from the API using a step-by-step process:
- Build the query
- Get the response
- Check the response code
- Get the content
- Convert to table

By the end, we should have data about patents that we can work with using the tools we've already learned.

### Step 1. Build the URL query 

Let's build our first URL query by combining the base url with one criterion (name of the `assignee_organization`). This is based on the directions detailed [at this link.](https://www.patentsview.org/api/query-language.html)

To build it up, we start with the base url (`http://www.patentsview.org/api/patents/query?q=`) and add the criterion: (`{"assignee_organization":university of maryland"}`).

In [None]:
url = 'http://www.patentsview.org/api/patents/query?q={"assignee_organization":"university of maryland"}'

### Step 2. Get the response

Now let's get the response using the URL defined above, using the `requests` library.

In [None]:
r = requests.get(url)  

### Step 3. Check the Response Code

Before you can do anything with a website or URL in Python, it’s a good idea to check the current status code of said portal.

The following are the response codes for the PatentsView API:

`200` - the query parameters are all valid; the results will be in the body of the response

`400` - the query parameters are not valid, typically either because they are not in valid JSON format, or a specified field or value is not valid; the “status reason” in the header will contain the error message

`500` -  there is an internal error with the processing of the query; the “status reason” in the header will contain the error message

Let's check the status of our response 

In [None]:
r.status_code  # Check the status code

We are good to go. Now let's get the content.

### Step 4. Get the Content
After a web server returns a response, you can collect the content you need by converting it into a JSON format.

JSON is a way to encode data structures like lists and dictionaries to strings that ensures that they are easily readable by machines. JSON is the primary format in which data is passed back and forth to APIs, and most API servers will send their responses in JSON format.

In [None]:
json = r.json()  # Convert response to JSON format

By default, we get information on `patent_id`, `patent_number`, and `patent_title`. At the end of the JSON you will see how many results are returned (variable `count`) and the total number of patents found (variable `total_patent_count`).

In [None]:
json  # View JSON

There are 1205 patents for University of Maryland, with 25 out of 1205 results returned (we will discuss how to change the number of returned results later in the notebook).

We want to be able to use this data, but it's a bit hard to in the current JSON format. We want to essentially take the information that is in the `patents` field within the dictionary and create a Table out of it. To do that, we'll need to take a little detour to introduce a few Python tools that will make our lives much easier.

### Lists and List Comprehension

Python lists are similar to arrays, but have slightly different properties. You can create basic python lists with square brackets and work them similar to arrays.

In [None]:
# Create an empty list
empty = []
empty

In [None]:
# Create a list with some numbers
nums = [1,2,3,4,5]
nums

In [None]:
nums.append('test')
nums

#### List Comprehension

List comprehension is kind of like a compact `for` loop inside a list. You use it to generate a list of values with certain characteristics. For example, if we wanted to create a list with values from 0 to 9, we could use the following.

In [None]:
[i for i in np.arange(10)]

Now, this isn't super interesting, because we could have just used the array itself. But, we can also do slight variations.

In [None]:
[2*i for i in np.arange(10)]

In [None]:
nums = np.arange(10)
[i > 5 for i in nums]

### Step 5. Converting JSON into a Table

Now let's convert the JSON into a `Table`. To do this, let's first examine how the JSON looks.

In [None]:
type(json)

In [None]:
json

It can be a bit hard to tell, but `json` is a Python dictionary, with three keys: `patents`, `count`, and `total_patent_count`. Let's look at what each one contains.

In [None]:
json['patents']

In [None]:
json['count']

In [None]:
json['total_patent_count']

The `patents` key has, itself, a list of dictionaries. Each individual dictionary has information about an individual patent, so each element in the list is a patent. The other two are summaries about the patent, showing the count of patents that we pulled, along with the overall number of patents for University of Maryland.

Because the patent information is inside a list, we'll need to access information from the `patents` key by iterating through each element of that list and pulling the relevant information from each patent in that list. To do this, we'll use list comprehension.

In [None]:
patent_id = [a['patent_id'] for a in json['patents']]
patent_number = [a['patent_number'] for a in json['patents']]
patent_title = [a['patent_title'] for a in json['patents']]
patents = Table().with_columns('patent_id', patent_id, 'patent_number', patent_number, 'patent_title', patent_title)

Above, the `a` represents an individual patent dictionary, which is an element of the list in `json['patents']`. We pull out the individual fields for each patent (`patent_id`, `patent_number`, and `patent_title`, then put them all into a Table. Let's take a look at the table.

In [None]:
patents

### <span style="color:red">Checkpoint 1: Pull patent data for another university</span>

Now try pulling patent data for Georgetown University:
- build a query URL;
- make a request;
- get the response in JSON format;
- note the total number of patents;
- convert the JSON to a Table.

## Adding to the query other fields of interest

Above we were able to pull data with the default information on the patents (`patent_id`, `patent_number`, `patent_title`). 

What if we want to know about the patent title and patent year?

Let's look for those variables in the API Endpoint (http://www.patentsview.org/api/patent.html), and add those fields to our query.

To the URL created above, we will add the fields parameter: `&f=["patent_title","patent_year"]`

In [None]:
url = 'http://www.patentsview.org/api/patents/query?q={"assignee_organization":"university of maryland"}&f=["patent_title","patent_year"]'

In [None]:
r = requests.get(url)  # Get response from the URL
r.status_code  # Check the status code

In [None]:
json = r.json()  # Convert response to JSON format

In [None]:
json  # View JSON

In [None]:
patent_title = [a['patent_title'] for a in json['patents']]
patent_year = [a['patent_year'] for a in json['patents']]
patents = Table().with_columns('patent_title', patent_title, 'patent_year', patent_year)
patents

### <span style="color:red">Checkpoint 2: Add other fields</span>

Try adding other fields of interest. Go to the Patents Endpoint (http://www.patentsview.org/api/patent.html) and pick other 2 fields of interest to add to the query and get the results.

## Customize number of results

As you have noticed, by default, only 25 results are returned. To change the number of results returned (for example, 50 results), add the option parameter to the query URL: `&o={"per_page":50}`


In [None]:
url = 'http://www.patentsview.org/api/patents/query?q={"assignee_organization":"university of maryland"}&f=["patent_title","patent_year"]&o={"per_page":50}'

In [None]:
r = requests.get(url)  # Get response from the URL
r.status_code  # Check the status code

In [None]:
json = r.json()  # Convert response to JSON format

Now the JSON shows 50 results (as noted in the variable `count` at the bottom of the JSON)

In [None]:
json

### <span style="color:red">Checkpoint 3: Customize the number of results</span>

Try customizing the number of returned results using the options parameter. 

## Using the Data

Now that we've pulled the data, we need to use it somehow. We'll go over more advanced methods with text analysis, but here's an initial look at what types of patents are awarded to University of Maryland.

In [4]:
base_url = 'http://www.patentsview.org/api/patents/query?q='
organization = '{"assignee_organization":"university of maryland"}'
variables = '&f=["patent_title","patent_year", "patent_type", "patent_abstract"]&o={"per_page":100}'

url = base_url + organization + variables
r = requests.get(url)  # Get response from the URL
r.status_code  # Check the status code

200

In [5]:
json = r.json()
patent_title = [a['patent_title'] for a in json['patents']]
patent_year = [a['patent_year'] for a in json['patents']]
patent_type = [a['patent_type'] for a in json['patents']]
patent_abstract = [a['patent_abstract'] for a in json['patents']]


patents = Table().with_columns('patent_title', patent_title,
                               'patent_year', patent_year,
                              'patent_type', patent_type,
                              'patent_abstract', patent_abstract)
patents.show(5)

patent_title,patent_year,patent_type,patent_abstract
Method for binding site identification by molecular dyna ...,2018,utility,The invention describes an explicit solvent all-atom mol ...
Methods for recovery of leaf proteins,2018,utility,A novel method for processing soluble plant leaf protein ...
Bacterial live vector vaccines expressing chromosomally- ...,2018,utility,Bacterial live vector vaccines represent a vaccine devel ...
"Systems, methods, and devices for health monitoring of a ...",2018,utility,A health monitoring device includes an ultrasound source ...
Sparse decomposition of head related impulse responses w ...,2018,utility,This application describes methods of signal processing ...


In [None]:
patents.group('patent_type')

In [None]:
# Find which titles have "method" in them
method = [('Method' in a) or ('method' in a) for a in patents.column('patent_title')]

# Add to Table, then count
patents = patents.with_column('is_method', method)
patents.group('is_method')

## Optional

Please feel free to explore and practice all available options in the API Query Language section of the PatentsView website (http://www.patentsview.org/api/query-language.html).