## A Brief Guide to the Patentsview API

In [4]:
import requests
import json
import pandas as pd
import os.path

`requests` is a Python library that... well, let's you make HTTP requests. So this pretty much allows you to communicate with any website, API, etc. If it uses HTTP (i.e. it can be viewed in a browser), `requests` can fetch it programmatically for you. And it's a perfect tool for Patentsview.

A lot of websites have publicly accessible APIs with decent documentation. Twitter and Reddit are good examples of *good* API design/documentation.

Unfortunately, Patentsview is not as thorough so I'm still figuring it out.. Maybe I'm just slow.

Below is an example of how I'm fetching the abstracts for my current set of patent IDs. 

In [13]:
def fetch_selected_abstracts(patent_ids):
    base_url = "http://www.patentsview.org/api/patents/query?"
    query = "q={\"patent_id\":\"%s\"}" # WTF is all this???
    return_format = "&f=[\"patent_number\",\"patent_abstract\"]" # And this???
    selected_abstracts = {}
    for patent_id in patent_ids:
        # Concatenates base_url, query (after inserting patent_id) and return_format
        url = base_url+(query % patent_id)+return_format 
        response = requests.get(url)
        selected_abstracts[patent_id] = response.json()['patents'][0]['patent_abstract']
    return pd.DataFrame(list(selected_abstracts.items()), columns=['patent_id', 'abstract'])

The `base_url` is the patents website: `"http://www.patentsview.org/api/patents/query?"`

The `query` represents the query (and all the fields): `"q={\"patent_id\":\"%s\"}"`

It's kinda confusing, but we can break it down.

`{\"patent_id\":\"%s\"}]` basically means `{patent_id : <insert_id_here>}`. Which in this context, means to query the API for a particular patent ID. Alot of the minutiae is because HTTP can't really parse quotes and whitespace characters within a URL. So we have to add a lot of escape characters (backslashes) so that HTTP can parse the URL correctly.

The `return_format` is similar to `query`: `"&f=[\"patent_number\",\"patent_abstract\"]"`

where `"&f=[\"patent_number\",\"patent_abstract\"]"` => `["patent_number", "patent_abstracts"]` since I want the patent number and the abstract to be returned.


In [12]:
patent_ids = ['7018053', '6790432', '6564038', '6726725', '6716898']
%time selected_abstracts = fetch_selected_abstracts(patent_ids)
selected_abstracts.head()

CPU times: user 16.7 ms, sys: 3.94 ms, total: 20.6 ms
Wall time: 387 ms


Unnamed: 0,patent_id,abstract
0,7018053,A projector includes circuitry configured to g...
1,6790432,Provided is a method and apparatus for produci...
2,6564038,A method and apparatus are disclosed for suppr...
3,6726725,Orthopedic implants comprising components of z...
4,6716898,Disclosed are amber polyester compositions sui...
