In this lesson, we look at how we can use data from the web.
We will use data from Harvard’s [Caselaw Access Project](https://case.law/) ("CAP").
CAP aims to make all published US courts decisions freely available in a standard, machine-readable format.
CAP and the data format is [documented here](https://case.law/api/).

To fetch data from the web, we must first import a few libraries:

In [4]:
import requests
import json

We need to specify the URL to the data we want to fetch.
We include some parameters that specifies which cases we want to load:
- `page_size` specifies the number of items
- `jurisdiction` is Illinois in this example
- `decision_date_min` is the minimum date, we only want decisions later than this date
- `full_case` include the full text of each case

More parameters are listed in the CAP documentation linked above.

In [3]:
URL = "https://api.case.law/v1/cases/?jurisdiction=ill&full_case=true&decision_date_min=2009-01-01&page_size=3"

Now, let's fetch the data. The server response also contains metadata, but we want the content:

In [None]:
response = requests.get(URL)
content = response.content

We can look at the first 100 characters from the raw data. We can see the same data if we open the URL in a web browser.

In [5]:
print(content[:100])

b'{"count":2025,"next":"https://api.case.law/v1/cases/?cursor=eyJwIjogWzAuMCwgMTIzMTI4NjQwMDAwMCwgNDI4'


To use the data, we must decode them. We must specify the character set, which is often UTF-8. Then we decode the json format into a Python dictionary.

In [None]:
text = content.decode("utf-8")
data = json.loads(text)

The field `count` contains the number of hits in the database. This is usually different from the number of items we requested.

In [6]:
print(data["count"])

2025


Let's fetch the list of cases:

In [None]:
cases = data["results"]

Now we can inspect each case. Let's loop over the cases and get some of the information.
The data contains various metadata about each case, such as the case name on the abbreviated case name.

It's often useful to look at the data in a web browser to get an overview.

In [7]:
for case in cases:
    print("Case name:", case["name_abbreviation"])
    print("Court:", case["court"]["name"])
    opinions = case["casebody"]["data"]["opinions"]
    for opinion in opinions:
        print(" opinion type:", opinion["type"])
        print(" opinion author:", opinion["author"])

Case name: People v. Johnson
Court: Illinois Appellate Court
 opinion type: majority
 opinion author: JUSTICE LYTTON
 opinion type: dissent
 opinion author: None
Case name: People v. Angarola
Court: Illinois Appellate Court
 opinion type: majority
 opinion author: JUSTICE O’MALLEY
Case name: Village of Bensenville v. City of Chicago
Court: Illinois Appellate Court
 opinion type: majority
 opinion author: JUSTICE O’MALLEY
