### Sourcing SCOTUS from Harvard's [Caselaw Access Project (CAP)](https://case.law/)

Goal: retrieve all opinions written by the Supreme Court for a specified year range.

SCOTUS denies thousands of cases every year, and each denial gets its own document, so we can't just grab all SCOTUS documents from CAP for a specified year. We need docket numbers for the cases that granted cert and argued before the court. Here, we source those docket numbers from the [Super-SCOTUS dataset](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/POWQIT) [[paper](https://aclanthology.org/2023.nllp-1.20/)].

1. Get docket numbers for the years 1986-2019 from superscotus.
2. For each year, request a small sample (~15) cases from CAP. (waiting on unmetered API access before pulling full set)

**Case issues**
- [Board of Education v. Tom F.](https://cite.case.law/us/552/1/) Here, there was a recusal, and the court split 4-4, leading to a ~2-sentence per curiam opinion saying the lower court was affirmed by default. For some reason, you can't search for this case by docket number (via web or API)
- [Altantic Sounding Co. v. Townsend](https://cite.case.law/us/557/404/): Classified as 11th circuit instead of SCOTUS, so 9009 court filter returns 0 results with this case's docket number

### Rerunning API pull

TODOs:

- Make is_known_authors 0 or 1 not null
- Deal with special cases (kavanaugh and kagan not participating and per curiam)
- Cross reference num_cases with Oyez api



### Authorship

If we fail to find case or find case without opinion, there are no authors

If we find a case, for each opinion, we can find: no author, unknown author, or special authors

In [1]:
%cd -q ../..

import asyncio
import os
import sqlite3
from collections import defaultdict
from itertools import chain
from pathlib import Path

import aiohttp
import jsonlines
from dotenv import load_dotenv

from scotus_metalang import cap
from scotus_metalang.authors import AUTHOR_MAP

load_dotenv()
CAP_TOKEN = os.environ["CAP_TOKEN"]

docket_nums_by_year = defaultdict(list)
with jsonlines.open("data/super_scotus/1986_to_2019.jsonl", "r") as f:
    for case in f:
        # Example case id: "1986_84-2022"
        year = case["year"]
        docket_number = case["id"][5:]
        docket_nums_by_year[year].append(docket_number)


In [2]:
with jsonlines.open("data/super_scotus/1986_to_2019.jsonl", "r") as f:
    for case in f:
        # Example case id: "1986_84-2022"
        year = case["year"]
        docket_number = case["id"][5:]
        if docket_number == "10-10":
            print(case)
            break

{'id': '2010_10-10', 'year': 2010, 'citation': '564 US _', 'title': 'Turner v. Rogers', 'petitioner': 'Michael D. Turner', 'respondent': 'Rebecca L. Rogers, et al.', 'docket_no': '10-10', 'court': 'Roberts Court', 'decided_date': 'Jun 20, 2011', 'url': 'https://www.oyez.org/cases/2010/10-10', 'transcripts': [{'name': 'Oral Argument - March 23, 2011', 'url': 'https://apps.oyez.org/player/#/roberts6/oral_argument_audio/23023', 'id': 23023, 'case_id': '2010_10-10'}], 'adv_sides_inferred': False, 'known_respondent_adv': True, 'advocates': {'Seth P. Waxman': {'id': 'seth_p_waxman', 'name': 'Seth P. Waxman', 'role': 'for the petitioner', 'side': 1}, 'Leondra R. Kruger': {'id': 'leondra_r_kruger', 'name': 'Leondra R. Kruger', 'role': 'Acting Principal Deputy Solicitor General, Department of Justice, as amicus curiae, for the United States', 'side': 2}, 'Stephanos Bibas': {'id': 'stephanos_bibas', 'name': 'Stephanos Bibas', 'role': 'for the respondents', 'side': 0}}, 'win_side': 1.0, 'win_side

In [2]:
docket_numbers = list(chain(*docket_nums_by_year.values()))
docket_numbers = [{"docket_number": x} for x in docket_numbers]


In [2]:

connection = sqlite3.connect("api_log.db")

In [4]:
opinions = [{"docket_number": 454545, "cap_author": "asdf"},
            {"docket_number": 5665656, "cap_author": "dededed"}]

with connection:
    connection.executemany("""--sql
                           INSERT INTO opinions(docket_number, cap_author) VALUES(:docket_number, :cap_author)
                           """, opinions)

In [9]:
# Insert all docket numbers into cases
with connection:
    connection.executemany("""--sql
                           INSERT OR IGNORE INTO cases (docket_number)
                           VALUES(:docket_number)
                           """, docket_numbers)

In [12]:
# Get everything from cases
with connection:
    rows = connection.execute("""--sql
                              SELECT * FROM cases
                              """).fetchall()


In [9]:
# Figure out which docket numbers need to be processed
with connection:
    rows = connection.execute("""--sql
                              SELECT * FROM cases
                              WHERE case_status != 'success' OR case_status IS NULL
                              """).fetchall()
docket_numbers_to_process = [row[0] for row in rows]

## select * where status != success

In [10]:
len(docket_numbers_to_process)

2405

In [11]:
async def main():
    connector = aiohttp.TCPConnector(limit_per_host=20)
    headers={"Authorization": f"Token {CAP_TOKEN}"}
    async with aiohttp.ClientSession(connector=connector, headers=headers) as session:
        for i, docket_number in enumerate(docket_numbers_to_process):  # Sample here to limit API usage while tinkering
            db_params, opinions_as_params = await cap.process_opinions_by_docket_number(docket_number, session)
            with connection:
                connection.execute("""--sql
                                   UPDATE cases
                                   SET case_status = :case_status,
                                   is_authors_known = :is_authors_known,
                                   selected_case_id = :selected_case_id,
                                   num_opinions = :num_opinions,
                                   authors = :authors
                                   WHERE docket_number = :docket_number
                                   """, db_params)
                connection.executemany("""--sql
                                       INSERT INTO opinions
                                       VALUES(:docket_number, opinion_number, :cap_author, :author)
                                       """, opinions)

In [None]:
await main()