# Tow PACER project

### Phase 1 (preparation; Jeremy, 40 hrs; Justin, 8 hrs; between 3/1 and 4/1): 
 - Compile a close-to-live database of relevant free Federal court filings data, pulling directly from PACER RSS feeds and the CourtListener's APIs.
 - Compile training data in consultation with the courts reporter for several of his heuristics. 
 - Other preliminary infrastructure work, e.g. scraping DoJ press releases
### Phase 2 (Jeremy, 120hrs; Justin 12 hrs; between 4/1 and 6/1):
 - Build custom models (classifiers and named-entity recognizers) to instrumentalize hand-written heuristics used by the courts reporter using docket data.
 - Test those models accuracy on held-out sets and also new data, plus human evaluation/research by Justin. (Revising as necessary.)
 - Alert documents considered accurate by the model to the courts reporter via Slack.
### Phase 3 (Jeremy, 60 hours; between 6/1 and 7/1):
 - For heuristics that aren't performing as well as we'd like, purchase dockets (10cents to a few dollars each) to provide the model with more information. This is a higher stakes test for the model, since it will involve automatically spending money.
 - Fetch and store those dockets (likely in the Free Law Project's RECAP database).
### Phase 4 ("stretch goals"; if time permits):
- Devise an AI-based method, based on consultation with a courts reporter, for guessing which documents we should purchase from a case. This is a greater challenge because we're asking the model to encode newsworthiness directly, rather than a more well-defined category.

## Models:

Search warrants:
- is this maybe a search warrant (based on RSS) that we should buy the docket for?
- is [this a search warrant or not](https://colab.research.google.com/drive/16l_Fr9d9oLrGPz7cQwtD5Z3sJGsHQagD#scrollTo=2QR9UdtmiGbs)? (based on a docket)
- [what's being searched/seized](https://colab.research.google.com/drive/1bK5lsugjEX6X4QdGFmeA4KglUPQaRuPh#scrollTo=79q5hRTtHNRq)?
- what category is the thing being searched/seized? (need to generate categories)

Newsworthy: 
 - DOJ press releases, Justin's articles, whether it was purchased by a courtlistener user

In [1]:
import courtlistener



In [5]:
docs = courtlistener.find_search_warrant_documents(n=100)

In [4]:
from flair.models import SequenceTagger

model = SequenceTagger.load('search_warrants_model/final-model-20210406.pt')

from flair.data import Sentence

2021-04-06 12:23:18,883 loading file search_warrants_model/final-model-20210406.pt
APPLICATION for Search Warrant as to In <B-MISC> Re <I-MISC> : <I-MISC> Five <I-MISC> digital <I-MISC> devices <I-MISC> / <I-MISC> media <I-MISC> seized <I-MISC> from <I-MISC> Raymond <I-MISC> Sherwin <I-MISC> on <I-MISC> January <I-MISC> 25 <I-MISC> , <I-MISC> 2021 <L-MISC> . ( Attachments : # 1 Attachment A , # 2 Attachment B , # 3 Affidavit of Jamie West ) ( eh ) ( Entered : 01 / 28 / 2021 )
Application by United States for Search Warrant re One Account for Investigation of 18 U.S.C . 1343 and Other Offenses ( N.D. Cal. 2021 )


In [7]:
for doc in docs:
    sentence = Sentence(doc["description"])
    model.predict(sentence)
    print(sentence.to_plain_string())
    print(' '.join(entity.text for entity in sentence.get_spans('ner')))
    print()

REDACTED VERSION of 2 Search Warrant Application by USA as to Sealed Residence (dlb) (Entered: 04/05/2021)


REDACTED VERSION of 2 Search Warrant by USA as to Sealed Bank Account (dlb) (Entered: 04/05/2021)


Application and Affidavit for Search and Seizure Warrant entered as to Purple i-phone Seizure No. 2021250600056401-0004 ("Target Device 2"). (lrc) (Entered: 04/02/2021)
Purple i-phone Seizure No. 2021250600056401-0004 (" Target Device 2 ")

APPLICATION and Affidavit for Search Warrant by USA as to Apple iPhone 978 305-2137, LG Telephone 978 876-2190 - APPROVED So Ordered by Magistrate Judge Andrea K. Johnstone. (Attachments: # 1 Affidavit) Original document available in clerks office. (bt) (Entered: 04/01/2021)
Apple iPhone 978 305-2137 , LG Telephone 978 876-2190 -

Application and Affidavit for Search and Seizure Warrant entered as to Grey Apple iPhone Cellular Phone Case Number: SYS-21-03-0049 Seizure No. 2021250400161901. (lrc) (Entered: 03/31/2021)
Grey Apple iPhone Cellular 

MOTION /Application for Search Warrant by USA as to 205 Blue Ridge Lane, Morgantown, WV 26508. (Attachments: # 1 Affidavit Agent, # 2 Attachment A - Property to be Searched, # 3 Attachment B - Description of Property to be Seized)(mh) (Copy Agent, USA) (Entered: 03/14/2021)
205 Blue Ridge Lane , Morgantown , WV 26508 be Seized )( mh

MOTION /Application for Search Warrant by USA as to 205 Blue Ridge Lane, Morgantown, WV 26508. (Attachments: # 1 Affidavit Agent, # 2 Attachment A - Property to be Searched, # 3 Attachment B - Description of Property to be Seized)(mh) (Copy Agent, USA) (Entered: 03/14/2021)
205 Blue Ridge Lane , Morgantown , WV 26508 be Seized )( mh

MOTION /Application for Search Warrant by USA as to 205 Blue Ridge Lane, Morgantown, WV 26508. (Attachments: # 1 Affidavit Agent, # 2 Attachment A - Property to be Searched, # 3 Attachment B - Description of Property to be Seized)(mh) (Copy Agent, USA) (Entered: 03/14/2021)
205 Blue Ridge Lane , Morgantown , WV 26508 be Seized

MEMORANDUM OPINION & ORDER as to David Clyde McKinney. McKinney did not carry his burden in proving the challenged statements are false, agents acted deliberately or recklessly, the search warrant otherwise lacked a basis for probable cause, and officers acted in bad faith. It is therefore ORDERED that Defendant's Motion for a Franks Hearing/Suppression (Dkt. 740 ) is hereby DENIED. Signed by District Judge Amos L. Mazzant, III on 2/26/2021. (baf, )
David Clyde McKinney . McKinney did not carry his burden in proving the challenged statements are false , agents acted deliberately or recklessly , the search warrant otherwise lacked a basis for probable cause , and officers acted in bad faith

Application and Affidavit for Search and Seizure Warrant entered as to Red Samsung Cell Phone Seized as FP&f No. 2021565300016105 Item 001 ("Target Device #2"). (lrc) (Entered: 02/26/2021)
Red Samsung Cell Phone Seized as FP & f No. 2021565300016105 Item 001 (" Target Device # 2 ")

APPLICATION AND 

ORDER denying 11 Motion to Remand to State Court. Plaintiff, taking issue with the lack of detail in the defendants declaration, insists that it would have been easy for the defendant to determine whether the claims at issue involved ERISA. T he plaintiff seeks discovery regarding the contours of the defendant's search. However, the defendant is not obligated to investigate whether a case is removable. Cutrone, 749 F.3d at 143. Furthermore, the Second Circuit has cautioned cour ts against "expending copious time determining what a defendant should have known or have been able to ascertain at the time of the initial pleading or other relevant filing." Cutrone at 145. It is true that Defendant determined ERISA a pplied soon after launching a targeted search, a task not particularly labyrinthine or protracted. But, due to the lack of a clearly stated, univocal connection between the information in the EOB forms and ERISA, the defendants were required t o conduct a separate, targeted searc

## NEW GOAL: get EVERYTHING then assign it a search warrant probability

In [41]:
# how do I get case titles that don't have documents
# e.g. https://www.courtlistener.com/docket/59790602/application-by-the-united-states-for-a-search-warrant-for-four-electronic/
#  and https://www.courtlistener.com/docket/59796410/application-by-the-united-states-for-a-search-warrant-for-one-location-and/
# this gets them ... https://www.courtlistener.com/?type=r&q=search%20warrant&type=r&order_by=dateFiled%20desc

import requests
from urllib.parse import urlencode
from dotenv import load_dotenv
from os import environ, makedirs

load_dotenv()

API_KEY = environ.get("API_KEY")


def search_recap_with_url(url):
    return requests.get(
        url,
        headers={
            "content-type": "application/json",
            "Authorization": f"Token {API_KEY}",
        },
    ).json()


def search_recap(q=None, description=None, available_only=None, suit_nature=None):
    urlparams = {
        "type": "d",  # Document-oriented results from the RECAP Archive
        "order_by": "entry_date_filed desc",
    }
    if available_only:
        urlparams["available_only"] = "on"
    if suit_nature:
        urlparams["suitNature"] = suit_nature
    if description:
        urlparams["description"] = description
    if q:
        urlparams["q"] = q  # wwg1wga
    print(urlparams)
    return search_recap_with_url(
        "https://www.courtlistener.com/api/rest/v3/search/?{}".format(
            urlencode(urlparams)
        )
    )


def find_search_warrant_documents(n=100):
    #     for each case and document, make a record (in memory or in a DB), so we don\'t duplicate
    #     download the documents locally
    #    ?q=&type=r&order_by=entry_date_filed%20desc&available_only=on&description=search%20warrant
    next_url = None
    records = []
    while len(records) <= n:
        if len(records) == 0:
            search_result = search_recap(
                q="search warrant",
                available_only=None
            )
            records += search_result["results"]
            next_url = search_result["next"]
        elif next_url:
            search_result = search_recap_with_url(next_url)
            records += search_result["results"]
            next_url = search_result["next"]
        else:  # next_url is not None (and it's not the first go)
            break
    assert len(set([doc["is_available"] for doc in records])) == 2 or n < 20
    return records

docs = find_search_warrant_documents(100)


{'type': 'd', 'order_by': 'entry_date_filed desc', 'q': 'search warrant'}


{False, True}

In [43]:
docs[-1]

{'absolute_url': '',
 'assignedTo': None,
 'assigned_to_id': None,
 'attachment_number': None,
 'attorney': None,
 'attorney_id': None,
 'caseName': 'Application by the United States for a Search Warrant for One Electronic Device for Investigation of 21 U.S.C. 841 and Other Offenses',
 'cause': '',
 'court': 'District Court, N.D. California',
 'court_citation_string': 'N.D. Cal.',
 'court_exact': 'cand',
 'court_id': 'cand',
 'dateArgued': None,
 'dateFiled': None,
 'dateTerminated': None,
 'description': '',
 'docketNumber': '5:21-mj-70400',
 'docket_absolute_url': '/docket/59708236/application-by-the-united-states-for-a-search-warrant-for-one-electronic/',
 'docket_entry_id': 157991548,
 'docket_id': 59708236,
 'document_number': None,
 'document_type': 'PACER Document',
 'entry_date_filed': '2021-03-04T23:53:00-08:00',
 'entry_number': None,
 'filepath_local': None,
 'firm': None,
 'firm_id': None,
 'id': 163089616,
 'is_available': False,
 'jurisdictionType': '',
 'juryDemand': '',