In [None]:
%load_ext rich

Open a persistent database connection. 

In [None]:
from sqlpyd import Connection
c = Connection(DatabasePath="app/data/x.db", WAL=True)

Delete all tables from the named database within the `CWD`.

In [None]:
#prefixes = ["pax_tbl", "lex_tbl", "sc_tbl"]
prefixes = [] # because this deletes tables, default to empty list
for p in prefixes:
    if p:
        sql = f"""--sql
            SELECT name 
            FROM sqlite_schema 
            WHERE type='table' and name like '{p}%' 
            ORDER BY name;
            """
        for x in c.db.execute_returning_dicts(sql):
            c.db.execute(f"DROP TABLE IF EXISTS {x['name']};")

## Corpus-Pax

Utilize the `corpus-pax` library to create the initial tables.

In [None]:
from corpus_pax import init_persons
init_persons(c)

## Corpus-Base

After the `corpus-pax` tables are created, can create the `corpus-base` tables which include decision and justice tables.

In [None]:
from corpus_base import build_sc_tables, init_sc_cases
build_sc_tables(c)
init_sc_cases(c)

Because it takes a significant amount of time, pre-process Opinions (a related decision table from corpus-base) via inclusion files stored in the local repository. 

This step need only be done whenever we change the citation / statute detection algorithms:

1. `citation_utils.extract_citations()`; and 
2. `statute-patterns.count_rules()`.

In [None]:
from corpus_x.inclusions import create_inclusion_files_from_db_opinions
create_inclusion_files_from_db_opinions(c)

## Corpus-X

### Preparatory steps from files to db

The pre-processed data can now be used to insert related Statutes and Citations of each Opinion back into the database. 

The statute and inclusion tables need to be created before the pre-processed data can be inserted.

In [None]:
from corpus_x.inclusions import Inclusion
from corpus_x import Statute
Statute.make_tables(c) # inclusion files will reference a statute_id
Inclusion.make_tables(c) # note that statutes need to exist first

### Move content from files to db

Collect the pre-processed data and insert the same into the created database tables. 

Estimate at the end of 2022 (factors to consider):

1. the last time data was scraped as raw files,
2. the time separate opinions were manually included

Result: about ~484k `CitationsInOpinions` and ~99k `StatutesInOpinions` records. 

In [None]:
from corpus_x.inclusions import populate_db_with_inclusions
populate_db_with_inclusions(c) # takes 5 minutes

### Ensure existence of component elements

What exists in the database are records of statutes but not the statutes themselves. In other words, the foreign key included in the `StatuteInOpinions` table does not yet have a counterpart in the `StatuteRow` table. 

Note that the `CitationInOpinions` will have a counterpart in the `DecisionRow` table since this was processed first.

In [None]:
from corpus_x.inclusions import StatuteInOpinion, CitationInOpinion
StatuteInOpinion.add_statutes(c) # takes 2-3 minutes to store 500 objects
StatuteInOpinion.update_statute_ids(c)
CitationInOpinion.update_decision_ids(c)

With `StatuteRow` and `CitationRow` tables already in existence, can proceed to add the `CodeRow` table.

### Add Codifications 

In [None]:
from corpus_x.codifications import Codification, CodeStatuteEvent
Codification.make_tables(c) 
Codification.add_rows(c) # takes about 1-2 minutes


Determine Codifications that are missing affector paths, i.e. improper use of `item`, `caption`, `content` in matching an event to a Statute unit.

In [None]:
from corpus_x.codifications import CodeStatuteEvent
if matches := CodeStatuteEvent.fetch_unmaterialized(c):
    print(f"Violating {len(matches)=}; review violators via SQL.")