### OCI Data Science - Useful Tips
<details>
<summary><font size="2">Check for Public Internet Access</font></summary>

```python
import requests
response = requests.get("https://oracle.com")
assert response.status_code==200, "Internet connection failed"
```
</details>
<details>
<summary><font size="2">Helpful Documentation </font></summary>
<ul><li><a href="https://docs.cloud.oracle.com/en-us/iaas/data-science/using/data-science.htm">Data Science Service Documentation</a></li>
<li><a href="https://docs.cloud.oracle.com/iaas/tools/ads-sdk/latest/index.html">ADS documentation</a></li>
</ul>
</details>
<details>
<summary><font size="2">Typical Cell Imports and Settings for ADS</font></summary>

```python
%load_ext autoreload
%autoreload 2
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

import logging
logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.ERROR)

import ads
from ads.dataset.factory import DatasetFactory
from ads.automl.provider import OracleAutoMLProvider
from ads.automl.driver import AutoML
from ads.evaluations.evaluator import ADSEvaluator
from ads.common.data import ADSData
from ads.explanations.explainer import ADSExplainer
from ads.explanations.mlx_global_explainer import MLXGlobalExplainer
from ads.explanations.mlx_local_explainer import MLXLocalExplainer
from ads.catalog.model import ModelCatalog
from ads.common.model_artifact import ModelArtifact
```
</details>
<details>
<summary><font size="2">Useful Environment Variables</font></summary>

```python
import os
print(os.environ["NB_SESSION_COMPARTMENT_OCID"])
print(os.environ["PROJECT_OCID"])
print(os.environ["USER_OCID"])
print(os.environ["TENANCY_OCID"])
print(os.environ["NB_REGION"])
```
</details>

In [None]:
pip install oci psycopg2-binary sentence-transformers pandas pyarrow tqdm weasyprint


In [None]:
# --- 1. Imports ---
import os
import oci
import pandas as pd
from tqdm import tqdm
import psycopg2
from sentence_transformers import SentenceTransformer

# --- 2. Configuration ---

# OCI Object Storage
BUCKET_NAME = "aus-legal-corpus"
OBJECT_PREFIX = ""  # Assuming root directory of the bucket
DOWNLOAD_DIR = "./data"
os.makedirs(DOWNLOAD_DIR, exist_ok=True)

# PostgreSQL
DB_CONFIG = {
    "dbname": "postgres",
    "user": "postgres",
    "password": "",
    "host": "10.150.2.103",
    "port": "5432"
}

# OCI Config
oci_config = {
    "user": "ocid1.user.oc1..aaq",
    "key_file": "./data/oci_api_key.pem",
    "fingerprint": "de:d6",
    "tenancy": "ocid1.tenancy.oc1..aua",
    "region": "us-sanjose-1"
}

# --- 3. Connect to OCI and Download Parquet Files ---
object_storage = oci.object_storage.ObjectStorageClient(oci_config)
namespace = object_storage.get_namespace().data

print("üîç Listing objects in bucket...")
objects = object_storage.list_objects(namespace, BUCKET_NAME, prefix=OBJECT_PREFIX).data.objects
parquet_files = [obj.name for obj in objects if obj.name.endswith(".parquet")]

for obj_name in parquet_files:
    local_file = os.path.join(DOWNLOAD_DIR, os.path.basename(obj_name))
    if not os.path.exists(local_file):
        print(f"‚¨áÔ∏è Downloading {obj_name} ...")
        with open(local_file, 'wb') as f:
            response = object_storage.get_object(namespace, BUCKET_NAME, obj_name)
            for chunk in response.data.raw.stream(1024 * 1024, decode_content=False):
                f.write(chunk)
print("‚úÖ All Parquet files downloaded.")

# --- 4. Load Embedding Model ---
model = SentenceTransformer("BAAI/bge-small-en-v1.5")
print("‚úÖ Embedding model loaded (384 dimensions).")

# --- 5. Connect to PostgreSQL ---
conn = psycopg2.connect(**DB_CONFIG)
conn.autocommit = True
cursor = conn.cursor()

# --- 6. Create Table with Vector Extension ---
cursor.execute("""
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE IF NOT EXISTS legal_docs (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding VECTOR(384)
);
""")
print("‚úÖ Table ready.")

# --- 7. Insert Helper with Error Handling ---
def insert_batch(batch):
    if not batch:
        print("‚ö†Ô∏è  Empty batch, skipping insert.")
        return
    try:
        args_str = ",".join(cursor.mogrify("(%s, %s)", x).decode("utf-8") for x in batch)
        cursor.execute("INSERT INTO legal_docs (content, embedding) VALUES " + args_str)
        print(f"‚úÖ Inserted {len(batch)} records.")
    except Exception as e:
        print(f"‚ùå Error during batch insert: {e}")

# --- 8. Process and Vectorize Each Parquet File ---
BATCH_SIZE = 500
local_files = sorted([f for f in os.listdir(DOWNLOAD_DIR) if f.endswith(".parquet")])
total_inserted = 0

for file in local_files:
    file_path = os.path.join(DOWNLOAD_DIR, file)
    print(f"\nüîÑ Processing {file} ...")
    
    try:
        df = pd.read_parquet(file_path)
        print(f"üìÑ Columns found: {list(df.columns)}")
        print(f"üî¢ Rows in file: {len(df)}")
        
        if "text" not in df.columns:
            print(f"‚ö†Ô∏è  'text' column not found in {file}, skipping.")
            continue

        df = df[df["text"].notna() & (df["text"].str.strip() != "")]
        print(f"üßπ Rows with non-empty 'text': {len(df)}")

        batch = []
        for _, row in tqdm(df.iterrows(), total=len(df), desc="Embedding & Inserting"):
            try:
                text = row["text"]
                embedding = model.encode(text).tolist()
                if len(embedding) != 384:
                    print("‚ùå Skipping due to unexpected vector length.")
                    continue
                batch.append((text, embedding))
                if len(batch) >= BATCH_SIZE:
                    insert_batch(batch)
                    total_inserted += len(batch)
                    batch = []
            except Exception as e:
                print(f"‚ö†Ô∏è  Skipping row due to error: {e}")
        
        if batch:
            insert_batch(batch)
            total_inserted += len(batch)

    except Exception as e:
        print(f"‚ùå Failed to process {file}: {e}")

print(f"\n‚úÖ Total records inserted: {total_inserted}")

# --- 9. Create Vector Index for Cosine Similarity ---
print("‚öôÔ∏è Creating vector index for cosine similarity...")
cursor.execute("""
CREATE INDEX IF NOT EXISTS legal_docs_cosine_idx
ON legal_docs USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100);
""")
cursor.execute("ANALYZE legal_docs;")
print("‚úÖ Vector index created.")

# --- 10. Final Table Check ---
cursor.execute("SELECT COUNT(*) FROM legal_docs;")
print("üìä Total rows in legal_docs:", cursor.fetchone()[0])


üîç Listing objects in bucket...
‚úÖ All Parquet files downloaded.
‚úÖ Embedding model loaded (384 dimensions).
‚úÖ Table ready.

üîÑ Processing 0000.parquet ...
üìÑ Columns found: ['version_id', 'type', 'jurisdiction', 'source', 'mime', 'date', 'citation', 'url', 'when_scraped', 'text']
üî¢ Rows in file: 11024
üßπ Rows with non-empty 'text': 11024


Embedding & Inserting:   5%|‚ñç         | 504/11024 [00:16<19:36,  8.94it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:   9%|‚ñâ         | 1008/11024 [00:33<13:38, 12.23it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  14%|‚ñà‚ñé        | 1510/11024 [00:50<10:30, 15.08it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  18%|‚ñà‚ñä        | 2005/11024 [01:05<12:24, 12.11it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  23%|‚ñà‚ñà‚ñé       | 2502/11024 [01:25<20:23,  6.97it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  27%|‚ñà‚ñà‚ñã       | 3000/11024 [01:40<12:10, 10.98it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  32%|‚ñà‚ñà‚ñà‚ñè      | 3503/11024 [01:58<12:00, 10.43it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  36%|‚ñà‚ñà‚ñà‚ñã      | 4000/11024 [02:14<11:22, 10.28it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  41%|‚ñà‚ñà‚ñà‚ñà      | 4501/11024 [02:32<09:07, 11.91it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  45%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 5005/11024 [02:50<07:52, 12.73it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  50%|‚ñà‚ñà‚ñà‚ñà‚ñâ     | 5505/11024 [03:07<06:43, 13.68it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  54%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç    | 5997/11024 [03:23<04:05, 20.51it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  59%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ    | 6509/11024 [03:42<05:09, 14.57it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  64%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé   | 7005/11024 [04:00<05:16, 12.69it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  68%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä   | 7507/11024 [04:16<04:02, 14.50it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  73%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé  | 8001/11024 [04:34<04:44, 10.64it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  77%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã  | 8506/11024 [04:52<03:18, 12.68it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  82%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè | 9010/11024 [05:09<02:05, 16.05it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  86%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå | 9500/11024 [05:25<02:10, 11.69it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  91%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 10005/11024 [05:41<01:28, 11.51it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  95%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 10504/11024 [05:59<00:37, 14.05it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ| 11004/11024 [06:17<00:02,  9.77it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11024/11024 [06:18<00:00, 29.13it/s]


‚úÖ Inserted 24 records.

üîÑ Processing 0001.parquet ...
üìÑ Columns found: ['version_id', 'type', 'jurisdiction', 'source', 'mime', 'date', 'citation', 'url', 'when_scraped', 'text']
üî¢ Rows in file: 11528
üßπ Rows with non-empty 'text': 11528


Embedding & Inserting:   4%|‚ñç         | 505/11528 [00:17<14:22, 12.77it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:   9%|‚ñä         | 1004/11528 [00:35<15:19, 11.44it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  13%|‚ñà‚ñé        | 1504/11528 [00:53<14:24, 11.60it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  17%|‚ñà‚ñã        | 2005/11528 [01:10<12:04, 13.14it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  22%|‚ñà‚ñà‚ñè       | 2506/11528 [01:27<10:43, 14.01it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  26%|‚ñà‚ñà‚ñå       | 3000/11528 [01:44<13:27, 10.56it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  30%|‚ñà‚ñà‚ñà       | 3506/11528 [02:01<12:52, 10.38it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  35%|‚ñà‚ñà‚ñà‚ñç      | 4000/11528 [02:20<13:48,  9.08it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  39%|‚ñà‚ñà‚ñà‚ñâ      | 4502/11528 [02:37<11:40, 10.03it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  43%|‚ñà‚ñà‚ñà‚ñà‚ñé     | 5004/11528 [02:55<10:29, 10.37it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  48%|‚ñà‚ñà‚ñà‚ñà‚ñä     | 5507/11528 [03:12<07:17, 13.77it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  52%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè    | 6007/11528 [03:30<07:09, 12.86it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  56%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã    | 6503/11528 [03:48<07:01, 11.92it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  61%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà    | 7003/11528 [04:04<06:55, 10.90it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  65%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå   | 7502/11528 [04:20<07:21,  9.12it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  69%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ   | 8005/11528 [04:36<04:43, 12.43it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  74%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç  | 8503/11528 [04:52<05:14,  9.60it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  78%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä  | 9003/11528 [05:08<03:56, 10.66it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  82%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè | 9505/11528 [05:24<02:25, 13.92it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  87%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã | 10003/11528 [05:40<02:18, 11.04it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  91%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 10504/11528 [05:56<01:16, 13.35it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  95%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 11003/11528 [06:11<00:43, 11.97it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ| 11505/11528 [06:27<00:01, 12.51it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11528/11528 [06:28<00:00, 29.64it/s]


‚úÖ Inserted 28 records.

üîÑ Processing 0002.parquet ...
üìÑ Columns found: ['version_id', 'type', 'jurisdiction', 'source', 'mime', 'date', 'citation', 'url', 'when_scraped', 'text']
üî¢ Rows in file: 12078
üßπ Rows with non-empty 'text': 12078


Embedding & Inserting:   4%|‚ñç         | 503/12078 [00:15<15:28, 12.47it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:   8%|‚ñä         | 1004/12078 [00:31<16:42, 11.05it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  12%|‚ñà‚ñè        | 1503/12078 [00:47<15:17, 11.52it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  17%|‚ñà‚ñã        | 2003/12078 [01:02<13:57, 12.03it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  21%|‚ñà‚ñà        | 2507/12078 [01:18<12:29, 12.77it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  25%|‚ñà‚ñà‚ñç       | 3003/12078 [01:34<15:01, 10.07it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  29%|‚ñà‚ñà‚ñâ       | 3502/12078 [01:49<12:47, 11.17it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  33%|‚ñà‚ñà‚ñà‚ñé      | 4002/12078 [02:05<12:44, 10.57it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  37%|‚ñà‚ñà‚ñà‚ñã      | 4501/12078 [02:21<13:44,  9.19it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  41%|‚ñà‚ñà‚ñà‚ñà‚ñè     | 5005/12078 [02:38<08:37, 13.66it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  46%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 5506/12078 [02:55<09:14, 11.85it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  50%|‚ñà‚ñà‚ñà‚ñà‚ñâ     | 5999/12078 [03:10<03:11, 31.67it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  54%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç    | 6501/12078 [03:27<11:48,  7.87it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  58%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä    | 7005/12078 [03:44<06:16, 13.49it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  62%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè   | 7506/12078 [04:00<05:23, 14.14it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  66%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã   | 8003/12078 [04:18<07:59,  8.50it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  70%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 8508/12078 [04:34<03:51, 15.45it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç  | 9004/12078 [04:50<04:24, 11.62it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  79%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä  | 9504/12078 [05:06<03:47, 11.33it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  83%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé | 10003/12078 [05:23<03:00, 11.52it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  87%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã | 10502/12078 [05:40<02:57,  8.88it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  91%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 11003/12078 [05:58<01:40, 10.74it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  95%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 11503/12078 [06:15<00:55, 10.32it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  99%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ| 12007/12078 [06:32<00:06, 11.33it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 12078/12078 [06:34<00:00, 30.58it/s]


‚úÖ Inserted 78 records.

üîÑ Processing 0003.parquet ...
üìÑ Columns found: ['version_id', 'type', 'jurisdiction', 'source', 'mime', 'date', 'citation', 'url', 'when_scraped', 'text']
üî¢ Rows in file: 11957
üßπ Rows with non-empty 'text': 11957


Embedding & Inserting:   4%|‚ñç         | 503/11957 [00:17<17:10, 11.11it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:   8%|‚ñä         | 1004/11957 [00:35<14:39, 12.45it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  13%|‚ñà‚ñé        | 1504/11957 [00:51<17:07, 10.18it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  17%|‚ñà‚ñã        | 2001/11957 [01:08<16:50,  9.86it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  21%|‚ñà‚ñà        | 2503/11957 [01:25<15:42, 10.03it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  25%|‚ñà‚ñà‚ñå       | 3005/11957 [01:43<12:08, 12.29it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  29%|‚ñà‚ñà‚ñâ       | 3508/11957 [01:59<09:52, 14.25it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  33%|‚ñà‚ñà‚ñà‚ñé      | 4003/11957 [02:15<10:42, 12.37it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  38%|‚ñà‚ñà‚ñà‚ñä      | 4501/11957 [02:31<15:45,  7.88it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  42%|‚ñà‚ñà‚ñà‚ñà‚ñè     | 5003/11957 [02:48<11:39,  9.95it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  46%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 5503/11957 [03:04<08:26, 12.75it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 6003/11957 [03:21<08:39, 11.47it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  54%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç    | 6504/11957 [03:36<07:27, 12.19it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  59%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä    | 7003/11957 [03:53<09:19,  8.85it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  63%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé   | 7503/11957 [04:10<07:33,  9.82it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  67%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã   | 8008/11957 [04:26<04:50, 13.58it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  71%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 8503/11957 [04:45<05:16, 10.91it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 9006/11957 [05:01<03:48, 12.92it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  80%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ  | 9507/11957 [05:18<02:56, 13.90it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  84%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé | 10006/11957 [05:33<02:22, 13.71it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  88%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä | 10505/11957 [05:48<01:53, 12.81it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  92%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè| 11005/11957 [06:04<01:06, 14.35it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  96%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 11500/11957 [06:20<00:41, 10.90it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11957/11957 [06:34<00:00, 30.30it/s]


‚úÖ Inserted 457 records.

üîÑ Processing 0004.parquet ...
üìÑ Columns found: ['version_id', 'type', 'jurisdiction', 'source', 'mime', 'date', 'citation', 'url', 'when_scraped', 'text']
üî¢ Rows in file: 11993
üßπ Rows with non-empty 'text': 11993


Embedding & Inserting:   4%|‚ñç         | 506/11993 [00:16<14:26, 13.26it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:   8%|‚ñä         | 1006/11993 [00:31<12:47, 14.31it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  13%|‚ñà‚ñé        | 1504/11993 [00:48<14:19, 12.21it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  17%|‚ñà‚ñã        | 2004/11993 [01:05<15:20, 10.85it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  21%|‚ñà‚ñà        | 2506/11993 [01:22<12:27, 12.69it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  25%|‚ñà‚ñà‚ñå       | 3004/11993 [01:38<14:04, 10.65it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  29%|‚ñà‚ñà‚ñâ       | 3504/11993 [01:54<11:40, 12.11it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  33%|‚ñà‚ñà‚ñà‚ñé      | 4002/11993 [02:11<14:05,  9.45it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  38%|‚ñà‚ñà‚ñà‚ñä      | 4509/11993 [02:27<07:53, 15.80it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  42%|‚ñà‚ñà‚ñà‚ñà‚ñè     | 5003/11993 [02:44<11:38, 10.00it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  46%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 5506/11993 [03:00<07:22, 14.66it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 6003/11993 [03:17<11:34,  8.63it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  54%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç    | 6507/11993 [03:34<06:54, 13.24it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  58%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä    | 7005/11993 [03:50<05:53, 14.11it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  63%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé   | 7505/11993 [04:07<05:08, 14.57it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  67%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã   | 8001/11993 [04:24<06:57,  9.56it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  71%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 8503/11993 [04:41<05:34, 10.43it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 9003/11993 [04:59<04:33, 10.95it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  79%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ  | 9505/11993 [05:15<03:02, 13.65it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  83%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé | 10006/11993 [05:30<02:16, 14.53it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  88%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä | 10504/11993 [05:46<01:59, 12.42it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  92%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè| 11006/11993 [06:02<01:18, 12.56it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  96%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 11501/11993 [06:19<00:51,  9.62it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11993/11993 [06:34<00:00, 30.41it/s]


‚úÖ Inserted 493 records.

üîÑ Processing 0005.parquet ...
üìÑ Columns found: ['version_id', 'type', 'jurisdiction', 'source', 'mime', 'date', 'citation', 'url', 'when_scraped', 'text']
üî¢ Rows in file: 11863
üßπ Rows with non-empty 'text': 11863


Embedding & Inserting:   4%|‚ñç         | 505/11863 [00:17<16:17, 11.62it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:   8%|‚ñä         | 1004/11863 [00:34<14:46, 12.25it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  13%|‚ñà‚ñé        | 1506/11863 [00:51<13:36, 12.68it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  17%|‚ñà‚ñã        | 2001/11863 [01:08<16:49,  9.77it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  21%|‚ñà‚ñà        | 2503/11863 [01:25<16:40,  9.35it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  25%|‚ñà‚ñà‚ñå       | 3006/11863 [01:41<09:26, 15.64it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  30%|‚ñà‚ñà‚ñâ       | 3500/11863 [01:57<16:49,  8.28it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  34%|‚ñà‚ñà‚ñà‚ñé      | 4003/11863 [02:14<13:40,  9.57it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  38%|‚ñà‚ñà‚ñà‚ñä      | 4508/11863 [02:30<09:15, 13.23it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  42%|‚ñà‚ñà‚ñà‚ñà‚ñè     | 4999/11863 [02:46<03:15, 35.06it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  46%|‚ñà‚ñà‚ñà‚ñà‚ñã     | 5503/11863 [03:04<08:30, 12.46it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  51%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 6005/11863 [03:22<07:41, 12.70it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  55%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç    | 6506/11863 [03:38<09:04,  9.84it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  59%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ    | 7005/11863 [03:54<05:52, 13.80it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  63%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé   | 7504/11863 [04:13<06:12, 11.72it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  67%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã   | 8003/11863 [04:28<05:58, 10.76it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  72%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  | 8505/11863 [04:44<05:18, 10.56it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  76%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 9006/11863 [05:00<03:11, 14.93it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  80%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 9505/11863 [05:16<02:58, 13.21it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  84%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç | 10006/11863 [05:31<02:14, 13.80it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  89%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä | 10506/11863 [05:49<01:49, 12.41it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  93%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé| 11005/11863 [06:05<01:14, 11.47it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  97%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã| 11506/11863 [06:21<00:22, 15.78it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11863/11863 [06:33<00:00, 30.18it/s]


‚úÖ Inserted 363 records.

üîÑ Processing 0006.parquet ...
üìÑ Columns found: ['version_id', 'type', 'jurisdiction', 'source', 'mime', 'date', 'citation', 'url', 'when_scraped', 'text']
üî¢ Rows in file: 11922
üßπ Rows with non-empty 'text': 11922


Embedding & Inserting:   4%|‚ñç         | 510/11922 [00:17<12:44, 14.93it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:   8%|‚ñä         | 1005/11922 [00:33<13:24, 13.57it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  13%|‚ñà‚ñé        | 1504/11922 [00:50<14:01, 12.38it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  17%|‚ñà‚ñã        | 2001/11922 [01:07<15:26, 10.70it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  21%|‚ñà‚ñà        | 2506/11922 [01:24<13:29, 11.64it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  25%|‚ñà‚ñà‚ñå       | 3008/11922 [01:40<11:12, 13.26it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  29%|‚ñà‚ñà‚ñâ       | 3506/11922 [01:57<11:06, 12.62it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  34%|‚ñà‚ñà‚ñà‚ñé      | 4005/11922 [02:12<07:51, 16.78it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  38%|‚ñà‚ñà‚ñà‚ñä      | 4501/11922 [02:28<12:38,  9.79it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  42%|‚ñà‚ñà‚ñà‚ñà‚ñè     | 5004/11922 [02:43<09:10, 12.57it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  46%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 5504/11922 [02:59<09:11, 11.65it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 6007/11922 [03:16<07:53, 12.49it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  55%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç    | 6507/11922 [03:33<07:11, 12.54it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  59%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä    | 7000/11922 [03:51<09:38,  8.51it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  63%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé   | 7505/11922 [04:06<04:44, 15.54it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  67%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã   | 8007/11922 [04:23<04:55, 13.25it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  71%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  | 8502/11922 [04:40<05:54,  9.66it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  76%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 9006/11922 [04:57<03:50, 12.63it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  80%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ  | 9505/11922 [05:12<02:52, 14.02it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  84%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç | 10006/11922 [05:28<02:34, 12.41it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  88%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä | 10504/11922 [05:44<02:04, 11.35it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  92%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè| 11003/11922 [06:00<01:34,  9.71it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  97%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã| 11506/11922 [06:18<00:33, 12.51it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11922/11922 [06:33<00:00, 30.33it/s]


‚úÖ Inserted 422 records.

üîÑ Processing 0007.parquet ...
üìÑ Columns found: ['version_id', 'type', 'jurisdiction', 'source', 'mime', 'date', 'citation', 'url', 'when_scraped', 'text']
üî¢ Rows in file: 12006
üßπ Rows with non-empty 'text': 12006


Embedding & Inserting:   4%|‚ñç         | 506/12006 [00:16<12:44, 15.04it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:   8%|‚ñä         | 1007/12006 [00:33<13:59, 13.10it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  13%|‚ñà‚ñé        | 1506/12006 [00:50<12:40, 13.81it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  17%|‚ñà‚ñã        | 2006/12006 [01:08<15:16, 10.91it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  21%|‚ñà‚ñà        | 2503/12006 [01:24<15:25, 10.27it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  25%|‚ñà‚ñà‚ñå       | 3005/12006 [01:40<11:00, 13.63it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  29%|‚ñà‚ñà‚ñâ       | 3503/12006 [01:57<12:53, 10.99it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  33%|‚ñà‚ñà‚ñà‚ñé      | 4004/12006 [02:13<13:46,  9.69it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  38%|‚ñà‚ñà‚ñà‚ñä      | 4505/12006 [02:28<08:00, 15.63it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  42%|‚ñà‚ñà‚ñà‚ñà‚ñè     | 5003/12006 [02:44<09:31, 12.26it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  46%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 5508/12006 [03:00<06:24, 16.89it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 6004/12006 [03:17<07:16, 13.74it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  54%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç    | 6505/12006 [03:34<06:42, 13.67it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  58%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä    | 7008/12006 [03:51<07:19, 11.37it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  63%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé   | 7504/12006 [04:07<07:07, 10.53it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  67%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã   | 8006/12006 [04:24<05:59, 11.13it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  71%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 8505/12006 [04:39<04:02, 14.44it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç  | 9003/12006 [04:54<03:46, 13.28it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  79%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ  | 9505/12006 [05:10<03:06, 13.40it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  83%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé | 10009/12006 [05:27<02:23, 13.94it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  87%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã | 10504/12006 [05:44<02:12, 11.33it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  92%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè| 11001/12006 [06:01<01:35, 10.51it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  96%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå| 11503/12006 [06:17<00:41, 12.04it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ| 12003/12006 [06:34<00:00, 10.22it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 12006/12006 [06:34<00:00, 30.43it/s]


‚úÖ Inserted 6 records.

üîÑ Processing 0008.parquet ...
üìÑ Columns found: ['version_id', 'type', 'jurisdiction', 'source', 'mime', 'date', 'citation', 'url', 'when_scraped', 'text']
üî¢ Rows in file: 12127
üßπ Rows with non-empty 'text': 12127


Embedding & Inserting:   4%|‚ñç         | 506/12127 [00:16<17:06, 11.32it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:   8%|‚ñä         | 1005/12127 [00:32<14:22, 12.89it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  12%|‚ñà‚ñè        | 1507/12127 [00:49<13:38, 12.98it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  17%|‚ñà‚ñã        | 2002/12127 [01:06<22:46,  7.41it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  21%|‚ñà‚ñà        | 2508/12127 [01:22<10:39, 15.04it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  25%|‚ñà‚ñà‚ñç       | 3008/12127 [01:40<11:20, 13.41it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  29%|‚ñà‚ñà‚ñâ       | 3504/12127 [01:57<12:50, 11.19it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  33%|‚ñà‚ñà‚ñà‚ñé      | 4003/12127 [02:14<11:06, 12.19it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  37%|‚ñà‚ñà‚ñà‚ñã      | 4504/12127 [02:30<12:42, 10.00it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  41%|‚ñà‚ñà‚ñà‚ñà‚ñè     | 5004/12127 [02:46<08:56, 13.27it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  45%|‚ñà‚ñà‚ñà‚ñà‚ñå     | 5504/12127 [03:03<09:37, 11.46it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  49%|‚ñà‚ñà‚ñà‚ñà‚ñâ     | 6002/12127 [03:19<08:25, 12.12it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  54%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé    | 6508/12127 [03:35<05:36, 16.70it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  58%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä    | 7003/12127 [03:51<08:17, 10.29it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  62%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè   | 7507/12127 [04:07<05:41, 13.54it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  66%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå   | 8001/12127 [04:23<06:17, 10.93it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  70%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà   | 8503/12127 [04:38<05:11, 11.63it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  74%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç  | 9007/12127 [04:53<03:36, 14.43it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  78%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä  | 9504/12127 [05:07<02:56, 14.83it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  82%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè | 10004/12127 [05:23<02:22, 14.91it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  87%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã | 10503/12127 [05:38<02:56,  9.19it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  91%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà | 11007/12127 [05:54<01:21, 13.80it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  95%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç| 11503/12127 [06:09<00:50, 12.30it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  99%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ| 12002/12127 [06:23<00:09, 12.84it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 12127/12127 [06:27<00:00, 31.30it/s]


‚úÖ Inserted 127 records.

üîÑ Processing 0009.parquet ...
üìÑ Columns found: ['version_id', 'type', 'jurisdiction', 'source', 'mime', 'date', 'citation', 'url', 'when_scraped', 'text']
üî¢ Rows in file: 11848
üßπ Rows with non-empty 'text': 11848


Embedding & Inserting:   4%|‚ñç         | 504/11848 [00:16<14:39, 12.89it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:   8%|‚ñä         | 1004/11848 [00:31<11:37, 15.55it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  13%|‚ñà‚ñé        | 1506/11848 [00:47<13:44, 12.55it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  17%|‚ñà‚ñã        | 2003/11848 [01:03<13:44, 11.94it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  21%|‚ñà‚ñà        | 2502/11848 [01:18<13:18, 11.71it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  25%|‚ñà‚ñà‚ñå       | 2999/11848 [01:33<03:07, 47.09it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  30%|‚ñà‚ñà‚ñâ       | 3501/11848 [01:50<13:33, 10.26it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  34%|‚ñà‚ñà‚ñà‚ñç      | 4005/11848 [02:05<11:13, 11.65it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  38%|‚ñà‚ñà‚ñà‚ñä      | 4501/11848 [02:22<10:43, 11.42it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  42%|‚ñà‚ñà‚ñà‚ñà‚ñè     | 5003/11848 [02:36<08:23, 13.58it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  46%|‚ñà‚ñà‚ñà‚ñà‚ñã     | 5504/11848 [02:51<09:23, 11.25it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  51%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 6007/11848 [03:05<05:31, 17.61it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  55%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç    | 6505/11848 [03:20<06:33, 13.57it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  59%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ    | 7004/11848 [03:35<07:03, 11.44it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  63%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé   | 7503/11848 [03:51<06:08, 11.78it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  68%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä   | 8007/11848 [04:07<05:08, 12.47it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  72%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  | 8505/11848 [04:22<03:15, 17.09it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  76%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 9003/11848 [04:38<03:51, 12.29it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  80%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 9506/11848 [04:53<02:41, 14.47it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  84%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç | 10006/11848 [05:09<01:48, 16.90it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  89%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä | 10507/11848 [05:23<01:26, 15.53it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  93%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé| 11002/11848 [05:38<01:16, 11.06it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting:  97%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã| 11506/11848 [05:53<00:22, 15.27it/s]

‚úÖ Inserted 500 records.


Embedding & Inserting: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 11848/11848 [06:02<00:00, 32.67it/s]


‚úÖ Inserted 348 records.

‚úÖ Total records inserted: 118346
‚öôÔ∏è Creating vector index for cosine similarity...
‚úÖ Vector index created.
üìä Total rows in legal_docs: 118346
