1. Introduction

    Background and Motivation
    The Role of AI in Legal Interpretation
    Challenges of Bias in Legal AI Models
    Distinction Between Interpretability and Explainability
    Objectives and Hypotheses of the Study

2. Literature Review

    LegalBERT and Its Applications
    Bias in Natural Language Processing Models
    Understanding Interpretability vs. Explainability
        Definitions and Differences
        Importance in AI Models
    Attention Mechanisms in Deep Learning
    Interpretability Methods
        Integrated Gradients
        Saliency Scores
    Explainability Methods
        Counterfactual Explanations
        Other Explainability Techniques
    Previous Work on Bias Detection and Mitigation

3. Methodology

    Data Collection and Preparation
        Sources of Legislative Data
        Preprocessing Techniques
    Model Fine-Tuning Process
        Details of Fine-Tuning LegalBERT
        Baseline Models for Comparison
    Experimental Design
        Prompt Perturbation Strategies
        Implementation of Interpretability Methods
        Implementation of Explainability Methods
    Evaluation Metrics
        Measuring Sensitivity and Bias
        Statistical Analysis Techniques

4. Results

    Robustness to Prompt Perturbations
        Impact on Model Predictions
        Statistical Significance Testing
    Analysis of Attention Mechanisms
        Findings from Interpretability Methods
            Integrated Gradients Results
            Saliency Score Patterns
        Insights from Explainability Methods
            Counterfactual Explanation Findings
    Identification of Unintended Biases
        Specific Linguistic Features Influencing the Model
        Quantification of Bias Effects

5. Discussion

    Interpretation of Results
        Comparing Findings with Null Hypothesis
        Implications for Legal Interpretations
    Sources of Bias
        Data-Driven Biases
        Model Architecture Considerations
    Effectiveness of Interpretability and Explainability Methods
        Complementary Insights Provided
    Limitations of the Study
        Data Limitations
        Methodological Constraints

6. Bias Mitigation Techniques

    Alignment Strategies
        Approaches to Reduce Bias
        Implementation Details
    Evaluation of Mitigation Efforts
        Post-Mitigation Analysis
        Effectiveness of Different Techniques
    Recommendations for Model Improvement

7. Conclusion

    Summary of Key Findings
    Contributions to the Field
    Implications for Future Research and Practice
    Final Remarks

8. References

    Citations of All Sources Used

9. Appendices

    Supplementary Data
        Additional Tables and Figures
    Detailed Methodological Procedures
    Code and Model Availability

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Data Collection:


https://www.congress.gov/browse/policyarea/93rd-congress

In [None]:
import pandas as pd

# Load your CSV file
file_path = './93-118OCT28MasterCSV.csv'  # Replace with your file path
df = pd.read_csv(file_path)

# Check for duplicate rows
num_duplicates = df.duplicated().sum()

print(f"Number of duplicate rows: {num_duplicates}")


Number of duplicate rows: 0


In [None]:
df.dropna(subset=['Title', 'Subject'], inplace=True)

In [None]:
df['Title'] = df['Title'].str.lower()

In [None]:
df['Title'] = df['Title'].str.strip().str.replace(r'\s+', ' ', regex=True)

In [None]:
# Remove specific symbols but keep common punctuation
df['Title'] = df['Title'].str.replace(r'[^\w\s,;:.]', '', regex=True)

In [None]:
import pandas as pd
from sklearn.preprocessing import LabelEncoder

# Encode the 'category' column
le = LabelEncoder()
df['category_encoded'] = le.fit_transform(df['Subject'])

# Create a dynamic mapping DataFrame
mapping_df = pd.DataFrame({
    'Category': le.classes_,
    'Encoded Value': range(len(le.classes_))
})

# Save the mapping to a CSV file
mapping_df.to_csv('subject_mapping.csv', index=False)

print("Mapping file created as 'category_mapping.csv'")
print(mapping_df)


Mapping file created as 'category_mapping.csv'
                                       Category  Encoded Value
0                          Agriculture and Food              0
1                                       Animals              1
2            Armed Forces and National Security              2
3                   Arts, Culture, and Religion              3
4   Civil Rights and Liberties, Minority Issues              4
5                                      Commerce              5
6                                      Congress              6
7                     Crime and Law Enforcement              7
8                   Economics and PublicFinance              8
9                                     Education              9
10                         Emergency Management             10
11                                       Energy             11
12                     Environmental Protection             12
13                                     Families             13
14      

In [None]:
import pandas as pd
import torch
import wandb
from transformers import BertTokenizer, BertForSequenceClassification, Trainer, TrainingArguments
from torch.utils.data import Dataset
from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
import json

# init wanb for real time train info
wandb.init(project="legal-bert-classification", name="training-run-1")

# had to buy credit on colab to use T4 gpu
device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu")
print(f"Using device: {device}")


texts = df['Title'].astype(str).tolist()
labels = df['Subject'].tolist()

# encode numeric
label_encoder = LabelEncoder()
encoded_labels = label_encoder.fit_transform(labels)
num_labels = len(label_encoder.classes_)

# log to wanb and save model
label_mapping = dict(zip(label_encoder.classes_, range(len(label_encoder.classes_))))
wandb.log({"label_mapping": label_mapping})

# save label as file
with open('label_mapping.json', 'w') as f:
    json.dump(label_mapping, f)

print("Label mapping saved to 'label_mapping.json'")

# save encode to csv
labels_df = pd.DataFrame({
    'Original Label': labels,
    'Encoded Label': encoded_labels
})
labels_df.to_csv('saved_labels.csv', index=False)
print("Original and encoded labels saved to 'saved_labels.csv'")

# split
train_texts, val_texts, train_labels, val_labels = train_test_split(texts, encoded_labels, test_size=0.2, random_state=42)

# tokenizer
model_name = "nlpaueb/legal-bert-base-uncased"
tokenizer = BertTokenizer.from_pretrained(model_name)
def tokenize_function(texts):
    return tokenizer(texts, padding=True, truncation=True, max_length=512)

train_encodings = tokenize_function(train_texts)
val_encodings = tokenize_function(val_texts)

# custom pytorch class to handle encodings
class LegislativeDataset(Dataset):

    # constructor method that init
    def __init__(self, encodings, labels):
        self.encodings = encodings  # Store the tokenized inputs (encodings) as a class attribute
        self.labels = labels  # Store the labels (categories) as a class attribute

    # method to get a single data sample
    def __getitem__(self, idx):
        # create a dictionary of tensors for each encoding key and the associated value at the index
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        # add the label at index idx to the item dictionary
        item['labels'] = torch.tensor(self.labels[idx])
        return item

    # method to return the total number of samples
    def __len__(self):
        return len(self.labels)


# create train and validation
train_dataset = LegislativeDataset(train_encodings, train_labels)
val_dataset = LegislativeDataset(val_encodings, val_labels)

# load legal bert model for classification
model = BertForSequenceClassification.from_pretrained(model_name, num_labels=num_labels)
model.to(device)

# training arguments with wandb
training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy='epoch',
    save_strategy='epoch',
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=2,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=10,
    load_best_model_at_end=True,
    learning_rate=1e-6,
    gradient_accumulation_steps=2,
    max_grad_norm=0.5,
    # Add wandb reporting
    report_to="wandb",
    # Add run name for wandb
    run_name="legal-bert-training"
)

# trainer object for training and evaluation
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset
)

# train
trainer.train()

# save fine tune model
model.save_pretrained('./fine_tuned_legalbert')
tokenizer.save_pretrained('./fine_tuned_legalbert')

# evaluate model on the validation set
eval_results = trainer.evaluate()

# log final evaluation results to wandb
wandb.log({"final_evaluation": eval_results})

print(f"Evaluation results: {eval_results}")

# close wanb
wandb.finish()


VBox(children=(Label(value='0.026 MB of 0.026 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
eval/loss,█▁
eval/runtime,█▁
eval/samples_per_second,▁█
eval/steps_per_second,▁█
train/epoch,▁▁▂▂▂▂▂▂▂▃▃▄▄▄▄▄▄▄▄▄▅▅▅▅▅▅▅▅▅▆▆▆▆▇▇▇▇▇▇█
train/global_step,▁▁▁▁▁▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▄▄▄▄▄▅▅▆▆▆▆▆▆▇▇▇▇▇██
train/grad_norm,▅▄▂▃▁▂▃▁▃▃▄▅▃▂▄▃▄█▆▄▇▄▆▆▇▃▅▂▆▆▄▄▃▅▆▇▅▆▆▅
train/learning_rate,████▇▇▇▇▇▇▆▆▆▆▆▅▅▅▅▅▅▅▅▅▅▄▃▃▃▃▃▃▃▂▂▂▁▁▁▁
train/loss,███▇▇▅▃▃▄▃▂▂▂▂▂▂▂▂▂▂▁▂▂▁▂▁▁▂▁▁▁▁▂▁▁▁▁▁▁▁

0,1
eval/loss,1.09822
eval/runtime,1423.6263
eval/samples_per_second,45.016
eval/steps_per_second,5.627
train/epoch,2.11341
train/global_step,33860.0
train/grad_norm,11.06134
train/learning_rate,0.0
train/loss,1.1037


Using device: cuda
Label mapping saved to 'label_mapping.json'
Original and encoded labels saved to 'saved_labels.csv'


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at nlpaueb/legal-bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch,Training Loss,Validation Loss


Epoch,Training Loss,Validation Loss
0,1.386,1.330662
1,0.9327,1.210325


Evaluation results: {'eval_loss': 1.2103253602981567, 'eval_runtime': 1418.071, 'eval_samples_per_second': 45.192, 'eval_steps_per_second': 5.649, 'epoch': 1.9999375838716724}


VBox(children=(Label(value='0.025 MB of 0.025 MB uploaded\r'), FloatProgress(value=1.0, max=1.0)))

0,1
eval/loss,█▁▁
eval/runtime,▁█▆
eval/samples_per_second,█▁▃
eval/steps_per_second,█▁▃
train/epoch,▁▁▂▂▂▂▂▂▃▃▃▃▃▃▃▄▄▄▄▄▄▄▄▅▅▅▅▆▆▆▆▆▆▆▆▇▇███
train/global_step,▁▁▁▁▁▂▂▂▂▃▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▆▆▆▆▇▇▇▇▇█████
train/grad_norm,▂▂█▁▁▂▂▁▁▁▁▂▁▁▁▁▁▂▁▁▁▂▂▁▂▁▂▂▂▂▁▂▂▂▁▂▂▃▁▁
train/learning_rate,█████▇▇▇▇▇▇▇▇▇▆▆▆▅▅▅▅▅▅▅▅▄▄▃▃▃▂▂▂▂▂▁▁▁▁▁
train/loss,██▇██▇▅▄▄▄▃▃▂▃▃▂▂▃▂▂▁▂▂▂▂▁▂▁▂▂▂▂▁▁▁▂▂▂▁▂

0,1
eval/loss,1.21033
eval/runtime,1418.071
eval/samples_per_second,45.192
eval/steps_per_second,5.649
total_flos,9.618710848776e+16
train/epoch,1.99994
train/global_step,32042.0
train/grad_norm,14.8558
train/learning_rate,0.0
train/loss,0.9327
