<a href="https://colab.research.google.com/github/mlragland/casehold_transformers/blob/main/initialization.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Title: Initializing the Environment for Zero-shot MCQA Evaluation with Legal Pretrained Transformer Models using LexGLUE CaseHold Benchmark

## Introduction:

This notebook serves as the initialization step for our exploration into Zero-shot Multiple Choice Question Answering (MCQA) Evaluation leveraging legal domain Pretrained Transformer Models. Our analysis utilizes the LexGLUE CaseHold benchmark dataset, which is a significant resource in evaluating model performance on legal textual data.

Our endeavor is inspired and builds upon the seminal work presented in the paper "When Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset" by Lucia Zheng et al. The insights and findings from this paper provide a valuable foundation for our exploration.

Within this notebook, we establish the primary environment required for our further experimentation and analysis. The code herein ensures that all necessary base files, stored in the associated repository, are correctly set up. This setup is crucial for the subsequent phases of our project where we delve deeper into model evaluation and analysis to uncover the potential and limitations of Transformer models in legal MCQA tasks.

The initialized environment herein is the stepping stone towards a comprehensive analysis, ensuring that all requisite data and configurations are accurately in place.

## Environment Setup
- **Platform**: Google Colaboratory (Colab) with a T4 GPU is utilized for code execution, experimentation, and evaluation.
- **Libraries**: Dependencies are managed through a `requirements.txt` file to ensure reproducibility.
- **Storage**: Google Drive is mounted for file storage, with additional files accessible from the [GitHub repository](https://github.com/mlragland/casehold_transformers).

The first steps will be done one time to set up the environment, data, and to copy files form collaboratory repositories.  


NOTE THE FILE HAS BEEN REDACTED TO PROTECT SENSITIVE INFORMATION FEEL FREE TO UPDATE WITH YOUR PERSONAL CREDENTIALS

In [None]:
# Mount the Google
# Mount Google Drive:
from google.colab import drive
drive.mount( path://'')

Mounted at /content/drive


In [None]:
#######################################################################

In [None]:
# Clone the Casehold_Transformers repo:
!git clone https://token@github.com/username/casehold_transformers.git

Cloning into 'casehold_transformers'...
remote: Enumerating objects: 3, done.[K
remote: Counting objects: 100% (3/3), done.[K
remote: Compressing objects: 100% (2/2), done.[K
remote: Total 3 (delta 0), reused 0 (delta 0), pack-reused 0[K
Receiving objects: 100% (3/3), done.


In [None]:
# Change directory to the cloned repository
%cd /content/casehold_transformers

/content/casehold_transformers


In [None]:
#Step 2: Initialize Git in your Google Colab Environment
# Configure Git (this is okay as it's a global configuration)
!git config --global user.email "email"
!git config --global user.name "username"

In [None]:
# Step 3: Copy the files from Google Drive to the cloned GitHub repository
!cp -r /content/drive/MyDrive/casehold_transformers/* /content/casehold_transformers/

In [None]:
# Step 4: Commit the files
!git add .
!git commit -m "Initial commit of all files from Google Drive"

[main 8fc935d] Initial commit of all files from Google Drive
 10 files changed, 1679 insertions(+)
 create mode 100644 case_ids/pretrain_ids.json
 create mode 100644 case_ids/task_ids.json
 create mode 100644 casehold.py
 create mode 100644 classification/run_glue.py
 create mode 100644 demo.ipynb
 create mode 100644 figures/hyperparameters.png
 create mode 100644 figures/results.png
 create mode 100644 multiple_choice/run_multiple_choice.py
 create mode 100644 multiple_choice/utils_multiple_choice.py
 create mode 100644 requirements.txt


In [None]:
# Step 5: Push the changes to GitHub
# Note: The remote 'origin' should already be set from the clone command
!git push origin main  # Ensure 'main' is the correct branch name

Enumerating objects: 17, done.
Counting objects:   5% (1/17)Counting objects:  11% (2/17)Counting objects:  17% (3/17)Counting objects:  23% (4/17)Counting objects:  29% (5/17)Counting objects:  35% (6/17)Counting objects:  41% (7/17)Counting objects:  47% (8/17)Counting objects:  52% (9/17)Counting objects:  58% (10/17)Counting objects:  64% (11/17)Counting objects:  70% (12/17)Counting objects:  76% (13/17)Counting objects:  82% (14/17)Counting objects:  88% (15/17)Counting objects:  94% (16/17)Counting objects: 100% (17/17)Counting objects: 100% (17/17), done.
Delta compression using up to 2 threads
Compressing objects: 100% (15/15), done.
Writing objects: 100% (16/16), 10.76 MiB | 2.89 MiB/s, done.
Total 16 (delta 1), reused 0 (delta 0), pack-reused 0
remote: Resolving deltas: 100% (1/1), done.[K
To https://github.com/mlragland/casehold_transformers.git
   e8f2656..8fc935d  main -> main


Note:

If your repository on GitHub was initialized with a README or other files, you may need to pull these down first before pushing your local commits by running git pull origin main.
If this is a brand new repository on GitHub without any files, you should be able to push without pulling first.
The -u flag in the git push command is only needed the first time you push. It sets up the local main branch to track the main branch on the remote repository, which makes subsequent pushes simpler.

In [None]:
#######################################################################