<a href="https://colab.research.google.com/github/srehaag/legal_info_tech_w26/blob/main/lesson_4_videos_2026_update.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Notebook for Lesson 4 Videos: Generative AI (2026 Update)

# Web Interfaces

### ChatGPT (OpenAI) Web Interface

https://chatgpt.com/

### Copilot (Microsoft) Web Interface

https://copilot.microsoft.com/


### Gemini (Google) Web Interface

https://gemini.google.com/app

### Deepkseek v3 (Open Source) Web Interface (via Huggingface)

https://huggingface.co/deepseek-ai/DeepSeek-V3.2



# OpenAI API Platform

### Platform:

https://platform.openai.com

### OpenAI Playground

https://platform.openai.com/chat/edit?models=gpt-5-mini

### OpenAI API Key

https://platform.openai.com/api-keys

### Pricing:

https://openai.com/api/pricing/

### Models:

https://platform.openai.com/docs/models

### Example from Playground

In [None]:
# First need to set API Key as environment variable: See further below

In [None]:
from openai import OpenAI
client = OpenAI()

response = client.chat.completions.create(
  model="gpt-4o",
  messages=[
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "You are a poetic assistant to a law student. You take legal concepts and you turn them into creative poetry."
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Write song lyrics about Donoghue v Stevenson"
        }
      ]
    },
  ],
  response_format={
    "type": "text"
  },
  temperature=1,
  max_completion_tokens=2048,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)
response



# Secrets (OpenAI API KEY)

### Secrets in Colab

In [None]:
# first set the secret in colab (call it OPENAI_API_KEY)
from google.colab import userdata
import os
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

### Secrets in Codespaces or Local Install (using dotenv)

In [None]:
# first set the secret in .env file (OPENAI_API_KEY = '')
# pip install python-dotenv
from dotenv import load_dotenv
load_dotenv()

# don't forget to include .env in .gitignore if you are pushing the folder to github

### Secrets with Manual Input (using getpass)

In [None]:
# pip install getpass
import getpass
import os
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API Key: ")

# Counting tokens (using tiktoken)

In [None]:
# use tiktoken to get number of tokens in a string of text
# (default model is cl100k_base, which is used for several OpenAI modes)

#!pip install tiktoken
#!pip install openai

import tiktoken

def count_tokens(text, model = "cl100k_base"):
    encoding = tiktoken.get_encoding(model)
    num_tokens = len(encoding.encode(text))
    return num_tokens

In [None]:
text = """
Imagine that James (birthday September 18, 2018) living in Toronto was given over the counter cough medication
on Jan 27 2022. Shortly after he became seriously ill and had to be hospitalized for 2 months. No one knew why
he became ill until June 7 2023 when the pharmaceutical company that manufactured the cough medication revealed
that they produced a large batch of improperly manufactured cough medication, which produced symptoms similar
to those experienced by James. When is the last date on which James can sue the pharmaceutical?
"""

count_tokens(text)

In [None]:
# function to read pdf document from url and extract text

#!pip install requests
#!pip install PyPDF2

import requests
import PyPDF2
import io

def get_pdf_text_from_ulr(url):
    response = requests.get(url)
    pdf = PyPDF2.PdfReader(io.BytesIO(response.content))

    text = ''
    for page in pdf.pages:
        text += page.extract_text()

    return text

In [None]:
# Example: Download pdf from url and count tokens

# URL is from ON ministry of labour database of collective agreements:
# https://www.lr.labour.gov.on.ca/en-CA/Collective-Agreements/

url = 'https://ws.lr.labour.gov.on.ca/CA/doc/611-11484-23%20(805-0042)?library=Education%20and%20Related%20Services'

collective_agreement = get_pdf_text_from_ulr(url)

print(collective_agreement[:500])
print("[...]")
print ("_____________________________")

# count tokens
num_tokens = count_tokens(collective_agreement)

print(f"Number of tokens in the text: {num_tokens}")

In [None]:
#!pip install datasets

from datasets import load_dataset
dataset = load_dataset("refugee-law-lab/canadian-legal-data", split="train", data_dir="SCC")
df = dataset.to_pandas()
df.head()

In [None]:
# filter df for languge = "en"
df = df[df.language == "en"]

# filter df for year = 2023
df = df[df.year == 2023]

len(df)

In [None]:
# put df.unofficial_text into a single variable
text = " ".join(df.unofficial_text)
len(text)

In [None]:
# count tokens
num_tokens = count_tokens(text)
print(f"Number of tokens in the text: {num_tokens}")

# Accessing the OpenAI API: A simple example

In [None]:
# set up OpenAI API by setting the OPENAI_API_KEY as an environment variable (see above, secrets)

# NOTE: Model pricing as of January 2025:
# Model: 'gpt-4o' Input: $0.00250 / 1K input tokens;	$0.01000 / 1K output tokens; Context Length 128k tokens
# Model: 'gpt-4o-mini ' $0.000150 / 1K input tokens;	$0.000600 / 1K output tokens; Context Length: 128k tokens

# !pip install openai

from openai import OpenAI

client = OpenAI()

def get_completion(user_message,
        system_message="You are a helpful assistant to a Canadian law student",
        model = "gpt-4o",
        temperature = 0):

    completion = client.chat.completions.create(
        model=model,
        temperature=temperature,
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": user_message}
        ]
    )
    return completion.choices[0].message.content

In [None]:
get_completion("Hello. How are you?")

In [None]:
get_completion("Hello. How are you?", system_message = "You only speak French")

In [None]:
new_sys_message = "You only speak Spanish"

get_completion("Hello. How are you?", system_message = new_sys_message)

In [None]:
get_completion("Write a haiku about 1L", temperature = 1.25, model = "gpt-4o-mini")

In [None]:
response = get_completion("Explain the steps to file a patent. Cite the releveant legislation")
print(response)

### OpenAI API: Adding Context

In [None]:
text = """
Courts have repeatedly held that negative inferences on the basis of plausibility are to
made only in the clearest of cases due to the myriad of factors that can affect plausibility,
especially in contexts involving cross-cultural and cross-linguistic communication, and in
contexts where refugee claimants may be struggling with mental health issues and trauma (see
Ascencio Perez v. Canada (Citizenship and Immigration), 2022 FC 215).
"""

In [None]:
from datasets import load_dataset
dataset = load_dataset("refugee-law-lab/canadian-legal-data", split="train", data_dir="FC")
df = dataset.to_pandas()

In [None]:
# get df.unofficial_text where citation = "2022 FC 215"
case_text = df[df.citation == "1"].unofficial_text.values[0]
print(case_text[:500])

In [None]:
user_message = f"""
CONTEXT: THIS IS A CASE THAT IS CITED: {case_text}
___________
CONTEXT: THIS IS A TEXT CITING THE CASE: {text}
___________
QUESTION: Is the text citing the case in a way that is fair and accurate. If not, why not?
"""

response = get_completion(user_message, model = "gpt-4o", temperature = 0.05)

In [None]:
print(response)

# OpenAI API: Few Shot Learning

In [None]:
user_message = f"""
INPUT: Ottawa 19-AUG-2010  BEFORE The Honourable Mr. Justice Russell  Language: E  Before the Court:  Motion Doc. No. 3 on behalf of Applicant  Result of Hearing:  Matter dismissed  held by way of Conference Call in chambers  Duration per day:  19-AUG-2010 from 04:00 to 05:20  Courtroom : Judge's Chambers - Ottawa  Court Registrar: Jennifer Jones  Total Duration: 1h20  Appearances:  Lloyd Peter James 416-737-4286 representing Applicant  Suranjana Bhattacharyya 416-973-6716 representing Respondent  Comments: Counsel for the Respondent had two preliminary matters: 1)a law student was  in attendance to confirm that the Respondent's Authorities had been served  upon the Applicant prior to the Hearing; and 2)the proper Respondent should  be the Minister of Public Safety and Emergency Preparedness, not the  Minister of Citizenship and Immigration.  Minutes of Hearing entered in Vol. 236 page(s) 155 - 158  Abstract of Hearing placed on file
OUTPUT: 1h20m
INPUT: Toronto 17-JAN-2019  BEFORE The Honourable Madam Justice Elliott  Language: E  Before the Court:  Motion Doc. No. 3 on behalf of Applicant  Result of Hearing:  Matter reserved  held by way of Conference Call  Duration per day:  17-JAN-2019 from 11:30 to 12:23  Courtroom : Judge's Chambers - Toronto  Court Registrar: Alice Prodan Gil  Total Duration: 53min  Appearances:  Barbara Jackman/Sarah Boyd 416-653-9964 representing Applicant  Alex Kam 647-256-0743 representing Respondent  Comments: DARS back up used.  Minutes of Hearing entered in Vol. 368 page(s) 197 - 198  Abstract of Hearing placed on file
OUPUT: 53m
INPUT: Toronto 13-OCT-2012  BEFORE The Honourable Mr. Justice O'Keefe  Language: E  Before the Court:  Motion Doc. No. 3 on behalf of Applicant  Result of Hearing:  Matter reserved  held by way of Conference Call  Duration per day:  18-OCT-2012 from 12:00 to 01:30  Courtroom : Judge's Chambers - Toronto  Court Registrar: Charles Skelton  Total Duration: 1h30min  Appearances:  Adela Crossley 416-850-1073 representing Applicant  Tamrat Gebeyehu 416-973-9665 representing Respondent  Minutes of Hearing entered in Vol. 273 page(s) 8 - 9  Abstract of Hearing placed on file
OUTPUT: 1h30m
INPUT: Ottawa 16-JUL-2007  BEFORE The Honourable Madam Justice Snider  Language: E  Before the Court:  Motion Doc. No. 4 on behalf of Applicant  Result of Hearing: Heard with IMM-5405-06. Motion withdrawn.  held by way of Conference Call  Duration per day:  16-JUL-2007 from 09:30 to 11:30  Courtroom : Teleconference Room No. 1 - Toronto  Court Registrar: Lhaden Bhutia  Total Duration: 2h  Appearances:  Ronald Poulton 416-862-0000 representing Applicant  John Provart 416-973-1346 representing Respondent  Minutes of Hearing entered in Vol. 195 page(s) 345 - 350  Abstract of Hearing placed on file
OUTPUT:"""
response = get_completion(user_message, temperature = 0)

In [None]:
response