<a href="https://colab.research.google.com/github/srehaag/legal_info_tech_w26/blob/main/Solutions_Module2_assignment_w26.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module 2: Assignment Solutions

### Instructions

There isn't usually just one way to solve a coding problem. This notebook offers some examples of how you might approach the assignment questions -- but these are not the only possible approaches.

Recall that students receive a high pass if they make a meaningful effort to complete the module. Pass grades are where students mostly complete the assignment but should have put more effort into the assignment. And fail is where the assignment is not mostly complete and insufficient effort was put into the assignment.

If you would like further feedback on your completed assignment, including invidivualized feedback on your coding, please drop in during the "Study with me" sessions, or reach out by email.

### Beginner Question 1:

Create a text file called courses.txt with the names of each of the courses that you are taking this year (each course on one line).

In [None]:
courses = """
Refugee Law
Legal Info Tech
Administrative Law
International Human Rights"""

with open("courses.txt", "w") as f:
    f.write(courses)


### Beginner Question 2:

Find a course summary (yours or one that you find online) for an Osgoode Hall Law School course in MS Word docx format.

Extract the text of the document into a variable called: course_summary

Print the first 20 characters of each of the first 20 paragraphs of the course_summary variable

In [None]:
!pip install python-docx

# simple version using for loops

# load doc
from docx import Document
doc = Document("Summary.docx")

# get text
course_summary = ""
for para in doc.paragraphs:
    course_summary = course_summary + para.text + "\n"

# print first 20 chars in first 20 paras
for para in doc.paragraphs[:20]:
    print(para.text[:20])
    print()

# or, alternatively, using the course_summary variable
for para in course_summary.split("\n")[:20]:
    print(para[:20])
    print()


In [None]:
# more succinctly using list comprehension

from docx import Document
doc = Document("Summary.docx")
course_summary = "\n".join([para.text for para in doc.paragraphs])
print([para.text[:20] for para in doc.paragraphs[:20]])

In [None]:
# alternative approach using counter
# credit to Elad Dekel (with revisions)

from docx import Document

course_summary = Document("Summary.docx")

count = 0
for para in course_summary.paragraphs:
    if count < 20 and para.text != "": # removed the blank paragraphs
        print(para.text[:20])
        count += 1

### Beginner Question 3:

Create a pandas dataframe using data about refugee claims decided in 2019 available at: https://refugeelab.ca/wp-content/uploads/2024/06/2019_RPD_Data.xlsx

Using the dataframe, list the 10 counsel who worked on the largest number of refugee claims in 2019.

In [None]:
import pandas as pd
df = pd.read_excel("https://refugeelab.ca/wp-content/uploads/2024/06/2019_RPD_Data.xlsx")
df["Counsel Fullname"].value_counts()[:10]

Unnamed: 0_level_0,count
Counsel Fullname,Unnamed: 1_level_1
"CINTOSUN, BRIAN IBRAHIM",285
"Singer, Melissa",269
"Siryuyumusi, Pacifique",240
"Desjardins, Odette",177
"GRICE, JOHN W",170
"VALOIS, Stéphanie",170
"MARKAKI, STYLIANI",164
"IVANYI, PETER",162
"LOEBACH, MICHAEL",160
"HAMILTON, IAN",158


### Intermediate Question 1:

Using the same dataframe about refugee claims in 2019 from Beginner Question 3, filter the dataframe so that it only includes positive and negative decisions (excluding claims that were otherwise resolved, such as claims that were abandoned).

Using the filtered dataframe, caculate and print the grant rates for the ten highest volume counsel, listed from highest grant rate to lowest grant rate.

In [None]:
import pandas as pd
df = pd.read_excel("https://refugeelab.ca/wp-content/uploads/2024/06/2019_RPD_Data.xlsx")

# explore values of df.Explanation
print(df["Explanation"].value_counts())

# google search shows that "Expedited Positive" is a Positive decision,
# and Neg. No Cred Basis is a Negative Decision
# Filter for: "Positive", "Negative", "Expedited Positive", "Neg. No Cred Basis"
df = df[df["Explanation"].isin(["Positive",
                                "Negative",
                                "Expedited Positive",
                                "Neg. No Cred Basis"
                                ])]

# Add boolean col for "Positive"
df["Positive"] = df["Explanation"].isin(["Positive", "Expedited Positive"])

# filter for top 10 counsel
df = df[df["Counsel Fullname"].isin(df["Counsel Fullname"].value_counts()[:10].index)]

# use groupby to get grant rate for each counsel, sort by grant rate
df.groupby("Counsel Fullname")["Positive"].mean().sort_values(ascending=False)




Explanation
Positive              10667
Negative               6204
Expedited Positive     3388
Withdrawn              1372
Abandoned              1093
Allowed - 109           150
Allowed - 108           138
Neg. No Cred Basis      126
Deceased                 42
Administrative           10
Dismissed - 108           6
Dismissed - 109           6
Name: count, dtype: int64
Explanation
Positive              10667
Negative               6204
Expedited Positive     3388
Neg. No Cred Basis      126
Name: count, dtype: int64


Unnamed: 0_level_0,Positive
Counsel Fullname,Unnamed: 1_level_1
"Siryuyumusi, Pacifique",0.949367
"CINTOSUN, BRIAN IBRAHIM",0.935252
"TATHAM, MARY",0.934307
"MARKAKI, STYLIANI",0.93125
"HAMILTON, IAN",0.86
"KAMINKER, HART",0.852113
"Desjardins, Odette",0.817073
"VALOIS, Stéphanie",0.733766
"LOEBACH, MICHAEL",0.726667
"Singer, Melissa",0.638889


In [None]:
# alternatively, if you assume that Neg. No Cred Basis and Expedited Positive
# are "otherwise resolved"

import pandas as pd
df = pd.read_excel("https://refugeelab.ca/wp-content/uploads/2024/06/2019_RPD_Data.xlsx")

# filter for "Postive" or "Negative" in Explanation
df = df[df["Explanation"].isin(["Positive", "Negative"])]

# add boolean df["Positive"]
df["Positive"] = df["Explanation"] == "Positive"

# filter for top 10 counsel
df = df[df["Counsel Fullname"].isin(df["Counsel Fullname"].value_counts()[:10].index)]

# use groupby to get grant rate for each counsel, sort by grant rate
df.groupby("Counsel Fullname")["Positive"].mean().sort_values(ascending=False)


Unnamed: 0_level_0,Positive
Counsel Fullname,Unnamed: 1_level_1
"TATHAM, MARY",0.923077
"CINTOSUN, BRIAN IBRAHIM",0.903226
"HAMILTON, IAN",0.86
"KORMAN, MICHAEL",0.717557
"LOEBACH, MICHAEL",0.68
"VALOIS, Stéphanie",0.669355
"GRICE, JOHN W",0.633333
"Singer, Melissa",0.513369
"KABATERAINE, NKUNDA",0.467213
"MENGHILE, Claudette",0.405172


### Intermediate Question 2:

The lesson videos show you how to engage with the Refugee Law Lab's Bulk Decisions Dataset using [Hugging Face Datasets](https://huggingface.co/datasets/refugee-law-lab/canadian-legal-data).

In 2025, responsibility for this dataset was transfered to [Access to Algorithmic Justice](https://a2aj.ca). The A2AJ dataset works largely the same way, although there are different fields.

Review the documentation for the A2AJ datasets [here](https://huggingface.co/datasets/a2aj/canadian-case-law) and [here](https://github.com/a2aj-ca/canadian-legal-data).

Download the Ontario Court of Appeal dataset, convert it to a pandas dataframe. Then print the citation and name of the case for the most recent decision in that dataframe.


In [None]:
from datasets import load_dataset
cases = load_dataset("a2aj/canadian-case-law", data_dir = "ONCA", split="train")
df = cases.to_pandas()
print(df.sort_values("document_date_en", ascending=False).iloc[0]["citation_en"])
print(df.sort_values("document_date_en", ascending=False).iloc[0]["name_en"])



2026 ONCA 18
R. v. C.L.


In [None]:
# Alternative approach that takes into consideration en and fr dates
# Credit to Mia Cox (with revisions)

from datasets import load_dataset

ds = load_dataset("a2aj/canadian-case-law", data_dir = "ONCA", split="train")

df = ds.to_pandas()

# since english and french docs are dated separately in the dataset, pick the most recent of the two (or more realistically, whichever one exists) and put it in a unified column
df["unified_date"] = df[["document_date_en", "document_date_fr"]].max(axis=1)

# we can now sort by this unified column
mostRecentCases = df.sort_values("unified_date", ascending = False)

# get the citation. try to get the english citation first, but if there is not an english citation then pull the french citation
citation = mostRecentCases["citation_en"].values[0] if mostRecentCases["citation_en"].values[0] else mostRecentCases["citation_fr"].values[0]

# ditto with the case name. try english first, then french if there is no english
caseName = mostRecentCases["name_en"].values[0] if mostRecentCases["name_en"].values[0] else mostRecentCases["name_fr"].values[0]

print(citation + ", " + caseName)

# answer is different than above, but that is because there were 3 cases decided on the same day
# and all three would be the correct answer

2026 ONCA 20, R. v. Qureshi


### Intermediate Question 3:

In addition to bulk downloads, the A2AJ also makes its data available via an Application Programming Interface. Review documentation about using that API [here](https://github.com/a2aj-ca/canadian-legal-data/blob/main/access-via-api.ipynb).

Using the API, programmatically download the text of section 167 of the Immigration and Refugee Protection Act and print it.

In [None]:
import requests

url = "https://api.a2aj.ca/fetch"
params = {
    "citation": "S.C. 2001, c. 27",
    "doc_type": "laws",
    "output_language": "en",
    "section": 167
}
data = requests.get(url, params=params).json()
print(data["results"][0]["unofficial_text_en"])

Right to counsel (1) A person who is the subject of proceedings before any Division of the Board and the Minister may, at their own expense, be represented by legal or other counsel. Representation (2) If a person who is the subject of proceedings is under 18 years of age or unable, in the opinion of the applicable Division, to appreciate the nature of the proceedings, the Division shall designate a person to represent the person. 2001, c. 27, s. 167; 2010, c. 8, s. 23; 2012, c. 17, s. 63


In [None]:
# Alternative approach (Credit to Matt Aydin)

import requests

# 1) Find IRPA in the API (so you don’t have to guess the citation format)
search_url = "https://api.a2aj.ca/search"
search_params = {
    "query": "Immigration and Refugee Protection Act",
    "search_type": "name",   # search by title/name
    "doc_type": "laws",
    "size": 5,
    "search_language": "en",
}
search_data = requests.get(search_url, params=search_params, timeout=30).json()
irpa = search_data["results"][0]              # take the top match
irpa_citation = irpa["citation_en"]           # this is what we'll use to fetch text

# 2) Fetch section 167 of IRPA
fetch_url = "https://api.a2aj.ca/fetch"
fetch_params = {
    "citation": irpa_citation,
    "doc_type": "laws",
    "output_language": "en",
    "section": "167",        # key part: section is only for laws
}
fetch_data = requests.get(fetch_url, params=fetch_params, timeout=30).json()
section_text = fetch_data["results"][0]["unofficial_text_en"]

print(section_text)


Right to counsel (1) A person who is the subject of proceedings before any Division of the Board and the Minister may, at their own expense, be represented by legal or other counsel. Representation (2) If a person who is the subject of proceedings is under 18 years of age or unable, in the opinion of the applicable Division, to appreciate the nature of the proceedings, the Division shall designate a person to represent the person. 2001, c. 27, s. 167; 2010, c. 8, s. 23; 2012, c. 17, s. 63


### Advanced Question 1:

Scrape the biographies of all judges on the Ontario Court of Appeal's [website](https://www.ontariocourts.ca/coa/judges-of-the-court/), and put them into a dataframe that includes the name of the judge, the text of the judge's biography, the URL where the biography is found, and the date/time when you scraped the biography


In [None]:
# Credit to Matt Aydin

!pip -q install requests beautifulsoup4 pandas

import re
from datetime import datetime
import requests
import pandas as pd
from bs4 import BeautifulSoup

url = "https://www.ontariocourts.ca/coa/judges-of-the-court/"

html = requests.get(url).text
soup = BeautifulSoup(html, "html.parser")

# get all visible-ish text and turn it into clean lines
text = soup.get_text("\n")
lines = [line.strip() for line in text.split("\n") if line.strip()]

# judge names usually start like this
name_pattern = re.compile(r"^(The Honourable|Honourable)\b")

rows = []
current_name = None
current_bio = []

for line in lines:
    if name_pattern.match(line):
        # save previous judge
        if current_name and current_bio:
            rows.append({
                "judge_name": current_name,
                "biography_text": " ".join(current_bio),
                "biography_url": url,
                "scraped_at": datetime.now().isoformat()
            })

        # start new judge
        current_name = line
        current_bio = []
    else:
        if current_name:
            current_bio.append(line)

# save last judge
if current_name and current_bio:
    rows.append({
        "judge_name": current_name,
        "biography_text": " ".join(current_bio),
        "biography_url": url,
        "scraped_at": datetime.now().isoformat()
    })

df = pd.DataFrame(rows)
df.head()

Unnamed: 0,judge_name,biography_text,biography_url,scraped_at
0,The Honourable Michael H. Tulloch was appointe...,"In 2016, Chief Justice Tulloch was appointed b...",https://www.ontariocourts.ca/coa/judges-of-the...,2026-01-20T23:12:14.542489
1,The Honourable J. Michal Fairburn,Associate Chief Justice Fairburn obtained her ...,https://www.ontariocourts.ca/coa/judges-of-the...,2026-01-20T23:12:14.542504
2,The Honourable Jill M. Copeland,Justice Jill M. Copeland was appointed to the ...,https://www.ontariocourts.ca/coa/judges-of-the...,2026-01-20T23:12:14.542511
3,The Honourable Steve A. Coroza,Justice Steve A. Coroza was appointed to the C...,https://www.ontariocourts.ca/coa/judges-of-the...,2026-01-20T23:12:14.542517
4,The Honourable Jonathan Dawe,Jonathan Dawe was appointed to the Court of Ap...,https://www.ontariocourts.ca/coa/judges-of-the...,2026-01-20T23:12:14.542524


In [None]:
# alternative approach
# credit to Mia Cox

import requests
from bs4 import BeautifulSoup
import pandas as pd
import datetime

url = "https://www.ontariocourts.ca/coa/judges-of-the-court/"

response = requests.get(url)
response.raise_for_status()  # ensure we got a valid response

soup = BeautifulSoup(response.text, "html.parser")

names = soup.find_all("div", class_="su-spoiler-title")
bios = soup.find_all("div", class_="su-spoiler-content")

judges = []
for name_div, bio_div in zip(names, bios):
    name = name_div.get_text(strip=True)
    biography = bio_div.get_text(separator=" ", strip=True).replace(u'\xa0', ' ')  # join paragraphs with spaces then remove unicode non-breaking spaces which are in the website everywhere ...
    judges.append({"name": name, "biography": biography, "url": url, "scrape_date": datetime.datetime.now()})

# convert to pandas dataframe
df = pd.DataFrame(judges)

df.head()

Unnamed: 0,name,biography,url,scrape_date
0,The Honourable Michael H. Tulloch,The Honourable Michael H. Tulloch was appointe...,https://www.ontariocourts.ca/coa/judges-of-the...,2026-01-20 23:54:36.768868
1,The Honourable J. Michal Fairburn,Associate Chief Justice Fairburn obtained her ...,https://www.ontariocourts.ca/coa/judges-of-the...,2026-01-20 23:54:36.768886
2,The Honourable Jill M. Copeland,Justice Jill M. Copeland was appointed to the ...,https://www.ontariocourts.ca/coa/judges-of-the...,2026-01-20 23:54:36.768904
3,The Honourable Steve A. Coroza,Justice Steve A. Coroza was appointed to the C...,https://www.ontariocourts.ca/coa/judges-of-the...,2026-01-20 23:54:36.768920
4,The Honourable Jonathan Dawe,Jonathan Dawe was appointed to the Court of Ap...,https://www.ontariocourts.ca/coa/judges-of-the...,2026-01-20 23:54:36.768938


### Reflection

If you could work with any legal data programmatically what data would you like to have access to?

Does that data currently exist in a publicly accessible format that can be easily integrated into Python programs?

If not, why not?