<a href="https://colab.research.google.com/github/srehaag/legal_info_tech_w24/blob/main/Module4_Solutions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Module 4 Assignment

### Instructions

Everyone should attempt to complete the beginner section. If you are interested you can continue to intermediate and advanced. Whether you choose to go beyond the beginner section will not affect your participation grade, but some of the skills you learn (or relearn) may be helpful for your final project for the course.

In addition, everyone should complete the final reflective section.

Combined, the lesson and assignment should not take you more than three hours -- so if you get to that point, just move on to the reflective section.

When you have finished, upload a copy of the .ipynb file to the eClass assignment page.

### Everyone (if you are using OpenAI):

Log into the OpenAI [platfrom](https://platform.openai.com/). View the limits page at this [link](https://platform.openai.com/account/limits) directly, or access via menu, settings, limits.

If you see a message saying "You must be on a paid plan to manage usage limits", and another message saying "Current Tier: Free" then that means that you are using free credits. The system will not let you spend more than the $5 in free credit (unless you upgrade to a paid account).

If you see on option to set a monthly budget limit and an email notification threshold, and if another message says "Current Tier: Tier 1" (or higher), then you are on a paid plan. In that case please set a monthly limit (make that limit something low like $5 or $10, as you can always reset it higher and this will stop you from incurring unexpected expenses).

In addition to setting the budget limits, the same page will also show you the rate limits that apply to your account. These will show you: which models you have access to, the number of tokens that you are allowed to send per minute, and the number of requests that you are able to send per minute. The consequence of going beyond the rate limits is just that the system will respond with an error message.

If you are on the free tier and you find the rate limits constraining, or you would like access to the better models, you can add payment information and purchase $5 additional credits, which should bump your account into the first paid tier.

You can find additional information about pricing [here](https://openai.com/pricing) and about rate limits [here](https://platform.openai.com/docs/guides/rate-limits). Note that the API does not have a monthly fee. You only pay for usage. The assignment for this module can be completed for (far) less than the $5 in free credits.

Once you have confirmed that you are either on the free tier or that you are on a paid tier and you have set budget limits, indicate that below.

*NOTE: if you requested alternatives to OpenAI in your submitted assignment for Module 3, you can skip this question.*

### Beginner Question 2

Go to OpenAI's platform (log in and select API) here:

https://platform.openai.com/playground

Select Chat mode.

Try asking the system a legal question that arises from one of your other courses. See how the answer compares using different models. See if anything changes when you change the system message. See if anything changes when you change the temperature.

Based on this experimentation, if you wanted to use the system to help answer questions from your courses, what settings and system message would you use and why?

*NOTE: If you requested alternatives to to OpenAI in your submitted assignment for Module 3, try instead working with Microsoft's [Bing AI](https://www.bing.com/search?q=Bing+AI&showconv=1), and try adjusting the conversation style -- or alternatively experiment with another online generative AI system of your choice.*

Response:

### Beginner Question 3

Using the best settings you were able to come up with in the playground select one example of a legal question and use Python to:

(a) print the number of input tokens in the example

(b) calculate and print the cost of sending those tokens to the OpenAI api

(c) send the example (using the settings that you used in the playground for Beginner Question 3) to the OpenAI api and print the result

*NOTE: If you requested alternatives to OpenAI in your submitted assignment for Module 3, here is an alternative question using a free open source system: Review how to use the Huggingface text generation pipeline [here](https://huggingface.co/learn/nlp-course/chapter1/3). The text generation example in the link uses GPT-2, an open-source locally installable precursor to ChatGPT. Using that pipeline try asking some legal questions and printing the results. How do the answers compare to the answers you got in question 2. Hint: You will need to pip install transformers and torch (and potentially some other dependencies)*

In [None]:
#!pip install tiktoken
#!pip install openai

import tiktoken

def count_tokens(text, model = "cl100k_base"):
    encoding = tiktoken.get_encoding(model)
    num_tokens = len(encoding.encode(text))
    return num_tokens

question = "What is the refugee definition in Canada?"
num_tokens = count_tokens(question)
print("Question:", question)
print("Number of tokens in question:", num_tokens)
print('Cost to use GPT-3.5-turbo-0125, at $0.0005/1k tokens: ', 100 * num_tokens * 0.0005 / 1000, "cents")



Question: What is the refugee definition in Canada?
Number of tokens in question: 8
Cost to use GPT-3.5-turbo-0125, at $0.0005/1k tokens:  0.0004 cents


In [None]:
from openai import OpenAI

# load API key from .env file
from dotenv import load_dotenv
load_dotenv()

# # Alternatively, if using Colab:
# from google.colab import userdata
# import os
# os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')

client = OpenAI()

def get_completion(user_message,
        system_message="You are a helpful assistant to a Canadian law student",
        model = "gpt-3.5-turbo-0125",
        temperature = 0):

    completion = client.chat.completions.create(
        model=model,
        temperature=temperature,
        messages=[
            {"role": "system", "content": system_message},
            {"role": "user", "content": user_message}
        ]
    )
    return completion.choices[0].message.content

print(get_completion(question))

In Canada, a refugee is defined as a person who is outside their home country and is unable or unwilling to return due to a well-founded fear of persecution based on race, religion, nationality, political opinion, or membership in a particular social group. This definition is in line with the United Nations 1951 Refugee Convention and its 1967 Protocol, to which Canada is a signatory. 

Refugees in Canada can apply for protection through the refugee system, which includes the Refugee and Humanitarian Resettlement Program. This program allows individuals to seek asylum and protection in Canada if they meet the criteria outlined in the Immigration and Refugee Protection Act.


### Beginner Question 4

Using the same dataset from Module 3, Beginner Question 4 (code to load below), create a few shot prompt (i.e. provide a few examples of inputs and outputs) to extract the judge's gender from their bio, and apply it to the 10 most recent judicial appointments, put the results in the dataframe, and print the results for the 10 most recent judicial appointments.

Hint: If you are using the free $5 account, you may need to add a delay using the time library to avoid request per minute rate limits

*NOTE: If you requested alternatives to OpenAI in your submitted assignment for Module 3, using the same link as in the prior question, try using the Huggingface categorization pipeline [here](https://huggingface.co/learn/nlp-course/chapter1/3) to extract the judge's gender from the bios*

In [None]:
import pandas as pd

df = pd.read_json("https://raw.githubusercontent.com/srehaag/legal_info_tech_w24/main/judges.json")

# get bios from 2 judges, one male, one female
maleJ = df[df['bio'].str.contains(" he ", case=False)].iloc[0]['bio']
femaleJ = df[df['bio'].str.contains(" she ", case=False)].iloc[0]['bio']

# create prompt prefix:
prefix = f'''INPUT: {maleJ}
OUTPUT: male
INPUT: {femaleJ}
OUPUT: female
INPUT: '''

# create prompt suffix:
suffix = '''
OUTPUT: '''

# create system message:
system_message = '''You review biogographies (inputs) and return the gender
(male or female) of the person described in the biography (output).'''

# function to get gender (run question 3 above to define get_completion function and to setup openai)
# uncomment the import time and time.sleep lines to avoid rate limits on free account

#import time
def get_gender(bio):
    #time.sleep(25)
    return get_completion(prefix + bio + suffix, system_message=system_message)

# get df with last 10 bios
last10df = df.tail(10).copy().reset_index(drop=True)

# apply get_gender to last10df
last10df['gender'] = last10df.bio.apply(get_gender)
last10df


Unnamed: 0,Names of Judges,Date of Appointment,Date of Departure,bio_link,bio,gender
0,The Hon. Richard Wagner,2012-10-05,2017-12-18 Footnote 1,https://www.scc-csc.ca/judges-juges/bio-eng.as...,The Right Honourable Richard Wagner is the 18...,male
1,The Hon. Clément Gascon,2014-06-09,2019-09-14,https://www.scc-csc.ca/judges-juges/bio-eng.as...,Mr. Justice Clément Gascon was born in Montrea...,male
2,The Hon. Suzanne Côté,2014-12-01,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,The Honourable Suzanne Côté was appointed to ...,female
3,The Hon. Russell Brown,2015-08-31,2023-06-12,https://www.scc-csc.ca/judges-juges/bio-eng.as...,Justice Russell Brown was born in Vancouver on...,male
4,The Hon. Malcolm Rowe,2016-10-28,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,"Justice Rowe was born in 1953 in St. John’s, ...",male
5,The Hon. Sheilah L. Martin,2017-12-18,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,"Born and raised in Montréal, Justice Sheilah L...",female
6,The Hon. Nicholas Kasirer,2019-09-16,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,Justice Nicholas Kasirer graduated from McGil...,male
7,The Hon. Mahmud Jamal,2021-07-01,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,Justice Mahmud Jamal was appointed to the Sup...,male
8,The Hon. Michelle O'Bonsawin,2022-09-01,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,The Honourable Michelle O’Bonsawin is a widel...,female
9,The Hon. Mary T. Moreau,2023-11-06,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,The Honourable Mary T. Moreau was born in Edm...,female


In [None]:
# Another Solution: Credit to Shelby Firth for a similar solution

# load df
import pandas as pd
df = pd.read_json("https://raw.githubusercontent.com/srehaag/legal_info_tech_w24/main/judges.json")

# get 10 most recent judge
df = df.sort_values('Date of Appointment', ascending=False).head(10)

# Extracting judge's gender from their bio (run question 3 above to define get_completion and set up openai)
# uncomment the import time and time.sleep lines to avoid rate limits on free account
#import time

def extract_gender (x):
  bio_text = f"""
  INPUT: William Johnstone Ritchie was born in Annapolis, Nova Scotia, on Octover 28, 1813. He was the son of Thomas Ritchie and Elizabeth Wildman Johnstone.
  OUTPUT: He / him (Male)
  INPUT: The Right Honourable Beverly McLachlin spent her formative years in Pincher Creek, Alberta and was educated at the University of Alberta.
  OUTPUT: She / her (Female)
  INPUT: {x}
  OUTPUT: """
  response = get_completion(bio_text, system_message="Return only 'She / her (Female)' or 'He / him (Male)'")
  #time.sleep(25)
  print(response)
  return response

# apply function
df['Gender']=df['bio'].apply(extract_gender)


She / her (Female)
She / her (Female)
He / him (Male)
He / him (Male)
She / her (Female)
He / him (Male)
He / him (Male)
She / her (Female)
He / him (Male)
He / him (Male)


In [None]:
# Another Solution, this one using a loop and GPT4. Credit to Simon Minich

# Install and import relevant utilites
#!pip install openai
#import pandas as pd

# run question 3 above to define get_completion and set up opena
# requires paid account to use gpt-4-turbo

# Put data into DataFrame
url = "https://raw.githubusercontent.com/srehaag/legal_info_tech_w24/main/judges.json"
df = pd.read_json(url)

# Establish starting variables
last = 89
list_judge = []

# Create few shot prompt that triggers AI to determine judge gender + apply to 10 most recent SCC appointments
for i in range(10):
  # Create few shot prompt
  prompt_gender = f'''
  Judge Bio: William Johnstone Ritchie was born in Annapolis, Nova Scotia, on October 28, 1813. He was the son of Thomas Ritchie and Elizabeth Wildman Johnstone. After graduating from the Pictou Academy, he studied law in Halifax in the office of his brother, John William Ritchie. He was called to the bar of Nova Scotia in 1837 but moved to Saint John, New Brunswick, and was called to the bar of that province the following year. In 1846 he was elected to the Legislative Assembly of New Brunswick. In keeping with his pledge to resign if a fellow Liberal candidate failed to win a by-election, he gave up his seat in 1851, only to be re-elected three years later. In 1855 he left politics to accept an appointment to the Supreme Court of New Brunswick, and 10 years later he was named Chief Justice of New Brunswick. He was appointed to the newly established Supreme Court of Canada on September 30, 1875 and became its chief justice on January 11, 1879. He served on the Supreme Court for 17 years. Chief Justice Ritchie died on September 25, 1892, at the age of 78.
  Judge Gender: Male

  Judge Bio: Ronald Martland was born in Liverpool, England, on February 10, 1907. He was\r\n  the son of John Martland and Ada Wild. When he was four years old, his family\r\n  emigrated to Canada and settled in Edmonton. He graduated from high school\r\n  at the age of 14, but he was too young to attend university, so he worked as\r\n  a page in the Alberta Legislature for two years. He then attended the University\r\n  of Alberta and obtained a B.A. in 1926 and an LL.B. two years later. Awarded\r\n  a Rhodes Scholarship, he pursued his studies at Oxford University, earning\r\n  a B.A. there in 1930 and a B.C.L. in 1931. Upon his return to Edmonton in 1932,\r\n  he was called to the bar of Alberta and joined the law firm of Milner, Carr,\r\n  Dafoe & Poirier , with which he practised for over 25 years. On January\r\n  15, 1958, he was appointed to the Supreme Court of Canada. He served on the\r\n  Court for 24 years and retired on February 10, 1982. Justice Martland died\r\n  on November 20, 1997, at the age of 90.
  Judge Gender: Male

  Judge Bio: The Right Honourable Beverley McLachlin spent her formative years in Pincher Creek, Alberta and was educated at the University of Alberta, where she received a B.A. (Honours) in Philosophy in 1965. She pursued her studies at the University of Alberta and, in 1968, received both an M.A. in Philosophy and an LL.B. She was called to the Alberta Bar in 1969 and to the British Columbia Bar in 1971 and practised law in Alberta and British Columbia.  Commencing in 1974, she taught for seven years in the Faculty of Law at the University of British Columbia as a tenured Associate Professor. Her judicial career began in April 1981 when she was appointed to the Vancouver County Court. In September 1981, she was appointed to the Supreme Court of British Columbia. She was elevated to the British Columbia Court of Appeal in December 1985 and was appointed Chief Justice of the Supreme Court of British Columbia in September 1988.  Seven months later, in April 1989, she was sworn in as a Justice of the Supreme Court of Canada. On January 7, 2000, she was appointed Chief Justice of Canada. She is the first woman in Canada to hold this position. In addition to her judicial duties at the Supreme Court, the Right Honourable Beverley McLachlin   chaired the Canadian Judicial Council, the Advisory Council of the Order of Canada and the Board of Governors of the National Judicial Institute. She  is the author of numerous articles and publications. She retired on December 15, 2017. The Right Honourable Beverley McLachlin bids farewell to the Supreme Court The Right Honourable Beverley McLachlin bids farewell to the Supreme Court .
  Judge Gender: Female

  Judge Bio: Mr. Justice Major, B.Com., LL.B. Born in Mattawa, Ontario, 1931. Son of William \r\n  and Elsie Major. Married to H\u00e9l\u00e8ne Provencher , 1959. Children: \r\n  Suzan, Peter, Paul and Steven. Educated at Loyola College (now Concordia University), \r\n  Montr\u00e9al, and University of Toronto. Hon. LL.D.: Concordia University, \r\n  2003; University of Calgary, 2005; University of Toronto, 2005. Called to the \r\n  Alberta Bar, 1958. Practised law with Bennett, Jones, Verchere at Calgary and \r\n  became senior partner in 1967. Appointed Q.C., 1972. Elected Fellow of the American \r\n  College of Trial Lawyers, 1980. Counsel for the Canadian Medical Protective \r\n  Association (Alberta), 1971-91. Senior Counsel for the City of Calgary Police \r\n  Service, 1975-85; Counsel at the McDonald Commission re RCMP, 1978-82; Counsel, \r\n  Royal Commission into the collapse of the CCB and Northland Bank (Estey Commission); \r\n  Senior Counsel for the Province of Alberta at the Code Inquiry into the collapse \r\n  of the Principal Group of Companies, 1987. Appointed to the Alberta Court of \r\n  Appeal, July 11, 1991. Appointed to the Supreme Court of Canada, November 13, \r\n  1992. Justice Major retired on December 25, 2005.
  Judge Gender: Male

  Judge Bio: Justice  Andromache Karakatsanis was appointed to the Supreme Court of Canada in October  2011. She had been appointed a judge of the Court of Appeal for Ontario in  March 2010 and a judge of the Ontario Superior Court of Justice in December  2002. Justice  Karakatsanis is a graduate of the University of Toronto and Osgoode Hall Law  School. Following  her call to the Bar in 1982, Andromache Karakatsanis served as a law clerk to  the Ontario Court of Appeal. In private practice, she practised criminal, civil  and family litigation in Toronto for several years. She then served in the  Ontario Public Service for 15 years in a number of senior positions. During  her career in public service, Andromache Karakatsanis served as Chair and Chief  Executive Officer of the Liquor Licence Board of Ontario (1988-95); as  Assistant Deputy Attorney General and Secretary for Native Affairs (1995-97);  and as Deputy Attorney General (1997-2000). Andromache  Karakatsanis served as Ontario's Secretary of the Cabinet and Clerk of the  Executive Council from July 2000 to November 2002. As the province's senior  public servant, she provided leadership to the Ontario Public Service and to the  deputy ministers. While  in the public service, Justice Karakatsanis was actively involved in issues  related to education and reform in the field of administrative justice. She was  a recipient of the Society of Ontario Adjudicators and Regulators (SOAR) Medal  in 1996 for outstanding service to Ontario's administrative justice system. Justice  Karakatsanis volunteered extensively with the YMCA of Greater Toronto from 1990  to 2002 and held a number of senior positions, including that of Chair of the  Board of Directors. She also served as a member of the Board of the Public  Policy Forum and of Canadian Policy and Research Networks (CPRN). Justice  Karakatsanis was born in Toronto on October 3, 1955. She is married to Tom  Karvanis and they have two children, Paul and Rhea. Webcast of the ceremony in honour of the Honourable Andromache Karakatsanis , held on November 14, 2011.
  Judge Gender: Female

  Judge Bio: {df["bio"][last]}
  Judge Gender: '''
  recent_ten = get_completion(prompt_gender, model = "gpt-4")
  list_judge.append(recent_ten)
  last = last - 1

# Put information into DataFrame and display it for 10 most recent SCC appointments
list_judge.reverse()
df = df.tail(10)
df["Gender"] = list_judge
df

Unnamed: 0,Names of Judges,Date of Appointment,Date of Departure,bio_link,bio,Gender
80,The Hon. Richard Wagner,2012-10-05,2017-12-18 Footnote 1,https://www.scc-csc.ca/judges-juges/bio-eng.as...,The Right Honourable Richard Wagner is the 18...,Male
81,The Hon. Clément Gascon,2014-06-09,2019-09-14,https://www.scc-csc.ca/judges-juges/bio-eng.as...,Mr. Justice Clément Gascon was born in Montrea...,Male
82,The Hon. Suzanne Côté,2014-12-01,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,The Honourable Suzanne Côté was appointed to ...,Female
83,The Hon. Russell Brown,2015-08-31,2023-06-12,https://www.scc-csc.ca/judges-juges/bio-eng.as...,Justice Russell Brown was born in Vancouver on...,Male
84,The Hon. Malcolm Rowe,2016-10-28,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,"Justice Rowe was born in 1953 in St. John’s, ...",Male
85,The Hon. Sheilah L. Martin,2017-12-18,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,"Born and raised in Montréal, Justice Sheilah L...",Female
86,The Hon. Nicholas Kasirer,2019-09-16,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,Justice Nicholas Kasirer graduated from McGil...,Male
87,The Hon. Mahmud Jamal,2021-07-01,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,Justice Mahmud Jamal was appointed to the Sup...,Male
88,The Hon. Michelle O'Bonsawin,2022-09-01,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,The Honourable Michelle O’Bonsawin is a widel...,Female
89,The Hon. Mary T. Moreau,2023-11-06,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,The Honourable Mary T. Moreau was born in Edm...,Female


In [None]:
# Another approach: Credit to Aidan Ryan for a version of this solution

import pandas as pd
df = pd.read_json("https://raw.githubusercontent.com/srehaag/legal_info_tech_w24/main/judges.json").tail(10)
df['Gender'] = 'NA'

user_message = """
INPUT:Justice Marie Deschamps received a Licentiate in Laws from the Universit\u00e9 de Montr\u00e9al in 1974  and an LL.M. from McGill University in 1983. The Universit\u00e9 de Montr\u00e9al awarded her an honorary doctorate in 2008. She was called to  the Quebec Bar in 1975 and practised as a trial lawyer at Martineau Walker and Sylvestre et Matte in commercial, family and civil law, then at Rouleau ,  Rumanek and Sirois in criminal law, and finally at Byers Casgrain in commercial  and civil law. She was appointed  to the Quebec Superior Court on March 29, 1990, to the Quebec Court of Appeal  on May 6, 1992 and to the Supreme Court of Canada on August 7, 2002. Justice Deschamps has participated in the Universit\u00e9 de Montr\u00e9al 's advocacy classes for many  years and in the Barreau du Qu\u00e9bec 's advocacy seminars for more than 25  years. She has also been an adjunct professor in the law faculty of the Universit\u00e9 de Sherbrooke since 2006 and in that of McGill University since 2012. She has been a member  of the board of the Universit\u00e9 de Montr\u00e9al and a member of the board of  directors of the Universit\u00e9 de Montr\u00e9al 's alumni association. She also  sat on the advisory committee on reform of the Bankruptcy Act in 1986  and on the Competition Tribunal advisory council from 1986 to 1990. She  was inducted as a member of the American College of Trial Lawyers in 2005. While at the  Supreme Court of Canada, Justice Deschamps took a particular interest in the  Court's Law Clerk Program, and she also sat on a number of committees of the  Canadian Judicial Council and the National Judicial Institute. Justice Deschamps was born in Repentigny, Quebec, on October 2, 1952. Her spouse, Paul  Gobeil , is a businessman. They have two children: Val\u00e9rie and Maxime . Justice Deschamps is a sports enthusiast who skis, swims, hikes and jogs. Her interests  also include art, travel and languages (in addition to French, her mother  tongue, she speaks fluent English and has studied Italian and Spanish). She retired from  the judiciary on August 7, 2012.
OUTPUT:F
INPUT:Born on December 25, 1940 in Winnipeg, Manitoba, Marshall Rothstein went  to school in Winnipeg. He then attended the University of Manitoba, where he  earned a B. Com. in 1962 and an LL.B. in 1966. He is married to Montreal native Sheila Dorfman and the couple have four  children, Ronald, Douglas, Tracey and Robert, and six grandchildren. After being called to the Manitoba Bar in 1966, he started his career at  Thorvaldson, Eggertson, Saunders and Mauro before moving to Aikins, MacAulay  & Thorvaldson in 1969, where he was a partner from 1972 to 1992 and a  member and periodic Chairman of the Management Committee\/Executive Board from  1981 to 1992. He was appointed Queen's  Counsel in 1979. He served as an  adjudicator under the Manitoba Human Rights Act from 1978 to 1983 and as  a member of the Canadian Human Rights Tribunal from 1986 to 1992. In his practice, he appeared before federal and Manitoba administrative  tribunals, the Manitoba Court of Queen's  Bench, the Manitoba Court of Appeal, the Federal Court - Trial Division, the  Federal Court of Appeal and the Supreme Court of Canada. Justice Rothstein taught transportation law  as a lecturer in the University of Manitoba's  Faculty of Law from 1970 to 1983 and from 1988 to 1992, and contract law in the  University's  Extension Department from 1970 to 1975. He was a Bar Admission Course lecturer for the Law Society of Manitoba  from 1970 to 1975. He also held many other offices: Secretary (Administrator),  Civil Legal Aid Committee, Law Society of Manitoba, 1968-70; Chairman,  Commission on Compulsory Retirement (Manitoba), 1981-82; Chairman, Ministerial  Task Force on International Air Policy (Canada), 1990-91; Member and Chairman,  Manitoba Transportation Industry Development Advisory Committee, 1985-87 and  1987-90 respectively; Member, Airports Task Force, 1985-86; Member, Airports  Transfer Advisory Board, 1988-92; and Member, External Advisory Committee,  University of Manitoba Transport Institute, 1989-92. Justice Rothstein was appointed to the Trial Division of the Federal  Court of Canada on June 24, 1992; while a judge of the Trial Division, he also  served as a member ex officio of the Appeal Division, a judge of the  Court Martial Appeal Court of Canada and a judicial member of the Competition  Tribunal. He was elevated to the Federal  Court of Appeal on January 21, 1999, and, finally, to the Supreme Court of  Canada on March 1, 2006. He retired on August 31, 2015.
OUTPUT:M
INPUT:Justice  Andromache Karakatsanis was appointed to the Supreme Court of Canada in October  2011. She had been appointed a judge of the Court of Appeal for Ontario in  March 2010 and a judge of the Ontario Superior Court of Justice in December  2002. Justice  Karakatsanis is a graduate of the University of Toronto and Osgoode Hall Law  School. Following  her call to the Bar in 1982, Andromache Karakatsanis served as a law clerk to  the Ontario Court of Appeal. In private practice, she practised criminal, civil  and family litigation in Toronto for several years. She then served in the  Ontario Public Service for 15 years in a number of senior positions. During  her career in public service, Andromache Karakatsanis served as Chair and Chief  Executive Officer of the Liquor Licence Board of Ontario (1988-95); as  Assistant Deputy Attorney General and Secretary for Native Affairs (1995-97);  and as Deputy Attorney General (1997-2000). Andromache  Karakatsanis served as Ontario's Secretary of the Cabinet and Clerk of the  Executive Council from July 2000 to November 2002. As the province's senior  public servant, she provided leadership to the Ontario Public Service and to the  deputy ministers. While  in the public service, Justice Karakatsanis was actively involved in issues  related to education and reform in the field of administrative justice. She was  a recipient of the Society of Ontario Adjudicators and Regulators (SOAR) Medal  in 1996 for outstanding service to Ontario's administrative justice system. Justice  Karakatsanis volunteered extensively with the YMCA of Greater Toronto from 1990  to 2002 and held a number of senior positions, including that of Chair of the  Board of Directors. She also served as a member of the Board of the Public  Policy Forum and of Canadian Policy and Research Networks (CPRN). Justice  Karakatsanis was born in Toronto on October 3, 1955. She is married to Tom  Karvanis and they have two children, Paul and Rhea. Webcast of the ceremony in honour of the Honourable Andromache Karakatsanis , held on November 14, 2011.
OUTPUT:F
INPUT:William Stevenson was born in Edmonton, Alberta, on May 7, 1934.\r\n          He is the son of Alexander Lindsay Stevenson and Eileen Harriet Burns.\r\n          He studied at the University of Alberta, obtaining a B.A. in 1956 and\r\n          an LL.B. the following year. Called to the bar in 1958, he joined the\r\n          law firm of Morrow, Morrow & Reynolds in Edmonton. In 1959 he was\r\n          counsel on the last case from Canada to be appealed to the Judicial\r\n          Committee of the Privy Council in London, England. He became a lecturer\r\n          at the University of Alberta in 1963 and a full-time professor of law\r\n          five years later. In 1970 he returned to private practice but continued\r\n          to lecture on law. He is the author of many legal texts and a founding\r\n          editor of the Alberta Law Review. He was appointed to the District\r\n          Court of Alberta in 1975 and to the Court of Queen's Bench of Alberta\r\n          in 1979. A year later, he was appointed to the Alberta Court of Appeal,\r\n          and on September 17, 1990, he was elevated to the Supreme Court of\r\n          Canada. Justice Stevenson served on the Supreme Court for nearly two\r\n          years before retiring on June 5, 1992. Justice Stevenson died on July 7, 2021, at the age of 87.
OUTPUT:M
INPUT:"""

for index, row in df.iterrows():
    message = user_message +str(row['bio'])+"\nOUTPUT:"
    df.at[index, 'Gender'] = get_completion(message, model='gpt-3.5-turbo')

df

Unnamed: 0,Names of Judges,Date of Appointment,Date of Departure,bio_link,bio,Gender
80,The Hon. Richard Wagner,2012-10-05,2017-12-18 Footnote 1,https://www.scc-csc.ca/judges-juges/bio-eng.as...,The Right Honourable Richard Wagner is the 18...,M
81,The Hon. Clément Gascon,2014-06-09,2019-09-14,https://www.scc-csc.ca/judges-juges/bio-eng.as...,Mr. Justice Clément Gascon was born in Montrea...,M
82,The Hon. Suzanne Côté,2014-12-01,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,The Honourable Suzanne Côté was appointed to ...,F
83,The Hon. Russell Brown,2015-08-31,2023-06-12,https://www.scc-csc.ca/judges-juges/bio-eng.as...,Justice Russell Brown was born in Vancouver on...,M
84,The Hon. Malcolm Rowe,2016-10-28,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,"Justice Rowe was born in 1953 in St. John’s, ...",M
85,The Hon. Sheilah L. Martin,2017-12-18,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,"Born and raised in Montréal, Justice Sheilah L...",F
86,The Hon. Nicholas Kasirer,2019-09-16,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,Justice Nicholas Kasirer graduated from McGil...,M
87,The Hon. Mahmud Jamal,2021-07-01,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,Justice Mahmud Jamal was appointed to the Sup...,M
88,The Hon. Michelle O'Bonsawin,2022-09-01,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,The Honourable Michelle O’Bonsawin is a widel...,F
89,The Hon. Mary T. Moreau,2023-11-06,,https://www.scc-csc.ca/judges-juges/bio-eng.as...,The Honourable Mary T. Moreau was born in Edm...,F


In [None]:
# Another solution: Credit to Brooke Ash who came up with a similar solution

# Run question 3 to set up openai
import time

# Load the data
import pandas as pd
df = pd.read_json("https://raw.githubusercontent.com/srehaag/legal_info_tech_w24/main/judges.json")

# Set up few shot prompt:
few_shot_prompt = """
Given a judge's bio, extract the judge's gender.

---
Input: Judge bio: "John Doe is a distinguished jurist with over 20 years of experience in the legal field."
Output: Male

Input: Judge bio: "Jane Smith is a highly respected attorney known for her commitment to justice and equality."
Output: Female

Input: Judge bio: "Dylan Smith brings a wealth of legal knowledge and expertise to the bench."
Output: Non-binary"""

results = []
error_count = 0
for bio in df['bio'].tail(10):
  if error_count > 10: break
    while True:
        try:
            response = client.chat.completions.create(
                model="gpt-3.5-turbo",
                temperature=0.7,
                max_tokens=100,
                messages=[
                    {"role": "system", "content": few_shot_prompt},
                    {"role": "system", "content": f"Input: Judge bio: {bio}"}
                ]
            )
            gender = response.choices[0].message.content
            results.append(gender)
            break
        except:
            error_count += 1
            if error_count > 10:
              print ("Reached maximum errors")
              break
            print("Rate limit reached. Waiting for 20 seconds.")
            time.sleep(20)

results_df = pd.DataFrame({'Judge Bio': df['bio'].tail(10), 'Judge Gender': results})

results_df


Unnamed: 0,Judge Bio,Judge Gender
80,The Right Honourable Richard Wagner is the 18...,Male
81,Mr. Justice Clément Gascon was born in Montrea...,Male
82,The Honourable Suzanne Côté was appointed to ...,Female
83,Justice Russell Brown was born in Vancouver on...,Male
84,"Justice Rowe was born in 1953 in St. John’s, ...",Male
85,"Born and raised in Montréal, Justice Sheilah L...",Female
86,Justice Nicholas Kasirer graduated from McGil...,Male
87,Justice Mahmud Jamal was appointed to the Sup...,Male
88,The Honourable Michelle O’Bonsawin is a widel...,Female
89,The Honourable Mary T. Moreau was born in Edm...,Female


### Intermediate Question 1

A guide to fine tuning on OpenAI's developer platform is available [here](https://platform.openai.com/docs/guides/fine-tuning/use-a-fine-tuned-model).

In the code cell below, there are urls to two datasets, a training dataset (100 rows) and a testing dataset (10 rows)

In both, the input column is a docket entry from the Federal Court's online dockets, and the output column is the name of the judge from the entry (and if no judge is present then it returns other). Note that the data is in both English and French.

Using the training dataset, fine tune a model that extracts the name of the judge. Apply that model to the test dataframe. Print the resulting test dataframe.

How does the accuracy of the fine tuned model compare to one shot or few shot approaches?

Hint: Fine tuinging a JSONL file rather than JSON file. You can export in JSONL format (like JSON, but with each line being a separate dictionary) via Pandas. To do so, when exporting set lines = True.

*NOTE: If you requested alternatives to OpenAI in your submitted assignment for Module 3 you can try to fine tune any other system of your choice, including [Spacy](https://spacy.io/usage/training) or BERT via [Hugginface](https://huggingface.co/learn/nlp-course/chapter3/1). Unless you already have experience with this, fine tuning on these other systems is probably a more difficult task than the Advanced Question 1 below, so if you are not using OpenAI maybe try that one first.*

In [None]:
import pandas as pd

train_url = "https://refugeelab.ca/wp-content/uploads/2024/02/train_data_mod4.xlsx"
test_url = "https://refugeelab.ca/wp-content/uploads/2024/02/test_data_mod4.xlsx"

df_train = pd.read_excel(train_url)
df_test = pd.read_excel(test_url)

# view df_train
df_train.head()

Unnamed: 0,input,output
0,(Décision finale) Jugement rendu(e) à Ottawa l...,other
1,(Décision finale) Jugement rendu(e) par Madame...,Elizabeth Walker
2,(Décision finale) Jugement rendu(e) par Monsie...,Roy
3,(Décision finale) Motifs de jugement et jugeme...,S. Noël
4,(Décision finale) Motifs de jugement et jugeme...,LeBlanc


In [None]:
# create jsonl file for training

# create system message:

system_message = '''You review court dockets and return a judge name if a judge is
listed by their individual name, and you return other if no judge is named or if
for any reason you are not entirely sure about the name of the judge'''

# create list of dictionaries
train_data = []
for i, row in df_train.iterrows():
    train_dict = {"messages":
                  [{"role": "system", "content": system_message},
                   {"role": "user", "content": row['input']},
                   {"role": "assistant", "content": row['output']}
                ]}
    train_data.append(train_dict)

# write to jsonl file
import json
with open('train_data_judges.jsonl', 'w') as outfile:
    for entry in train_data:
        json.dump(entry, outfile)
        outfile.write('\n')

In [None]:
# upload to OpenAI and note file ID
# NOTE: Alternatively, upload and submit via OpenAI fine tuning web interface:
# https://platform.openai.com/finetune

from openai import OpenAI

from dotenv import load_dotenv
load_dotenv()
# or use colabl secrets

client = OpenAI()

client.files.create(
  file=open("train_data_judges.jsonl", "rb"),
  purpose="fine-tune",
)

FileObject(id='file-KNMUo4aukLyDdFwGKzkaRgUC', bytes=77600, created_at=1707343909, filename='train_data_judges.jsonl', object='file', purpose='fine-tune', status='processed', status_details=None)

In [None]:
# Send instructions to OpenAI to fine-tune the model
client.fine_tuning.jobs.create(
  training_file="file-KNMUo4aukLyDdFwGKzkaRgUC",
  model="gpt-3.5-turbo",
)

FineTuningJob(id='ftjob-qu0HGpoOAPTK8Geh7xXrl47O', created_at=1707344077, error=None, fine_tuned_model=None, finished_at=None, hyperparameters=Hyperparameters(n_epochs='auto', batch_size='auto', learning_rate_multiplier='auto'), model='gpt-3.5-turbo-0613', object='fine_tuning.job', organization_id='org-OBNXT1YSS6IuFCMYYC5ICu3T', result_files=[], status='validating_files', trained_tokens=None, training_file='file-KNMUo4aukLyDdFwGKzkaRgUC', validation_file=None)

In [None]:
# Checking to see if the fine-tuning job is complete
# Will also receive an email at address associated with OpenAI account
client.fine_tuning.jobs.retrieve('ftjob-qu0HGpoOAPTK8Geh7xXrl47O')

FineTuningJob(id='ftjob-qu0HGpoOAPTK8Geh7xXrl47O', created_at=1707344077, error=None, fine_tuned_model='ft:gpt-3.5-turbo-0613:refugee-law-lab::8pkgSS3i', finished_at=1707344870, hyperparameters=Hyperparameters(n_epochs=3, batch_size=1, learning_rate_multiplier=2), model='gpt-3.5-turbo-0613', object='fine_tuning.job', organization_id='org-OBNXT1YSS6IuFCMYYC5ICu3T', result_files=['file-kbzpzFvJEkqvXRhjhlWNIVB1'], status='succeeded', trained_tokens=50313, training_file='file-KNMUo4aukLyDdFwGKzkaRgUC', validation_file=None)

In [None]:
# Model created = ft:gpt-3.5-turbo-0613:refugee-law-lab:docket-judge-name:8pkNVzQp
# Try on test data

# function to get completion on docket
system_message = '''You review court dockets and return a judge name if a judge is
listed by their individual name, and you return other if no judge is named or if
for any reason you are not entirely sure about the name of the judge'''
# import time
def get_judge_name(input):
    #time.sleep(25)
    return get_completion(input,
            system_message=system_message,
            model="ft:gpt-3.5-turbo-0613:refugee-law-lab::8pkgSS3i"
            )
df_test['output'] = df_test['input'].apply(get_judge_name)
df_test

# Seems to be working well!


Unnamed: 0,input,output
0,Letter from Applicant dated 12-SEP-2019 advisi...,other
1,"(Final decision) Order granting the motion, on...",other
2,(Final decision) Order rendered by The Presidi...,other
3,(Décision finale) Ordonnance rejetant la deman...,other
4,(Final decision) Order rendered by The Honoura...,Favel
5,(Décision finale) Ordonnance rendu(e) par Mada...,Roussel
6,(Final decision) Order rendered by Associate C...,Gagné
7,Order rendered by The Honourable Madam Justice...,Heneghan
8,(Final decision) Order rendered by The Honoura...,Annis
9,Order rendered by The Honourable Mr. Justice B...,Brown


In [None]:
# Alternative Approach by Aidan Ryan: PART I: Fine Tune the Model

import pandas as pd
from openai import OpenAI
import json

train_url = "https://refugeelab.ca/wp-content/uploads/2024/02/train_data_mod4.xlsx"
test_url = "https://refugeelab.ca/wp-content/uploads/2024/02/test_data_mod4.xlsx"

df_train = pd.read_excel(train_url)
df_test = pd.read_excel(test_url)

training_data = []

for index, row in df_train.iterrows():
  training_data.append({'messages': [
      {"role":"system", "content": "Your goal is to extract the last name of the judge from the text."},
      {"role":"user", "content": row['input']},
      {"role":"assistant", "content": row['output']}
  ]})

with open('training_data.jsonl', 'w') as jsonl_file:
    for item in training_data:
        jsonl_file.write(json.dumps(item) + '\n')

#Manually used jsonl training data to fine tune a model through the openai platform, got model:
#ft:gpt-3.5-turbo-1106:personal::8pOlMgFS

In [None]:
# Alternative Approach by Aidan Ryan: PART II - Apply the Model

client = OpenAI()

# log in using either .env or colab secrets
from dotenv import load_dotenv
load_dotenv()

df_test['Naive GPT Result'] = 'NA'
df_test['Fine-Tuned Result'] = 'NA'

for index, row in df_test.iterrows():
  df_test.at[index, 'Naive GPT Result'] = get_completion(row['input'], system_message ="Your goal is to extract the last name of the judge from the text, choosing other if no judge is named.", model='gpt-3.5-turbo')
  df_test.at[index, 'Fine-Tuned Result'] = get_completion(row['input'], system_message ="Your goal is to extract the last name of the judge from the text, choosing other if no judge is named.", model='ft:gpt-3.5-turbo-1106:personal::8pOlMgFS')


### Advanced Question 1

The Ontario Ministry of Labour maintains a database of collective agreements.

The most recent York Unviersity and CUPE Local 3903 collective agreement (2020-2023) is available here:

https://ws.lr.labour.gov.on.ca/CA/doc/611-32365-23%20(805-0053)?library=Education%20and%20Related%20Services

Create a local Retrieval Augmented Generation (RAG) application that:

* Loads the text of the collective agreement at the URL (it is a PDF)
* Splits the text into manageable text chunks
* Creates embeddings for the chunks (I recommend using openAI's embeddings) and stores them in a vector store (I recommend using Chroma locally)
* Allows the user to ask questions about the document using a generative AI model (I recommend using one of the openAI models), with the RAG system providing the most relevant chunks of the document for context.

With that set up, have the user ask the system what the Collective Agreement has to say about Bill 124 and print the output.

Hint: I recommend using Langchain. There is a simple guide [here](https://python.langchain.com/docs/use_cases/question_answering/quickstart), but note that you will need to use a different langchain document loader (i.e. PDF not html). Also, if you'd like a nicer user interface than what can be achieved via the "input" function, try using gradio.

In [None]:
#Source: https://python.langchain.com/docs/use_cases/question_answering/quickstart

#!pip install langchain
#!pip install pypdf
#!pip install chromadb
#!pip install langchainhub

from langchain import hub
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

from dotenv import load_dotenv
load_dotenv()

# load document
loader = PyPDFLoader("https://ws.lr.labour.gov.on.ca/CA/doc/611-32365-23%20(805-0053)?library=Education%20and%20Related%20Services")
docs = loader.load()

# Create vectorstore with embeddings from document
text_splitter = RecursiveCharacterTextSplitter(chunk_size=700, chunk_overlap=100)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())

# Set up RAG chain
retriever = vectorstore.as_retriever()
prompt = hub.pull("rlm/rag-prompt")
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0.5)

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)


In [None]:
# Run RAG chain
rag_chain.invoke("What does the agreement say about bill 124?")

'The agreement states that if Bill 124 is repealed or successfully challenged, the parties agree to re-negotiate the salary and compensation provisions of the collective agreement that were limited by Bill 124. This re-negotiation will be done within the limits of the law and taking into account the years of service accumulated up to that time.'