**Use Case - Using openai to extract the outcome of all the available judgements in the court of criminal appeal (superior)**

Up to July 23 there were 139 appeals available on e-courts. We will be using openAI chat completion to determine the outcome of the appeals. Only 27 were successful/partially successful, 112 were rejected. 

The data was extracted previously and stored in a SQL database. This is a particularly interesting task because some of the judgements (quite a few) are in Maltese. 

In [None]:
import pyodbc
import pandas as pd
from tqdm import tqdm
from easygoogletranslate import EasyGoogleTranslate

Importing the dataset from sql 

In [1]:
# Connection details
conn_str = (
    r'DRIVER={ODBC Driver 17 for SQL Server};'
    r'SERVER=RG_LAPTOP\SQLEXPRESS;'
    r'DATABASE=master;'
    r'Trusted_Connection=yes;'
)

# SQL query
query = '''
    SELECT *
    FROM CourtJudgementsText
    WHERE CaseNumber IN (
        SELECT CaseNumber
        FROM CourtJudgements_Staging
        WHERE Court LIKE '%OF CRIMINAL APPEAL (SUPERIOR)%'
    )
'''

# Establishing the connection
conn = pyodbc.connect(conn_str)

# Executing the query and fetching the data into a DataFrame
df = pd.read_sql_query(query, conn)

# Closing the connection
conn.close()


  df = pd.read_sql_query(query, conn)


Thats them - after reading a few it was evident that the last 300 words should be more than enough to get 'most' right. Some judgements are more complex

In [2]:
import re

# Function to extract the last 300 words from a text
def extract_last(text):
    words = re.findall(r'\b\w+\b', text)
    last_100_words = ' '.join(words[-300:])
    return last_100_words

# Apply the function to create the 'last_100' column
df['last_300'] = df['JudgmentText'].apply(extract_last)

# Print the updated DataFrame
print(df)

    CaseNumber                                       JudgmentText  \
0        72770  Kopja Informali ta' Sentenza  \nPagna 1 minn 3...   
1       136696  1 \n  \nThe Court of Criminal Appeal  \n \nHis...   
2       136697  1 \n \n \n \nQORTI TA` L -APPELL KRIMINALI  \n...   
3       136698  1 \n  \nThe Court of Criminal Appeal  \n \nHis...   
4       120005  1 \n QORTI TAL -APPELL KRIMINALI  \nS.T.O. PRI...   
..         ...                                                ...   
134     138528  1 \n \n \n \nQORTI TA` L -APPELL KRIMINALI  \n...   
135     113796  1 \n  \n \n \nQORTI TA'  L-APPELL KRIMINALI  \...   
136     113797  1 \n  \n \n \nQORTI TA'  L-APPELL KRIMINALI  \...   
137     113799  1 \n  \n \n \nQORTI TA'  L-APPELL KRIMINALI  \...   
138      17727  Kopja Informali ta' Sentenza  \nPagna 1 minn 1...   

                                              last_300  
0    dealt with was outside the broad range of pena...  
1    Institute of Bail it is evident that no appeal...  


Using google translate for the last 300 words. Tried to use chatGPT with Maltese Language with little success. 

In [4]:
def translate_text(df):
    translator = EasyGoogleTranslate(
        source_language='mt',
        target_language='en',
        timeout=10
    )

    translated_texts = []
    total_rows = len(df)
    progress_bar = tqdm(total=total_rows, desc="Translating")

    for index, row in df.iterrows():
        text = row['last_300']
        chunk_size = 5000  # Set the maximum chunk size for translation
        chunks = [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]
        translated_chunks = [translator.translate(chunk) for chunk in chunks]
        translated_text = ' '.join(translated_chunks)
        translated_texts.append(translated_text)

        progress_bar.update(1)

    progress_bar.close()

    df['last_300_e'] = translated_texts

    return df

In [None]:
df = translate_text(df)

The next step is to use openai (gpt-3.5-turbo) to evaluate the translated extracts.

In [2]:
import openai
import pandas as pd
import time
from tqdm import tqdm
import itertools

# Set API key and model
api_key = 'sk-xxxx'
GPT_MODEL = "gpt-3.5-turbo"
openai.api_key = api_key

In [None]:
def generate_response(article_body):

    retries = 1  # Number of retries in case of a disconnection
    for retry in range(1, retries + 2):
        try:
            response = openai.ChatCompletion.create(
                model=GPT_MODEL,
                messages=[
                    {"role": "system", "content": """Imagine you are a legal journalist. Each text body is the last 300 words of a Maltese Court Judgment from Criminal Appeal. Your task is to 
                     describe the outcome of the appeal in about 100 words. Use short sentences when possible."""},
                    {"role": "user", "content": f"Body: {article_body}"}
                ],
                max_tokens=120,
                temperature=0.0
            )
            return response.choices[0].message['content']
        except Exception as e:
            print(f"An error occurred: {str(e)}")
            if retry <= retries:
                print(f"Retrying in 5 seconds... (Attempt {retry})")
                time.sleep(2)
            else:
                print("Max retries reached. Skipping this article.")
    return "Error: Unable to generate response"

In [None]:
# Create a progress bar using tqdm
pbar = tqdm(total=10)  # Set the total to 10 for the first 10 rows

# Generate responses for each judgment text and store in "Output_1" column
for index, row in itertools.islice(df.iterrows(), 10):  # Use itertools.islice to limit the loop to the first 10 rows
    df.at[index, 'GPT_Summary'] = generate_response(row['last_300_e'])
    pbar.update(1)
    if index == 9:  # Break the loop after processing the first 10 rows
        break

pbar.close()

In [None]:
# Create a progress bar using tqdm
pbar = tqdm(total=len(df))

# Generate responses for each judgment text and store in "Output_1" column
for index, row in df.iterrows():
    df.at[index, 'GPT_Summary'] = generate_response(row['last_300_e'])
    pbar.update(1)

pbar.close()

*This part was done previously - the next step was to manually check the results!

df = pd.read_excel(r"C:\Users\grupp\Python Files\01. E_Courts\Criminal_Appeals.xlsx")

In [7]:
df = df.drop('Unnamed: 8', axis=1)

In [8]:
df.head()

Unnamed: 0.1,Unnamed: 0,CaseNumber,JudgmentText,last_300,last_300_e,GPT_Summary,From GPT,Manual Checks
0,0,72770,Kopja Informali ta' Sentenza \nPagna 1 minn 3...,dealt with was outside the broad range of pena...,dealt with was outside the broad range of pena...,The appeal was rejected and the appealed sente...,Rejected,Rejected
1,1,136696,1 \n \nThe Court of Criminal Appeal \n \nHis...,Institute of Bail it is evident that no appeal...,Institute of Bail it is evident that no appeal...,The appeal is declared null and void. The Cour...,Rejected,Rejected
2,2,136697,1 \n \n \n \nQORTI TA` L -APPELL KRIMINALI \n...,għaliex il liġi stess b deroga għar regoli ġen...,because the law itself with a derogation to ge...,The appeal is rejected and the appealed senten...,Rejected,Rejected
3,3,136698,1 \n \nThe Court of Criminal Appeal \n \nHis...,Frar 2012 App Sup 8 mogħti lil l esperti rispe...,February 2012 App Sup 8 given to the respectiv...,The appeal filed by the Attorney General is up...,Successful,Upheld
4,4,120005,1 \n QORTI TAL -APPELL KRIMINALI \nS.T.O. PRI...,nullita tal operazzjoni u del resto din qatt m...,nullity of the operation and del resto this wa...,The appeal is rejected. The court found no fla...,Rejected,Rejected
