In [1]:
import os
from dotenv import load_dotenv
import pandas as pd
pd.set_option('display.max_columns', None)
import numpy as np
import textwrap
from tqdm import tqdm 
from IPython.display import Markdown, display
import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer
tqdm.pandas()

In [2]:
from nltk.translate.meteor_score import meteor_score
from nltk.tokenize import word_tokenize
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [3]:
from langchain.embeddings import HuggingFaceInstructEmbeddings

In [4]:
from top_n_tool import get_llm_fact_pattern_summary

In [5]:
df = pd.read_parquet("reddit_legal_cluster_test_results.parquet")
df['timestamp'] = pd.to_datetime(df['created_utc'], unit='s')
df['datestamp'] = df['timestamp'].dt.date
df.reset_index(inplace=True, drop=True)
df.sort_index(inplace=True)
print(f"df shape: {df.shape}")
df.head(2)

df shape: (5000, 16)


Unnamed: 0,index,created_utc,full_link,id,body,title,text_label,flair_label,embeddings,token_count,llm_title,State,kmeans_label,topic_title,timestamp,datestamp
0,1078,1575952538,https://www.reddit.com/r/legaladvice/comments/...,e8lsen,I applied for a job and after two interviews I...,"Failed a drug test due to amphetamines, I have...",employment,5,"[9.475638042064453e-05, 0.0005111666301983955,...",493,"""Validity of Schedule II Drug Prescription in ...",PR,8,Employment Legal Concerns and Issues,2019-12-10 04:35:38,2019-12-10
1,2098,1577442453,https://www.reddit.com/r/legaladvice/comments/...,eg9ll2,"Hi everyone, thanks in advance for any guidanc...","Speeding ticket in Tennessee, Georgia Driver's...",driving,4,"[-0.006706413111028856, 0.020911016696181495, ...",252,"""Speeding ticket consequences for out-of-state...",KY,10,Legal Topics in Traffic Violations,2019-12-27 10:27:33,2019-12-27


In [10]:
test_query = df['body'].sample(1, random_state=18).tolist()[0]
test_query_link = str(df['full_link'].sample(1, random_state=13).tolist()[0])
print(test_query)

   I won a small claims case against a used car dealer for $2900 for a used 2002 car I bought.  I bought a extended warranty with car, on our way home after purchase vehicle lost oil pressure and hd to be towed home. I called dealer they said no problem that the extended warranty would cover it.  I had it towed to GM dealership to have it checked out and they said engine and tranny were both bad and would need replaced. they called warranty company to submit claim and found out the warranty I had bought was never available for that car since it had more than125,000 miles on it.  Now used car dealer said they are not liable and have to do nothing. I won the small claims case but they have since appealed and I need to know how to answer my appellee respondents brief but I can not find any similar cases or a format to do it in?   Please help          Thank You

Extra info: Plaintiff failed to bring a cause of action against defendant upon which relief could be granted. Trial courts erred 

In [6]:
model_name = "tuner007/pegasus_summarizer"
torch_device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer_summarizer = PegasusTokenizer.from_pretrained(model_name)
model_summarizer = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

In [7]:
def get_response(input_text):
    batch = tokenizer_summarizer(
        [input_text],
        truncation=True,
        padding="longest",
        max_length=1024,
        return_tensors="pt",
    ).to(torch_device)
    gen_out = model_summarizer.generate(
        **batch, max_length=512, num_beams=5, num_return_sequences=1, temperature=1.5
    )
    output_text = tokenizer_summarizer.batch_decode(gen_out, skip_special_tokens=True)
    return output_text[0]

In [8]:
sample_df = df.sample(10, random_state=42)

In [9]:
sample_df['pegasus_summary'] = sample_df["body"].progress_apply(get_response)

100%|██████████| 10/10 [01:29<00:00,  8.98s/it]


In [10]:
def display_summary(df, original_text_col, summary_col):
    for index, row in df.iterrows():
        print(f"Original Text {index+1}:\n")
        print('\n'.join(textwrap.wrap(row[original_text_col], width=120)))
        print("\n")
        print(f"Summarized Text {index+1}:\n")
        print('\n'.join(textwrap.wrap(row[summary_col], width=120)))
        print("\n\n")

display_summary(sample_df, 'body', 'pegasus_summary')


Original Text 1502:

So my friend is currently attending a very small alternative high school after goin through tough times  at her normal
high schools. For the sake of privacy I’ll call my friend Cleo. So for her 2020 goal she decided that it she wanted to
focus on her mental health and herself for this year and decided to cut out the toxic people in her life. In this
process, she cut out a former friend in the same high school, I’ll call her Anna. So she texts Anna and tells her that
she thinks they shouldn’t be friends anymore and says something on the lines of “Ive never really felt comfortable with
the people you were hanging out with and I’ve felt sometimes you were being disrespectful to me and I felt bothered by
your racist jokes and I think it would be better for us if we weren’t friends anymore”. To which they responded
something like “you’ve(Cleo) been disrespecting us and you treat us like garbage” and “we do everything for you and
we’ve always been there for you”. When in

In [11]:
sample_df = get_llm_fact_pattern_summary(sample_df, "body")

100%|██████████| 10/10 [00:49<00:00,  4.93s/it]


In [12]:
sample_df.head(2)

Unnamed: 0,level_0,index,created_utc,full_link,id,body,title,text_label,flair_label,embeddings,token_count,llm_title,State,kmeans_label,topic_title,pegasus_summary,summary
0,1501,7204,1578610923,https://www.reddit.com/r/legaladvice/comments/...,emhn7p,So my friend is currently attending a very sma...,2 Gorgons 1 Gal,school,9,"[0.014099272998582758, -0.018092472233581916, ...",612,"""Legal recourse for bullying, harassment, and ...",MP,9,Legal Consequences of False Accusations,"A friend of hers, who is attending an alternat...","Cleo, a student at a small alternative high sc..."
1,2586,2354,1589666939,https://www.reddit.com/r/legaladvice/comments/...,gl3exq,"I rent a house with 3 people, and the landlord...",CALIFORNIA: Landlord wants to show house,housing,7,"[-0.0009711538267174882, -0.011054954024413326...",88,"""Legal implications of refusing to vacate rent...",NC,0,Legal Topics in Rental Properties,"I rent a house with 3 people, and the landlord...",The scenario involves a tenant who rents a hou...


In [13]:
display_summary(sample_df, 'body', 'summary')

Original Text 1:

So my friend is currently attending a very small alternative high school after goin through tough times  at her normal
high schools. For the sake of privacy I’ll call my friend Cleo. So for her 2020 goal she decided that it she wanted to
focus on her mental health and herself for this year and decided to cut out the toxic people in her life. In this
process, she cut out a former friend in the same high school, I’ll call her Anna. So she texts Anna and tells her that
she thinks they shouldn’t be friends anymore and says something on the lines of “Ive never really felt comfortable with
the people you were hanging out with and I’ve felt sometimes you were being disrespectful to me and I felt bothered by
your racist jokes and I think it would be better for us if we weren’t friends anymore”. To which they responded
something like “you’ve(Cleo) been disrespecting us and you treat us like garbage” and “we do everything for you and
we’ve always been there for you”. When in ac

In [14]:
# Function to calculate METEOR score
def calculate_meteor(df, original_text_col, summarized_text_col, metric_name = 'meteor_score'):
    df[metric_name] = df.apply(lambda row: meteor_score([word_tokenize(row[original_text_col])], word_tokenize(row[summarized_text_col])), axis=1)
    return df

# Function to calculate Cosine Similarity
def calculate_cosine_similarity(df, original_text_col, summarized_text_col, metric_name = 'cosine_similarity'):
    def cosine_similarity_score(row):
        vectorizer = TfidfVectorizer()
        tfidf_matrix = vectorizer.fit_transform([row[original_text_col], row[summarized_text_col]])
        similarity_matrix = cosine_similarity(tfidf_matrix, tfidf_matrix)
        return similarity_matrix[0, 1]
    df[metric_name] = df.apply(cosine_similarity_score, axis=1)
    return df


In [15]:
# Score openai summary
sample_df = calculate_meteor(sample_df, "body", "summary", metric_name = 'meteor_score_openai')
sample_df = calculate_cosine_similarity(sample_df, "body", "summary", metric_name = 'cosine_similarity_openai')

# Score pegasus summary
sample_df = calculate_meteor(sample_df, "body", "pegasus_summary", metric_name = 'meteor_score_pegasus')
sample_df = calculate_cosine_similarity(sample_df, "body", "pegasus_summary", metric_name = 'cosine_similarity_pegasus')

In [16]:
sample_df.head()

Unnamed: 0,level_0,index,created_utc,full_link,id,body,title,text_label,flair_label,embeddings,token_count,llm_title,State,kmeans_label,topic_title,pegasus_summary,summary,meteor_score_openai,cosine_similarity_openai,meteor_score_pegasus,cosine_similarity_pegasus
0,1501,7204,1578610923,https://www.reddit.com/r/legaladvice/comments/...,emhn7p,So my friend is currently attending a very sma...,2 Gorgons 1 Gal,school,9,"[0.014099272998582758, -0.018092472233581916, ...",612,"""Legal recourse for bullying, harassment, and ...",MP,9,Legal Consequences of False Accusations,"A friend of hers, who is attending an alternat...","Cleo, a student at a small alternative high sc...",0.214294,0.679522,0.081275,0.419183
1,2586,2354,1589666939,https://www.reddit.com/r/legaladvice/comments/...,gl3exq,"I rent a house with 3 people, and the landlord...",CALIFORNIA: Landlord wants to show house,housing,7,"[-0.0009711538267174882, -0.011054954024413326...",88,"""Legal implications of refusing to vacate rent...",NC,0,Legal Topics in Rental Properties,"I rent a house with 3 people, and the landlord...",The scenario involves a tenant who rents a hou...,0.353986,0.560747,0.798212,0.917734
2,2653,7420,1575692649,https://www.reddit.com/r/legaladvice/comments/...,e79rv5,Had the utility company at my apartment for pi...,(CA) Utility Company tagged furnace as hazardo...,housing,7,"[0.002910906343142232, -0.010915469854511384, ...",487,"""Landlord's negligence and potential violation...",NM,3,Rental Property and Landlord Matters,My landlord sent a handyman to look at my furn...,"In this scenario, the tenant had utility worke...",0.401421,0.712941,0.132122,0.739993
3,1055,5152,1588983332,https://www.reddit.com/r/legaladvice/comments/...,gg4zm9,So I was selling my motorcycle and went to sho...,[TN] Insurance is dragging their feet on my claim,insurance,8,"[0.0035871945918938717, -0.01260652162372722, ...",181,"""Delays in insurance claim for stolen motorcyc...",UT,2,Car Accident Liability and Insurance,I was selling my motorcycle and went to show i...,The individual in this scenario was selling th...,0.343034,0.538094,0.324989,0.672485
4,705,8104,1589977728,https://www.reddit.com/r/legaladvice/comments/...,gnagdp,My old tenants were supposed to move out on Ap...,Can I get sued for throwing out my ex-roommate...,housing,7,"[0.006972842104453466, -0.02474492316535464, 0...",262,"""Legal implications of allowing ex-tenants to ...",DC,0,Legal Topics in Rental Properties,My old tenants were supposed to move out on Ap...,"In this scenario, the landlord had tenants who...",0.343348,0.626731,0.258311,0.730131


In [17]:
from itertools import combinations

def pairwise_cosine_similarity(texts):
    vectorizer = TfidfVectorizer()
    tfidf_matrix = vectorizer.fit_transform(texts)
    cosine_similarities = cosine_similarity(tfidf_matrix, tfidf_matrix)
    np.fill_diagonal(cosine_similarities, 0)
    pairs = list(combinations(range(len(texts)), 2))
    return np.mean([cosine_similarities[i, j] for i, j in pairs])

In [18]:
pairwise_cosine_similarity(sample_df['body'].tolist()), pairwise_cosine_similarity(sample_df['summary'].tolist()), pairwise_cosine_similarity(sample_df['pegasus_summary'].tolist())

(0.2272324343256208, 0.3133059255223452, 0.11805679818169615)

In [19]:
def embeddings_pairwise_cosine_similarity(texts, embeddings):
    # Generate embeddings for each text
    embedded_texts = [embeddings.embed_query(text) for text in texts]
    # Calculate cosine similarity
    cosine_similarities = cosine_similarity(embedded_texts, embedded_texts)
    # Fill diagonal with 0s as the similarity of the text with itself is not needed
    np.fill_diagonal(cosine_similarities, 0)
    # Get pairs of indices
    pairs = list(combinations(range(len(texts)), 2))
    # Return the mean cosine similarity
    return np.mean([cosine_similarities[i, j] for i, j in pairs])

In [20]:
embeddings = HuggingFaceInstructEmbeddings(
    query_instruction="Represent the insurance document for retrieval: "
)

load INSTRUCTOR_Transformer
max_seq_length  512


In [21]:
mean_cosine_similarity = embeddings_pairwise_cosine_similarity(sample_df['body'].tolist(), embeddings)
mean_cosine_similarity

0.7712422795401508

In [22]:
pairwise_cosine_similarity(sample_df['body'].tolist()), pairwise_cosine_similarity(sample_df['summary'].tolist()), pairwise_cosine_similarity(sample_df['pegasus_summary'].tolist())

(0.2272324343256208, 0.3133059255223452, 0.11805679818169615)

In [54]:
embeddings_pairwise_cosine_similarity(sample_df['body'].tolist(), embeddings), embeddings_pairwise_cosine_similarity(sample_df['summary'].tolist(), embeddings), embeddings_pairwise_cosine_similarity(sample_df['pegasus_summary'].tolist(), embeddings)

(0.7712422795401508, 0.7920908053427657, 0.7590423799157077)

In [23]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

def rerank_with_cross_encoder(df: pd.DataFrame,
                              query: str, 
                              text_col_name: str = "summary",
                              model_name: str = 'BAAI/bge-reranker-large',
    ) -> pd.DataFrame:
    """
    A function to rerank search results using pre-trained cross-encoder
    
    On models:
    Base models are listed below. More info here: https://www.sbert.net/docs/pretrained-models/ce-msmarco.html
    ms-marco-MiniLM-L-6-v2: performance
    ms-marco-MiniLM-L-2-v2: speed
    
    Example: rerank_res_df = rerank_with_cross_encoder(top_n_res_df, query, 'summary')
    
    Args:
        df (pd.DataFrame): Results from `get_llm_fact_pattern_summary`.
        text_col_name (str): The column of text to to compare with query text. Defaults to "summary". 
    Returns:
        pd.DataFrame: Input df sorted based on re-ranking scores.
    """
    # Initialize the model and tokenizer
    model = AutoModelForSequenceClassification.from_pretrained(model_name)
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    # Prepare the data for the model
    query_df = pd.DataFrame({"query": [query]})
    query_fact_pattern_df = get_llm_fact_pattern_summary(df=query_df, text_col_name="query")
    query_proc = query_fact_pattern_df["summary"].iloc[0]  
    data = [(query_proc, text) for text in df[text_col_name].tolist()]
    # Tokenize the data
    features = tokenizer(*zip(*data), padding=True, truncation=True, return_tensors="pt")
    # Predict the scores
    model.eval()
    with torch.no_grad():
        scores = model(**features).logits
    # Convert scores to numpy array
    scores = scores.detach().numpy()
    # Add the scores to the dataframe
    df['scores'] = scores
    # Sort the dataframe by the scores in descending order
    df = df.sort_values(by='scores', ascending=False)
    return df

In [24]:
query = sample_df.iloc[0]['body']
print(query)

So my friend is currently attending a very small alternative high school after goin through tough times  at her normal high schools. For the sake of privacy I’ll call my friend Cleo. So for her 2020 goal she decided that it she wanted to focus on her mental health and herself for this year and decided to cut out the toxic people in her life. In this process, she cut out a former friend in the same high school, I’ll call her Anna. So she texts Anna and tells her that she thinks they shouldn’t be friends anymore and says something on the lines of “Ive never really felt comfortable with the people you were hanging out with and I’ve felt sometimes you were being disrespectful to me and I felt bothered by your racist jokes and I think it would be better for us if we weren’t friends anymore”. To which they responded something like “you’ve(Cleo) been disrespecting us and you treat us like garbage” and “we do everything for you and we’ve always been there for you”. When in actuality, they’ve b

In [25]:
test = rerank_with_cross_encoder(sample_df, query, "summary")

100%|██████████| 1/1 [00:04<00:00,  4.04s/it]


In [26]:
test

Unnamed: 0,level_0,index,created_utc,full_link,id,body,title,text_label,flair_label,embeddings,token_count,llm_title,State,kmeans_label,topic_title,pegasus_summary,summary,meteor_score_openai,cosine_similarity_openai,meteor_score_pegasus,cosine_similarity_pegasus,scores
0,1501,7204,1578610923,https://www.reddit.com/r/legaladvice/comments/...,emhn7p,So my friend is currently attending a very sma...,2 Gorgons 1 Gal,school,9,"[0.014099272998582758, -0.018092472233581916, ...",612,"""Legal recourse for bullying, harassment, and ...",MP,9,Legal Consequences of False Accusations,"A friend of hers, who is attending an alternat...","Cleo, a student at a small alternative high sc...",0.214294,0.679522,0.081275,0.419183,8.316047
6,589,952,1475765751,https://www.reddit.com/r/legaladvice/comments/...,565rm8,"Hi,\n\nI live in Alberta, Canada.\nI started u...",Unauthorized use of trademark - Canada/USA,business,0,"[0.005342585541970538, -0.036669563787880624, ...",140,"""Trademark infringement concerns: Can a Canadi...",AL,6,Compilation of Legal Topics,A company from the US has sent me a letter cla...,"The individual in question resides in Alberta,...",0.309878,0.427063,0.460417,0.704744,-7.151401
8,2413,196,1590614822,https://www.reddit.com/r/legaladvice/comments/...,grsqun,This is definitely a minor post but I wasn’t s...,Employer sends email to multiple furloughed em...,employment,5,"[-0.013531485970733639, -0.0018254108100103777...",239,"""Is it legal for an employer to share employee...",ND,8,Employment Legal Concerns and Issues,I work for a university in the athletic depart...,"In this scenario, the individual works for a u...",0.384056,0.626048,0.231399,0.477347,-7.168838
1,2586,2354,1589666939,https://www.reddit.com/r/legaladvice/comments/...,gl3exq,"I rent a house with 3 people, and the landlord...",CALIFORNIA: Landlord wants to show house,housing,7,"[-0.0009711538267174882, -0.011054954024413326...",88,"""Legal implications of refusing to vacate rent...",NC,0,Legal Topics in Rental Properties,"I rent a house with 3 people, and the landlord...",The scenario involves a tenant who rents a hou...,0.353986,0.560747,0.798212,0.917734,-7.206954
7,2468,5918,1416795284,https://www.reddit.com/r/legaladvice/comments/...,2n8116,A common scenario: you're speeding on the high...,"""Do you know how fast you were going?""",driving,4,"[0.0037471115415221956, 0.04198835666309084, 0...",222,"""Best tactics for responding to a cop's questi...",AS,10,Legal Topics in Traffic Violations,If you're pulled over by a police officer for ...,"In this scenario, the individual is pulled ove...",0.223614,0.327331,0.266153,0.697188,-7.211981
9,1600,8907,1517592365,https://www.reddit.com/r/legaladvice/comments/...,7usvw6,In 2011 my uncle passed away. He was to inheri...,Estate not closed out going on 4 years. [Kentu...,wills,10,"[-0.0022509259519703116, -0.002298492555696126...",442,"""Inheritance dispute: Seeking recourse for del...",TX,1,Legal Issues in Estate Administration,My grandfather passed away in 2014 and while i...,"In this scenario, the individual's grandfather...",0.481745,0.717668,0.119552,0.56147,-7.309842
4,705,8104,1589977728,https://www.reddit.com/r/legaladvice/comments/...,gnagdp,My old tenants were supposed to move out on Ap...,Can I get sued for throwing out my ex-roommate...,housing,7,"[0.006972842104453466, -0.02474492316535464, 0...",262,"""Legal implications of allowing ex-tenants to ...",DC,0,Legal Topics in Rental Properties,My old tenants were supposed to move out on Ap...,"In this scenario, the landlord had tenants who...",0.343348,0.626731,0.258311,0.730131,-7.333016
2,2653,7420,1575692649,https://www.reddit.com/r/legaladvice/comments/...,e79rv5,Had the utility company at my apartment for pi...,(CA) Utility Company tagged furnace as hazardo...,housing,7,"[0.002910906343142232, -0.010915469854511384, ...",487,"""Landlord's negligence and potential violation...",NM,3,Rental Property and Landlord Matters,My landlord sent a handyman to look at my furn...,"In this scenario, the tenant had utility worke...",0.401421,0.712941,0.132122,0.739993,-7.418821
3,1055,5152,1588983332,https://www.reddit.com/r/legaladvice/comments/...,gg4zm9,So I was selling my motorcycle and went to sho...,[TN] Insurance is dragging their feet on my claim,insurance,8,"[0.0035871945918938717, -0.01260652162372722, ...",181,"""Delays in insurance claim for stolen motorcyc...",UT,2,Car Accident Liability and Insurance,I was selling my motorcycle and went to show i...,The individual in this scenario was selling th...,0.343034,0.538094,0.324989,0.672485,-7.724254
5,106,6343,1472700007,https://www.reddit.com/r/legaladvice/comments/...,50l7lh,**Update:** I called the lady at the shelter b...,I was denied a volunteer position because I am...,criminal,2,"[-0.004616000582941044, -0.0072016183486047144...",264,"""False Criminal Investigation: Seeking Legal A...",PA,9,Legal Consequences of False Accusations,I applied to volunteer at a local animal shelt...,The individual in this scenario applied to vol...,0.24277,0.44211,0.313679,0.588914,-8.552497


In [41]:
from typing import List
from llama_index.schema import Document
from llama_index.query_engine import CitationQueryEngine
from llama_index import (
    VectorStoreIndex,
    ServiceContext,
)

def get_li_docs_from_df(df: pd.DataFrame, 
                        text_col: str,  
                        **kwargs):
    docs = []
    for i, row in df.iterrows():
        metadata = {key: row[value] for key, value in kwargs.items()}
        doc = Document(
            text=row[text_col],
            id_=i,
            metadata=metadata,
        )
        docs.append(doc)
    return docs

In [58]:
doc_metadata = {
    "id": 'index',
    "State": 'State',
    "Title": "llm_title",
    "URL": 'full_link',
}

docs_test = get_li_docs_from_df(
    sample_df, 
    "body",
    **doc_metadata)

In [37]:
print(docs_test[0].get_metadata_str())

id: 7204
State: MP
Title: "Legal recourse for bullying, harassment, and discrimination in a high school setting"
link: https://www.reddit.com/r/legaladvice/comments/emhn7p/2_gorgons_1_gal/


In [57]:
from llama_index import Prompt

CITATION_TEMPLATE = Prompt(
    "Please provide an answer based solely on the provided sources. "
    "When referencing information from a source, "
    "cite the appropriate source using its corresponding number and URL. "
    "Every answer should include at least one source citation. "
    "Only cite a source when you are explicitly referencing it. "
    "For example:\n"
    "[1]:\n"
    "URL: source_1_address.com\n"
    "Excerpt:\n-----\nThe sky is red in the evening and blue in the morning.\n"
    "[2]:\n"
    "URL: source_2_address.com\n"
    "Excerpt:\n-----\nWater is wet when the sky is red.\n"
    "Query: When is water wet?\n"
    "Answer: Water will be wet when the sky is red [2](source_2_address.com), "
    "which occurs in the evening [1](source_1_address.com).\n"
    "Now it's your turn. Below are several sources of information. Make sure to cite the information using [[number](URL)] notation:"
    "\n------\n"
    "{context_str}"
    "\n------\n"
    "Query: {query_str}\n"
    "Answer: "
)

In [86]:
from llama_index.llms import OpenAI

def get_answer_from_model(df, query_string):

    service_context = ServiceContext.from_defaults(
        llm=OpenAI(temperature=0, model="gpt-4")
    )
    doc_metadata = {
        "id": 'index',
        "State": 'State',
        "Title": "llm_title",
        "URL": 'full_link',
    }
    documents = get_li_docs_from_df(df, "body", **doc_metadata)
    

    index = VectorStoreIndex.from_documents(documents, service_context=service_context)

    query_engine = CitationQueryEngine.from_args(
        index,
        citation_qa_template=CITATION_TEMPLATE,
        similarity_top_k=len(documents),
        citation_chunk_size=1024,
    )
    # query the citation query engine
    response = query_engine.query(query_string)
    return response

In [88]:
query_string = "explain these legal questions in a markdown table"

In [89]:
response = get_answer_from_model(sample_df, query_string)

In [90]:
Markdown(f"{response}")

| Title | State | Legal Question |
| --- | --- | --- |
| "Is it legal for an employer to share employee furlough information via a mass email?" | ND | The user is questioning the legality of their employer sharing their furlough status via a mass email to other employees [196](https://www.reddit.com/r/legaladvice/comments/grsqun/employer_sends_email_to_multiple_furloughed/). |
| "Legal implications of refusing to vacate rental property for showings during pandemic" | NC | The user is asking about the potential legal consequences of refusing to vacate their rental property for showings during the COVID-19 pandemic [2354](https://www.reddit.com/r/legaladvice/comments/gl3exq/california_landlord_wants_to_show_house/). |
| "Trademark infringement concerns: Can a Canadian business be held liable for violating US laws?" | AL | The user is seeking advice on whether their Canadian business can be held liable for trademark infringement under US law [952](https://www.reddit.com/r/legaladvice/comments/565rm8/unauthorized_use_of_trademark_canadausa/). |
| "Best tactics for responding to a cop's question about speeding: honesty, admission, or pleading the fifth?" | AS | The user is asking for advice on the best way to respond when a police officer asks about their speed during a traffic stop [5918](https://www.reddit.com/r/legaladvice/comments/2n8116/do_you_know_how_fast_you_were_going/). |
| "Legal implications of allowing ex-tenants to store furniture during COVID-19 and potential liability for disposing of it" | DC | The user is asking about their potential liability if they dispose of their ex-tenants' furniture that was left in their property during the COVID-19 pandemic [8104](https://www.reddit.com/r/legaladvice/comments/gnagdp/can_i_get_sued_for_throwing_out_my_exroommates/). |
| "False Criminal Investigation: Seeking Legal Advice for Erroneous Background Check Results" | PA | The user is seeking advice after being falsely flagged as under criminal investigation in a background check [6343](https://www.reddit.com/r/legaladvice/comments/50l7lh/i_was_denied_a_volunteer_position_because_i_am/). |
| "Landlord's negligence and potential violations: Seeking legal remedies for unsafe furnace and habitability issues" | NM | The user is seeking legal remedies for their landlord's negligence and potential violations related to an unsafe furnace and habitability issues [7420](https://www.reddit.com/r/legaladvice/comments/e79rv5/ca_utility_company_tagged_furnace_as_hazardous/). |
| "Legal recourse for bullying, harassment, and discrimination in a high school setting" | MP | The user is asking about potential legal recourse for bullying, harassment, and discrimination in a high school setting [7204](https://www.reddit.com/r/legaladvice/comments/emhn7p/2_gorgons_1_gal/). |
| "Inheritance dispute: Seeking recourse for delayed disbursement and reduction of entitled amount in Kentucky estate" | TX | The user is seeking recourse for a delayed disbursement and reduction of their entitled amount from an inheritance [8907](https://www.reddit.com/r/legaladvice/comments/7usvw6/estate_not_closed_out_going_on_4_years_kentucky/). |
| "Delays in insurance claim for stolen motorcycle - seeking advice on time frame and next steps" | UT | The user is seeking advice on the time frame and next steps for an insurance claim for their stolen motorcycle [5152](https://www.reddit.com/r/legaladvice/comments/gg4zm9/tn_insurance_is_dragging_their_feet_on_my_claim/). |

In [76]:
from semantic_search import SemanticSearch

In [77]:
search_engine = SemanticSearch(df)

In [78]:
top_n = 10
filter_criteria = None
use_cosine_similarity = False
similarity_threshold = 0.93

In [83]:
test_res = search_engine.query_similar_documents(
    test_query, 
    top_n, 
    filter_criteria, 
    use_cosine_similarity,
    similarity_threshold,
)

In [84]:
prompt = f"Summarize how the information is relevant to the question. Summarize similarities, and include a bullet point list of specific examples.\n\nQUESTION: {test_query}"
print(prompt)

Summarize how the information is relevant to the question. Summarize similarities, and include a bullet point list of specific examples.

QUESTION: Hi, I’m a bit new and was told by a friend that I could come here to seek some legal advice on the direction I should go.
My wife (separated) and I have been married for five years and have a four year old daughter who is the light of my life. She and I have always had a special bond and I have taken it upon myself to do most of the child-rearing as her mother has traveled for work and been mostly uninterested in her.
About three weeks ago, my wife and I got in a fight about her drug (opioid prescription that lead to heroin) use, and when I told her that I was taking my daughter she revealed to me that my daughter isn’t biologically mine and I wouldn’t have any legal rights to her. 
I believe her 95% and know who the father is. The timelines check out, I just had never realized it before.
I know that family courts often favor mothers, and I

In [87]:
response = get_answer_from_model(test_res, prompt)
Markdown(f"{response}")

The information provided in the sources is relevant to the question as they all discuss various aspects of custody battles, parental rights, and the legal implications of these situations. Similarities across the sources include concerns about the child's safety, the involvement of biological and non-biological parents, and the impact of drug use on custody decisions. 

Specific examples include:

- Source 2 discusses a father's concerns about his rights to his stepchild and biological child during a divorce, similar to the questioner's situation with his non-biological daughter [2](https://www.reddit.com/r/legaladvice/comments/4ypf3z/custody_before_a_divorce/).
- Source 3 provides an example of a father who has both legal and physical custody of his child due to the mother's drug use and mental health issues, which is relevant to the questioner's concerns about his wife's drug use [3](https://www.reddit.com/r/legaladvice/comments/gqf86t/father_with_legal_physical_custody_moving_out_of/).
- Source 4 discusses the challenges of adopting a stepchild when the biological father is still involved, which could be relevant if the questioner considers adoption as a way to secure his rights to his daughter [4](https://www.reddit.com/r/legaladvice/comments/2kjktb/biological_father_refuses_to_sign_away_rights/).
- Source 9 discusses a father's concerns about his daughter's safety with her mother and his desire for emergency custody, which is similar to the questioner's concerns about his wife's drug use and the safety of his daughter [9](https://www.reddit.com/r/legaladvice/comments/e9333n/who_do_i_talk_to_about_getting_emergency_custody/).
- Source 10 discusses a father's attempt to relocate with his child amidst a custody battle involving a mentally unstable mother, which could be relevant if the questioner needs to consider relocation in the future [10](https://www.reddit.com/r/legaladvice/comments/gsb52f/moving_a_child_out_of_state_oklahoma/).

In [11]:
from custom_tools import ResearchPastQuestions

In [12]:
research_tool = ResearchPastQuestions(df=df)

In [13]:
result = research_tool.run(test_query)

100%|██████████| 10/10 [01:31<00:00,  9.13s/it]
100%|██████████| 1/1 [00:09<00:00,  9.15s/it]


In [10]:
test_query_link

'https://www.reddit.com/r/legaladvice/comments/7wictb/single_mom_here_exhusband_father_of_my_2_kids_is/'

In [14]:
Markdown(f"{result}")

## New Query:
   I won a small claims case against a used car dealer for $2900 for a used 2002 car I bought.  I bought a extended warranty with car, on our way home after purchase vehicle lost oil pressure and hd to be towed home. I called dealer they said no problem that the extended warranty would cover it.  I had it towed to GM dealership to have it checked out and they said engine and tranny were both bad and would need replaced. they called warranty company to submit claim and found out the warranty I had bought was never available for that car since it had more than125,000 miles on it.  Now used car dealer said they are not liable and have to do nothing. I won the small claims case but they have since appealed and I need to know how to answer my appellee respondents brief but I can not find any similar cases or a format to do it in?   Please help          Thank You

Extra info: Plaintiff failed to bring a cause of action against defendant upon which relief could be granted. Trial courts erred by failing to afford defendant opportunity to cross examine plaintiff. Court erred judgment to plaintiff record clearly contradicts plaintiff's assertions. Court erred by waiting over 30 months from date of trial to render decision. 
## Model Response:
The new query involves a used car purchase, an extended warranty that was not valid, and a small claims court case that the buyer won but is now being appealed by the used car dealer. The buyer is seeking guidance on how to respond to the appeal.

The most relevant case is [[1](https://www.reddit.com/r/legaladvice/comments/51vff6/fl_used_car_scam/)], where the buyer also purchased a used car that turned out to be in poor mechanical condition. The dealer misled the buyer about the car's reliability, and the paperwork provided was inconsistent. However, this case differs from the new query in that it does not involve an extended warranty or a small claims court case.

Case [[5](https://www.reddit.com/r/legaladvice/comments/gn3br8/lemon_2020_wrangler_settlement_of_2000_with_no/)] involves a leased vehicle with multiple mechanical issues. The buyer was offered a settlement before pursuing arbitration. This case is similar to the new query in that it involves a faulty vehicle and a legal dispute, but it differs in that it involves a leased vehicle, not a used car purchase, and the dispute is about a settlement offer, not a small claims court case.

Case [[6](https://www.reddit.com/r/legaladvice/comments/57zb30/car_damage_after_being_towed_could_use_some_advice/)] involves a car that was towed and subsequently experienced mechanical issues. The insurance company denied coverage, citing pre-existing wear and tear. This case is similar to the new query in that it involves a car with mechanical issues and a dispute over who is responsible for the damage, but it differs in that it involves a towed car, not a used car purchase, and the dispute is with an insurance company, not a used car dealer.

In summary, while there are cases that involve elements similar to the new query, such as a used car purchase, mechanical issues, and legal disputes, none of the cases involve the exact combination of a used car purchase, an invalid extended warranty, and a small claims court case that is being appealed.

| Citation Number | Similarity Score | Legal Questions |
| --- | --- | --- |
| [1](https://www.reddit.com/r/legaladvice/comments/51vff6/fl_used_car_scam/) | 7 | Can a buyer seek legal recourse if they were misled by a used car dealer about the reliability of a car and the paperwork provided was inconsistent? |
| [5](https://www.reddit.com/r/legaladvice/comments/gn3br8/lemon_2020_wrangler_settlement_of_2000_with_no/) | 6 | Can a buyer seek legal recourse if they leased a vehicle with multiple mechanical issues and were offered a settlement before pursuing arbitration? |
| [6](https://www.reddit.com/r/legaladvice/comments/57zb30/car_damage_after_being_towed_could_use_some_advice/) | 6 | Can a car owner seek legal recourse if their car was towed and subsequently experienced mechanical issues, and the insurance company denied coverage, citing pre-existing wear and tear? |

___
**[6] ["Legal recourse for car damage after towing: Seeking advice on potential lawsuit against tow company"](https://www.reddit.com/r/legaladvice/comments/57zb30/car_damage_after_being_towed_could_use_some_advice/)** - SD, Oct 2016

**[1] ["Legal recourse for purchasing a misrepresented and unreliable vehicle in Florida"](https://www.reddit.com/r/legaladvice/comments/51vff6/fl_used_car_scam/)** - IN, Sep 2016

**[5] ["Seeking advice on settlement offer for new vehicle with multiple repairs - How much can I realistically expect?"](https://www.reddit.com/r/legaladvice/comments/gn3br8/lemon_2020_wrangler_settlement_of_2000_with_no/)** - FM, May 2020

___

In [10]:
Markdown(f"{result}")

## New Query:
Hi, I’m a bit new and was told by a friend that I could come here to seek some legal advice on the direction I should go.
My wife (separated) and I have been married for five years and have a four year old daughter who is the light of my life. She and I have always had a special bond and I have taken it upon myself to do most of the child-rearing as her mother has traveled for work and been mostly uninterested in her.
About three weeks ago, my wife and I got in a fight about her drug (opioid prescription that lead to heroin) use, and when I told her that I was taking my daughter she revealed to me that my daughter isn’t biologically mine and I wouldn’t have any legal rights to her. 
I believe her 95% and know who the father is. The timelines check out, I just had never realized it before.
I know that family courts often favor mothers, and I am worried that I might lose my precious little girl. I have a great job (100k a year plus benefits) and plenty of family and company support. I do not want my child to live with her mother while she is in the condition she is in. I have proof that her lifestyle is unsafe and unhealthy for my child.
I’ve spoken to a family lawyer who wouldn’t take my case because he said I didn’t have a chance.
I’m coming here to see what I can do. I have already looked at different lawyers, but would like to see what I can do to build my case.
Thanks.
## Model Response:
The new query presents a situation where a man, who has been the primary caregiver for his daughter, discovers that he may not be the biological father. He is concerned about the mother's drug use and wants to ensure the child's safety. He seeks advice on how to build a case against the mother, despite a family lawyer declining his case due to perceived slim chances of success.

This situation shares similarities with several of the search results. In [[1](https://www.reddit.com/r/legaladvice/comments/7sqqy1/custody_of_a_child_thats_not_biologically_mine/)], the individual is also a man who discovers he may not be the biological father of his daughter. He too is concerned about the mother's drug use and wants to protect his relationship with the child. This case could provide insights into how the courts may view a non-biological father's rights, especially when the mother's lifestyle is potentially harmful to the child.

In [[2](https://www.reddit.com/r/legaladvice/comments/4ypf3z/custody_before_a_divorce/)], the individual is facing a potential divorce and is concerned about their rights regarding their children, one of whom is a stepchild. This case could provide insights into how the courts may view the rights of a non-biological parent in a divorce situation.

In [[5](https://www.reddit.com/r/legaladvice/comments/gsb52f/moving_a_child_out_of_state_oklahoma/)], the individual is the custodial parent of a child and is concerned about the mother's mental instability. This case could provide insights into how the courts may view the rights of a custodial parent when the other parent has mental health issues.

In [[8](https://www.reddit.com/r/legaladvice/comments/e9333n/who_do_i_talk_to_about_getting_emergency_custody/)], the individual is concerned about the father's ability to provide a safe environment for the child due to his history of drug use. This case could provide insights into how the courts may view the rights of a parent when the other parent has a history of drug use.

In summary, the new query shares similarities with several past cases, particularly in terms of the concerns about the mother's drug use and the individual's desire to protect the child. These cases could provide insights into how the courts may view the rights of a non-biological parent and the factors they may consider when determining custody.

Top 10 Most Similar Cases:

| Citation Number | Legal Questions | Similarity Score |
| --- | --- | --- |
| 1 | Custody of a child that's not biologically mine | 9 |
| 2 | Custody before a divorce | 7 |
| 5 | Moving a child out of state (Oklahoma) | 7 |
| 8 | Who do I talk to about getting emergency custody? | 7 |
| 3 | MN split parenting, one parent is always sending child away | 6 |
| 4 | Considering asking daughter's bio father to allow adoption | 6 |
| 6 | Biological father refuses to sign away rights | 6 |
| 7 | Ontario custody and visitation of infant daughter | 6 |
| 9 | NJ 15 year old embroiled in a custody battle, what are her rights? | 6 |
| 10 | Father with legal & physical custody moving out of state | 6 |
___
**[5] ["Seeking advice on relocating with a child amidst a custody battle involving a mentally unstable parent and recent drug exposure"](https://www.reddit.com/r/legaladvice/comments/gsb52f/moving_a_child_out_of_state_oklahoma/)** - MT, May 2020

**[1] ["Seeking legal advice on custody rights and protecting my child from an unsafe environment"](https://www.reddit.com/r/legaladvice/comments/7sqqy1/custody_of_a_child_thats_not_biologically_mine/)** - KY, Jan 2018

**[8] ["Seeking advice on emergency custody and concerns about child's safety in Tennessee custody dispute"](https://www.reddit.com/r/legaladvice/comments/e9333n/who_do_i_talk_to_about_getting_emergency_custody/)** - MS, Dec 2019

**[2] ["Legal implications of asserting parental rights during divorce proceedings in Tennessee"](https://www.reddit.com/r/legaladvice/comments/4ypf3z/custody_before_a_divorce/)** - AR, Aug 2016

___