<a href="https://colab.research.google.com/github/leohpark/leohpark/blob/main/Gradio_Modular_Brief_Maker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Gradio Modular Brief Maker

This is a proof of concept colab notebook to explore modular prompt templating to generate legal case briefs. The brief-generating prompt can be configured to contain different brief sections, as well as include "focus topics" from inferred keywords.

Due to the complexity limitations of gpt-3.5 and the loss of performance over long contexts, this is more conceptual than useful at this point. However, I think it's an interesting technique for either rapidly prototyping different prompting techniques, or developing summarization techniques that incorporate some self-prioritization or human-assigned subejct matter prioritization.

## Summaries

Click "Upload" to upload a pdf or simply copy paste text into the "Summary Outline" textbox. The subsequent prompts are designed to use an outline, but will work with any text source pasted into the window. I'm not entirely sure what the end product will look like though.

"Doc Tokens" will tell you how many tokens are in the Doc you uploaded, so that can understand how much compression occurs from the source material and the Outline.

Subsequent model calls, "Generate Legal Topics" and "Get Final Brief!", both dynamically pull text from the "Summary Outline" textbox, so changes can be made to this box if you want to test different text combinations. Be aware that if you exceed ~12500 tokens, those API calls may fail due to exceeding gpt-3.5-turbo-16k's context length.

Configure the "Doc Chunk Size" and "Output Brief Size" (recommended values of 2000, 2000 work well). Then click "Get Summaries" to generate a Summary Outline of approximately 9000-12500 tokens in length using gpt-3.5-turbo.

"Outline Tokens" will dynamically keep track of however much text you have in the Summary Outline.

## Make your Brief!
"Legal Brief Maker" is where the Final Brief text will eventually be placed.

"Prompt Preview" will dynamically display the modular components that will be fed into the prompt template for the Final Brief.

"Prompt Builder" allows you to add and subtract brief sections from your final brief. It's a bunch of checkboxes.

"Generated Topics" is a multi-select dropdown. You need to generate legal topics before this menu is populated. If you spam-click this too aggressively, gradio can encounter bugs.

Click "Generate Legal Topics" to make an API call that will generate inferred keywords from the text in "Summary Outline." In an application setting this would normally be handled automatically, but since I don't have output validation from the gpt call it sometimes breaks.

"Generate Final Brief!" will take the Prompt Builder and Generated Topics settings and place them into a prompt template used to create the final brief, output to "Legal Brief Maker." You can then mess around with settings and see how it affects your brief.

##Debugging

Mostly for self-checking things while I'm building.

In [None]:
!pip install -q gradio langchain unstructured pdf2image openai tiktoken

In [None]:
import gradio as gr
from langchain.llms import OpenAI
from langchain.document_loaders import UnstructuredPDFLoader, OnlinePDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chat_models import ChatOpenAI
from langchain import PromptTemplate
from langchain.docstore import document
import tiktoken
import openai
import json
tokenizer = tiktoken.get_encoding('cl100k_base')
def tiktoken_len(text):
  tokens = tokenizer.encode(
      text,
      disallowed_special=()
  )
  return len(tokens)

In [None]:
# Doing this double key binding thing because some api calls are via langchain, and others are directly.
OPENAI_KEY = "..."
openai.api_key = OPENAI_KEY

In [None]:
#@title Customizable Brief Maker 1.0

summary_template_default = """Context:{text}

As an experienced legal analyst, review the Context which is part of a long document to answer create a detailed and comprehensive
bulleted outline of the Context with as many facts, descriptions, explanations, reasonings, previous case citations and their relationship to this case, as you can.
"""

formatted_brief_default = """Case: [name of the case, court, year]
Questions Presented: The issues the court case must resolve.
Facts of the Case: the parties, facts and events leading to this court case.
Procedural History: [district court case summary, appeals court case summary, how this issue reached this court]
Rules: detailed explanation of relevant statutes, interpretations, standards, and tests applicable to this case
Case Law: names of cases and analysis of how they relate to the Questions Presented
Application: detailed analysis of how the Rules and Case Law help reach the conclusions
Conclusion: the court's answer to the Questions Presented."""

inferred_topics_template_default = """Context:{text}
You are an experienced attorney and legal scholar reviewing legal documents. Return a JSON of the ten most repeated legal key phrases from the Context,
Topics=["topic 1", "topic 2"...]. Return only the JSON."""

final_template_default = """Context:{text}
As an experienced legal analyst, use the Context to compose a case memorandum working step-by-step using only information from the Context.
Replace the bullet points with paragraphs in the following memo structure:

{formatted_brief}
"""

topics_choices = []

#function that selects PDF, extracts text to list, extracts page_content, counts tokens, sets token ratio.

def upload_file(files, token_ratio):
  page_content = []
  loader = UnstructuredPDFLoader(files.name)
  docs_raw = loader.load()
  doc_content = docs_raw[0].page_content[:]
  docs_tokens = int(tiktoken_len(doc_content))

  # This line here lets you tune the max token ratio!!!
  # if you are using GPT4, then the maximum tokens for summaries is probably around 5500-6000.
  new_token_ratio = round(docs_tokens / 12500, 1)
  if new_token_ratio > token_ratio:
    token_ratio = new_token_ratio
  else:
    token_ratio = 2.5
  return doc_content, docs_tokens, token_ratio

def get_tokens(summary_outline):
  summary_docs_tokens = int(tiktoken_len(summary_outline))
  return summary_docs_tokens

def summarize_it(doc, chunk_s, summary_template, token_ratio):
  text_splitter = RecursiveCharacterTextSplitter(
      chunk_size = chunk_s, #chunk_s, # number of units per chunk
      chunk_overlap = 0, # number of units of overlap
      length_function = tiktoken_len, #trying to use tokens as chunking unit instead of characters.
      separators=['\n\n', '\n', ' ', ''] # our chosen operators for separating
      )
  texts = text_splitter.split_text(doc)

  s_prompt = PromptTemplate(
      input_variables=["text"],
      template=summary_template
  )

  # Use this for OpenAI latest: gpt-3.5-turbo, bigger context: gpt-3.5-turbo-16k, functions: gpt-3.5-turbo-0613
  llms = OpenAI(temperature=0, openai_api_key=OPENAI_KEY, model_name="gpt-3.5-turbo", max_tokens=int(chunk_s // token_ratio))
  summarized_texts = ""
  for text in texts:
    summary_prompt = s_prompt.format(text=text)
    summary = llms(summary_prompt)
    summarized_texts += summary + "\n"
  summarized_tokens = int(tiktoken_len(summarized_texts))

  return summarized_texts, summarized_tokens

# function call to rerun the Final Summary

def final_brief(summarized_texts, formatted_brief, out_s):
  formatted_brief = formatted_brief

  f_prompt = PromptTemplate(
      input_variables=["text", "formatted_brief"],
      template=final_template_default
  )
  llmf = OpenAI(temperature=0, openai_api_key=OPENAI_KEY, model_name="gpt-3.5-turbo-16k", max_tokens=out_s)
  final_prompt = f_prompt.format(text=summarized_texts, formatted_brief=formatted_brief)
  final_summary = llmf(final_prompt)
  final_doc_tokens = int(tiktoken_len(final_summary))
  return final_summary, final_template_default, final_doc_tokens

#this function takes the summary_outline and does a direct API call for a list of topics.
#I was going to use a function call, but they can't return an array as far as I can see.

def get_topics(summary_outline):
  topics_debug = []
  topics_messages = [{"role": "system", "content":
                      """You are an experienced attorney and legal scholar reviewing legal documents.
                      Return a JSON array \"legal_topics\" of the twelve most repeated legal key phrases from the
                      Context. Answer: "legal_topics":["topic 1", "topic 2", "topic 3", "topic 4"..."topic 12"] """},
                      {"role": "user", "content": summary_outline},]

  #gpt-3.5-turbo-0613, gpt-3.5-turbo-16k-0613, gpt-4-0613, gpt-4-32k-0613
  topics_list = openai.ChatCompletion.create(
      model="gpt-3.5-turbo-16k-0613",
      temperature=0,
      messages=topics_messages
      )
  response_message=topics_list["choices"][0]["message"]
  response_json = json.loads(response_message['content'])
  legal_topics = response_json['legal_topics']
  topics_choices = [topic.strip() for topic in legal_topics]
  topics_choices = [str(topic).strip() for topic in legal_topics]

  return gr.Dropdown.update(choices=topics_choices), topics_choices

#This function assembles the final prompt to be performed by Brief Builder.
def prompt_preview(brief_selection, topics_list):
  case_name = "Case: [name of the case, court, year]\n"
  questions_presented = "Questions Presented: The issues the court case must resolve.\n"
  brief_bits = """"""
  if 'Facts' in brief_selection:
    brief_bits += "Facts of the Case: the parties, facts and events leading to this court case.\n"
  if 'Procedural' in brief_selection:
    brief_bits += "Procedural History: [district court case summary, appeals court case summary, how this issue reached this court]\n"
  if 'Rules' in brief_selection:
    brief_bits += "Rules: detailed explanation of relevant statutes, interpretations, standards, and tests applicable to this case\n"
  if 'Case Law' in brief_selection:
    brief_bits += "Case Law: names of cases and analysis of how they relate to the Questions Presented\n"
  if 'Application' in brief_selection:
    brief_bits += "Application: detailed analysis of how the Rules and Case Law help reach the conclusions\n"
  if 'Conclusion' in brief_selection:
    brief_bits += "Conclusion: the court's answer to the Questions Presented.\n"
  if 'Dissent' in brief_selection:
    brief_bits += "Dissent: (exclude if a dissent was not included) How it disagrees with the holding of this case.\n"

  foci_bits = ""
  if topics_list:
   # foci_bits = "Review the Context carefully to provide as much detail as you can in the memo about " + ", ".join(["'{}'".format(topic) for topic in topics_list])
   json_string = json.dumps({"Memo Focus": topics_list})
   foci_bits = appended_json_string = json_string + " - Memo Focus: Review the Context step by step for these topics and discuss them thoroughly in the memo."

  else:
    foci_bits = ""

  formatted_brief = case_name + questions_presented + brief_bits + "\n" + foci_bits

  return formatted_brief, formatted_brief, brief_selection, topics_list

#Gradio (Gradio)! All we hear is Gradio Gaga! Gradio Goo Goo!
#themes to try: "default", "huggingface", "grass", "peach", "darkdefault", "darkhuggingface", "darkgrass", "darkpeach"
# gradio themes from gradio gr.themes.Base() gr.themes.Default(), gr.themes.Glass(), gr.themes.Monochrome(), gr.themes.Soft()

with gr.Blocks(theme=gr.themes.Soft()) as briefinator:
  with gr.Row():
    with gr.Column(scale=2.5):
      with gr.Tab("Summaries"):
        summary_doc = gr.Textbox(lines=12, max_lines=15, label="Summary Outline", show_copy_button=True)
        summary_save = gr.State(summary_template_default)

        with gr.Row():
          doc_tokens = gr.Textbox(label="Doc Tokens", scale=1)
          chunk_s = gr.Slider(1000, 2500, step=100, label="Doc Chunk Size", scale=4, value=2000)
          out_s = gr.Slider(1000, 4000, step=250, label="Output Brief Size", scale=4, value=2000)
          outline_tokens = gr.Textbox(label="Outline Tokens", scale=1)
          summary_doc.change(fn=get_tokens, inputs=summary_doc, outputs=outline_tokens)

# Document upload thingy

        with gr.Row():
          uploaded_doc = gr.State([])
          token_ratio = gr.State(2.5)
          upl_btn = gr.UploadButton("Upload PDF", file_types=[".pdf"], file_count="single", size="sm")
          upl_btn.upload(fn=upload_file, inputs=[upl_btn, token_ratio], outputs=[uploaded_doc, doc_tokens, token_ratio])
          summarize_button = gr.Button("Get Summaries", variant="primary", size="sm")

# Legal Brief Maker tab
      with gr.Tab("Make your Brief!"):
        brief_box = gr.Textbox(lines=12,interactive=True, label="Legal Brief Maker", show_copy_button=True)
        with gr.Row():
          with gr.Column():
            brief_settings_state = gr.State(["Facts", "Procedural","Rules", "Case Law", "Application", "Conclusion"])
            pick_topics_state = gr.State([])
            formatted_brief_state = gr.State(formatted_brief_default)
            #brief_preview is not visible here for UI clarity, but the information is still visible from the debugger.
            brief_preview = gr.Textbox(lines = 5, label ="Prompt Preview", info="Brief Bits Made to Order")
            brief_settings = gr.CheckboxGroup(choices=["Facts", "Procedural","Rules", "Case Law", "Application", "Conclusion", "Dissent"],
                                              value=["Facts", "Procedural","Rules", "Case Law", "Application", "Conclusion"],
                                              label="Prompt Builder", info="Select which Sections you want in the Final Brief.", interactive=True)

            pick_topics = gr.Dropdown(choices=topics_choices, multiselect=True, max_choices=6, label="Generated Topics", interactive=True,
                                      info = "Inferred Legal Topics, curated by GPT! Multiselect up to six topics to focus the brief")
            brief_settings.change(fn=prompt_preview, inputs=[brief_settings, pick_topics_state], outputs=[ brief_preview, formatted_brief_state, brief_settings_state, pick_topics_state])
            pick_topics.change(fn=prompt_preview, inputs=[brief_settings_state, pick_topics], outputs=[ brief_preview, formatted_brief_state, brief_settings_state, pick_topics_state])

        #translates outline into legal topics for Brief Foci
        with gr.Row():
          topics_button = gr.Button("Generate Legal Topics", variant="primary", size="sm")
          final_button = gr.Button("Get Final Brief!", variant="stop", size="sm")

      #some outputs to keep a check on things
      with gr.Tab("Debugging"):
        with gr.Row():
          summaries_tokens = gr.Textbox(scale=1, label="Final Brief Tokens")
        final_prompt = gr.Textbox(lines=15, label="Final Prompt for Viewing", show_copy_button=True)
        topics_return = gr.Textbox(lines=15, label="What was returned from 0613 for topics JSON", show_copy_button=True)


# Button values

        topics_button.click(fn=get_topics, inputs=[summary_doc], outputs=[pick_topics, topics_return])
        summarize_button.click(fn=summarize_it, inputs=[uploaded_doc, chunk_s, summary_save, token_ratio], outputs=[summary_doc, outline_tokens])
        final_button.click(fn=final_brief, inputs=[summary_doc, formatted_brief_state, out_s], outputs =[brief_box, final_prompt, summaries_tokens])


if __name__ == "__main__":
    briefinator.queue().launch(share=True, debug=True)