# **GPT-based Summarization of Legal Bills**

In this Notebook, we explore the use of GPT-3 model acquired from Open AI summarization API (https://platform.openai.com/examples/default-tldr-summary/) which summarizes the text by adding a 'tl;dr:' prompt to the end of a text.

For the purpose of experimentation a publicly available dataset is utilized

BillSum dataset (https://github.com/FiscalNote/BillSum) which consists of 18,949 training document-summary pairs, 3,269 US test document-summary pairs. In addition to this, the dataset also contains 1,237 CA document-summary pairs.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!pip install openai  #installing openai package

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


**Getting the OPENAI_API_KEY**

In order to make use of Open AI API, we need to create a secret key (https://platform.openai.com/account/api-keys). The good practice is to store the key into some json file as done below.

In [None]:
import json

OPENAI_API_KEY = ''
with open('/content/drive/My Drive/Colab Notebooks/GPTSumm/OpenAIKey.json', 'r') as file_to_read:
    json_data = json.load(file_to_read)
    OPENAI_API_KEY = json_data["OPENAI_API_KEY"]

In [None]:
import os
import openai

openai.api_key =  OPENAI_API_KEY

In [None]:
USTestText =[] #Reading the documents and their corresponding summaries from google drive.
f= open("/content/drive/My Drive/Colab Notebooks/us_bill_test/cleantext.txt", "r")
f2= f.readlines()
for j in range(len(f2)):
  USTestText.append(f2[j])

USTestActualSummary =[]
f= open("/content/drive/My Drive/Colab Notebooks/us_bill_test/cleansummary.txt", "r")
f2= f.readlines()
for j in range(len(f2)):
  USTestActualSummary.append(f2[j])

The below cell shows the summarization of US test bills (two documents). Since GPT-3 model is not freely available, therefore we do not perform fine-tuning. We simply do the inference using the free credits. For more details regarding the free credits, please visit Open AI website. (https://platform.openai.com/account/usage)

In [None]:
PredSumm=[]
for i in range(2):
  prompt= USTestText[i] + "\n Tl;dr:" #append the prompt at the end of the text
  response = openai.Completion.create(
  model="text-davinci-003",  #the model used for summarization
  prompt=prompt,
  temperature=0.7, 
  max_tokens=150,
  top_p=0.9,
  frequency_penalty=0.0,
  presence_penalty=1
  )
  PredSumm.append(response["choices"][0]["text"])


In [None]:
print(PredSumm[1])

 This Act allows small businesses to receive a credit for retaining certain newly hired individuals before 2013. The credit is equal to the lesser of $4,000 or 6.2 percent of wages paid by the taxpayer to the retained worker during the 52 consecutive week period. There are limitations on the number of retained workers and the amount of the credit allowed. The credit may be carried forward for up to three taxable years.


**This is the output summary of UStest document#1**

This Act provides tax credits for businesses that make contributions to support science, technology, engineering, and mathematics (STEM) education at the elementary and secondary school level. The credits are given for contributions of STEM inventory property, STEM service contributions, STEM teacher externship expenses, and STEM teacher training expenses. The credits are applicable to taxable years beginning after the date of enactment of this Act.

**This is the output summary of UStest document#2**

This Act allows small businesses to receive a credit for retaining certain newly hired individuals before 2013. The credit is equal to the lesser of $4,000 or 6.2 percent of wages paid by the taxpayer to the retained worker during the 52 consecutive week period. There are limitations on the number of retained workers and the amount of the credit allowed. The credit may be carried forward for up to three taxable years.





In [None]:
!pip install -U rouge

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
from rouge import Rouge
rouge =Rouge()
print(rouge.get_scores(PredSumm, USTestActualSummary[:2], avg=True))

{'rouge-1': {'r': 0.3191219512195122, 'p': 0.44699848024316113, 'f': 0.35308889542516103}, 'rouge-2': {'r': 0.12494885433715221, 'p': 0.2062206572769953, 'f': 0.14068937359166683}, 'rouge-l': {'r': 0.26253658536585367, 'p': 0.37044072948328266, 'f': 0.2913736769419868}}


**Predicted Summary for US test document#1**

{'rouge-1': {'r': 0.3902439024390244, 'p': 0.3404255319148936, 'f': 0.3636363586596075}, 'rouge-2': {'r': 0.14893617021276595, 'p': 0.11666666666666667, 'f': 0.13084111656913286}, 'rouge-l': {'r': 0.3170731707317073, 'p': 0.2765957446808511, 'f': 0.2954545404777893}}]

**Predicted Summary for US test document#2**

{'rouge-1': {'r': 0.248, 'p': 0.5535714285714286, 'f': 0.3425414321907146}, 'rouge-2': {'r': 0.10096153846153846, 'p': 0.29577464788732394, 'f': 0.15053763061420083}, 'rouge-l': {'r': 0.208, 'p': 0.4642857142857143, 'f': 0.28729281340618423}}]

**Average ROUGE scores for the above generated summaries**

{'rouge-1': {'r': 0.3191219512195122, 'p': 0.44699848024316113, 'f': 0.35308889542516103}, 'rouge-2': {'r': 0.12494885433715221, 'p': 0.2062206572769953, 'f': 0.14068937359166683}, 'rouge-l': {'r': 0.26253658536585367, 'p': 0.37044072948328266, 'f': 0.2913736769419868}}