<a href="https://colab.research.google.com/github/russell-ai/nlp/blob/main/Question_Answering/CUAD_Streamlit.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Contract Understanding Atticus Dataset (CUAD) Demo**

## **Setup files**

In [20]:
!git clone https://huggingface.co/spaces/akdeniz27/contract-understanding-atticus-dataset-demo

Cloning into 'contract-understanding-atticus-dataset-demo'...
remote: Enumerating objects: 64, done.[K
remote: Counting objects: 100% (64/64), done.[K
remote: Compressing objects: 100% (61/61), done.[K
remote: Total 64 (delta 34), reused 0 (delta 0)[K
Unpacking objects: 100% (64/64), done.


In [21]:
%%capture
! pip install streamlit -q
! pip install npx -q
! pip install pyngrok -q

In [None]:
%%capture
!pip install transformers sentencepiece

In [22]:
import torch, os, transformers, sentencepiece

## **Streamlit app.py file**

In [23]:
%%writefile app.py
from transformers import AutoModelForQuestionAnswering, AutoTokenizer
import streamlit as st
import json
from predict import run_prediction

st.set_page_config(layout="wide")

model_list = ['akdeniz27/roberta-base-cuad',
			  'akdeniz27/roberta-large-cuad',
			  'akdeniz27/deberta-v2-xlarge-cuad']
st.sidebar.header("Select CUAD Model")
model_checkpoint = st.sidebar.radio("", model_list)

if model_checkpoint == "akdeniz27/deberta-v2-xlarge-cuad": import sentencepiece

st.sidebar.write("Project: https://www.atticusprojectai.org/cuad")
st.sidebar.write("Git Hub: https://github.com/TheAtticusProject/cuad")
st.sidebar.write("CUAD Dataset: https://huggingface.co/datasets/cuad")
st.sidebar.write("License: CC BY 4.0 https://creativecommons.org/licenses/by/4.0/")

@st.cache(allow_output_mutation=True)
def load_model():
    model = AutoModelForQuestionAnswering.from_pretrained(model_checkpoint)
    tokenizer = AutoTokenizer.from_pretrained(model_checkpoint , use_fast=False)
    return model, tokenizer

@st.cache(allow_output_mutation=True)
def load_questions():
	with open('test.json') as json_file:
		data = json.load(json_file)

	questions = []
	for i, q in enumerate(data['data'][0]['paragraphs'][0]['qas']):
		question = data['data'][0]['paragraphs'][0]['qas'][i]['question']
		questions.append(question)
	return questions

@st.cache(allow_output_mutation=True)
def load_contracts():
	with open('test.json') as json_file:
		data = json.load(json_file)

	contracts = []
	for i, q in enumerate(data['data']):
		contract = ' '.join(data['data'][i]['paragraphs'][0]['context'].split())
		contracts.append(contract)
	return contracts

model, tokenizer = load_model()
questions = load_questions()
contracts = load_contracts()

contract = contracts[0]

st.header("Contract Understanding Atticus Dataset (CUAD) Demo")
st.write("Based on https://github.com/marshmellow77/cuad-demo")


selected_question = st.selectbox('Choose one of the 41 queries from the CUAD dataset:', questions)
question_set = [questions[0], selected_question]

contract_type = st.radio("Select Contract", ("Sample Contract", "New Contract"))
if contract_type == "Sample Contract":
	sample_contract_num = st.slider("Select Sample Contract #")
	contract = contracts[sample_contract_num]
	with st.expander(f"Sample Contract #{sample_contract_num}"):
		st.write(contract)
else:
	contract = st.text_area("Input New Contract", "", height=256)

Run_Button = st.button("Run", key=None)
if Run_Button == True and not len(contract)==0 and not len(question_set)==0:
	predictions = run_prediction(question_set, contract, 'akdeniz27/roberta-base-cuad')
	
	for i, p in enumerate(predictions):
		if i != 0: st.write(f"Question: {question_set[int(p)]}\n\nAnswer: {predictions[p]}\n\n")
	


Overwriting app.py


## **Running pyngrok**

In [25]:
!streamlit run app.py & npx localtunnel --port 8501

2022-03-10 00:23:06.768 INFO    numexpr.utils: NumExpr defaulting to 4 threads.
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Network URL: [0m[1mhttp://172.28.0.2:8502[0m
[34m  External URL: [0m[1mhttp://35.202.145.51:8502[0m
[0m
[K[?25hnpx: installed 22 in 2.432s
your url is: https://pink-duck-26.loca.lt
[34m  Stopping...[0m
^C


In [24]:
!streamlit run app.py&>/dev/null&

import time
time.sleep(3)

from pyngrok import ngrok

public_url = ngrok.connect(addr='8501')
print (public_url)

NgrokTunnel: "http://a1dd-35-202-145-51.ngrok.io" -> "http://localhost:8501"


*Russell C.*