# Analysis of compliance

In [1]:
%pip install langchain
%pip install openai

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [18]:
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI


In [19]:
# create a chat model using openAI's gpt-4 model
chat = ChatOpenAI(model_name='gpt-4', temperature=0.2)

llm = OpenAI(model_name='gpt-4', temperature=0.2)

# One shot

In [1]:
pre_prompt_analysis_only = """ACT as a corporate attorney whose job is to check the compliance of the company's policy with the state's new laws.

Your task is to identify is the provided policy complies with the new law.

If the policy complies with the new law, then say so and do nothing more.

Otherwise, if the policy does not comply with the law, make a general assessment how bad is the violation, and make a numbered list of what is missing from the policy or what is wrong with the policy.

For example, if the policy is "The company's address is 123 Main Street, New York, NY 10001. The company's phone number is 212-555-1234." and the new law is "You have to specify the company's address, phone number, and email address.", then the missing part is "company's email"."
 """


In [3]:
# read file "recroom notice" into a string
with open('recroom notice', 'r') as f:
    policy = f.read()
print(policy)

FileNotFoundError: [Errno 2] No such file or directory: 'data/recroom notice'

In [3]:
law = '''
  General Duties of Businesses that Collect Personal Information
(a) A business that controls the collection of a consumer’s personal information shall, at or before the point of collection, inform consumers of the following:
(1) The categories of personal information to be collected and the purposes for which the categories of personal information are collected or used and whether that information is sold or shared. A business shall not collect additional categories of personal information or use personal information collected for additional purposes that are incompatible with the disclosed purpose for which the personal information was collected without providing the consumer with notice consistent with this section.

(2) If the business collects sensitive personal information, the categories of sensitive personal information to be collected and the purposes for which the categories of sensitive personal information are collected or used, and whether that information is sold or shared. A business shall not collect additional categories of sensitive personal information or use sensitive personal information collected for additional purposes that are incompatible with the disclosed purpose for which the sensitive personal information was collected without providing the consumer with notice consistent with this section.

(3) The length of time the business intends to retain each category of personal information, including sensitive personal information, or if that is not possible, the criteria used to determine that period provided that a business shall not retain a consumer’s personal information or sensitive personal information for each disclosed purpose for which the personal information was collected for longer than is reasonably necessary for that disclosed purpose.

'''

In [4]:
# read file definitions
with open('definitions', 'r') as f:
    definitions = f.read()
print(definitions)

FileNotFoundError: [Errno 2] No such file or directory: 'data/definitions'

In [28]:
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

prompt = PromptTemplate(
    input_variables=["question", "context"],
    template= "{question}\n\n{context}\n\n",
)

chain = LLMChain(llm=llm, prompt=prompt)

In [29]:
question1 = pre_prompt_analysis_only + f"""

*** Policy ***
{policy}
"""

context = f"""
*** Law ***
{law}

{definitions}
"""

answer1 = chain.run({
    'question': question1,
    'context': context
    })

print(answer1)

The company's policy does not fully comply with the new law. The violations are not severe, but there are important aspects that need to be addressed to ensure full compliance. 

1. Missing Information on Retention Period: The policy does not specify the length of time the company intends to retain each category of personal information, including sensitive personal information. This is required under the new law.

2. Missing Information on Sensitive Personal Information: The policy does not specifically mention if the company collects sensitive personal information, what categories of sensitive personal information are collected, and the purposes for which these categories of sensitive personal information are collected or used. This is a requirement under the new law.

3. Lack of Clarity on Third-Party Sharing: The policy mentions that third parties may collect personal information on the company's behalf, but it does not clearly state whether this information is sold or shared, which

## Re-run the updated policy

In [11]:
# read policy from recoom notice upd
with open('recroom notice upd', 'r') as f:
    policy_upd = f.read()



The company's policy does not fully comply with the new law. Here are the issues identified:

1. The policy does not clearly define what constitutes "sensitive personal information". According to the new law, sensitive personal information includes details like social security number, driver's license number, passport number, account log-in, financial account, debit card, or credit card number in combination with any required security or access code, password, or credentials allowing access to an account, precise geolocation, racial or ethnic origin, religious or philosophical beliefs, or union membership, contents of a consumer’s mail, email, and text messages unless the business is the intended recipient of the communication, a consumer’s genetic data, the processing of biometric information for the purpose of uniquely identifying a consumer, personal information collected and analyzed concerning a consumer’s health, and personal information collected and analyzed concerning a consum

In [42]:
question2 = pre_prompt_analysis_only + f"""

*** Policy ***
{policy_upd}
"""


answer2 = chain.run({
    'question': question2,
    'context': context
    })

print(answer2)

The policy of Rec Room Inc. is in compliance with the new law. The company's policy clearly outlines the categories of personal information it collects, the purposes for which this information is used, and the period for which the information is retained. The policy also states that the company does not sell or share personal information without explicit consent, which is in line with the law's requirements. Furthermore, the policy provides contact information for any questions or requests related to personal information, which is also required by the law.


In [40]:
# read the complete policy from RecRoom
with open('RecRoom_complete.md', 'r') as f:
    policy_rr_complete = f.read()
print(policy_rr_complete)

# Privacy Policy

Effective: July 25, 2023

Rec Room Inc., a Delaware corporation ("Company", "we", "our", and their derivatives) provides the websites, [http://www.recroom.com](http://www.recroom.com/), [http://www.rec.net](http://www.rec.net/), and the subdomains of each of the foregoing (collectively, the "Website") and the Rec Room® video game (the "Game" and, with the Website, the "Services").

This Policy sets forth how we collect, use, protect, store, and otherwise process your Personal Information (defined below). This Policy does NOT apply to information we collect offline or you provide to or is collected by any third party (except as otherwise provided below).

For our practices regarding children, please see the Children's section in Section 2 below.

# 1. What types of Personal Information does the Company collect?

_Generally_

We may collect different types of information from you depending on how you use our Services, including Personal Information. "Personal Informatio

In [43]:

question3 = pre_prompt_analysis_only + f"""

*** Policy ***
{policy_rr_complete}
"""


answer3 = chain.run({
    'question': question3,
    'context': context
    })

print(answer3)



KeyboardInterrupt: 

# Evaluation

In [23]:
%pip install ragas

Collecting ragas
  Using cached ragas-0.0.16-py3-none-any.whl (38 kB)
Collecting sentence-transformers (from ragas)
  Using cached sentence-transformers-2.2.2.tar.gz (85 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting tiktoken (from ragas)
  Downloading tiktoken-0.5.1-cp311-cp311-macosx_11_0_arm64.whl (924 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m924.4/924.4 kB[0m [31m18.9 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting pydantic<2.0 (from ragas)
  Downloading pydantic-1.10.13-cp311-cp311-macosx_11_0_arm64.whl (2.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m38.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting pysbd>=0.3.4 (from ragas)
  Using cached pysbd-0.3.4-py3-none-any.whl (71 kB)
Collecting torchvision (from sentence-transformers->ragas)
  Downloading torchvision-0.16.0-cp311-cp311-macosx_11_0_arm64.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [12]:
from ragas import evaluate
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)
from ragas.metrics.critique import harmfulness

In [36]:
# Make a dataset for evaluation
from datasets import Dataset

dataset = Dataset.from_dict(
    {
        "question": [question1, question2],
        "contexts":
            # sequence of strings with 10 world capitals, list[list[str]]
            [
                [context], [context]
            ],
        "answer": [answer1, answer2]
    }
)

# Concatenate
# concatenated_dataset = concatenate_datasets([dataset, new_dataset])

# save dataset to file
dataset.save_to_disk("QA_dataset")

Saving the dataset (0/1 shards):   0%|          | 0/2 [00:00<?, ? examples/s]

In [38]:
# load dataset from file
dataset2 = Dataset.load_from_disk("QA_dataset")
dataset2.to_pandas()

Unnamed: 0,question,contexts,answer
0,ACT as a corporate attorney whose job is to ch...,[\n*** Law ***\n\n General Duties of Business...,The company's policy does not fully comply wit...
1,ACT as a corporate attorney whose job is to ch...,[\n*** Law ***\n\n General Duties of Business...,The company's policy appears to be in complian...


In [34]:
# Evaluate

result = evaluate(
    dataset,
    metrics=[
        context_precision,
        faithfulness,
        answer_relevancy,
        # context_recall,
        # harmfulness,
    ],
)

evaluating with [context_relevancy]


100%|██████████| 1/1 [00:01<00:00,  1.91s/it]


evaluating with [faithfulness]


100%|██████████| 1/1 [00:25<00:00, 25.80s/it]


evaluating with [answer_relevancy]


100%|██████████| 1/1 [00:03<00:00,  3.44s/it]
  reciprocal_sum = np.sum(1.0 / np.array(values))  # type: ignore


In [35]:
# Dive deep into results

df = result.to_pandas()
df.head()

Unnamed: 0,question,contexts,answer,context_relevancy,faithfulness,answer_relevancy
0,ACT as a corporate attorney whose job is to ch...,[\n*** Law ***\n\n General Duties of Business...,The company's policy does not fully comply wit...,0.0,0.333333,0.830735
1,ACT as a corporate attorney whose job is to ch...,[\n*** Law ***\n\n General Duties of Business...,The company's policy appears to be in complian...,0.0,0.166667,0.865428
