In [1]:
%pip install langchain
%pip install openai

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
from langchain.schema import (
    AIMessage,
    HumanMessage,
    SystemMessage
)
from langchain.chat_models import ChatOpenAI

In [3]:
# create a chat model using openAI's gpt-4 model
chat = ChatOpenAI(model_name='gpt-4', temperature=0.2)

# One shot

In [4]:
pre_prompt = """ACT as a corporate attorney whose job is to check the compliance of the company's policy with the state's new laws.

Your task is to identify is the provided policy complies with the new law.

If the policy complies with the new law, then say so and do nothing more.

Otherwise, if the policy does not comply with the law, make a general assessment how bad is the violation, and make a bullet list of what is missing from the policy or what is wrong with the policy.

Then update the policy to comply with the new law. Add placeholders for the missing information, such as [thing]

For example, if the policy is "The company's address is 123 Main Street, New York, NY 10001. The company's phone number is 212-555-1234." and the new law is "You have to specify the company's address, phone number, and email address.", then the missing part is "company's email", and the updated policy is "The company's address is 123 Main Street, New York, NY 10001. The company's phone number is 212-555-1234. The company's email address is [email address]."
 """


In [11]:
# read file "recroom notice" into a string
with open('recroom notice', 'r') as f:
    policy = f.read()
print(policy)

Rec Room Inc., a Delaware corporation (“Company”, “we”, “us”, “our”), may collect and use the below categories of your Personal Information. Third parties such as Greenhouse also may collect such Personal Information on our behalf. “Personal Information” means information that identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked, directly or indirectly, with you.
* Full name
* Home address
* Telephone number
* Email address
* Resume information
* Interview feedback
* Compensation expectations
* Authorization to work in the United States and/or visa requirements, if applicable
The Company collects the above categories of Personal Information to use or disclose as appropriate to:
* Recruit and evaluate job applicants for employment
* Comply with all applicable laws and regulations
* Perform data analytics
* Exercise or defend the legal rights of the Company
If you have any questions about this Notice or need to access this Notic

In [12]:
law = '''
  General Duties of Businesses that Collect Personal Information
(a) A business that controls the collection of a consumer’s personal information shall, at or before the point of collection, inform consumers of the following:
(1) The categories of personal information to be collected and the purposes for which the categories of personal information are collected or used and whether that information is sold or shared. A business shall not collect additional categories of personal information or use personal information collected for additional purposes that are incompatible with the disclosed purpose for which the personal information was collected without providing the consumer with notice consistent with this section.

(2) If the business collects sensitive personal information, the categories of sensitive personal information to be collected and the purposes for which the categories of sensitive personal information are collected or used, and whether that information is sold or shared. A business shall not collect additional categories of sensitive personal information or use sensitive personal information collected for additional purposes that are incompatible with the disclosed purpose for which the sensitive personal information was collected without providing the consumer with notice consistent with this section.

(3) The length of time the business intends to retain each category of personal information, including sensitive personal information, or if that is not possible, the criteria used to determine that period provided that a business shall not retain a consumer’s personal information or sensitive personal information for each disclosed purpose for which the personal information was collected for longer than is reasonably necessary for that disclosed purpose.

'''

In [13]:
# read file definitions
with open('definitions', 'r') as f:
    definitions = f.read()
definitions

'  Definitions\nFor purposes of this title:\n(a) “Advertising and marketing” means a communication by a business or a person acting on the business’ behalf in any medium intended to induce a consumer to obtain goods, services, or employment.\n(b) “Aggregate consumer information” means information that relates to a group or category of consumers, from which individual consumer identities have been removed, that is not linked or reasonably linkable to any consumer or household, including via a device. “Aggregate consumer information” does not mean one or more individual consumer records that have been deidentified.\n(c) “Biometric information” means an individual’s physiological, biological, or behavioral characteristics, including information pertaining to an individual’s deoxyribonucleic acid (DNA), that is used or is intended to be used singly or in combination with each other or with other identifying data, to establish individual identity. Biometric information includes, but is not 

In [8]:
messages = [
    SystemMessage(content=pre_prompt),
    HumanMessage(content=f"""

*** Policy ***
{policy}

*** Law ***
{law}
"""
                 )
]
r = chat(messages)
print(r.content)

The provided policy does not fully comply with the new law. The violation is moderate as there are several key elements missing from the policy. 

Here is what is missing or wrong with the policy:

- The policy does not specify whether the collected personal information is sold or shared.
- The policy does not identify if any of the collected information is considered sensitive personal information, and if so, it does not specify the purposes for which this sensitive personal information is collected or used, and whether it is sold or shared.
- The policy does not mention the length of time the company intends to retain each category of personal information, including sensitive personal information, or the criteria used to determine that period.

The updated policy should look like this:

Rec Room Inc., a Delaware corporation (“Company”, “we”, “us”, “our”), may collect and use the below categories of your Personal Information. Third parties such as Greenhouse also may collect such Pers

In [14]:
messages = [
    SystemMessage(content=pre_prompt),
    HumanMessage(content=f"""

*** Policy ***
{policy}

*** Law ***
{law}
{definitions}
"""
                 )
]
r = chat(messages)
print(r.content)

The company's policy does not comply with the new law. The violations are not severe, but there are several areas that need to be addressed. 

Here are the main issues:

- The policy does not specify the length of time the company intends to retain each category of personal information, as required by the new law.
- The policy does not clarify whether the company collects sensitive personal information, and if so, for what purposes.
- The policy does not provide a clear method for consumers to submit requests or directions under the new law.
- The policy does not specify whether the company sells or shares the personal information it collects.

To bring the policy into compliance with the new law, it should be updated as follows:

Rec Room Inc., a Delaware corporation (“Company”, “we”, “us”, “our”), may collect and use the below categories of your Personal Information. Third parties such as Greenhouse also may collect such Personal Information on our behalf. “Personal Information” mean

## Re-run the updated policy

In [15]:
# read policy from recoom notice upd
with open('recroom notice upd', 'r') as f:
    policy_upd = f.read()

messages = [
    SystemMessage(content=pre_prompt),
    HumanMessage(content=f"""

*** Policy ***
{policy_upd}

*** Law ***
{law}
{definitions}
"""
                 )
]
r = chat(messages)
print(r.content)

The policy does not comply with the new law. The violations are not severe, but they are significant enough to require changes to the policy. 

Here's what is missing or wrong with the policy:

- The policy does not clearly define what constitutes "sensitive personal information" as per the new law. The policy should include a clear definition and examples of sensitive personal information.
- The policy does not specify the criteria used to determine the period for which the company retains each category of personal information. The new law requires this information to be disclosed.
- The policy does not provide a clear method for consumers to submit requests or directions under the new law. The new law requires businesses to provide designated methods for submitting such requests.
- The policy does not provide a clear explanation of the company's practices regarding the sale or sharing of personal information. The new law requires businesses to disclose whether they sell or share pers

# Evaluation

In [23]:
%pip install ragas

Collecting ragas
  Using cached ragas-0.0.16-py3-none-any.whl (38 kB)
Collecting sentence-transformers (from ragas)
  Using cached sentence-transformers-2.2.2.tar.gz (85 kB)
  Preparing metadata (setup.py) ... [?25ldone
Collecting tiktoken (from ragas)
  Downloading tiktoken-0.5.1-cp311-cp311-macosx_11_0_arm64.whl (924 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m924.4/924.4 kB[0m [31m18.9 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
Collecting pydantic<2.0 (from ragas)
  Downloading pydantic-1.10.13-cp311-cp311-macosx_11_0_arm64.whl (2.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m38.8 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hCollecting pysbd>=0.3.4 (from ragas)
  Using cached pysbd-0.3.4-py3-none-any.whl (71 kB)
Collecting torchvision (from sentence-transformers->ragas)
  Downloading torchvision-0.16.0-cp311-cp311-macosx_11_0_arm64.whl (1.6 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [9]:
from ragas import evaluate
from ragas.metrics import (
    answer_relevancy,
    faithfulness,
    context_recall,
    context_precision,
)
from ragas.metrics.critique import harmfulness

## Sample data

In [27]:
# sample data
from datasets import load_dataset

fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")
fiqa_eval

Found cached dataset fiqa (/Users/gr703z/.cache/huggingface/datasets/explodinggradients___fiqa/ragas_eval/1.0.0/3dc7b639f5b4b16509a3299a2ceb78bf5fe98ee6b5fee25e7d5e4d290c88efb8)


  0%|          | 0/1 [00:00<?, ?it/s]

DatasetDict({
    baseline: Dataset({
        features: ['question', 'ground_truths', 'answer', 'contexts'],
        num_rows: 30
    })
})

In [34]:
from datasets import Dataset

# prepare your huggingface dataset in the format
# Dataset({
#     features: ['question', 'contexts', 'answer'],
#     num_rows: 25
# })

# add rows to dataset
dataset = Dataset.from_dict(
    {
        "question": ["What is the capital of Germany?"],
        "contexts":
            # sequence of strings with 10 world capitals, list[list[str]]
            [
                [
                    "The capital of Germany is Berlin.",
                    "The capital of France is Paris.",
                    "The capital of Spain is Madrid.",
                    "The capital of Italy is Rome.",
                    "The capital of the United Kingdom is London.",
                    "The capital of the United States is Washington, D.C.",
                    "The capital of Canada is Ottawa.",
                    "The capital of Australia is Canberra.",
                    "The capital of China is Beijing.",
                    "The capital of Japan is Tokyo.",
                ]
            ],
        "answer": ["Berlin"],
    }
)


In [35]:
result = evaluate(
    dataset,
    metrics=[
        context_precision,
        faithfulness,
        answer_relevancy,
        # context_recall,
        harmfulness,
    ],
)

evaluating with [context_relevancy]


100%|██████████| 1/1 [00:00<00:00,  2.48it/s]


evaluating with [faithfulness]


100%|██████████| 1/1 [00:01<00:00,  1.36s/it]


evaluating with [answer_relevancy]


100%|██████████| 1/1 [00:01<00:00,  1.76s/it]


evaluating with [harmfulness]


100%|██████████| 1/1 [00:01<00:00,  1.01s/it]


In [36]:
df = result.to_pandas()
df.head()

Unnamed: 0,question,contexts,answer,context_relevancy,faithfulness,answer_relevancy,harmfulness
0,What is the capital of Germany?,"[The capital of Germany is Berlin., The capita...",Berlin,0.1,1.0,0.719271,0


## Real Data