In [2]:
import os
import google.generativeai as genai
import pandas as pd
import json
import random
from tqdm import tqdm
from langchain.prompts.chat import (ChatPromptTemplate, HumanMessagePromptTemplate)
import time

N_GENERATIONS = 10

In [3]:
with open("gemini_api_key.txt") as f:
    gemini_api_key = f.read().strip()
genai.configure(api_key=gemini_api_key)
model = genai.GenerativeModel(model_name='gemini-1.5-flash')

In [5]:
QA_generation_prompt = """
Your task is to write a factoid question and an answer based on the given Terms and Conditions (T&C) context.
Your factoid question should target specific clauses, rights, obligations, or policies from the context, focusing on key details that a user may seek clarity on.
Make sure your factoid question resembles the type of questions a user might ask when seeking information about T&C, and make sure you incorporate the company name in the question such as "What is the policy on returns in <company name>?" or "How are user data stored in <company name>?"

Avoid references to "the passage" or "context" in your question and avoid directing reader to the documents for more information.

Provide your answer as follows:

Factoid question: (your factoid question)
Answer: (your detailed and precise answer to the factoid question)

Now here is the context.

Context: {context}
"""

prompt_template = ChatPromptTemplate.from_messages([HumanMessagePromptTemplate.from_template(QA_generation_prompt)])

In [6]:
def call_llm(model, prompt):
    response = model.generate_content(prompt)
    return response.text

In [7]:
documents = pd.read_csv('documents.csv')

In [8]:
outputs = []
for doc in tqdm(documents.sample(n=N_GENERATIONS, random_state=48).iterrows()):
    # Generate QA couple
    context = " Company:" + doc[1]['company_names'] + doc[1]['documents']
    output_QA_couple = call_llm(model, prompt_template.format(context=context))
    time.sleep(0.5)
    print(output_QA_couple)
    try:
        question = output_QA_couple.split("Factoid question:")[-1].split("Answer: ")[0]
        answer = output_QA_couple.split("Answer: ")[-1]
        outputs.append(
            {
                "company": doc[1]['company_names'],
                "question": question,
                "right answer": answer,
            }
        )
    except:
        print("Error")
        continue

1it [00:01,  1.98s/it]

Factoid question: What happens if a restaurant using MyTable Guest Center falls below the required average number of reservations per month?
Answer: If a restaurant using MyTable Guest Center fails to meet the required monthly average of 15 reservations, the application, related services, and the agreement will be terminated. 


2it [00:05,  2.61s/it]

Factoid question: What is Mistral AI’s policy on user data retention when using the Standard API?
Answer: Mistral AI retains a user's prompts and outputs for 30 days to monitor abuse and prevent breaches of the Agreement. However, users can request zero data retention, in which case their prompts and outputs are only processed for the duration needed to generate the output and are not retained for any longer period.  


3it [00:06,  1.94s/it]

Factoid question: What information about users who write reviews can Google Play developers see?
Answer: Google Play developers can see the user's name and image from their Google account, as well as their past review edits. They can also see the user's language, device and country, and device information such as language, model, and OS version. This information is available to developers for both public and private reviews. 


4it [00:08,  1.93s/it]

Factoid question: What is the policy on using Snapchat if I am a business user based in the United Kingdom? 
Answer: If you are using Snapchat on behalf of a business based outside the United States, your business will be bound by the arbitration clause that appears later in the Snap Group Limited Terms of Service. 


5it [00:09,  1.68s/it]

Factoid question: What factors does Amazon consider when choosing "Featured" search results in the UK store? 
Answer: Amazon's "Featured" search results prioritize customer actions like purchase frequency and product information such as title, price, and description. These elements indicate the likelihood of customer interest and are the primary factors considered.  Additional factors, such as customer reviews and the product's age, may also influence the ranking. 


6it [00:11,  1.79s/it]

Factoid question: What is the policy on returning products purchased through AliExpress if I am a user based in a Relevant Jurisdiction?
Answer: As an AliExpress Relevant Jurisdiction User, your contract is with AliExpress Russia Holding Private Limited and any purchases you make are subject to the terms and conditions set out in the Transactional Services Agreement, particularly clause 3.2.  This clause specifies additional obligations regarding payment, returns, warranties, shipping, insurance, fees, taxes, title, licenses, fines, permits, handling, transportation, and storage.  For more specific information about returns, you should consult the Transactional Services Agreement directly. 


7it [00:13,  1.83s/it]

Factoid question: What are Facebook's policies on using content I create and share on the platform?
Answer:  Facebook requires users to grant a non-exclusive, transferable, sub-licensable, royalty-free, and worldwide license to host, use, distribute, modify, run, copy, publicly perform or display, translate, and create derivative works of their content. This means Facebook can store, copy, and share your content with others (consistent with your settings) such as Meta Products or service providers that support those products and services. The license ends when you delete your content from their systems. You can delete individual content you share at any time, and all content posted to your personal account will be deleted if you delete your account. However, it may take up to 90 days to fully delete content from their systems. 


8it [00:14,  1.63s/it]

Factoid question: What is OttoOtto's liability policy if a retailer experiences data loss due to a system malfunction?
Answer: OttoOtto's liability for data loss is limited to the ordinary expenditures related to data recovery, provided that the retailer has regularly made back-up copies of their data on a daily basis. If the retailer has not been regularly making back-ups, OttoOtto is not liable for data loss. 


9it [00:15,  1.57s/it]

Factoid question: What happens if I disagree with Google Maps' decision regarding my content or account in the EU, according to the Digital Services Act?
Answer: If you disagree with Google Maps' decision to restrict your content or account, you may be able to refer your matter to an out-of-court dispute settlement body. This option is available if Google believes you violated the law or their policies, or if they decide not to act on your report regarding potentially illegal or policy-violating content. However, Google is not bound by any decisions made by these bodies. You may also appeal through Google's internal appeals process or seek legal counsel. 


10it [00:17,  1.78s/it]

Factoid question: What is Houzz Pro's policy on disclosing personal data to public authorities? 
Answer: Houzz Pro's data importer (Houzz Inc.) agrees to notify the data exporter (the Professional) and, where possible, the data subject promptly if it receives a legally binding request from a public authority for the disclosure of personal data transferred under the Houzz Pro Terms and Conditions. Houzz Pro's data importer also agrees to review the legality of the request, challenge the request if it is deemed unlawful, and provide the minimum amount of information permissible when responding to a request for disclosure. 





In [12]:
new_testset = pd.concat([new_testset, pd.DataFrame(outputs)])
new_testset

Unnamed: 0,company,question,right answer
0,Bolt,What is the policy on retaining driver data a...,"After a driver's account is closed, Bolt Head..."
1,Shopify,How long does Shopify keep store information ...,Shopify retains store information for two year...
2,Amazon,What is Amazon's policy regarding the storage...,"Amazon states that they will not retain, use, ..."
3,Uber,What is the policy on returns in Uber?\n,The Uber Terms of Service do not explicitly m...
4,TikTok,What company am I contracting with when I use...,"If you are resident in the United Kingdom, you..."
5,Google Search,What happens to my Google One membership afte...,After the promotional period (Offer Period) en...
6,Instagram,What is the maximum time it can take for Inst...,"According to Instagram's policies, it can take..."
7,AliExpress,What is the policy on the number of product l...,AliExpress reserves the right to place restri...
8,azuremarketplace.microsoft.com,What are the governing terms and conditions f...,If you purchase Azure services through a Micr...
9,Tradera,What kind of data does Tradera collect automa...,Tradera automatically collects information sen...


In [14]:
new_testset.to_excel('new_testset.xlsx', index=False)