# **Company Compliance Assessment Tool**

**Introduction**

This notebook is designed to automate the evaluation of a company's compliance with various industry standards and best practices in data security, intellectual property, confidentiality, and more. By utilizing advanced NLP models and data retrieval systems, this tool queries internal documents and external sources to generate responses to predefined compliance questions. The responses are then automatically scored against a rigorous set of criteria to assess how well the company meets the specified requirements.

The notebook integrates several cutting-edge technologies:

- **LangChain:** Used for leveraging large language models to perform complex retrieval and question-answering tasks.

- **Pandas:** Provides robust data structures and operations for manipulating numerical tables and time series.

- **OpenAI Embeddings and Chroma:** These are utilized for embedding questions and retrieving the most relevant information from a vectorized database.

Each question is categorized under topics such as Access Control, Data Security, Subcontractors, and more. The tool evaluates answers based on the documentation provided or retrieved, scoring each on a scale from 0 to 10. A score of 0 indicates no evidence of compliance, while a score of 10 signifies comprehensive and consistent adherence to the requirements.

This automated assessment aims to streamline the compliance review process, making it more efficient and consistent. It's particularly useful for compliance officers, auditors, and management teams seeking to ensure that their company practices align with legal and ethical standards.

In [None]:
!pip install langchain
!pip install -U langchain-openai
!pip install chromadb

Collecting langchain
  Downloading langchain-0.1.16-py3-none-any.whl (817 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/817.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━[0m[90m╺[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m143.4/817.7 kB[0m [31m4.2 MB/s[0m eta [36m0:00:01[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m809.0/817.7 kB[0m [31m13.3 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m817.7/817.7 kB[0m [31m10.6 MB/s[0m eta [36m0:00:00[0m
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain)
  Downloading dataclasses_json-0.6.5-py3-none-any.whl (28 kB)
Collecting langchain-community<0.1,>=0.0.32 (from langchain)
  Downloading langchain_community-0.0.34-py3-none-any.whl (1.9 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.9/1.9 MB[0m [31m36.0 MB/s[0m eta [36m0:00:00[0m
Collecting langchain-text-splitters<0.1,>=0.0.

In [None]:
import os
from IPython.display import display, Markdown

os.environ["LANGCHAIN_TRACING_V2"]="true"
os.environ["LANGCHAIN_ENDPOINT"] =url = "https://api.langchain.plus/info"

In [None]:
OPEN_AI_API_KEY="API_KEY_HERE"
LANGCHAIN_API_KEY="API_KEY_HERE"

In [None]:
os.environ['OPEN_AI_API_KEY']=OPEN_AI_API_KEY
os.environ['LANGCHAIN_API_KEY']=LANGCHAIN_API_KEY

In [None]:
# account for deprecation of LLM model
import datetime

#Get the current date
current_date=datetime.datetime.now().date()

#Define the date after which the model should be set to "gpt-3.5-turbo"
target_date=datetime.date(2025,6,12)

#Set the model variable based on the current date
llm_model ="gpt-3.5-turbo-16k"

from langchain.vectorstores import Chroma
from langchain.embeddings.openai import OpenAIEmbeddings

persist_directory='docs/chroma/'

embedding=OpenAIEmbeddings(api_key=OPEN_AI_API_KEY)
vectordb=Chroma(
    persist_directory=persist_directory,
    embedding_function=embedding
)

In [None]:
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA

llm=ChatOpenAI(api_key=OPEN_AI_API_KEY,model_name=llm_model,temperature=0)
qa_chain=RetrievalQA.from_chain_type(
    llm,
    # MMR is Maximal Marginal Relevance
    retriever=vectordb.as_retriever(search_type="mmr",search_kwargs={'fetch_k':5,'k':7},
                                    #chain_type="refine",
                                    #chain_type="map_reduce",
                                    return_source_documents=True,
                                    )
)

In [None]:
import pandas as pd


data=[{
  "category":"Access Control",
  "question":"What are the access policies at the company for repositories, task management systems, other software used"

},
{
    "category":"Data Security",
    "question": "How does the company securely store and deploy results and software per industry best proactices?"

},
{
  "category":"Data Security",
  "question":"What is the process at the company for return or destroying a Cusotmer's confidential information"
},
 { "category":"Subcontractors",
    "question": "Is the company willing to remain fully responsible for the security of any subcontractors iy uses ?"
},
{
    "category":"Subcontractors",
    "question": "Will the company ensure any subcontractors it engages acknowlede the Cusotmer's right to modify the result produced?"
},
{
    "category":"Intellectual Property",
    "question": "Does the company agree not to transfer, assign, license, plede or encumber the results or related IP without"
},
{
    "category":"Confidentiality",
    "question": "Does the company agree to hold a Customer's confidential information upon termination of an agreement?"
},
{
    "category":"Confidentiality",
    "question": "Is the company willing to limit disclosures of a Customer's confidential information to legal process"
},
{
    "category":"Warranties",
    "question": "Does the company warrant it will not violate rights of third parties?"
},
{
    "category":"Warranties",
    "question": "Does the company warrant it will not disclose or use a Customer's property"
},
{
    "category":"Liability",
    "question": "Is the company willing to limit the liablity to the amount paid under the relevant St."
}]
df=pd.DataFrame(data)

In [None]:
df['answer']=""
#df['source_document']=""

for index,row in df.iterrows():
  result=qa_chain({"query":row['question']})
  df.at[index,'answer'] =result['result']



In [None]:
df

Unnamed: 0,category,question,answer
0,Access Control,What are the access policies at the company fo...,"I'm sorry, but I don't have access to specific..."
1,Data Security,How does the company securely store and deploy...,The company securely stores and deploys result...
2,Data Security,What is the process at the company for return ...,"At our company, we have a strict process in pl..."
3,Subcontractors,Is the company willing to remain fully respons...,"Based on the given context, it is not clear wh..."
4,Subcontractors,Will the company ensure any subcontractors it ...,"Based on the given context, it is not clear wh..."
5,Intellectual Property,"Does the company agree not to transfer, assign...","Yes, the company agrees not to transfer, assig..."
6,Confidentiality,Does the company agree to hold a Customer's co...,"I'm sorry, but I don't have access to the spec..."
7,Confidentiality,Is the company willing to limit disclosures of...,"Yes, the company is willing to limit disclosur..."
8,Warranties,Does the company warrant it will not violate r...,I don't have enough information to answer that...
9,Warranties,Does the company warrant it will not disclose ...,"Yes, the company warrants that it will not dis..."


Score the company


In [None]:
from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

import langchain
langchain.debug=False

df['score']=""

for index,row in df.iterrows():
  #prompt to evaluate how good company fit to the criteria in the question
  prompt=f"""Please thoroughly review the question and answer.
  Assume all answers are built upon relevant company documentation.
  Score each answer on a scale of 0-10 based on how well the cited documentation indicates the company meet the requirements.

  Scoring criteria:
  0 - Answer shows no evidence that the requirement is met.
  1-2 - Answer shows the requirement is minimally met in isolated cases.
  3-4 - Answer shows the requirement is met inconsistently or in limited siuations.
  5-6 - Answer shows the requirement is met about half the time on average.
  7-8 - Answer shows the requirement is met the majority of the time with occasional expectations.
  9-10 - Answer provides strong evidence the requuirement is comprehensively and consistently met.

  If there is no data available, score the answer 0.

  #Question:
  {row['question']}

  #Answer:
  {row['answer']}

  #Score:
    """
  result=qa_chain({"query":prompt})
  try:
      df.at[index,'score']=int(result['result'])
  except:
      df.at[index,'score']=0



In [None]:
df.style.to_excel('report.xlsx')

In [None]:
df.style

Unnamed: 0,category,question,answer,score
0,Access Control,"What are the access policies at the company for repositories, task management systems, other software used","I'm sorry, but I don't have access to specific information about the access policies at a company. It would be best to reach out to the company directly or consult their documentation for more information on their access policies for repositories, task management systems, and other software used.",0
1,Data Security,How does the company securely store and deploy results and software per industry best proactices?,"The company securely stores and deploys results and software per industry best practices by implementing various security measures. These may include encryption techniques to protect data at rest and in transit, access controls to ensure only authorized personnel can access the information, regular backups to prevent data loss, and secure deployment processes to minimize the risk of unauthorized access or tampering. Additionally, the company may follow industry standards and guidelines such as ISO 27001 or NIST cybersecurity framework to ensure a robust security posture. However, without specific information about the company's practices, it is difficult to provide a more detailed answer.",7
2,Data Security,What is the process at the company for return or destroying a Cusotmer's confidential information,"At our company, we have a strict process in place for handling customer's confidential information. When a customer requests a return or destruction of their confidential information, they can reach out to our customer service department. Our customer service representatives will guide them through the necessary steps and provide them with the required forms or documentation. Once the request is received, our team follows a secure and confidential process to either return the information to the customer or destroy it in a manner that ensures complete data protection.",7
3,Subcontractors,Is the company willing to remain fully responsible for the security of any subcontractors iy uses ?,"Based on the given context, it is not clear whether the company is willing to remain fully responsible for the security of any subcontractors it uses.",0
4,Subcontractors,Will the company ensure any subcontractors it engages acknowlede the Cusotmer's right to modify the result produced?,"Based on the given context, it is not clear whether the company will ensure that subcontractors acknowledge the customer's right to modify the result produced.",0
5,Intellectual Property,"Does the company agree not to transfer, assign, license, plede or encumber the results or related IP without","Yes, the company agrees not to transfer, assign, license, pledge, or encumber the results or related IP without prior consent or approval.",9
6,Confidentiality,Does the company agree to hold a Customer's confidential information upon termination of an agreement?,"I'm sorry, but I don't have access to the specific agreements or policies of any company. It would be best to refer to the terms and conditions or privacy policy of the company in question to determine their stance on holding customer's confidential information upon termination of an agreement.",0
7,Confidentiality,Is the company willing to limit disclosures of a Customer's confidential information to legal process,"Yes, the company is willing to limit disclosures of a Customer's confidential information to legal process.",0
8,Warranties,Does the company warrant it will not violate rights of third parties?,I don't have enough information to answer that question.,0
9,Warranties,Does the company warrant it will not disclose or use a Customer's property,"Yes, the company warrants that it will not disclose or use a Customer's property.",0


**Conclusion**

This notebook represents a significant step forward in automating the evaluation of corporate compliance. By using the power of large language models through LangChain and utilizing advanced data retrieval and processing technologies, we have developed a tool that not only simplifies but also enhances the accuracy and reliability of compliance assessments.

The results presented in this notebook offer a detailed look at how the company adheres to various compliance requirements across different categories. The automated scoring system provides clear, quantifiable metrics that help in identifying areas of strength and those needing improvement. This not only aids in maintaining regulatory compliance but also supports continuous improvement in corporate governance practices.

As we move forward, continuous refinement of the question sets, enhancement of the retrieval mechanisms, and integration of more dynamic scoring criteria will be key to addressing the evolving compliance landscape. This tool is not just a means for assessment but also a framework for sustained compliance and risk management.