# Contract Analysis using IBM Granite LLM from watsonx
Author: [@Aisha Mohammed Farooq Darga](https://www.linkedin.com/in/aisha-mohammed-farooq-darga-778280135/)

## **Description**

Contract analysis involves reviewing, interpreting, and extracting key information from contract documents to identify risks, obligations, and critical aspects. This ensures clarity on terms and conditions, helps avoid ambiguities, and mitigates potential legal or financial complications. 

Effective contract analysis is crucial for businesses, legal professionals, and stakeholders, as it safeguards against unintentional obligations, disputes, and risks.

---

## **What Does This Notebook Do?**

This notebook provides an **automated solution** for contract analysis using **IBM Granite LLM** from watsonx. By leveraging advanced language models and **ChromaDB**, the notebook accomplishes the following:

- Offers a **general overview** of the contract.
- Extracts and highlights **key terms**, such as payment terms, intellectual property rights, termination clauses, and dispute resolution methods.
- Conducts **detailed risk analysis** to uncover and assess potential issues.
- Provides **recommendations and actionable insights** for risk mitigation.
- Compiles a **compliance checklist** and a **risk summary table** categorized by severity.

---

## **Approach Followed**

The notebook uses a systematic multi-step approach:

1. **Text Extraction and Chunking**: 
   - Reads the contract and divides it into manageable chunks for efficient processing.
2. **Embedding the Text**:
   - Converts text chunks into embeddings using the **Granite LLM** model for semantic representation.
3. **Storing in ChromaDB**:
   - Stores the embeddings in **ChromaDB**, enabling efficient retrieval of context-specific sections.
4. **Proximity Search**:
   - Retrieves relevant sections based on queries using embedding-based proximity search.
5. **AI-Driven Risk Assessment**:
   - Analyzes retrieved content to generate a structured evaluation of key terms, risks, and recommendations.

---

## **Output Details**

The notebook produces the following outputs:

1. **General Overview**:
   - Summarizes effective date, involved parties, and scope of work.

2. **Key Highlights**:
   - Outlines major clauses: payment terms, intellectual property rights, termination provisions, and dispute resolution methods.

3. **Detailed Risk Analysis**:
   - Identifies potential risks, categorizes their severity (Low, Medium, High), and suggests mitigation strategies.

4. **Recommendations and Insights**:
   - Provides actionable recommendations for contract improvement.

5. **Compliance Checklist**:
   - Lists unaddressed risks and compliance issues.

6. **Summary of Risks by Severity**:
   - Creates a structured table summarizing risks, severity, and mitigation strategies.

---

## **Prerequisites**

- **IBM Cloud Account**: [Sign up here](https://cloud.ibm.com/registration).
- **Python Version**: Ensure Python 3.11.9 is installed.

---

## **Environment Setup**

### 1. **IBM Cloud Account Setup**
- Log in to [watsonx.ai](https://dataplatform.cloud.ibm.com/registration/stepone?context=wx&apps=all).
- Create a [watsonx.ai Project](https://www.ibm.com/docs/en/watsonx/saas?topic=projects-creating-project).
- Create a [Jupyter Notebook](https://www.ibm.com/docs/en/watsonx/saas?topic=editor-creating-managing-notebooks).
This step will open a Notebook environment where you can copy the code from this tutorial.  Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai project as an asset.

### 2. **Watson Machine Learning (WML) Service**
- Create a [WML Service Instance](https://cloud.ibm.com/catalog/services/watson-machine-learning) (Lite Plan recommended).
- Generate an [API Key](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-authentication.html).
- Associate the WML service to the project that you created in [watsonx.ai](https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/assoc-services.html).


## **Installation and Imports**
Install necessary libraries:

In [None]:
# installations
!pip install -q git+https://github.com/ibm-granite-community/utils \
    chromadb==0.3.26 \
    sentence-transformers \
    ibm-watsonx-ai \
    ibm_watson_machine_learning \
    PyPDF2    

In [None]:
import logging
import chromadb
from ibm_watsonx_ai.foundation_models import Model
from ibm_watsonx_ai.foundation_models.embeddings.sentence_transformer_embeddings import (
    SentenceTransformerEmbeddings,
)
from ibm_watsonx_ai.client import APIClient
from PyPDF2 import PdfReader

## Core Functions
### Logging Setup
Configure logging for monitoring and debugging.

In [None]:
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

### Credential Setup
Please store your WATSONX_URL & WATSONX_APIKEY in a separate .env file in the same level of your directory as this notebook. The folder contains an env_template for reference.

In [None]:
from ibm_granite_community.notebook_utils import get_env_var


def get_credentials():
    return {
        "url": get_env_var("WATSONX_URL"),
        "apikey": get_env_var("WATSONX_APIKEY"),
    }

### PDF Reading
Extract text from the given PDF file.

In [None]:
def read_pdf(pdf_path):
    pdf_reader = PdfReader(pdf_path)
    text = ""
    for page in pdf_reader.pages:
        text += page.extract_text()
    return text

### Text Chunking
Split text into manageable 512-word chunks for efficient processing.

In [None]:
def chunk_text(text, chunk_size=512):
    words = text.split()
    return [
        " ".join(words[i : i + chunk_size]) for i in range(0, len(words), chunk_size)
    ]

### Embedding Text

Generate embeddings for text chunks in batches

In [None]:
def batch_embeddings(emb, texts, batch_size=10):
    embeddings = []
    for i in range(0, len(texts), batch_size):
        embeddings.extend(emb.embed_documents(texts[i : i + batch_size]))
    return embeddings

### ChromaDB Hydration
Populate ChromaDB with text embeddings for efficient querying.


In [None]:
def hydrate_chromadb(pdf_path, emb, collection_name="pdf_collection"):
    # Read and chunk the PDF content
    text = read_pdf(pdf_path)
    chunks = chunk_text(text)

    # Embed the chunks
    try:
        embeddings = batch_embeddings(emb, chunks)
    except Exception as e:
        logger.error(f"Error embedding text: {e}")
        return None

    # Initialize ChromaDB client
    chroma_client = chromadb.Client()

    # Clean up existing collection
    try:
        chroma_client.delete_collection(name=collection_name)
        logger.info(f"Existing collection '{collection_name}' deleted.")
    except Exception as e:
        logger.warning(f"No existing collection to delete or error: {e}")

    # Create new collection
    collection = chroma_client.create_collection(name=collection_name)

    # Add the embeddings to the collection
    try:
        collection.add(
            embeddings=embeddings,
            documents=chunks,
            metadatas=[{"chunk_index": i} for i in range(len(chunks))],
            ids=[f"chunk_{i}" for i in range(len(chunks))],
        )
        logger.info("ChromaDB collection populated successfully.")
    except Exception as e:
        logger.error(f"Error populating ChromaDB: {e}")
        return None

    return collection

### Proximity Search 
Retrieve relevant sections of the contract.

In [None]:
def proximity_search(question, collection, emb):
    try:
        query_vectors = emb.embed_query(question)
        query_result = collection.query(
            query_embeddings=query_vectors,
            n_results=5,
            include=["documents", "metadatas", "distances"],
        )
        documents = list(reversed(query_result["documents"][0]))
        return "\n".join(documents)
    except Exception as e:
        logger.error(f"Error during proximity search: {e}")
        return ""

### Model Setup
This cell sets up the **Granite-3-8b-instruct** model, to generate detailed responses based on contract analysis. It configures the model with specific parameters (such as token limits and repetition penalties) and uses the previously defined credentials to authenticate. The SentenceTransformerEmbeddings model is also initialized here, enabling the embedding of the contract text.

In [None]:
model_id = "ibm/granite-3-8b-instruct"
parameters = {
    "decoding_method": "greedy",
    "max_new_tokens": 5000,
    "min_new_tokens": 0,
    "repetition_penalty": 1,
}
project_id = "c8018aee-3437-4f94-a493-7513583350f3"

model = Model(
    model_id=model_id,
    params=parameters,
    credentials=get_credentials(),
    project_id=project_id,
)

# Initialize Sentence Transformer
emb = SentenceTransformerEmbeddings("sentence-transformers/all-MiniLM-L6-v2")

### Populate ChromaDB
In this step, we specify the contract file using the pdf_path variable. The hydrate_chromadb function is then used to extract the text from the contract, split it into smaller chunks, generate embeddings, and store those embeddings in the ChromaDB database.

To analyze a contract of your choice, follow these simple steps:
1. **Add your contract file**: Place your contract PDF file into a folder named "contracts".
2. **Update the file path**: Change the pdf_path variable to point to your contract file inside the "contracts" folder. For example, if your file is   named my_contract.pdf, update the path like this:
    ```python
        pdf_path = './Contracts/my_contract.pdf'

In [None]:
pdf_path = "./Contracts/Landlord_Filed_Contract.pdf"

chroma_collection = hydrate_chromadb(pdf_path, emb)

### Generate AI Response

Analyze the contract for legal and financial risks using Granite LLM.

In [None]:
def generate_analysis():
    question = "Analyze the contract for legal and financial risks"
    context = proximity_search(question, chroma_collection, emb)

    prompt_input = """<|start_of_role|>system<|end_of_role|>You are Granite, an AI language model developed by IBM in 2024. You are a cautious assistant. You carefully follow instructions. You are helpful and harmless and you follow ethical guidelines and promote positive behavior. You are a AI language model designed to function as a specialized Retrieval Augmented Generation (RAG) assistant. When generating responses, prioritize correctness, i.e., ensure that your response is correct given the context and user query, and that it is grounded in the context. Furthermore, make sure that the response is supported by the given document or context. Always make sure that your response is relevant to the question. If an explanation is needed, first provide the explanation or reasoning, and then give the final answer. Avoid repeating information unless asked.<|end_of_text|>
        <|start_of_role|>user<|end_of_role|>Analyze the following contract for legal and financial risks. Present the analysis in a structured format with detailed sections, as described below:

        Expected Output Format
        1. General Overview
        Summarize key contract details, including:

        Dates mentioned in the contract
        Names of the Parties.
        Scope of Work or Agreement.
        2. Key Highlights
        Highlight important contract terms such as:

        Payment Terms.
        Intellectual Property Rights.
        Termination Provisions.
        Dispute Resolution Methods.
        3. Detailed Risk Analysis
        For each major contract section, identify:

        Potential Risks (e.g., vague terms, unfavorable clauses).
        Severity of the risks (e.g., Low, Medium, High, Critical).
        Mitigation Strategies to address the risks.
        4. Recommendations and Actionable Insights
        Provide practical recommendations to address the identified risks and improve the contract.

        5. Compliance Checklist
        List all compliance issues, unaddressed risks, or sections that require revision.

        6. Summary of Risks by Severity
           Create a **table summarizing risks**, with clear severity classification and corresponding mitigation strategies. The table should follow this format:
            | Risk                               | Severity     | Mitigation Strategy                              |
            |------------------------------------|--------------|--------------------------------------------------|
            | Example: Rent Increases            | High         | Negotiate caps on annual increases.              |

        <|end_of_role|>

        # Generate AI response with context and user prompt"""

    formatted_prompt = f"""<|start_of_role|>user<|end_of_role|>Use the following pieces of context to answer the question.

    {context}

    {prompt_input}<|end_of_role|>
    """

    try:
        generated_response = model.generate_text(
            prompt=formatted_prompt, guardrails=False
        )
        formatted_response = generated_response.replace("\n", "\n")
        print(formatted_response)
    except Exception as e:
        logger.error(f"Error generating response: {e}")


generate_analysis()