# Contract Analysis using IBM Granite LLM from watsonx
Author: [@Aisha Mohammed Farooq Darga](https://www.linkedin.com/in/aisha-mohammed-farooq-darga-778280135/)

### **Description**

Contract analysis involves reviewing, interpreting, and extracting key information from contract documents to identify risks, obligations, and critical aspects. This ensures clarity on terms and conditions, helps avoid ambiguities, and mitigates potential legal or financial complications. 

Effective contract analysis is crucial for businesses, legal professionals, and stakeholders, as it safeguards against unintentional obligations, disputes, and risks.

---

### **What Does This Notebook Do?**

This notebook provides an **automated solution for contract analysis** by leveraging advanced AI technologies like **IBM Granite LLM (`ibm/granite-3-8b-instruct`)** and **[Docling's DocumentConverter](https://ds4sd.github.io/docling/)**. It simplifies the process of extracting, analyzing, and understanding key details from contract documents, enabling users to:  
- Identify important clauses and terms.  
- Assess potential risks and obligations.  
- Generate actionable insights for better decision-making.  

The notebook is particularly useful for legal professionals, business analysts, and organizations handling contracts regularly, as it ensures consistency and scalability in the analysis process.  

---

### **Approach Followed**

1. **Document Conversion**  
   - Contracts in various file formats (e.g., PDFs) are converted into text using **Docling's DocumentConverter**, producing markdown-formatted text for seamless processing.  

2. **Text Splitting and Preparation**  
   - Contracts are split into manageable text chunks to ensure efficient processing by the AI model and adherence to token limits.  

3. **AI-Powered Analysis**  
   - The **IBM Granite LLM (`ibm/granite-3-8b-instruct`)** processes the contract chunks using a carefully crafted prompt. It performs the following tasks:
     - **Clause Identification**: Extracts and summarizes critical clauses, such as payment terms, intellectual property rights, and termination conditions.  
     - **Risk Analysis**: Identifies potential risks, categorizes them by severity, and offers strategies to mitigate them.  
     - **Actionable Recommendations**: Suggests improvements to contract terms and provides compliance checklists.  

4. **Output Consolidation**  
   - The results from each chunk are merged into a single comprehensive analysis document, offering both a high-level summary and detailed insights.  

---

### **Prerequisites**

- **IBM Cloud Account**: [Sign up here](https://cloud.ibm.com/registration).
- **Python Version**: Ensure Python 3.11.9 is installed.

---

### **Environment Setup**

#### 1. **IBM Cloud Account Setup**
- Log in to [watsonx.ai](https://dataplatform.cloud.ibm.com/registration/stepone?context=wx&apps=all).
- Create a [watsonx.ai Project](https://www.ibm.com/docs/en/watsonx/saas?topic=projects-creating-project).
- Create a [Jupyter Notebook](https://www.ibm.com/docs/en/watsonx/saas?topic=editor-creating-managing-notebooks).
This step will open a Notebook environment where you can copy the code from this tutorial.  Alternatively, you can download this notebook to your local system and upload it to your watsonx.ai project as an asset.

#### 2. **Watson Machine Learning (WML) Service**
- Create a [WML Service Instance](https://cloud.ibm.com/catalog/services/watson-machine-learning) (Lite Plan recommended).
- Generate an [API Key](https://dataplatform.cloud.ibm.com/docs/content/wsj/analyze-data/ml-authentication.html).
- Associate the WML service to the project that you created in [watsonx.ai](https://dataplatform.cloud.ibm.com/docs/content/wsj/getting-started/assoc-services.html).

---


## **Code Implementation**:

### **1. Install Libraries**
We start by installing the required dependencies. This includes libraries for document conversion, IBM watsonx LLM integration, and data handling.

In [None]:
!pip install -q git+https://github.com/ibm-granite-community/utils \
    docling==2.14.0 \
    langchain==0.2.12 \
    langchain-ibm==0.1.11 \
    langchain-community==0.2.11 \
    langchain-core==0.2.28 \
    ibm-watsonx-ai==1.1.2 \
    transformers==4.47.1

### **2. Import Libraries**

In this step, we import the necessary libraries that will help us process the contract data and analyze it using watsonx.

In [None]:
import os
import requests
import logging
from docling.document_converter import DocumentConverter
from langchain_ibm import WatsonxLLM
from langchain_core.prompts import PromptTemplate
from ibm_granite_community.notebook_utils import get_env_var

### **3. Configure Logging**

In this step, we configure the logging settings. Logging is essential for debugging and tracking the notebook's execution. The logging level is set to `INFO`, which allows us to capture useful information during the execution.

In [None]:
# Set up logging configuration for debugging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

### **4. Initialize Utilities**

Here, we initialize the **DocumentConverter** instance. This utility is responsible for converting contract documents into a structured format that can be processed and analyzed in the next steps.


In [None]:
# Initialize DocumentConverter for converting documents
document_converter = DocumentConverter()

### **5. Define Functions**

#### 5.1 Read Contract Text from a File
This function reads the contract from a specified file path, converts it using the `DocumentConverter`, and exports the contract text in markdown format for easier analysis.


In [None]:
def read_contract(file_path):
    result = document_converter.convert(file_path)
    text_content = result.document.export_to_markdown()
    return text_content

#### 5.2 Split Text into Manageable Chunks

In [None]:
def split_text(text, max_tokens=100000):
    words = text.split()
    chunks = []
    current_chunk = []
    current_length = 0

    for word in words:
        word_length = len(word) + 1  # +1 for the space
        if current_length + word_length > max_tokens:
            chunks.append(" ".join(current_chunk))
            current_chunk = []
            current_length = 0
        current_chunk.append(word)
        current_length += word_length

    if current_chunk:
        chunks.append(" ".join(current_chunk))
    return chunks

#### 5.3 Generate Contract Analysis

In [None]:
def generate_analysis_with_watsonx(text, watson_llm, prompt):
    chunks = split_text(text, max_tokens=100000)
    logger.info(f"Split text into {len(chunks)} chunks for processing.")
    results = []
    for i, chunk in enumerate(chunks):
        logger.info(f"Processing chunk {i + 1} of {len(chunks)}...")
        formatted_context = prompt.format(context=chunk)
        result = watson_llm.invoke(formatted_context)
        results.append(result)
    combined_result = "\n\n".join(results)
    return combined_result

#### 5.4 Directory and File Management

These helper functions assist in managing files and directories. We use them to:
- **setup_directory**: Ensure the necessary directory exists.
- **download_file**: Download contract files from a specified URL.
- **cleanup_directory**: Remove files from the directory after processing.

In [None]:
def setup_directory(directory):
    os.makedirs(directory, exist_ok=True)
    logger.info(f"Directory '{directory}' is ready.")

def download_file(file_url, destination):
    response = requests.get(file_url)
    if response.status_code == 200:
        with open(destination, "wb") as file:
            file.write(response.content)
        logger.info(f"Downloaded: {destination}")
    else:
        logger.error(f"Failed to download {file_url}. Status code: {response.status_code}")

def cleanup_directory(directory):
    for file_name in os.listdir(directory):
        file_path = os.path.join(directory, file_name)
        if os.path.isfile(file_path):
            os.remove(file_path)
    os.rmdir(directory)
    logger.info(f"Cleaned up directory '{directory}'.")

### 6. Main Execution Block

#### 6.1 Setup Environment Variables and Parameters

In [None]:
if __name__ == "__main__":
    # Step 1: Setup environment variables
    ibm_cloud_api_key = get_env_var('WATSONX_APIKEY')
    project_id = get_env_var('WATSONX_PROJECT_ID')
    watson_url = get_env_var('WATSONX_URL')

    watson_llm = WatsonxLLM(
        model_id="ibm/granite-3-8b-instruct",
        apikey=ibm_cloud_api_key,
        project_id=project_id,
        params={
            "decoding_method": "greedy",
            "max_new_tokens": 8000,
            "min_new_tokens": 1,
            "repetition_penalty": 1.01,
        },
        url=watson_url,
    )

    prompt_template = PromptTemplate(
        input_variables=['context'],
        # Define a prompt template for contract analysis
        template = '''<|start_of_role|>System<|end_of_role|> You are a highly skilled contract analyst. Your task is to provide a comprehensive analysis of the given contract, extracting key details and structuring the output into the following sections:
        ---

        1. General Overview
        - Contract Dates: Specify the start and end dates of the agreement.
        - Parties Involved: List the full names of the parties, including their principal offices or headquarters addresses.
        - Scope of Work: Summarize the primary services or obligations each party will fulfill.
        ---

        2. Key Highlights
        - Payment Terms:
        - Total fee and payment schedule, including deadlines and penalties for late payments.
        - Intellectual Property Rights:
        - Ownership details of pre-existing IP versus project-specific deliverables.
        - Termination Provisions:
        - Notice period and conditions for termination.
        - Dispute Resolution:
        - Process for handling disputes, including negotiation, mediation, and arbitration details.

        ---

        3. Detailed Risk Analysis
        For each key clause (Payment Terms, Intellectual Property Rights, Termination Provisions, and Dispute Resolution):
        - Risk: Identify potential risks or ambiguities.
        - Severity: Classify the risk as High, Medium, or Low.
        - Mitigation Strategy: Propose actionable solutions to reduce or eliminate the risk.

        ---

        4. Recommendations and Actionable Insights
        Provide specific recommendations to improve the agreement, address identified risks, or strengthen the contract terms.

        ---

        5. Compliance Checklist
        Highlight compliance requirements and ensure all clauses adhere to applicable legal standards and regulations.

        ---

        6. Summary of Risks by Severity
        Present the identified risks in a table format with the following columns:
        - Risk: A brief description of the issue.
        - Severity: High, Medium, or Low.
        - Mitigation Strategy: The proposed solution to address the risk.

        ---

        Contract Data:
        {context}

        Strictly adhere to the provided format and ensure the analysis is accurate, detailed, and well-organized.

        <|end_of_text|>
        '''
    )

#### 6.2 Setup Directory and Download Files

In [None]:
# Step 2: Set up directory and download files
data_dir = "Contracts"
setup_directory(data_dir)

# Example local paths and base URL
local_paths = [
    "IT_Consultancy_Agreement.pdf",
    "Construction_Contract.pdf",
    "Employment_Agreement.pdf",
    "Software_Development_Contract.pdf"
]
base_url = "https://raw.githubusercontent.com/AishaDarga/granite-snack-cookbook/refs/heads/contract-analysis/recipes/Contract-Analysis/Contracts/"

for file_name in local_paths:
    file_url = base_url + file_name
    file_path = os.path.join(data_dir, file_name)
    download_file(file_url, file_path)

#### 6.3 Read, Process, and Analyze Contracts

**Note:** You can change the selected contract file to any other contract present in the `Contracts` folder. Simply replace 'selected_contract' in the script with the desired contract file name.

The generated output contains the following sections:

1. General Overview:
   - Key contract details such as the effective dates, involved parties, and scope of work.

2. Key Highlights:
   - Summarizes the important contract clauses including payment terms, intellectual property rights, termination provisions, and dispute resolution methods.

3. Detailed Risk Analysis:
   - Identifies and assesses potential risks within each key section of the contract, categorizing them as Low, Medium, High, or Critical.

4. Recommendations and Actionable Insights:
   - Provides practical advice for mitigating risks and improving contract terms.

5. Compliance Checklist:
   - Lists compliance requirements and any unaddressed risks.

6. Summary of Risks by Severity:
   - A table summarizing identified risks, their severity, and proposed mitigation strategies.




In [None]:
# Step 3: Read, process, and analyze each contract
selected_contract = "Employment_Agreement.pdf"  # Replace with the desired file name

if selected_contract in local_paths:
    file_path = os.path.join(data_dir, selected_contract)
    contract_text = read_contract(file_path)
    contract_analysis = generate_analysis_with_watsonx(contract_text, watson_llm, prompt_template)
    print(f"Analysis for {selected_contract}:\n{contract_analysis}")
else:
    print(f"Error: {selected_contract} not found in the available contracts.")
    


#### 6.4 Cleanup

Once the contract analysis is complete, we use the cleanup function to remove the downloaded contract files and clean up the directory, ensuring no unnecessary files remain on the system.


In [None]:
cleanup_directory(data_dir)