# Legal Document Analysis

The project "Legal Document Analysis" involves the development of a system that is capable of parsing and **analyzing legal documents, including contracts, court** **rulings, legal agreements, and other similar texts**. The primary goal of this project is to use natural language processing (NLP) and machine learning techniques to extract valuable information and insights from legal texts, making it easier for legal professionals to review and work with such documents.

Key components and tasks involved in this project may include:

Text Extraction: The system should be capable of extracting the text content from various types of legal documents, which may be in different formats such as PDF, Word, or plain text.

Text Preprocessing: Preprocessing steps may include cleaning the text, removing irrelevant information, and standardizing the format for consistent analysis.

Named Entity Recognition (NER): Identifying and categorizing entities such as names of parties, dates, contract terms, legal citations, and more using NLP techniques.

Contract Clause Extraction: Extracting specific clauses and terms from contracts, such as obligations, responsibilities, payment terms, and termination clauses.

Sentiment Analysis: Determining the sentiment or tone of legal documents, which can be valuable in assessing the implications of contracts or court rulings.

Legal Research: Automating legal research by retrieving relevant case law, statutes, or regulations based on the content of the analyzed documents.

Document Summarization: Generating concise summaries of legal documents to provide a quick overview of their key points.

Compliance Checking: Ensuring that contracts and agreements comply with relevant legal standards and regulations.

Document Comparison: Comparing multiple versions of a document to identify changes, additions, or omissions.

User Interface: Creating a user-friendly interface that allows legal professionals to upload, analyze, and interact with legal documents efficiently.

The benefits of such a system include time savings, improved accuracy, and reduced risk in legal document analysis. It can be a valuable tool for lawyers, legal researchers, and organizations that deal with a large volume of legal documents regularly.


Most pretrained language models, such as **BERT, GPT-3**, and others, are trained on general text data from the internet and do not have domain-specific knowledge or capabilities for legal documents.

Here are a few examples of legal NLP projects and resources:

- LegalBERT: LegalBERT is a domain-specific adaptation of BERT for legal documents. It has been fine-tuned on legal text and may be useful for various legal NLP tasks.

- Caselaw Access Project: This project provides access to a large dataset of U.S. court cases. Researchers have used this dataset to train models for legal text analysis.

- Stanford NLP Group: Researchers at Stanford University have worked on various legal NLP projects, including parsing and extracting information from legal texts.

The **en_core_web_sm model is part of the spaCy library**, which is a popular natural language processing (NLP) library in Python. However, en_core_web_sm is a **general-purpose English language model, and it is not specifically tailored** **for legal text or legal-specific task**s.

 It is designed for common NLP tasks such as tokenization, part-of-speech tagging, named entity recognition, dependency parsing, and more, but it does not have specialized knowledge or capabilities for legal domain-specific tasks.




In [None]:
import spacy
from spacy.matcher import Matcher

# Load a language model (you can use a legal-specific model if available)
nlp = spacy.load("en_core_web_sm")

# Define a pattern for matching parties involved
party_pattern = [{"POS": "PROPN"}, {"POS": "PROPN"}]

# Define a pattern for matching contract dates (simplified for illustration)
date_pattern = [{"LIKE_NUM": True}, {"LOWER": "of"}, {"IS_DIGIT": True}, {"TEXT": {"REGEX": r"\d{4}"}}]

# Create a Matcher object for party and date patterns
matcher = Matcher(nlp.vocab)
matcher.add("PARTY", [party_pattern])
matcher.add("DATE", [date_pattern])

# Example legal document text
legal_document = """
This Agreement ("Agreement") is made and entered into this 14th day of January, 2024, by and between
Party A, a corporation organized and existing under the laws of the State of California, and Party B,
an individual residing at 123 Main Street, Anytown, USA.
"""

# Process the legal document using the NLP model
doc = nlp(legal_document)

# Find matches for parties and dates using the Matcher
matches = matcher(doc)

# Extract parties and dates from matches
parties_involved = []
contract_dates = []

for match_id, start, end in matches:
    match_text = doc[start:end].text
    if match_id == nlp.vocab.strings["PARTY"]:
        parties_involved.append(match_text)
    elif match_id == nlp.vocab.strings["DATE"]:
        contract_dates.append(match_text)

# Print extracted parties and contract dates
print("Parties Involved:")
for party in parties_involved:
    print(party)

print("\nContract Dates:")
for date in contract_dates:
    print(date)


Parties Involved:
Party A
Party B
Main Street

Contract Dates:


In [None]:
# Example legal document text
legal_document_2 = """
This Contract ("Contract") is made and entered into on the 5th day of March, 2023,
by and between XYZ Corporation, a Delaware corporation with its principal place of business
at 456 Oak Street, Cityville, USA, and John Doe, an individual residing at 789 Elm Avenue,
Townsville, USA.
"""

# Process the second legal document using the NLP model
doc_2 = nlp(legal_document_2)

# Find matches for parties and dates using the Matcher
matches_2 = matcher(doc_2)

# Extract parties and dates from matches
parties_involved_2 = []
contract_dates_2 = []

for match_id, start, end in matches_2:
    match_text = doc_2[start:end].text
    if match_id == nlp.vocab.strings["PARTY"]:
        parties_involved_2.append(match_text)
    elif match_id == nlp.vocab.strings["DATE"]:
        contract_dates_2.append(match_text)

# Print extracted parties and contract dates for the second document
print("\nParties Involved (Document 2):")
for party in parties_involved_2:
    print(party)

print("\nContract Dates (Document 2):")
for date in contract_dates_2:
    print(date)



Parties Involved (Document 2):
XYZ Corporation
Oak Street
John Doe
Elm Avenue

Contract Dates (Document 2):


In [None]:
# Example legal document text
legal_document_complex = """
                                AGREEMENT OF PURCHASE AND SALE
                                 (Real Estate - Residential)

THIS AGREEMENT made as of the 14th day of January, 2024.

BETWEEN:

1. JOHN DOE, an individual residing at 123 Oak Street, Cityville, USA,
   (hereinafter referred to as the "Seller");

2. JANE SMITH, an individual residing at 456 Elm Avenue, Townsville, USA,
   (hereinafter referred to as the "Buyer");

3. ABC REALTY INC., a corporation organized and existing under the laws of
   the State of California, with its principal place of business at 789 Main
   Road, Metropolis, USA,
   (hereinafter referred to as the "Agent").

WHEREAS, the Seller desires to sell and the Buyer desires to purchase the real property
located at 789 Maple Lane, Suburbia, USA, including all improvements thereon (the "Property").

NOW, THEREFORE, in consideration of the mutual covenants contained herein and for other good
and valuable consideration, the receipt and sufficiency of which is hereby acknowledged,
the parties hereto agree as follows:

1. SALE OF PROPERTY:
   1.1 The Seller agrees to sell and the Buyer agrees to purchase the Property on the terms
       and conditions set forth in this Agreement.

2. PURCHASE PRICE:
   2.1 The total purchase price for the Property is $500,000 (the "Purchase Price").
   2.2 The Purchase Price shall be paid as follows:
       - $50,000 as an initial deposit payable upon execution of this Agreement.
       - The balance of $450,000 shall be paid on the closing date.

3. CLOSING DATE:
   3.1 The closing of this transaction shall take place on the 1st day of March, 2024.

4. REPRESENTATIONS AND WARRANTIES:
   4.1 The Seller represents and warrants that they have good and marketable title to the
       Property and have the full right and authority to sell it.

5. GOVERNING LAW:
   5.1 This Agreement shall be governed by and construed in accordance with the laws of
       the State of California.

IN WITNESS WHEREOF, the parties hereto have executed this Agreement as of the date first above written.

___________________________     ___________________________     ___________________________
John Doe (Seller)              Jane Smith (Buyer)              ABC Realty Inc. (Agent)
"""

# Process the complex legal document using the NLP model
doc_complex = nlp(legal_document_complex)

# Find matches for parties and dates using the Matcher
matches_complex = matcher(doc_complex)

# Extract parties and dates from matches
parties_involved_complex = []
contract_dates_complex = []

for match_id, start, end in matches_complex:
    match_text = doc_complex[start:end].text
    if match_id == nlp.vocab.strings["PARTY"]:
        parties_involved_complex.append(match_text)
    elif match_id == nlp.vocab.strings["DATE"]:
        contract_dates_complex.append(match_text)

# Print extracted parties and contract dates for the complex document
print("\nParties Involved (Complex Document):")
for party in parties_involved_complex:
    print(party)

print("\nContract Dates (Complex Document):")
for date in contract_dates_complex:
    print(date)



Parties Involved (Complex Document):
Real Estate
JOHN DOE
Oak Street
JANE SMITH
Elm Avenue
ABC REALTY
REALTY INC
INC.
Maple Lane
Purchase Price
WITNESS WHEREOF
John Doe
Jane Smith
ABC Realty
Realty Inc.

Contract Dates (Complex Document):
