## Importing Necessary Modules

In [1]:
# =============================================================================
# Program Title: Court Case Summarizer
# Programmers: Jewell Anne Diamante
# Date Written: October 3, 2024
# Date Revised: November 4, 2024
#
# Purpose:
#     This program processes a raw court case text file and summarizes it by
#     segmenting the text into facts, issues, and rulings using a finetuned
#     model and Latent Semantic Analysis (LSA). It provides a structured summary
#     in the output format of facts, issues, and rulings.
#
#     The program is designed to assist in automating the summarization of legal
#     documents, which is particularly useful for legal professionals and
#     researchers.
#
# Where the program fits in the general system design:
#     The program is a component in a larger system for automating the extraction
#     and summarization of legal documents. It can be integrated with other
#     components like document retrieval, legal document classification, and
#     knowledge base creation for further legal analytics.
#
# Data Structures, Algorithms, and Control:
#     - Data Structures:
#         - String (raw_text, cleaned_text, segmented_paragraph, summary)
#         - List (segmented_paragraph, predicted_labels, segmentation_output)
#     - Algorithms:
#         - Preprocessing for text cleaning and segmentation
#         - Topic Segmentation using a finetuned model
#         - Latent Semantic Analysis (LSA) for summarizing segmented text
#     - Control:
#         - Conditional checks for empty files or failed segmentation
#         - Try-except blocks for error handling
# =============================================================================

In [2]:
from Custom_Modules.TopicSegmentation import TopicSegmentation
from Custom_Modules.Preprocess import preprocess
from Custom_Modules.LSA import LSA
import os


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## Court Case Summarizer

In [3]:
def sumarize_court_case(input_file: str) -> str:
    """
    Description:
        Processes the input text to perform topic segmentation and returns a 
        structured summary. This function reads a text file, cleans and tokenizes 
        the content into paragraphs, segments the paragraphs into facts, issues, 
        and rulings using a fine-tuned model, and applies Latent Semantic Analysis 
        (LSA) to generate a summary of the segmented text.

    Parameters:
        input_file (str): The path to the input text file containing the raw text 
        for segmentation.

    Returns:
        summary (str): A summary string that includes segmented facts, issues, and 
        rulings, formatted as specified in the output structure.
    """

    # Read input text
    with open(input_file, "r", encoding="utf-8") as file:
        raw_text = file.read()

    # Preprocessing
    preprocessor = preprocess(is_training=False)
    cleaned_text = preprocessor.remove_unnecesary_char(raw_text)
    segmented_paragraph = preprocessor.segment_paragraph(cleaned_text, raw_text)

    # Topic Segmentation
    segmentation = TopicSegmentation(model_path="my_awesome_model/77")
    predicted_labels = segmentation.sequence_classification(
        segmented_paragraph, threshold=0.8
    )
    segmentation_output = segmentation.label_mapping(predicted_labels)
    
    # Latent Semantic Analysis
    lsa = LSA(segmentation_output)
    summary = lsa.create_summary()

    return summary

### Example Usage:

In [4]:
if __name__ == "__main__":
    # Path to the main folder containing all court cases
    main_folder_path = "Evaluation/Court_Cases/Unstructured"
    for folder_name in os.listdir(main_folder_path):
        folder_path = os.path.join(main_folder_path, folder_name)
        
        # Check if the path is a directory
        if os.path.isdir(folder_path):
            # Path to the 'court_case.txt' file in the current folder
            case_file_path = os.path.join(folder_path, 'court case.txt')   
            
            # Proceed if the 'court_case.txt' file exists
            if os.path.isfile(case_file_path):   
                result = sumarize_court_case(case_file_path)
                
                # Path to save the summary file
                summary_file_path = os.path.join(folder_path, 'LSATP_summary.txt')
                
                with open(summary_file_path, "w", encoding="utf-8") as file:
                    file.write(result)

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model id2label: {0: 'rulings', 1: 'facts', 2: 'issues'}
Model label2id: {'rulings': 0, 'facts': 1, 'issues': 2}
Text: Before the Court is a Complaint dated June 23, 2017 filed by Celia D. Mendoza (complainant) before the Integrated Bar of the Philippines (IBP)-Commission on Bar Discipline (CBD) against Atty. Cesar R. Santiago, Jr. (respondent) for violation of the Code of Professional Responsibility and the 2004 Rules on Notarial Practice.
Label: rulings
Probability: 0.5564050674438477


Text: The Facts
Label: facts
Probability: 0.9813336682478881


Text: Complainant claims that she is one of the heirs of Adela Espiritu-­Barlaan, who died intestate on September 4, 2010, leaving no descendant or ascendant, but with brothers and sisters. Adela Espiritu-Barlaan also left a parcel of land with an area of 247 square meters, registered under Original Certificate of Title (OCT) No. 2133 with Free Patent No. MT-007-602-94-2003 located in Pembo, Makati City (subject property).
Label: facts
Prob

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model id2label: {0: 'rulings', 1: 'facts', 2: 'issues'}
Model label2id: {'rulings': 0, 'facts': 1, 'issues': 2}
Text: Before us is a Petition for Certiorari under Rule 65 of the Rules of Court assailing the Court of Appeals Resolutions dated 9 October 2007 and 26 February 2008, in CA-G.R. SP No. 00985-MIN, for having been issued with grave abuse of discretion amounting to lack or excess of jurisdiction.
Label: rulings
Probability: 0.4422913193702698


Text: The facts as culled from the records are as follows:
Label: rulings
Probability: 0.7391182780265808


Text: Petitioner Hadja Rawiya Suib’s (Suib) husband, Saab Hadji Suib (deceased), was the owner of a parcel of land with a total area of 12.6220 hectares, located in Sapu Masla, Malapatan, Sarangani Province, covered by OCT No. P-19714, which he acquired through a duly notarized Deed of Absolute Sale from Sagap Hadji Taib on 14 December 1981.
Label: facts
Probability: 0.9228524565696716


Text: Due to alleged illegal harvesting of co

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model id2label: {0: 'rulings', 1: 'facts', 2: 'issues'}
Model label2id: {'rulings': 0, 'facts': 1, 'issues': 2}
Text: Challenged in this Petition for Review on Certiorari are the Decision and the Resolution of the Court of Appeals (CA) Cagayan de Oro City Station in CA-G.R. CV No. 04749-MIN. The impugned Decision affirmed the Order of Branch 17 of the Regional Trial Court of Davao City, dismissing the Complaint for reformation of mortgage, nullity of foreclosure, damages, and attorney’s fees with temporary restraining order and preliminary injunction filed by Lucille Odilao (petitioner) against Union Bank of the Philippines (respondent bank) and Atty. Natasha M. Go-De Mesa, the Register of Deeds of Davao City. Upon the other hand, the assailed Resolution denied petitioner’s motion for reconsideration thereof.
Label: rulings
Probability: 0.7395480871200562


Text: The facts of the case are uncomplicated.
Label: rulings
Probability: 0.6116542220115662


Text: Petitioner, represented by h

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model id2label: {0: 'rulings', 1: 'facts', 2: 'issues'}
Model label2id: {'rulings': 0, 'facts': 1, 'issues': 2}
Text: At the core of these consolidated Petitions is the propriety of the suspension of proclamation of the winning candidate and the cancellation of Certificate of Candidacy (CoC) on the grounds of lack of bona fide intention to run for public office and voter confusion because of similarity in surnames.
Label: rulings
Probability: 0.46571585536003113


Text: ANTECEDENTS
Label: facts
Probability: 0.9813336682478881


Text: In the 2022 elections, four candidates, namely, Roberto “Pinpin” T. Uy, Jr. (Roberto), Romeo “Kuya Jonjon” M. Jalosjos, Jr. (Romeo), Frederico “Kuya Jan” P. Jalosjos (Frederico), and Richard Amazon, vied for the position of Zamboanga del Norte’s first district representative.
Label: facts
Probability: 0.4819050431251526


Text: On November 16, 2021, Romeo filed a Verified Petition to declare Frederico a nuisance candidate and to cancel his CoC before the C