## Importing Necessary Modules

In [1]:
# =============================================================================
# Program Title: Court Case Summarizer
# Programmers: Jewell Anne Diamante
# Date Written: October 3, 2024
# Date Revised: November 4, 2024
#
# Purpose:
#     This program processes a raw court case text file and summarizes it by
#     segmenting the text into facts, issues, and rulings using a finetuned
#     model and Latent Semantic Analysis (LSA). It provides a structured summary
#     in the output format of facts, issues, and rulings.
#
#     The program is designed to assist in automating the summarization of legal
#     documents, which is particularly useful for legal professionals and
#     researchers.
#
# Where the program fits in the general system design:
#     The program is a component in a larger system for automating the extraction
#     and summarization of legal documents. It can be integrated with other
#     components like document retrieval, legal document classification, and
#     knowledge base creation for further legal analytics.
#
# Data Structures, Algorithms, and Control:
#     - Data Structures:
#         - String (raw_text, cleaned_text, segmented_paragraph, summary)
#         - List (segmented_paragraph, predicted_labels, segmentation_output)
#     - Algorithms:
#         - Preprocessing for text cleaning and segmentation
#         - Topic Segmentation using a finetuned model
#         - Latent Semantic Analysis (LSA) for summarizing segmented text
#     - Control:
#         - Conditional checks for empty files or failed segmentation
#         - Try-except blocks for error handling
# =============================================================================

In [2]:
from Custom_Modules.TopicSegmentation import TopicSegmentation
from Custom_Modules.Preprocess import preprocess
from Custom_Modules.LSA import LSA
import os


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## Court Case Summarizer

In [3]:
def sumarize_court_case(input_file: str) -> str:
    """
    Description:
        Processes the input text to perform topic segmentation and returns a 
        structured summary. This function reads a text file, cleans and tokenizes 
        the content into paragraphs, segments the paragraphs into facts, issues, 
        and rulings using a fine-tuned model, and applies Latent Semantic Analysis 
        (LSA) to generate a summary of the segmented text.

    Parameters:
        input_file (str): The path to the input text file containing the raw text 
        for segmentation.

    Returns:
        summary (str): A summary string that includes segmented facts, issues, and 
        rulings, formatted as specified in the output structure.
    """

    # Read input text
    with open(input_file, "r", encoding="utf-8") as file:
        raw_text = file.read()

    # Preprocessing
    preprocessor = preprocess(is_training=False)
    cleaned_text = preprocessor.remove_unnecesary_char(raw_text)
    segmented_paragraph = preprocessor.segment_paragraph(cleaned_text, raw_text)

    # Topic Segmentation
    segmentation = TopicSegmentation(model_path="my_awesome_model/77")
    predicted_labels = segmentation.sequence_classification(
        segmented_paragraph
    )
    segmentation_output = segmentation.label_mapping(predicted_labels)

    # Write the segmented paragraphs into an output file (if needed)
    main_folder_path = "Evaluation/Court_Cases"
    for folder_name in os.listdir(main_folder_path):
        folder_path = os.path.join(main_folder_path, folder_name)
        # Check if the path is a directory
        if os.path.isdir(folder_path):
            # Path to the 'court_case.txt' file in the current folder
            segmentation_file_path = os.path.join(folder_path, 'output_segments.txt')   
            segmentation.write_output_segments(
                predicted_labels, output_file=segmentation_file_path
            )

    # Latent Semantic Analysis
    lsa = LSA(segmentation_output)
    summary = lsa.create_summary()

    return summary

### Example Usage:

In [4]:
if __name__ == "__main__":
    # Path to the main folder containing all court cases
    main_folder_path = "Evaluation/Court_Cases"
    for folder_name in os.listdir(main_folder_path):
        folder_path = os.path.join(main_folder_path, folder_name)
        
        # Check if the path is a directory
        if os.path.isdir(folder_path):
            # Path to the 'court_case.txt' file in the current folder
            case_file_path = os.path.join(folder_path, 'court case.txt')   
            
            # Proceed if the 'court_case.txt' file exists
            if os.path.isfile(case_file_path):   
                result = sumarize_court_case(case_file_path)
                
                # Path to save the summary file
                summary_file_path = os.path.join(folder_path, 'LSATP_summary.txt')
                
                with open(summary_file_path, "w", encoding="utf-8") as file:
                    file.write(result)

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Text: This case is about the need for the prosecution and all law enforcement agencies involved in illegal drugs operations to ensure proper observance of the rules governing entrapment of peddlers of prohibited substances.
Label: rulings
Probability: 0.8104531168937683


Text: The Facts and the Case
Label: rulings
Probability: 0.5121452212333679


Text: The City Prosecutor of Manila charged the accused Luis Pajarin and Efren Pallaya before the Regional Trial Court (RTC) of Manila in Criminal Cases 05-237756 and 05-237757 with violation of Section 5 in relation to Sections 26 and 11 (3) in relation to Section 13, respectively, of Article II of Republic Act (R.A.) 9165 or the Comprehensive Dangerous Drugs Act of 2002.
Label: facts
Probability: 0.7354584336280823


Text: The prosecution presented PO2 Nestor Lehetemas, member of the buy-bust team and PO2 James Nolan Ibañez, the poseur-buyer. They testified that on June 1, 2005 at around 10:00 p.m., an informant arrived at their Station An

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Text: This case is about the proof required to establish the domicile of a reinstated Filipino citizen who seeks election as governor of a province.
Label: rulings
Probability: 0.7019478678703308


Text: The Facts and the Case
Label: rulings
Probability: 0.40469300746917725


Text: Petitioner Rommel Jalosjos was born in Quezon City on October 26, 1973. He migrated to Australia in 1981 when he was eight years old and there acquired Australian citizenship. On November 22, 2008, at age 35, he decided to return to the Philippines and lived with his brother, Romeo, Jr., in Barangay Veteran’s Village, Ipil, Zamboanga Sibugay. Four days upon his return, he took an oath of allegiance to the Republic of the Philippines, resulting in his being issued a Certificate of Reacquisition of Philippine Citizenship by the Bureau of Immigration. On September 1, 2009 he renounced his Australian citizenship, executing a sworn renunciation of the same in compliance with Republic Act (R.A.) 9225.
Label: facts

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Text: Assailed in these consolidated petitions for review on certiorari are the Decision dated October 29, 2010 and the Resolution dated March 11, 2011 of the Court of Appeals (CA) in CA-G.R. CV No. 92765 which affirmed the Decision dated November 18, 2008 of the Regional Trial Court (RTC) of Legazpi City, Branch 7 in Civil Case No. 9033 for the annulment of the Deed of Sale dated August 23, 1962 executed in favor of petitioner Caridad Rodrigueza, and of Transfer Certificates of Title (TCT) Nos. 40467, 40468 and 40469 issued by the Registry of Deeds of Legazpi City.
Label: rulings
Probability: 0.4245217442512512


Text: The Facts
Label: rulings
Probability: 0.5637705326080322


Text: Respondent Domingo Alibin (Domingo) owned an undivided one-half  portion of Lot No. 1680 (subject lot) containing an aggregate area of 9,188 square meters, situated at Tahao, Legazpi City, Albay, and registered in his name and that of Mariano Rodrigueza (Mariano) under Original Certificate of Title (OCT) N

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Text: Before the Court is a petition for review on certiorari assailing the January 11, 2011 Decision and April 14, 2011 Resolution of the Court of Appeals, Cebu City (CA) in CA-G.R. SP No. 03888 which declared respondent Marcosa A. Sabandal-Herzenstiel (Sabandal-Herzenstiel) as the lawful possessor of Lot No. 2574, situated in Brgy. Basdiot, Moalboal, Cebu (subject property).
Label: rulings
Probability: 0.4270990192890167


Text: The Facts
Label: rulings
Probability: 0.5390087962150574


Text: Petitioner Philippine Tourism Authority (now Tourism Infrastructure and Enterprise Zone Authority) (petitioner) is the owner of the subject property and other parcels of land located in Brgy. Basdiot, Moalboal, Cebu since February 12, 1981 when it bought the same from Tri-Island Corporate Holdings, Inc. (Tri-Island). It had then been in actual, physical, continuous, and uninterrupted possession of the subject property and had declared the same for taxation purposes.  Sometime in 1997, however, r

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Text: Before this Court are consolidated petitions for review on certiorari assailing the Decision dated December 19, 2012 and the Resolution dated August 8, 2013 of the Court of Appeals (CA) in CA-G.R. CEB-CV No. 03791, which affirmed the Order dated September 22, 2009 of the Regional Trial Court of Cebu City, Branch 6 (RTC) in Civil Case No. CEB-34012 finding the Province of Cebu liable to pay WT Construction, Inc. (WTCI) the amount of P257,413,911.73, but reduced the legal interest rate imposable thereon from 12% to 6% per annum.
Label: rulings
Probability: 0.39035654067993164


Text: The Facts
Label: rulings
Probability: 0.5481042265892029


Text: Sometime in 2005, the Province of Cebu was chosen by former President Gloria Macapagal-Arroyo to host the 12th Association of Southeast Asian Nations (ASEAN) Summit scheduled on December 10, 2006. To cater to the event, it decided to construct the Cebu International Convention Center (CICC or the project) at the New Mandaue Reclamation Ar

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Text: Before the Court are consolidated petitions for review on certiorari assailing the Decision dated March 21, 2013 and the Resolution dated September 12, 2013 of the Court of Appeals in CA-G.R. CV No. 94337, which affirmed the Decision dated November 5, 2008 of the Regional Trial Court (RTC) of Quezon City, Branch 225 (RTC Branch 225) in Civil Case No. Q-98-34627 declaring the marriage of Reghis M. Romero II (Reghis) and Olivia Lagman Romero (Olivia) null and void ab initio on the ground of psychological incapacity pursuant to Article 36 of the Family Code of the Philippines (Family Code), as amended.
Label: facts
Probability: 0.4398100674152374


Text: The Facts
Label: rulings
Probability: 0.5772156715393066


Text: Reghis and Olivia were married on May 11, 1972 at the Mary the Queen Parish in San Juan City and were blessed with two (2) children, namely, Michael and Nathaniel, born in 1973 and 1975, respectively. The couple first met in Baguio City in 1971 when Reghis helped Olivi

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Text: Assailed in these consolidated petitions for review on certiorari are the Decision dated July 18, 2013 and the Resolution dated March 10, 2014 of the Court of Appeals (CA) in CA-G.R. SP No. 127219, which set aside the Orders dated July 16, 2012 and September 25, 2012 issued by the Regional Trial Court of Pasig City, Branch 160 (RTC) in LRC Case No. R-7509, excluding the petitioners in these cases from the implementation of the writ of possession in favor of respondent Planters Development Bank (Plantersbank).
Label: rulings
Probability: 0.40743494033813477


Text: The Facts
Label: rulings
Probability: 0.5812227129936218


Text: Plantersbank was the mortgagee of nineteen (19) parcels of land situated in San Juan, Metro Manila (subject properties), covered by Transfer Certificates of Title (TCT) Nos. 11057-R to 11075-R, under a Mortgage dated February 28, 2003 executed by the borrower-mortgagor, Kwong-on Trading Corporation (KTC), to secure a P14,000,000.00 loan. KTC defaulted in t

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Text: Assailed in this petition for review on certiorari are the Decision dated November 24, 2014 and the Resolution dated May 29, 2015 of the Court of Appeals (CA) in CA-G.R. CR No. 35293, which upheld the Decision dated September 6, 2012 of the Regional Trial Court of Tayug, Pangasinan, Branch 52 (RTC) in Criminal Case Nos. T-5144 and T-5145, finding petitioner Christopher Fianza a.k.a. “Topel” (Fianza) guilty beyond reasonable doubt of two (2) counts of violation of Section 5 (b), Article III of Republic Act No. (RA) 7610, otherwise known as the “Special Protection of Children Against Abuse, Exploitation and Discrimination Act.”
Label: rulings
Probability: 0.4721413552761078


Text: The Facts
Label: rulings
Probability: 0.5525526404380798


Text: Fianza was charged with two (2) counts of violation of Section 5 (b), Article III of RA 7610 under two (2) Informations dated April 6, 2011 filed before the RTC. The prosecution’s version of the incidents are as follows:
Label: rulings
Prob

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Text: Accused-appellant Marlon Belmonte y Sumagit assails the Decision dated April 22, 2014 of the Court of Appeals (CA) in CA-G.R. CR-HC No. 05774, affirming his conviction for Robbery with Rape in Criminal Case No. 135982-H.
Label: facts
Probability: 0.5235986113548279


Text: The Facts
Label: rulings
Probability: 0.582389235496521


Text: Accused-appellant and his co-accused, namely, Marvin Belmonte (Marvin), Enrile Gabay (Enrile), and Noel Baac (Noel) were charged with Robbery with Rape in an Information dated September 3, 2007 that reads:
Label: facts
Probability: 0.779998242855072


Text: The Prosecution, through the undersigned Public Prosecutor, charges Marlon Belmonte y Sumagit, Marvin Belmonte y Sumagit and Enrile Gabay y Dela Torre @ “Puno” with the crime of robbery with rape, committed as follows:
Label: facts
Probability: 0.6529921889305115


Text: On or about September 1, 2007, in Pasig City and within the jurisdiction of this Honorable Court, the above accused, armed wit

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Text: Assailed in this petition for review on certiorari are the Resolutions dated February 7, 2019 and July 8, 2019 of the Court of Appeals (CA) in CA-G.R. CR-HC No. 09062 which denied the motion of petitioner Fredierose Tamboa y Laday (petitioner) to recall entry of judgment and to reinstate her appeal seeking review of the Judgment dated January 24, 2017 of the Regional Trial Court of Sanchez Mira, Cagayan, Branch 12 (RTC) in Criminal Case No. 3712-S-5, finding her guilty beyond reasonable doubt of violation of Section 5, Article II of Republic Act No. (RA) 9165, otherwise known as the “Comprehensive Dangerous Drugs Act of 2002.”
Label: issues
Probability: 0.4078315198421478


Text: The Facts
Label: rulings
Probability: 0.5608649253845215


Text: The instant case stemmed from an Information filed before the RTC charging petitioner with the crime of Illegal Sale of Dangerous Drugs, defined and penalized under Section 5, Article II of RA 9165. The prosecution alleged that in the morni