## Importing Necessary Modules

In [1]:
# =============================================================================
# Program Title: Court Case Summarizer
# Programmers: Jewell Anne Diamante
# Date Written: October 3, 2024
# Date Revised: November 4, 2024
#
# Purpose:
#     This program processes a raw court case text file and summarizes it by
#     segmenting the text into facts, issues, and rulings using a finetuned
#     model and Latent Semantic Analysis (LSA). It provides a structured summary
#     in the output format of facts, issues, and rulings.
#
#     The program is designed to assist in automating the summarization of legal
#     documents, which is particularly useful for legal professionals and
#     researchers.
#
# Where the program fits in the general system design:
#     The program is a component in a larger system for automating the extraction
#     and summarization of legal documents. It can be integrated with other
#     components like document retrieval, legal document classification, and
#     knowledge base creation for further legal analytics.
#
# Data Structures, Algorithms, and Control:
#     - Data Structures:
#         - String (raw_text, cleaned_text, segmented_paragraph, summary)
#         - List (segmented_paragraph, predicted_labels, segmentation_output)
#     - Algorithms:
#         - Preprocessing for text cleaning and segmentation
#         - Topic Segmentation using a finetuned model
#         - Latent Semantic Analysis (LSA) for summarizing segmented text
#     - Control:
#         - Conditional checks for empty files or failed segmentation
#         - Try-except blocks for error handling
# =============================================================================

In [2]:
from Custom_Modules.TopicSegmentation import TopicSegmentation
from Custom_Modules.Preprocess import preprocess
from Custom_Modules.LSA import LSA
import os


[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## Court Case Summarizer

In [None]:
def sumarize_court_case(input_file: str) -> str:
    """
    Description:
        Processes the input text to perform topic segmentation and returns a 
        structured summary. This function reads a text file, cleans and tokenizes 
        the content into paragraphs, segments the paragraphs into facts, issues, 
        and rulings using a fine-tuned model, and applies Latent Semantic Analysis 
        (LSA) to generate a summary of the segmented text.

    Parameters:
        input_file (str): The path to the input text file containing the raw text 
        for segmentation.

    Returns:
        summary (str): A summary string that includes segmented facts, issues, and 
        rulings, formatted as specified in the output structure.
    """
    preprocessor = preprocess(is_training=False)
    
    # Read input text
    with open(input_file, "r", encoding="utf-8") as file:
        raw_text = file.read()
        raw_text = preprocessor.merge_numbered_lines(raw_text)
    
    # Preprocessing
    cleaned_text = preprocessor.remove_unnecesary_char(raw_text)
    segmented_paragraph = preprocessor.segment_paragraph(cleaned_text, raw_text)

    # Topic Segmentation
    segmentation = TopicSegmentation(model_path="my_awesome_model/77")
    predicted_labels = segmentation.sequence_classification(
        segmented_paragraph, threshold=0.8
    )
    segmentation_output = segmentation.label_mapping(predicted_labels)
    
    # Latent Semantic Analysis
    lsa = LSA(segmentation_output)
    summary = lsa.create_summary()

    return summary

### Example Usage:

In [4]:
if __name__ == "__main__":
    # Path to the main folder containing all court cases
    main_folder_path = "Evaluation/Court_Cases/Unstructured"
    for folder_name in os.listdir(main_folder_path):
        folder_path = os.path.join(main_folder_path, folder_name)
        
        # Check if the path is a directory
        if os.path.isdir(folder_path):
            # Path to the 'court_case.txt' file in the current folder
            case_file_path = os.path.join(folder_path, 'court case.txt')   
            
            # Proceed if the 'court_case.txt' file exists
            if os.path.isfile(case_file_path):   
                result = sumarize_court_case(case_file_path)
                
                # Path to save the summary file
                summary_file_path = os.path.join(folder_path, 'LSATP_summary.txt')
                
                with open(summary_file_path, "w", encoding="utf-8") as file:
                    file.write(result)

An evening intended to be a relaxing night out between two friends, herein accused-appellant Mario Rivera and his erstwhile co-accused Venancio Mercado, Jr., provided a tragic tableau for the senseless killing of an unwitting victim and the conviction of appellant for murder.  The case did not even have the saving grace of the inscrutability of fate; it was but another mundane episode involving the admixture of bravado and alcohol.
Appellant Mario Rivera and Venancio Mercado, Jr. were charged before the Regional Trial Court, Branch VIII at Aparri, Cagayan, with the crime of murder in an information alleging the commission thereof as follows: “That on or about October 19, 1989 in the municipality of Aparri, province of Cagayan, and within the jurisdiction of this Honorable Court, the said accused, Mario Rivera and Venancio Mercado, Jr., armed with a sharp pointed instrument, conspiring together and helping each other, with intent to kill, with evident premeditation and with treachery, d

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model id2label: {0: 'rulings', 1: 'facts', 2: 'issues'}
Model label2id: {'rulings': 0, 'facts': 1, 'issues': 2}
Text: An evening intended to be a relaxing night out between two friends, herein accused-appellant Mario Rivera and his erstwhile co-accused Venancio Mercado, Jr., provided a tragic tableau for the senseless killing of an unwitting victim and the conviction of appellant for murder.  The case did not even have the saving grace of the inscrutability of fate; it was but another mundane episode involving the admixture of bravado and alcohol.
Label: rulings
Probability: 0.5158836245536804


Text: Appellant Mario Rivera and Venancio Mercado, Jr. were charged before the Regional Trial Court, Branch VIII at Aparri, Cagayan, with the crime of murder in an information alleging the commission thereof as follows: “That on or about October 19, 1989 in the municipality of Aparri, province of Cagayan, and within the jurisdiction of this Honorable Court, the said accused, Mario Rivera and Ve

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model id2label: {0: 'rulings', 1: 'facts', 2: 'issues'}
Model label2id: {'rulings': 0, 'facts': 1, 'issues': 2}
Text: Facts
Label: facts
Probability: 0.9813336682478881


Text: This is a Petition for Review on Certiorari (Petition) filed under Rule 45 of the Rules of Court by petitioners National Commission on Indigenous Peoples (NCIP), Zenaida Brigida Hamada-Pawid, Dionesia O. Banua, Conchita C. Calzado, Percy Brawner, Cosme Lambayon, Santos Unsad, and Basilio Wandag (collectively, petitioners) against respondent Macroasia Corporation (Macroasia), seeking to reverse and set aside the Amended Decision dated March 14, 2016 and the Resolution dated August 9, 2016, and to reinstate the Decision dated April 22, 2015, all promulgated by the Court of Appeals (CA) in the case docketed as CA-G.R. SP No. 124632.
Label: facts
Probability: 0.5987400412559509


Text: Macroasia filed a Joint Motion to Render Judgment Based on Compromise Agreement dated February 21, 2023, manifesting that the partie

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model id2label: {0: 'rulings', 1: 'facts', 2: 'issues'}
Model label2id: {'rulings': 0, 'facts': 1, 'issues': 2}
Text: At the core of these consolidated Petitions is the propriety of the suspension of proclamation of the winning candidate and the cancellation of Certificate of Candidacy (CoC) on the grounds of lack of bona fide intention to run for public office and voter confusion because of similarity in surnames.
Label: rulings
Probability: 0.432056725025177


Text: ANTECEDENTS
Label: facts
Probability: 0.9813336682478881


Text: In the 2022 elections, four candidates, namely, Roberto “Pinpin” T. Uy, Jr. (Roberto), Romeo “Kuya Jonjon” M. Jalosjos, Jr. (Romeo), Frederico “Kuya Jan” P. Jalosjos (Frederico), and Richard Amazon, vied for the position of Zamboanga del Norte’s first district representative.
Label: facts
Probability: 0.4701055586338043


Text: On November 16, 2021, Romeo filed a Verified Petition to declare Frederico a nuisance candidate and to cancel his CoC before the Com

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model id2label: {0: 'rulings', 1: 'facts', 2: 'issues'}
Model label2id: {'rulings': 0, 'facts': 1, 'issues': 2}
Text: Petition for certiorari against the order of the Court of First Instance of Manila dated 13 May 1966 (in Criminal Case No. 82116) granting the petition for bail of therein accused Jose Simborio y Salonga on a P20,000.00 bond.
Label: rulings
Probability: 0.4675990343093872


Text: In an information filed on 19 April 1966, Jose Simborio y Salonga was charged with the crime of murder for the fatal shooting of Avelino Concepcion, Jr. in the evening of 11 March 1966.  It was there alleged that the accused, in conspiracy with Marmolito Catelo y Rivera (alias Sonny Catelo) and others whose identity and whereabouts were unknown, shot Avelino Concepcion, Jr. with a gun, wounding the latter in the abdomen which directly and immediately caused his death (on 14 March 1966).  The offense was said to have been attended by the circumstances of premeditation, treachery, abuse of superi

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/77 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Model id2label: {0: 'rulings', 1: 'facts', 2: 'issues'}
Model label2id: {'rulings': 0, 'facts': 1, 'issues': 2}
Text: This is a petition for review on certiorari under Rule 45 of the Rules of Court seeking the reversal of the respondent Court of Appeals’ decision of 2 November 1978 in C.A.-G.R. No. SP-07822-R, the dispositive portion of which provides:
Label: rulings
Probability: 0.574847400188446


Text: “PREMISES CONSIDERED, the order of respondent Judge dated January 25, 1978 is SET ASIDE and respondent sheriff Jaime de Leon is hereby ordered to restore petitioner Corazon Babao Gonzales to the possession of the questioned premises No. 1265 Calle Sande, Tondo, Manila upon the filing by the latter of a bond in the amount of P10,000.00 for any and all damages which the Heirs of Eugenio Sevilla, Inc. may suffer. SO ORDERED.”
Label: rulings
Probability: 0.5235801339149475


Text: The facts of the case are not disputed by the parties.
Label: rulings
Probability: 0.6827983856201172


Text: