## Importing Necessary Modules

In [1]:
from Custom_Modules.TopicSegmentation import TopicSegmentation
from Custom_Modules.Preprocess import preprocess
from Custom_Modules.LSA import LSA

[nltk_data] Downloading package stopwords to
[nltk_data]     C:\Users\PC\AppData\Roaming\nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


## Court Case Summarizer

In [2]:
def sumarize_court_case(input_file: str) -> str:
    """
    Description:
        Processes the input text to perform topic segmentation and returns a structured summary.
        This function reads a text file, cleans and tokenizes the content into paragraphs, 
        segments the paragraphs into facts, issues, and rulings using a fine-tuned model, 
        and applies Latent Semantic Analysis (LSA) to generate a summary of the segmented text.

    Parameters:
        input_file (str): The path to the input text file containing the raw text for segmentation.

    Returns:
        summary (str): A summary string that includes segmented facts, issues, and rulings, 
             formatted as specified in the output structure.
    """
    
    # Read input text
    with open(input_file, 'r', encoding='utf-8') as file:
        raw_text = file.read()

    # Preprocessing
    preprocessor = preprocess(is_training=False)
    cleaned_text = preprocessor.remove_unnecesary_char(raw_text)
    segmented_paragraph = preprocessor.segment_paragraph(cleaned_text)

    # Topic Segmentation
    segmentation = TopicSegmentation(model_path='my_awesome_model/70')
    predicted_labels = segmentation.sequence_classification(segmented_paragraph, threshold=0.45)
    segmentation_output = segmentation.label_mapping(predicted_labels)

    # Write the segmented paragraphs into an output file (if needed)
    segmentation.write_output_segments(predicted_labels, output_file="output_segments.txt")

    # Latent Semantic Analysis
    lsa = LSA(segmentation_output)
    summary = lsa.create_summary()
    
    return summary

### Example Usage:

In [3]:
if __name__ == "__main__":
    result = sumarize_court_case('input.txt')
    print(result)

Some weights of BartForSequenceClassification were not initialized from the model checkpoint at my_awesome_model/70 and are newly initialized because the shapes did not match:
- model.decoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
- model.encoder.embed_positions.weight: found shape torch.Size([1026, 768]) in the checkpoint and torch.Size([130, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Text: This case is about the proof required to establish the domicile of a reinstated Filipino citizen who seeks election as governor of a province.
Label: rulings
Probability: 0.4551340639591217


Text: The Facts and the Case
Label: rulings
Probability: 0.407451331615448


Text: Petitioner Rommel Jalosjos was born in Quezon City on October 26 1973. He migrated to Australia in 1981 when he was eight years old and there acquired Australian citizenship. On November 22 2008 at age 35 he decided to return to the Philippines and lived with his brother Romeo Jr. in Barangay Veteran s Village Ipil Zamboanga Sibugay. Four days upon his return he took an oath of allegiance to the Republic of the Philippines resulting in his being issued a Certificate of Reacquisition of Philippine Citizenship by the Bureau of Immigration. On September 1 2009 he renounced his Australian citizenship executing a sworn renunciation of the same in compliance with Republic Act 9225.
Label: rulings
Probability: 0.7759