# LawStory AI — Judgment PDF to Multi-Scene Explainer Video (Text-Based Prototype)

This notebook documents the end-to-end AI + Automation system that converts a **court judgment PDF** into a **multi-scene explainer video** with a minimal, readable design (white text on black background). The system is designed to generate **one final video link per judgment** using a structured pipeline, without requiring manual script writing or manual video editing.

The current submitted version successfully generates a **multi-scene text-based explainer video without audio**. A voiceover-enabled version is under development using **ElevenLabs** for text-to-speech and **Cloudinary** for media asset management.

---


## 1) Problem Definition & Objective

### a. Selected project track

AI + Automation (LLM + Video Rendering Pipeline)  
This project falls under the **AI + Automation** track and combines two essential capabilities into one automated system:
1. **LLM-based legal understanding**, where an AI model interprets long, unstructured legal judgments and converts them into structured explainable content.
2. **Automated video rendering**, where the structured content is programmatically converted into a short explainer video without human editing.

The core objective is to build a system where the **only input required from the user is a judgment PDF**, and the output is a structured multi-scene video that explains the case in a clear and accessible format. This directly supports the larger product vision of **LawStory AI**, where legal information is transformed into content that is faster to consume and easier to understand.

### b. Clear problem statement

Court judgments are one of the most important sources of legal learning and legal awareness, but they are also one of the hardest formats to consume. A judgment can be dozens or even hundreds of pages long, full of procedural details, legal reasoning, citations, and dense language. For most users, the real challenge is not simply reading the PDF, but identifying and connecting the most critical components such as:
- the **facts of the case**
- the **legal issues involved**
- the **arguments from both sides**
- the **final decision**
- the **reasoning and ratio**

The problem addressed by this project is to automatically convert a judgment PDF into a short, structured, multi-scene explainer video, without manual intervention. Without such a system, users must spend hours reading the full document, depend on scattered and inconsistent summaries, or avoid judgments entirely due to complexity and time constraints.

### c. Real-world relevance and motivation

This problem has strong real-world relevance because judgments impact learning, awareness, and decision-making, but they are not easily accessible in their raw form. This system solves a practical gap that traditional approaches do not solve at scale:
- **Law students** need fast and structured revision of case law, especially for exams and internships.
- **Common people** want simplified understanding of judgments that affect rights, family disputes, property matters, criminal matters, and constitutional issues.
- **Lawyers and legal educators** benefit from structured explainers for teaching and awareness.
- **Legal content creators** need consistent, scalable production of explainers without manually reading and rewriting every case.

The motivation behind this pipeline is that legal information should not remain inaccessible due to complexity. By converting a judgment into a structured video format, the system improves comprehension speed and increases the reach of legal knowledge. This makes it a strong foundation for a product feature in LawStory AI.


## 2) Data Understanding & Preparation

### a. Dataset source (public / synthetic / collected / API)

This project does not use a fixed traditional dataset such as CSV files, labeled training datasets, or structured records. Instead, it uses **real-world live input** in the form of **court judgment PDFs**, which are provided by users as the primary input. The extracted text from these PDFs becomes the raw data used for the AI generation step.

The dataset source for this project is therefore best described as:
- **Collected / user-provided judgment PDF documents**
- Extracted unstructured text obtained through the Make.com scenario pipeline

This approach reflects real-world usage because judgments vary widely across courts, formatting, length, and complexity.

### b. Data loading and exploration

The data enters the system through Make.com using an automation pipeline. The key steps in data loading and exploration are:
- The user provides a **judgment PDF** (or a judgment PDF URL).
- The pipeline receives it through a **Webhook trigger**.
- An **HTTP module** fetches and downloads the judgment file.
- A **Custom JavaScript module** extracts the PDF into plain text.

This stage acts as the practical “exploration and validation” stage. It ensures:
- the PDF was fetched correctly
- the extracted text is readable and complete enough for summarization
- the extracted content contains key legal context required for video generation

At this stage, the extracted judgment text must contain enough detail for the LLM to identify and generate structured information including:
- facts
- issues
- arguments from both sides
- decision and ratio/reasoning

### c. Cleaning, preprocessing, feature engineering

This project does not involve numeric feature engineering because it is not a model-training project. Instead, the preprocessing is designed to make unstructured legal text usable for an LLM and automation pipeline.

The key preprocessing steps are:
- ensuring extracted text is passed as one continuous input to Gemini
- reducing failure points caused by formatting artifacts such as line breaks and spacing
- structuring the output into a predictable JSON schema that Make can parse reliably

The most important transformation performed is:
**raw judgment text → structured multi-scene video script JSON**

This JSON contains multiple scenes, and each scene includes:
- narration
- duration_seconds
- visual instructions

### d. Handling missing values or noise (if applicable)

Legal PDF text extraction can contain noise and inconsistencies, including:
- irregular line breaks
- repeated headers/footers
- inconsistent spacing and formatting
- scanned PDFs with weaker extraction quality
- missing explicit section headings

This was handled practically by using Gemini to produce clean structured JSON output, rather than relying on the raw extracted formatting. The system also depends on strict JSON formatting so the output remains parseable and usable downstream in Make.


In [None]:
# Sample extracted text snippet (demo placeholder)
# In the actual pipeline, this text comes from the PDF -> text extraction step in Make.com.

judgment_extracted_text = """
IN THE HIGH COURT OF DELHI AT NEW DELHI
Party A v. Party B

The dispute concerns residence rights in a property claimed as a shared household.
The court examined ownership documents, contributions, and the effect of divorce on residence rights.
The appeal was dismissed.
"""

print(judgment_extracted_text)


## 3) Model / System Design

### a. AI technique used (ML / DL / NLP / LLM / Recommendation / Hybrid)

This project uses an LLM-based NLP generation approach combined with a video rendering system. The AI technique is primarily:
- **LLM-based NLP generation**, where the judgment text is interpreted and transformed into structured scenes.
- **Automation + rendering**, where the structured output is converted into a video timeline and rendered automatically.

The system uses:
- **Google Gemini** for script generation in a structured multi-scene JSON format.
- **Shotstack API** for rendering the final video and returning a hosted output link.

This makes the overall system a Hybrid AI + Media Rendering Automation pipeline.

### b. Architecture or pipeline explanation

The final working pipeline generates a multi-scene video with a single final output URL, without audio. The steps are as follows:

1. **Webhook Trigger**  
   Receives the judgment PDF URL/request from Bubble.

2. **HTTP Module**  
   Downloads and fetches the judgment PDF document.

3. **Custom JavaScript Module**  
   Extracts the PDF and converts it into plain text.

4. **Gemini Module**  
   Converts the extracted judgment text into a structured JSON video script.  
   The output format includes:
   - title_frame metadata (court name, case title, year, citation, coram)
   - scenes array with multiple scenes

5. **Parse JSON Module**  
   Parses the Gemini output so Make can map individual fields such as narration, duration, and visuals.

6. **Iterator**  
   Iterates over the scenes array and produces one bundle per scene.

7. **Text Aggregator**  
   Combines all iterated bundles into one unified clips array for Shotstack, ensuring all scenes are included in a single render timeline.

8. **Shotstack HTTP POST**  
   Sends one render request containing all clips and receives a render ID.

9. **Shotstack HTTP GET**  
   Polls the render status until completion and returns the final video URL.

This design ensures the output is not fragmented into multiple videos. Instead, the system generates one complete explainer video containing all scenes sequentially.

### c. Justification of design choices

This design was chosen because it provides the most stable and scalable automation workflow:
- Legal judgments are unstructured, so Gemini is essential to interpret and restructure them into usable scenes.
- JSON output is required because Make needs predictable keys for mapping and automation reliability.
- Iterator and aggregator are necessary because the system produces multiple scenes, but Shotstack requires a single combined timeline to render one final video.
- Shotstack is suitable because it supports programmatic video rendering and returns a hosted output link automatically, making the pipeline fully automated.


## 4) Core Implementation

### a. Model training / inference logic

No model training was performed in this project. The system is inference-only, meaning:
- Gemini generates a structured script directly from extracted judgment text.
- Shotstack renders the video directly from the structured clip timeline.

This makes the pipeline lightweight, fast to iterate, and suitable for real product workflows.

### b. Prompt engineering (for LLM-based projects)

Prompt engineering is a key component of making the system stable. Gemini must produce output that is machine-readable and consistent, because the entire automation pipeline depends on it.

The prompt was engineered to enforce:
- output must be valid JSON only
- no markdown or formatting wrappers
- scenes must be an array
- each scene must contain required keys: scene_number, duration_seconds, narration, visual

This was critical because earlier failures occurred when Gemini returned:
- extra text before/after JSON
- markdown code fences
- null values or unsupported keys

By enforcing strict output rules, the system became stable and parseable within Make.

### c. Recommendation or prediction pipeline

Not applicable. This project is not a recommendation or prediction system. It is a generation pipeline that converts:
judgment PDF text → structured scenes → video timeline → final video URL.

### d. Code must run top-to-bottom without errors

In the working submitted version, the pipeline runs end-to-end without errors:
- Gemini output is parsed correctly
- scenes are iterated properly
- the aggregator produces a combined clips array
- Shotstack POST succeeds
- Shotstack GET returns status done and provides the final video link


In [None]:
import json

# Sample Gemini output structure (demo only)
sample_gemini_output = {
  "title_frame": {
    "court": "High Court of Delhi at New Delhi",
    "case_title": "Party A v. Party B",
    "year": "2025",
    "citation": "Sample Citation",
    "coram": "Sample Coram"
  },
  "scenes": [
    {
      "scene_number": 1,
      "duration_seconds": 15,
      "narration": "This case explains whether Party A can continue living in a property claimed as a shared household.",
      "visual": "White text on black background: Scene 1 overview"
    },
    {
      "scene_number": 2,
      "duration_seconds": 20,
      "narration": "Party A argued residence rights, while Party B relied on ownership documents and sought possession.",
      "visual": "White text on black background: Scene 2 arguments"
    },
    {
      "scene_number": 3,
      "duration_seconds": 25,
      "narration": "The court focused on ownership proof and whether a domestic relationship existed after divorce.",
      "visual": "White text on black background: Scene 3 reasoning"
    }
  ]
}

print(json.dumps(sample_gemini_output, indent=2))


## 5) Evaluation & Analysis

### a. Metrics used (quantitative or qualitative)

This system was evaluated using functional and qualitative metrics that directly reflect whether the automation achieves its goal:

Functional success metrics:
- Shotstack returns status done
- a final output video URL is generated successfully

Scene correctness metrics:
- the final video contains multiple scenes
- scenes appear in correct order
- the output is one combined video link rather than multiple fragmented links

Timing correctness metrics:
- each scene duration matches duration_seconds
- start times accumulate properly (0, 5, 15, 27...) and scenes do not overlap incorrectly

Content quality metrics:
- narration remains neutral and educational
- narration covers facts, issues, arguments, and reasoning
- language is simplified for accessibility

### b. Sample outputs / predictions

The pipeline produces:
- one Shotstack render ID
- one final MP4 URL as output
- a multi-scene text-based explainer video where each scene appears sequentially
- consistent visual format (white text on black background)

### c. Performance analysis and limitations

Strengths:
- fully automated end-to-end pipeline
- multi-scene output works correctly
- one final video link per judgment
- scalable workflow for repeated use

Limitations in the submitted version:
- audio is not added yet
- visuals are minimal and text-based
- output quality depends on Gemini summarization consistency
- complex judgments may require additional prompt refinement for best scene structuring


In [None]:
# Example Shotstack response (demo only)
sample_shotstack_response = {
  "render_id": "render_1234567890",
  "status": "done",
  "final_video_url": "https://example.com/final_video.mp4"
}

import json
print(json.dumps(sample_shotstack_response, indent=2))


## 6) Ethical Considerations & Responsible AI

### a. Bias and fairness considerations

Legal summarization can unintentionally introduce bias if the summary:
- emphasizes one party’s narrative more than the other
- omits key reasoning that affects interpretation
- oversimplifies disputes in a misleading way

This risk is reduced by:
- keeping narration neutral and factual
- using balanced explanations of both sides
- avoiding emotional or accusatory language
- using neutral labels like Party A and Party B instead of assuming relationships

### b. Dataset limitations

This project uses live judgment PDFs rather than a fixed dataset. Therefore, input quality varies widely:
- scanned PDFs may produce weaker extraction quality
- formatting differences across courts reduce consistency
- some judgments may omit structured headings and require deeper inference

### c. Responsible use of AI tools

The system is designed for educational understanding and simplified explanation. It should not be treated as a replacement for reading the original judgment or for professional legal interpretation. The output should be used as a learning aid and a quick explainer rather than as legal advice.


## 7) Conclusion & Future Scope

### a. Summary of results

We successfully built a working Make.com automation pipeline that:
- takes a judgment PDF as input
- extracts text automatically
- uses Gemini to generate a structured multi-scene script
- iterates over the scenes and aggregates them into one Shotstack timeline
- generates one final video URL containing multiple scenes in sequence

The current submitted version works end-to-end without audio, and successfully produces a multi-scene explainer video output with a single final link.

### b. Possible improvements and extensions

Future improvements planned after this stage include:
- adding voiceover using ElevenLabs (text-to-speech)
- enabling multilingual audio generation for different languages
- adding translation features so judgments in any language can be converted into explainer videos in the same language
- adding background visuals or images per scene for a more animated feel
- adding subtitles and improved typography
- adding background music at low volume
- implementing automatic polling for Shotstack status instead of manual reruns
- adding a fallback JSON repair step if Gemini output fails formatting
- storing the final video URL back into Bubble database for the LawStory AI interface
