# LawStory AI — Judgment PDF to Multi-Scene Explainer Video (Text-Based Prototype)

**Primary Goal:** Automatically convert a **court judgment PDF** into a short **multi-scene explainer video** (white text on black background) without manual editing.  
**Submitted version:** Works end-to-end **without audio** (multi-scene video generation).  
**Audio version (in progress):** Planned voiceover using **ElevenLabs** and asset handling via **Cloudinary**.

---


## 1. Problem Definition & Objective

### a. Selected project track
**AI + Automation (LLM + Video Rendering Pipeline)**  
This project is built under the AI + Automation track, combining LLM-based legal understanding with an automated video rendering pipeline. The system takes a **judgment PDF as the only input**, extracts key legal information, and converts it into a structured multi-scene explainer video. The process is designed to work without manual editing, manual scriptwriting, or manual video creation.

### b. Clear problem statement
Legal judgments are long, technical, and difficult to consume quickly. Even with access to the judgment PDF, it takes significant time and effort to identify the **facts, legal issues, arguments from both sides, and the final reasoning of the court**. The problem solved here is: **how to automatically convert a judgment PDF into an easy-to-consume, structured multi-scene video without manual intervention.** Without such a system, users must spend hours reading or depend on inconsistent external summaries.

### c. Real-world relevance and motivation
- Law students and common people struggle to understand judgments quickly due to complex language, length, and formatting.  
- Lawyers, educators, and legal creators need short explainers for education, awareness, and simplified case-law learning.  
- Manual summarization + video editing is slow and not scalable.  
- This pipeline makes legal content more accessible by converting a judgment into a structured explainer format consumable in minutes.  

This project addresses a gap not effectively solved by traditional reading, short notes, or generic summaries and can power **LawStory AI** as a product feature.


## 2. Data Understanding & Preparation

### a. Dataset source (public / synthetic / collected / API)
This project does not use a fixed traditional dataset (like CSVs). It uses **live real-world input** in the form of **judgment PDFs**, including:
- Collected / user-provided judgment PDF documents  
- Extracted text generated from those PDFs through Make.com scenario processing  

So the “data” is unstructured legal text extracted from actual court judgments.

### b. Data loading and exploration
The data enters the system through Make.com as:
- Webhook input containing the judgment PDF URL/file  
- HTTP module to fetch/download the PDF  
- Custom JS module to extract PDF → text  
- Extracted text passed into the AI step  

This stage validates:
- PDF download success  
- Extracted text readability  
- Presence of key information (court details, case context)  
- Presence of core legal components needed for explanation: facts, issues, arguments, decision, reasoning/ratio  

### c. Cleaning, preprocessing, feature engineering
Instead of numeric feature engineering, the system uses **LLM-ready preprocessing**:
- Ensure extracted text is passed as a clean single input to Gemini  
- Enforce predictable JSON output format for automation  
- Convert raw judgment text → structured multi-scene JSON video script containing narration, duration_seconds, and visual instructions  

### d. Handling missing values or noise (if applicable)
Legal PDFs often contain noise:
- irregular spacing and line breaks  
- repeated headers/footers  
- inconsistent formatting  
- unclear headings  

This is handled by:
- relying on Gemini to generate clean structured JSON  
- ensuring JSON remains valid and parseable in Make  
- designing prompts that still produce complete explainer output even if headings are not explicit


In [None]:
# Sample extracted text snippet (demo placeholder)
judgment_extracted_text = """
IN THE HIGH COURT OF DELHI AT NEW DELHI
Party A v. Party B
The dispute concerns residence rights in a property claimed as a shared household.
The court examined ownership documents, contributions, and the effect of divorce on residence rights.
The appeal was dismissed.
"""

print(judgment_extracted_text)


## 3. Model / System Design

### a. AI technique used (ML / DL / NLP / LLM / Recommendation / Hybrid)
This project uses an **LLM-based NLP pipeline** combined with a video rendering system:
- Google Gemini for structured multi-scene script generation  
- Shotstack API for rendering the final video  

It is a **Hybrid AI + Media Rendering Automation system**.

### b. Architecture or pipeline explanation
Final working pipeline (multi-scene, one final video URL, no audio):

1. Webhook Trigger  
   - Receives judgment PDF URL/request from Bubble  

2. HTTP Module  
   - Downloads/fetches the judgment PDF  

3. Custom JS  
   - Converts PDF → extracted text  

4. Gemini Module  
   - Converts extracted judgment text into structured JSON video script  
   - Output includes title_frame metadata + scenes[] array  

5. Parse JSON module  
   - Parses Gemini output so Make can access fields like narration/duration/visual  

6. Iterator  
   - Iterates over scenes[] and creates one bundle per scene  

7. Text Aggregator (critical)  
   - Combines all scenes into one “clips array” for Shotstack  

8. Shotstack HTTP POST  
   - Sends one render request containing all clips in one timeline  
   - Returns render ID  

9. Shotstack HTTP GET  
   - Polls render status until done  
   - Returns final video URL  

This produces one final MP4 link containing multiple scenes in sequence.

### c. Justification of design choices
- Gemini is used because legal text is unstructured and requires summarization + structuring.  
- JSON output is required for reliable automation mapping in Make.  
- Iterator + Aggregator is necessary to combine multiple scenes into one Shotstack timeline.  
- Shotstack is used because it supports programmatic video rendering and returns a hosted output URL.


## 4. Core Implementation

### a. Model training / inference logic
No training was performed. The pipeline is **inference-only**:
- Gemini generates structured script from extracted judgment text  
- Shotstack renders the video from structured clips  

### b. Prompt engineering (for LLM-based projects)
Prompt engineering ensures Gemini output is stable and machine-readable:
- Output must be a JSON object only  
- Must include scenes as an array  
- Each scene includes scene_number, duration_seconds, narration, visual  

This was critical because earlier failures occurred when Gemini returned:
- extra text before/after JSON  
- markdown formatting  
- unexpected null values  
- unsupported keys  

Prompts were refined to enforce plain JSON output with consistent structure.

### c. Recommendation or prediction pipeline
Not applicable. This is a generation pipeline:
Judgment PDF → extracted text → structured scenes → video timeline → final video URL

### d. Code must run top-to-bottom without errors
In the working version:
- Gemini output parses correctly  
- Iterator generates multiple scene bundles  
- Aggregator combines them into one clips array  
- Shotstack POST returns success (201 Created)  
- Shotstack GET returns status done + final URL


In [None]:
import json

sample_gemini_output = {
  "title_frame": {
    "court": "High Court of Delhi at New Delhi",
    "case_title": "Party A v. Party B",
    "year": "2025",
    "citation": "Sample Citation",
    "coram": "Sample Coram"
  },
  "scenes": [
    {
      "scene_number": 1,
      "duration_seconds": 15,
      "narration": "This case explains whether Party A can continue living in a property claimed as a shared household.",
      "visual": "White text on black background: Scene 1 overview"
    },
    {
      "scene_number": 2,
      "duration_seconds": 20,
      "narration": "Party A argued residence rights, while Party B relied on ownership documents and sought possession.",
      "visual": "White text on black background: Scene 2 arguments"
    }
  ]
}

print(json.dumps(sample_gemini_output, indent=2))


## 5. Evaluation & Analysis

### a. Metrics used (quantitative or qualitative)
This system is evaluated using qualitative + functional metrics:
- Functional success: Shotstack returns status done  
- Final output URL generated successfully  
- Scene correctness: multiple scenes appear in the final video  
- Timing correctness: each scene duration matches duration_seconds  
- Start times accumulate correctly (0, 5, 15, 27…)  
- Content quality: narration explains facts, issues, arguments, and reasoning clearly  

### b. Sample outputs / predictions
The pipeline produces:
- One final Shotstack render ID  
- One final MP4 video URL (single link)  
- Multi-scene text-based video output in correct order  
- Consistent readable style (white text on black background)  

### c. Performance analysis and limitations
**Strengths**
- Fully automated end-to-end system  
- Works reliably without manual editing  
- Multi-scene output is structured correctly  
- Produces one combined final video link per judgment  

**Limitations (current stage)**
- Audio not added yet  
- Visuals are minimal and text-based  
- Output depends on Gemini summarization consistency  
- Video style is simple (text scenes)


## 6. Ethical Considerations & Responsible AI

### a. Bias and fairness considerations
Legal summarization can introduce bias by:
- emphasizing one party’s narrative  
- omitting critical legal reasoning  
- oversimplifying disputes  

Bias is reduced by:
- neutral and factual tone  
- avoiding emotional/accusatory language  
- using “Party A” and “Party B” instead of assuming relationships  

### b. Dataset limitations
Inputs vary heavily across judgments:
- scanned PDFs may reduce extraction quality  
- formatting issues can reduce clarity  
- incomplete or non-uniform court orders  

### c. Responsible use of AI tools
The system is designed for educational understanding and simplified explanation. It should not be treated as a replacement for the original judgment or professional legal interpretation.


## 7. Conclusion & Future Scope

### a. Summary of results
We successfully built a working Make.com automation that:
✅ Takes legal judgment PDF text  
✅ Uses Gemini to create a multi-scene script  
✅ Iterates over scenes  
✅ Aggregates them into one Shotstack timeline  
✅ Generates one final video URL with multiple scenes  
❌ Audio is not added yet in this version  

### b. Possible improvements and extensions
Future improvements include:
- Add voiceover using ElevenLabs (TTS)  
- Add multilingual audio generation  
- Add translation features so judgments in any language can be converted into videos in the same language  
- Add background visuals/images per scene  
- Add subtitles and improved typography  
- Add background music at low volume  
- Add automatic polling loop for Shotstack GET  
- Add JSON repair fallback if Gemini output fails  
- Store final URL back to Bubble database for LawStory AI UI
