# LawStory AI — Judgment PDF to Multi-Scene Explainer Video (Text-Based Prototype)

This notebook is the **main source of truth** for evaluating the project.  
It documents the end-to-end AI + Automation pipeline that converts a **court judgment PDF** into a **multi-scene explainer video** (white text on black background), without manual editing.


In [None]:
# Basic imports (no external API keys required for this notebook)
import json
print("Notebook initialized successfully.")

1) Problem Definition & Objective  
a. Selected project track  

AI + Automation (LLM + Video Rendering Pipeline)  
This project is built under the **AI + Automation** track, combining **LLM-based legal understanding** with an automated **video rendering pipeline**. The goal is to build a system that can take a **court judgment PDF as the only input**, extract the most meaningful legal information, and automatically convert it into a short, structured **multi-scene explainer video**. The entire process is designed to work without manual editing, manual scriptwriting, or manual video creation.

b. Clear problem statement  

Legal judgments are long, technical, and written in a format that is difficult to consume quickly. Even when someone has access to the judgment PDF, understanding it requires significant time and effort to identify the key parts like **facts, legal issues, arguments from both sides, and the final reasoning of the court**. The problem this project solves is: **how to automatically convert a judgment PDF into an easy-to-consume, structured multi-scene video without manual intervention.** Without this kind of system, users are forced to either spend hours reading, depend on scattered online summaries, or miss the core learning from the judgment entirely.

c. Real-world relevance and motivation  

This problem is highly real and urgent because legal awareness and legal education depend on access to understandable information, but judgments are not written for quick comprehension.  
- **Law students and common people** struggle to understand judgments quickly because of complex language, length, and legal formatting.  
- **Lawyers, educators, and legal creators** need short explainers to teach, spread awareness, and simplify complex case law.  
- Existing options usually require manual summarization or manual video editing, which is slow and not scalable.  
- This pipeline makes legal content dramatically more accessible by converting it into an explainer format that can be consumed in minutes.  

This project solves a gap that is not addressed effectively by traditional reading, short notes, or generic summaries. It can directly power **LawStory AI** as a product feature where a user uploads a judgment PDF and receives a structured explainer video output automatically.

2) Data Understanding & Preparation  
a. Dataset source (public / synthetic / collected / API)  

This project does not use a fixed “traditional dataset” like CSVs or labeled training data. Instead, it uses **live real-world input** in the form of **judgment PDFs**. The data source includes:  
- **Collected / user-provided judgment PDF documents**  
- **Extracted text generated from those PDFs** through processing inside the Make.com scenario  

So the “data” here is **unstructured legal text extracted from actual court judgments**, making it realistic, variable, and challenging in a way that reflects real product usage.

b. Data loading and exploration  

The data enters the system through Make.com as:  
- A **Webhook input** containing the judgment PDF URL/file  
- An **HTTP module** that fetches/downloads the judgment PDF  
- A **Custom JS module** that extracts PDF → text  
- The extracted text is passed forward into the AI step  

This stage functions as the exploration and validation stage where we confirm:  
- The PDF is successfully downloaded and readable  
- The extracted text is complete enough for understanding  
- The text contains essential legal information such as court details and case context  
- The text includes the core components required for video explanation such as:  
  - **facts**  
  - **legal issues**  
  - **arguments from both sides**  
  - **decision and reasoning / ratio**  

c. Cleaning, preprocessing, feature engineering  

Instead of numeric feature engineering, this project uses **LLM-ready preprocessing** to ensure the extracted legal text becomes usable for structured generation. The key preparation work included:  
- Ensuring the extracted judgment text is passed as a clean, single input to Gemini  
- Structuring the output into a predictable JSON format required for automation  
- Converting raw legal content into a multi-scene video script that contains:  
  - narration  
  - duration_seconds  
  - visual instructions  

The most important transformation is:  
**Raw judgment PDF text → structured multi-scene JSON video script**

d. Handling missing values or noise (if applicable)  

Yes, legal PDFs often contain noise such as:  
- line breaks and spacing issues  
- headers/footers repeated on every page  
- inconsistent formatting across courts  
- missing or unclear headings  

This was handled practically by:  
- relying on Gemini to generate clean structured JSON output instead of depending on raw formatting  
- ensuring JSON is valid and parseable in Make  
- designing prompts that still produce a complete explainer even if some parts are not explicitly labeled in the judgment

3) Model / System Design  
a. AI technique used (ML / DL / NLP / LLM / Recommendation / Hybrid)  

This project is an **LLM-based NLP pipeline** combined with a video rendering system. It uses:  
- **Google Gemini** for script generation in a structured multi-scene format  
- **Shotstack API** for rendering the final video from those scenes  

Therefore, it is best described as a **Hybrid AI + Media Rendering Automation system**, where the AI generates structured understanding and the renderer converts it into an output video.

b. Architecture or pipeline explanation  

Final working pipeline (multi-scene, one final video URL, no audio):  

Webhook Trigger  
- Receives the judgment PDF URL/request from Bubble  

HTTP Module  
- Downloads/fetches the judgment PDF  

Custom JS  
- Converts PDF → extracted text  

Gemini Module  
- Converts extracted judgment text into structured JSON video script  
- Output includes:  
  - title_frame metadata (court name, case title, year, citation, coram)  
  - scenes[] array  

Parse JSON module  
- Reads Gemini output as JSON so Make can access scene fields like:  
  - scenes[1].narration  
  - scenes[1].duration_seconds  
  - scenes[1].visual  

Iterator  
- Iterates over scenes[]  
- Creates 1 bundle per scene  

Text Aggregator (critical step)  
- Combines all iterated scenes into one “clips array” for Shotstack  
- Produces multiple clip objects such as:  
  - text clip 1 (start=0)  
  - text clip 2 (start=5)  
  - text clip 3 (start=15)  
  - and so on  

Shotstack HTTP POST  
- Sends one render request containing all clips in one timeline  
- Returns render ID  

Shotstack HTTP GET  
- Polls render status until done  
- Returns final video URL  

This creates **one final MP4 link** that contains **multiple scenes in sequence**, matching the goal of one complete explainer video per judgment.

c. Justification of design choices  

This design is chosen because it solves the problem in the most scalable and automation-friendly way:  
- Gemini is used because legal text is unstructured and requires intelligent summarization and restructuring.  
- JSON output is essential because Make automation depends on reliable fields to map into downstream modules.  
- Iterator + Aggregator is necessary because:  
  - the system generates multiple scenes  
  - Shotstack requires a single combined timeline for one final render  
- Shotstack is used because it renders videos programmatically and returns a hosted output URL without manual editing.  

This combination makes the system practical, repeatable, and suitable for real product deployment.

4) Core Implementation  
a. Model training / inference logic  

No training was done because this project does not require training a custom ML model. The pipeline is **inference-only**, meaning:  
- Gemini generates the structured script from extracted judgment text  
- Shotstack renders the final video from structured clips  

b. Prompt engineering (for LLM-based projects)  

Prompt engineering was a core part of making the system stable. Gemini was instructed to output a strict JSON structure:  
- must output a JSON object only  
- must include scenes as an array  
- each scene must contain:  
  - scene_number  
  - duration_seconds  
  - narration  
  - visual  

This was necessary because earlier the pipeline failed when Gemini returned:  
- extra text before/after JSON  
- markdown formatting  
- unexpected null values  
- extra unsupported keys  

To prevent failures, prompts were improved to ensure:  
- plain JSON output only  
- no markdown or backticks  
- consistent scene structure for Make parsing and Shotstack rendering  

c. Recommendation or prediction pipeline  

Not applicable. This is not a recommendation system. It is a generation pipeline that converts:  
**Judgment PDF → extracted text → structured scenes → video timeline → final video URL**

d. Code must run top-to-bottom without errors  

In the working version (multi-scene video generation), the pipeline ran end-to-end successfully:  
- Gemini output → parsed correctly  
- Iterator created multiple bundles  
- Aggregator produced a complete clips array  
- Shotstack POST request returned success (201 Created)  
- Shotstack GET returned status done + final URL

5) Evaluation & Analysis  
a. Metrics used (quantitative or qualitative)  

This system was evaluated using qualitative and functional metrics that directly reflect real user outcomes, such as:  
- Functional success: Shotstack returns status done  
- Final output URL generated successfully  
- Scene correctness: video contains multiple scenes (not just one clip)  
- Timing correctness: each scene’s duration matches duration_seconds  
- Start times accumulate properly (0, 5, 15, 27…)  
- Content quality: narration explains the judgment clearly in simple terms and captures facts, issues, arguments, and reasoning  

b. Sample outputs / predictions  

The pipeline produces:  
- One final Shotstack render ID  
- One final MP4 video URL (single link)  
- A multi-scene text-based video where scenes appear in correct order  
- A readable and consistent output style (white text on black background)  

c. Performance analysis and limitations  

Strengths  
- Fully automated end-to-end system  
- Works reliably without manual editing  
- Multi-scene output is correct and structured  
- Produces one combined final video link per judgment  

Limitations (in this completed stage)  
- Audio is not added yet  
- Visuals are simple text-based (not animated graphics)  
- Output quality depends on Gemini’s summarization consistency  
- Video style is minimal (text scenes)

6) Ethical Considerations & Responsible AI  
a. Bias and fairness considerations  

Legal summarization can unintentionally introduce bias by:  
- emphasizing one party’s narrative more than the other  
- omitting critical legal reasoning  
- oversimplifying complex disputes  

We reduce this risk by:  
- keeping summaries neutral, factual, and educational  
- avoiding emotional or accusatory language  
- using neutral terms like “Party A” and “Party B” instead of assuming relationships  

b. Dataset limitations  

This project does not use a fixed dataset, and input judgments vary heavily. Limitations include:  
- scanned PDFs may result in poor extraction quality  
- formatting issues can reduce context clarity  
- court orders may be incomplete or not structured uniformly  

c. Responsible use of AI tools  

The system is designed for educational understanding and simplified explanation. Users should treat the generated output as a learning tool and not as a replacement for reading the original judgment or professional legal interpretation.

7) Conclusion & Future Scope  
a. Summary of results  

We successfully built a working Make.com automation that:  
✅ Takes legal judgment PDF text  
✅ Uses Gemini to create a multi-scene script  
✅ Iterates over scenes  
✅ Aggregates them into one Shotstack timeline  
✅ Generates one final video URL with multiple scenes  
❌ Audio is not added yet in this version  

b. Possible improvements and extensions  

Future improvements after this stage include:  
- Add voiceover tracks using ElevenLabs (TTS)  
- Add multilingual audio generation (different languages)  
- Add translation features so judgments in any language can be converted into explainer videos in the same language  
- Add background visuals or images per scene  
- Add subtitles and improved typography  
- Add background music at low volume  
- Add automatic polling loop for Shotstack GET (instead of manual rerun)  
- Add fallback if Gemini JSON fails (repair step)  
- Store final URL back to Bubble database for LawStory AI UI

## Example: Structured JSON output used in the pipeline

Below is a sample JSON format representing the multi-scene output that the Gemini module generates and Make.com parses.


In [None]:
example_output = {
  "title_frame": {
    "court": "Example Court",
    "case_title": "Party A v. Party B",
    "year": "2025",
    "citation": "Example Citation",
    "coram": "Example Coram"
  },
  "scenes": [
    {
      "scene_number": 1,
      "duration_seconds": 10,
      "narration": "Scene 1 narration text...",
      "visual": "White text on black background..."
    },
    {
      "scene_number": 2,
      "duration_seconds": 15,
      "narration": "Scene 2 narration text...",
      "visual": "White text on black background..."
    }
  ]
}

print(json.dumps(example_output, indent=2))