# LawStory AI ‚Äî Judgment PDF to Multi-Scene Explainer Video (Text-Based Prototype)

**Primary Goal:** Automatically convert a **court judgment PDF** into a short, structured **multi-scene explainer video** (white text on black background) with **one final output link**, without manual editing.  

**Submitted version (current):**  
- ‚úÖ Multi-scene video generation works end-to-end (no audio)  
- ‚úÖ Output is a single Shotstack-rendered video URL containing all scenes in sequence  

**Audio version (in progress):**  
- üîÑ Voiceover planned using **ElevenLabs** (TTS)  
- üîÑ Media hosting/asset management planned using **Cloudinary**  
- üîÑ Multilingual audio + translation features planned  

---


## 1. Problem Definition & Objective

### a. Selected project track
**AI + Automation (LLM + Video Rendering Pipeline)**  

This project is built under the **AI + Automation** track. It combines:
- **LLM-based legal understanding** (to interpret and structure unstructured judgments)
- **Automation workflow orchestration** (to run the pipeline end-to-end)
- **Programmatic video rendering** (to generate an explainer video without editing tools)

The system takes a **court judgment PDF as the only input**, extracts key legal information, and automatically converts it into a **structured multi-scene explainer video**.  
The entire workflow is designed to work:
- without manual summarization  
- without manual script writing  
- without manual video editing  
- without requiring technical knowledge from the end user  

### b. Clear problem statement
Legal judgments are long, technical, and written in a style meant for legal precision, not for quick understanding. Even if a user has the judgment PDF, understanding it requires time and skill to identify and connect important legal components such as:
- **facts**  
- **legal issues**  
- **arguments from both sides**  
- **decision and reasoning (ratio)**  

The problem solved by this project is:  
**How can we automatically convert a judgment PDF into an easy-to-consume, structured multi-scene explainer video, without manual intervention?**

Without this system, users are forced to:
- spend hours reading the judgment end-to-end  
- rely on scattered online summaries that may be incomplete or inaccurate  
- miss the core legal learning due to complexity and time constraints  

### c. Real-world relevance and motivation
This problem is highly real because judgments are a primary source of legal learning, but they are difficult to consume for most audiences.

**Who benefits and why this matters:**
- **Law students** need fast revision and clarity on case law outcomes and reasoning.  
- **Common people** want to understand judgments that affect rights, property, family matters, criminal law, etc.  
- **Lawyers and educators** need short explainers for teaching and awareness.  
- **Legal creators and platforms** need scalable ways to produce accurate and structured content quickly.  

**Why this solution is uniquely valuable:**
- Manual legal summarization + video creation is slow and not scalable.  
- Traditional notes or summaries do not provide a consistent ‚Äústory-like‚Äù structured output.  
- This pipeline converts complex legal text into an explainer format that can be consumed in minutes, making legal content far more accessible.

This pipeline can directly power **LawStory AI** as a product feature:  
**Upload a judgment PDF ‚Üí receive a multi-scene explainer video output automatically.**


## 2. Data Understanding & Preparation

### a. Dataset source (public / synthetic / collected / API)
This project does not use a fixed dataset like CSV files, labeled training sets, or tabular datasets.  
Instead, it uses **live real-world input** in the form of **court judgment PDFs**.

**Source of data:**
- Collected / user-provided **judgment PDF documents**
- Extracted text generated from those PDFs through Make.com scenario processing

So, the ‚Äúdata‚Äù here is:
- **unstructured legal text**
- extracted from real court documents
- variable in formatting and complexity

### b. Data loading and exploration
The data enters the system through **Make.com** as:

1. **Webhook input**  
   - receives the judgment PDF URL or file reference  
2. **HTTP module**  
   - fetches/downloads the judgment PDF  
3. **Custom JS module**  
   - extracts PDF ‚Üí plain text  
4. The extracted text is passed forward into the AI step (Gemini)

This stage acts as a practical ‚Äúexploration + validation‚Äù step where we confirm:
- the PDF was successfully fetched  
- extracted text is readable and complete  
- extracted content includes enough context to explain the case  

**Key information we specifically require from extracted text:**
- court name / bench details (if present)  
- case title / parties  
- facts of the dispute  
- legal issues framed or implied  
- arguments of both sides  
- final decision and reasoning / ratio  

### c. Cleaning, preprocessing, feature engineering
This project does not involve numeric feature engineering because it is not a classic ML training pipeline.  
Instead, it uses **LLM-ready preprocessing**.

Key preprocessing steps:
- ensure extracted text is passed as one clean input to Gemini  
- reduce formatting noise (line breaks, repeated headers/footers) indirectly via LLM structuring  
- enforce structured output in a predictable JSON schema for automation  

The most important transformation is:  
**Judgment PDF ‚Üí extracted text ‚Üí structured multi-scene JSON script**

The structured script contains for each scene:
- narration (voice text / spoken explanation)  
- duration_seconds  
- visual instructions (what appears on screen)

### d. Handling missing values or noise (if applicable)
Yes, legal PDFs frequently contain noise or incomplete information, such as:
- irregular spacing and line breaks  
- repeated headers/footers  
- inconsistent formatting across courts  
- scanned PDFs where text extraction is weak  
- missing headings (facts/issues not clearly labeled)

This was handled practically by:
- relying on Gemini to generate clean structured JSON output  
- ensuring JSON remains valid and parseable in Make  
- prompt design that encourages complete explanation even when structure is not explicit


In [None]:
# Sample extracted text snippet (demo placeholder)
judgment_extracted_text = """
IN THE HIGH COURT OF DELHI AT NEW DELHI
Party A v. Party B

The dispute concerns residence rights in a property claimed as a shared household.
The court examined ownership documents, contributions, and the effect of divorce on residence rights.
The appeal was dismissed.
"""

print(judgment_extracted_text)


## 3. Model / System Design

### a. AI technique used (ML / DL / NLP / LLM / Recommendation / Hybrid)
This project is an **LLM-based NLP pipeline** combined with a media rendering system.

**Tools:**
- **Google Gemini** ‚Üí generates structured multi-scene video script JSON  
- **Shotstack API** ‚Üí renders final video from clips and returns hosted output URL  

So, this is a:
**Hybrid AI + Media Rendering Automation system**

### b. Architecture or pipeline explanation
Final working pipeline (multi-scene, one final video URL, no audio):

1. **Webhook Trigger (Make.com)**  
2. **HTTP Module** (fetch judgment PDF)  
3. **Custom JS Module** (extract PDF ‚Üí text)  
4. **Gemini Module** (generate JSON script)  
5. **Parse JSON Module**  
6. **Iterator** (1 bundle per scene)  
7. **Text Aggregator** (combine scenes into one clips array)  
8. **Shotstack HTTP POST** (render request)  
9. **Shotstack HTTP GET** (poll until done ‚Üí final URL)

### c. Justification of design choices
- Gemini structures unstructured legal text into scenes.  
- JSON output ensures automation stability in Make.  
- Iterator + Aggregator ensures one combined Shotstack timeline.  
- Shotstack enables programmatic video rendering with a hosted output URL.


## 4. Core Implementation

### a. Model training / inference logic
No training was performed. This is an inference-only pipeline.

### b. Prompt engineering (for LLM-based projects)
Prompts were engineered to ensure Gemini outputs strict JSON with consistent keys and no extra formatting.

### c. Recommendation or prediction pipeline
Not applicable (generation pipeline).

### d. Code must run top-to-bottom without errors
The submitted multi-scene pipeline runs end-to-end successfully and returns one final video URL.


In [None]:
import json

sample_gemini_output = {
  "title_frame": {
    "court": "High Court of Delhi at New Delhi",
    "case_title": "Party A v. Party B",
    "year": "2025",
    "citation": "Sample Citation",
    "coram": "Sample Coram"
  },
  "scenes": [
    {
      "scene_number": 1,
      "duration_seconds": 15,
      "narration": "This case explains whether Party A can continue living in a property claimed as a shared household.",
      "visual": "White text on black background: Scene 1 overview"
    }
  ]
}

print(json.dumps(sample_gemini_output, indent=2))


## 5. Evaluation & Analysis

### a. Metrics used (quantitative or qualitative)
Evaluated using functional + qualitative metrics: render success, multi-scene correctness, timing correctness, and content clarity.

### b. Sample outputs / predictions
Produces one render ID and one final MP4 URL containing all scenes.

### c. Performance analysis and limitations
Strength: end-to-end automation.  
Limitation: audio not included yet, visuals minimal.


In [None]:
sample_shotstack_response = {
  "render_id": "render_1234567890",
  "status": "done",
  "final_video_url": "https://example.com/final_video.mp4"
}

import json
print(json.dumps(sample_shotstack_response, indent=2))


## 6. Ethical Considerations & Responsible AI

### a. Bias and fairness considerations
Summaries may introduce bias; mitigated by neutral tone and balanced explanation.

### b. Dataset limitations
Judgment PDFs vary in quality and structure; scanned PDFs reduce extraction quality.

### c. Responsible use of AI tools
Educational use only; not a substitute for original judgments or legal advice.


## 7. Conclusion & Future Scope

### a. Summary of results
Built a working pipeline that converts judgment PDFs into a multi-scene Shotstack video URL (no audio yet).

### b. Possible improvements and extensions
ElevenLabs voiceover, multilingual translation, subtitles, visuals, background music, auto-polling, JSON repair, and Bubble DB storage.
