# Multimodal Analysis of Spacetech Designs for SSLV Using Florence-2

This Jupyter Notebook demonstrates multimodal analysis of Small Satellite Launch Vehicle (SSLV) designs using **Florence-2**, a state-of-the-art vision-language model released by Microsoft in January 2025. The demo is tailored for spacetech applications, focusing on Tamil Nadu's growing spacetech ecosystem (e.g., startups like Agnikul Cosmos, ToSpace). We perform tasks such as:
- **Image Captioning**: Describe SSLV components.
- **Object Detection**: Identify parts like nozzles or fairings.
- **Segmentation**: Isolate components for quality control.
- **Visual Question Answering (VQA)**: Answer design-related questions.

## Objectives
- Showcase Florence-2's capabilities in spacetech design analysis.
- Integrate OpenCV for image processing and PyTorch for model inference.
- Use datasets like SPEED+ and synthetic SSLV images.
- Address practical problems: quality control, mission planning, design validation.

## Prerequisites
- Python 3.8+
- Libraries: `torch`, `transformers`, `opencv-python`, `diffusers`, `datasets`, `PIL`
- GPU recommended (e.g., Google Colab Pro or local NVIDIA GPU)
- Hugging Face account and token for gated models

## Datasets
- **SPEED+**: Stanford's Spacecraft Pose Estimation Dataset (synthetic and hardware-in-the-loop images of satellites, relevant for SSLV components). Available at Stanford Digital Repository.
- **Synthetic SSLV Images**: Generated using Stable Diffusion 3.5 with prompts like "SSLV rocket nozzle in space".
- Optional: Public satellite imagery (e.g., NASA's Earth Observatory) for context.

## Setup
Install dependencies and authenticate with Hugging Face.

In [None]:
!pip install torch torchvision opencv-python transformers diffusers datasets pillow
from huggingface_hub import login
login("your_huggingface_token")  # Replace with your token
import torch
import cv2
import numpy as np
from PIL import Image
import requests
from transformers import AutoProcessor, AutoModelForCausalLM
from diffusers import StableDiffusionPipeline
from datasets import load_dataset
import matplotlib.pyplot as plt
%matplotlib inline

## 1. Load Florence-2 Model
Florence-2 is a lightweight vision-language model trained on the FLD-5B dataset (126M images, 5.4B annotations). It supports tasks like captioning, detection, segmentation, and VQA using a prompt-based approach.

In [None]:
device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Florence-2-large",
    torch_dtype=torch_dtype,
    trust_remote_code=True
).to(device)
processor = AutoProcessor.from_pretrained(
    "microsoft/Florence-2-large",
    trust_remote_code=True
)

## 2. Generate Synthetic SSLV Images
Since real SSLV images are limited, we use **Stable Diffusion 3.5** to generate synthetic images of SSLV components (e.g., rocket nozzle, fairing). This simulates designs for analysis.

In [None]:
pipe = StableDiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-large",
    torch_dtype=torch_dtype
).to(device)

prompt = "A detailed SSLV rocket nozzle in space, high-resolution, realistic, with metallic texture"
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
image.save("sslv_nozzle.png")

# Display synthetic image
plt.imshow(image)
plt.axis('off')
plt.title("Synthetic SSLV Nozzle")
plt.show()

## 3. Load SPEED+ Dataset
The SPEED+ dataset contains synthetic and hardware-in-the-loop images of spacecraft, suitable for SSLV component analysis. We load a sample image for demonstration.

In [None]:
# Note: SPEED+ requires downloading from Stanford Digital Repository.
# For demo, use a placeholder image or download a sample from Hugging Face.
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw)

# Placeholder: Replace with SPEED+ image of a satellite component
plt.imshow(image)
plt.axis('off')
plt.title("SPEED+ Sample Image (Placeholder)")
plt.show()

## 4. Multimodal Analysis with Florence-2
We perform four tasks to analyze SSLV designs:
- **Image Captioning**: Describe the component.
- **Object Detection**: Identify parts like nozzles or panels.
- **Segmentation**: Isolate components for quality control.
- **VQA**: Answer design-related questions.

### Helper Function
Define a function to run Florence-2 tasks.

In [None]:
def run_florence_task(image, task_prompt, text_input=None):
    prompt = task_prompt if text_input is None else task_prompt + text_input
    inputs = processor(text=prompt, images=image, return_tensors="pt").to(device, torch_dtype)
    generated_ids = model.generate(
        input_ids=inputs["input_ids"],
        pixel_values=inputs["pixel_values"],
        max_new_tokens=1024,
        num_beams=3
    )
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
    parsed_answer = processor.post_process_generation(
        generated_text,
        task=task_prompt,
        image_size=(image.width, image.height)
    )
    return parsed_answer

# Convert PIL Image to OpenCV format for visualization
def pil_to_cv2(image):
    return cv2.cvtColor(np.array(image), cv2.COLOR_RGB2BGR)

### 4.1 Image Captioning
Generate a description of the SSLV nozzle.

In [None]:
sslv_image = Image.open("sslv_nozzle.png")
caption = run_florence_task(sslv_image, "<CAPTION>")
print("Caption:", caption)

# Visualize with OpenCV
cv_image = pil_to_cv2(sslv_image)
cv2.putText(cv_image, caption, (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)
cv2.imshow("Captioned Image", cv_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

### 4.2 Object Detection
Detect components like the nozzle or structural elements.

In [None]:
detection = run_florence_task(sslv_image, "<OD>", "rocket nozzle, structural frame")
print("Detected Objects:", detection)

# Visualize bounding boxes
cv_image = pil_to_cv2(sslv_image)
for box in detection['<OD>']['bboxes']:
    x1, y1, x2, y2 = map(int, box)
    cv2.rectangle(cv_image, (x1, y1), (x2, y2), (0, 255, 0), 2)
    cv2.putText(cv_image, "Nozzle", (x1, y1-10), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)

cv2.imshow("Object Detection", cv_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

### 4.3 Segmentation
Segment the nozzle for quality control (e.g., defect detection).

In [None]:
segmentation = run_florence_task(sslv_image, "<SEGMENTATION>", "rocket nozzle")
print("Segmentation Masks:", segmentation)

# Visualize mask
cv_image = pil_to_cv2(sslv_image)
mask = segmentation['<SEGMENTATION>']['masks'][0]  # First mask
mask = cv2.resize(mask, (cv_image.shape[1], cv_image.shape[0]))
masked_image = cv2.bitwise_and(cv_image, cv_image, mask=mask.astype(np.uint8))

cv2.imshow("Segmented Nozzle", masked_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

### 4.4 Visual Question Answering (VQA)
Answer a design-related question about the SSLV component.

In [None]:
question = "What material is the rocket nozzle made of?"
vqa_answer = run_florence_task(sslv_image, "<VQA>", question)
print("VQA Answer:", vqa_answer)

# Visualize with question and answer
cv_image = pil_to_cv2(sslv_image)
cv2.putText(cv_image, f"Q: {question}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)
cv2.putText(cv_image, f"A: {vqa_answer}", (10, 60), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255, 255, 255), 2)

cv2.imshow("VQA Result", cv_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

## 5. Practical Applications in Spacetech
- **Quality Control**: Object detection and segmentation identify defects in SSLV components (e.g., cracks in nozzles), supporting Tamil Nadu's Space Bays manufacturing.
- **Mission Planning**: Captioning and VQA provide insights for mission design, e.g., analyzing thermal properties of components.
- **Design Validation**: Synthetic images allow iterative testing of SSLV designs before prototyping, reducing costs for startups like Agnikul Cosmos.
- **Automation**: Multimodal analysis automates inspection in Kulasekarapattinam spaceport operations.

## 6. Challenges and Future Work
- **Data Scarcity**: Real SSLV images are limited; synthetic datasets need validation.
- **Model Fine-Tuning**: Fine-tune Florence-2 on spacetech-specific datasets for better accuracy.
- **Real-Time Inference**: Optimize for real-time analysis in manufacturing.
- **Integration with CFD**: Combine with Computational Fluid Dynamics (CFD) for aerodynamic analysis (future scope).

## 7. Conclusion
This notebook demonstrates Florence-2's multimodal capabilities for SSLV design analysis, integrating OpenCV, PyTorch, and Hugging Face tools. It addresses spacetech challenges in Tamil Nadu's ecosystem, supporting startups and ISRO's initiatives. Extend this demo by:
- Using real SPEED+ images.
- Fine-tuning Florence-2 on custom SSLV datasets.
- Adding CFD simulations for aerodynamic validation.

## References
- Florence-2: https://huggingface.co/microsoft/Florence-2-large
- SPEED+: Stanford Digital Repository
- Stable Diffusion 3.5: https://huggingface.co/stabilityai/stable-diffusion-3.5-large
- Tamil Nadu Space Policy: https://startuptn.in/