# What Is Modal?
- Modal is a cloud-native, serverless compute platform designed specifically for AI and data teams. 
- It lets you "bring your own code" and run CPU-, GPU-, and memory-intensive workloads at scale without managing infrastructure.

# Core Features

- Sub-second container starts
  - Rust-based container stack for lightning-fast cold boots.
- Seamless autoscaling
  - Scales from zero to thousands of GPUs (or CPU nodes) to handle unpredictable loads.
- Fast model loading
  - Optimized file system loads gigabytes of model weights in seconds.
- Bring-your-own code & frameworks
  - Deploy custom models, Hugging Face pipelines, or any Python/Rust/C++ code without changing your codebase.
- Flexible environments
  - Use prebuilt Python containers or supply your own Docker images; provision A100/H100 GPUs on demand.
- Integrated data volumes
  - Mount S3, R2, or other object storage as persistent volumes for datasets, model weights, or experiment outputs. 
- Observability & integrations
  - Export logs/traces via OpenTelemetry to Datadog, New Relic, etc., for real-time monitoring.

# Architecture & Workflow


- Define a Modal Function
  - Decorate Python functions (or methods) with `@modal.function` to specify compute resources (CPU/GPU, memory, timeout).
- Local Testing
  - Invoke the same functions locally for rapid iteration.
- Deployment
  - Push to Modal with a single CLI command; the platform packages your code, dependencies, and runtime.
- Execution
  - Modal schedules workloads in its serverless fleet, spins up containers on demand, runs your code, then scales to zero.
- Scaling & Load Balancing
  - Autoscaling across containers abstracts away horizontal scaling complexities.

# Typical Use Cases

- LLM Inference
  - Deploy chatbots or embedding services using custom or open-source language models.
- Fine-tuning / Training
  - Run hyperparameter sweeps on A100/H100 GPUs without queue times; pay per second.
- Batch Data Processing
  - Fan-out parallel jobs for dataset preprocessing, feature extraction, or vector indexing.
- Research & Prototyping
  - Spin up experiments quickly, test new architectures without worrying about infra setup.

# Pricing & Credits

- Pay-as-you-go: billed by the second for CPU, GPU, and memory usage.
- Free Tier & Credits: startups can apply for up to $50 K in free credits; personal trial includes free hours of CPU/GPU.
- Cost Efficiency: no idle-resource charges—containers scale to zero when idle.

# Example Usage

In [1]:
import modal

Setting up the modal tokens. This is the same as running `modal setup` from the command line. It connects with Modal and installs your tokens.

In [None]:
!modal setup

In [None]:
!modal token new

Grab the token id and token secret from this file and add it to .env

In [2]:
from dotenv import load_dotenv
load_dotenv()

True

In [3]:
from hello import app, hello

In [None]:
with app.run():
    reply = hello.local() # This will run the hello function locally
reply

In [None]:
with app.run():
    reply=hello.remote() # This will run the hello function in the Modal cloud
reply

Register your Hugging face secret to `modal.com`

In [4]:
# First check if you can access the Hugging Face API
import requests
import os

token = os.getenv("HF_TOKEN")
headers = {"Authorization": f"Bearer {token}"}
response = requests.get(
    "https://huggingface.co/meta-llama/Meta-Llama-3.1-8B/resolve/main/config.json",
    headers=headers
)
print(f"Status: {response.status_code}")
print(f"Response: {response.text[:200]}")

Status: 200
Response: {
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128001,
  "hidden_act": "silu",
  "hidden_size": 4


In [5]:
from llama import app, generate

Troubleshooting in case you get errors
- Ensure you have access to the repository or the access is granted to the repository
- The hugging face token should have `Read` permission
- Restart the kernel of the notebook

In [6]:
with modal.enable_output():
    with app.run():
        result=generate.remote("Life is a mystery, everyone must stand alone, I hear")
result

Output()

Output()

Output()

'<|begin_of_text|>Life is a mystery, everyone must stand alone, I hear you call my name,'