diff --git a/docs/llms.txt b/docs/llms.txt index f7f8a1b5c4..fdc8883e54 100644 --- a/docs/llms.txt +++ b/docs/llms.txt @@ -57,6 +57,8 @@ class Predictor(BasePredictor): return postprocess(output) ``` +In the above we accept a path to the image as an input, and return a path to our transformed image after running it through our model. + Now, you can run predictions on this model: ```console @@ -206,6 +208,422 @@ Otherwise, you can upgrade to the latest version by running the same commands yo +# CLI + +Cog provides a command-line interface for building, running, and deploying machine learning models. + +## Overview + +The Cog CLI follows this general pattern: + +``` +cog [global-options] [command-options] [arguments] +``` + +For help with any command, use the `--help` flag: + +```bash +cog --help +cog build --help +``` + +## Global Options + +These options are available for all commands: + +| Flag | Type | Default | Description | +|------|------|---------|-------------| +| `--debug` | bool | false | Show debugging output | +| `--version` | bool | false | Show version of Cog | + +## Commands + +### cog init + +Initialize a new Cog project in the current directory. + +``` +cog init +``` + +This command creates: +- `cog.yaml` - Configuration file defining the environment +- `predict.py` - Python file with a basic prediction model template +- `requirements.txt` - Python dependencies file + +**Examples:** + +```bash +# Initialize a new project +cog init + +# The created files provide a starting template +ls +# cog.yaml predict.py requirements.txt +``` + +### cog build + +Build a Docker image from a `cog.yaml` configuration file. + +``` +cog build [options] +``` + +**Flags:** + +| Flag | Type | Default | Description | +|------|------|---------|-------------| +| `-t, --tag` | string | | A name for the built image in the form 'repository:tag' | +| `--progress` | string | auto | Set type of build progress output: 'auto', 'tty', or 'plain' | +| `--secret` | string[] | | Secrets to pass to the build environment in the form 'id=foo,src=/path/to/file' | +| `--no-cache` | bool | false | Do not use cache when building the image | +| `--separate-weights` | bool | false | Separate model weights from code in image layers | +| `--openapi-schema` | string | | Load OpenAPI schema from a file | +| `--use-cuda-base-image` | string | auto | Use Nvidia CUDA base image: 'true', 'false', or 'auto' | +| `--use-cog-base-image` | bool | true | Use pre-built Cog base image for faster cold boots | +| `-f` | string | cog.yaml | The name of the config file | + +**Examples:** + +```bash +# Build with default settings +cog build + +# Build with a custom tag +cog build -t my-model:latest + +# Build without cache +cog build --no-cache + +# Build with separated weights for faster deploys +cog build --separate-weights -t my-model:v1 + +# Build without CUDA for smaller images (non-GPU models) +cog build --use-cuda-base-image=false +``` + +### cog predict + +Run a prediction on a model. + +``` +cog predict [image] [options] +``` + +If an image is specified, it runs predictions on that Docker image. Otherwise, it builds the model in the current directory and runs predictions on it. + +**Flags:** + +| Flag | Type | Default | Description | +|------|------|---------|-------------| +| `-i, --input` | string[] | | Inputs in the form name=value. Use @filename to read from a file | +| `-o, --output` | string | | Output path | +| `-e, --env` | string[] | | Environment variables in the form name=value | +| `--json` | string | | Pass inputs as JSON object from file (@inputs.json) or stdin (@-) | +| `--use-replicate-token` | bool | false | Pass REPLICATE_API_TOKEN from local environment | +| `--setup-timeout` | uint32 | 300 | Timeout for container setup in seconds | +| `--gpus` | string | | GPU devices to add to the container | +| `--use-cuda-base-image` | string | auto | Use Nvidia CUDA base image | +| `--use-cog-base-image` | bool | true | Use pre-built Cog base image | +| `--progress` | string | auto | Set type of build progress output | +| `-f` | string | cog.yaml | The name of the config file | + +**Examples:** + +```bash +# Run prediction with inputs +cog predict -i image=@input.jpg -i scale=2 + +# Run prediction with output path +cog predict -i image=@photo.png -o output.png + +# Run prediction with JSON input from file +echo '{"image": "@input.jpg", "scale": 2}' > inputs.json +cog predict --json @inputs.json + +# Run prediction with JSON input from stdin +echo '{"image": "@input.jpg", "scale": 2}' | cog predict --json @- + +# Run prediction on specific image +cog predict my-model:latest -i text="Hello world" + +# Run with environment variables +cog predict -e API_KEY=secret -i prompt="Generate text" + +# Run with specific GPU +cog predict --gpus 0 -i image=@input.jpg +``` + +### cog run + +Run a command inside a Docker environment defined by Cog. + +``` +cog run [options] [args...] +``` + +**Flags:** + +| Flag | Type | Default | Description | +|------|------|---------|-------------| +| `-p, --publish` | string[] | | Publish a container's port to the host (e.g., -p 8000) | +| `-e, --env` | string[] | | Environment variables in the form name=value | +| `--gpus` | string | | GPU devices to add to the container | +| `--progress` | string | auto | Set type of build progress output | +| `--use-cuda-base-image` | string | auto | Use Nvidia CUDA base image | +| `--use-cog-base-image` | bool | true | Use pre-built Cog base image | +| `-f` | string | cog.yaml | The name of the config file | + +**Examples:** + +```bash +# Run Python interpreter +cog run python + +# Run a Python script +cog run python train.py + +# Run with environment variables +cog run -e API_KEY=secret python script.py + +# Run with published ports +cog run -p 8888 jupyter notebook + +# Run with GPU access +cog run --gpus all python gpu_test.py + +# Run bash commands +cog run ls -la +cog run bash -c "echo Hello && python --version" +``` + +### cog serve + +Run the cog HTTP server locally. + +``` +cog serve [options] +``` + +Generates and runs an HTTP server based on the model's declared inputs and outputs. + +**Flags:** + +| Flag | Type | Default | Description | +|------|------|---------|-------------| +| `-p, --port` | int | 8393 | Port on which to listen | +| `--gpus` | string | | GPU devices to add to the container | +| `--progress` | string | auto | Set type of build progress output | +| `--use-cuda-base-image` | string | auto | Use Nvidia CUDA base image | +| `--use-cog-base-image` | bool | true | Use pre-built Cog base image | +| `-f` | string | cog.yaml | The name of the config file | + +**Examples:** + +```bash +# Start server on default port +cog serve + +# Start server on custom port +cog serve -p 5000 + +# Start server with GPU +cog serve --gpus all + +# Test the server +curl http://localhost:8393/predictions -X POST \ + -H 'Content-Type: application/json' \ + -d '{"input": {"text": "Hello"}}' +``` + +### cog push + +Build and push a model to a Docker registry. + +``` +cog push [IMAGE] +``` + +**Flags:** + +| Flag | Type | Default | Description | +|------|------|---------|-------------| +| `--secret` | string[] | | Secrets to pass to the build environment | +| `--no-cache` | bool | false | Do not use cache when building | +| `--separate-weights` | bool | false | Separate model weights from code | +| `--openapi-schema` | string | | Load OpenAPI schema from a file | +| `--use-cuda-base-image` | string | auto | Use Nvidia CUDA base image | +| `--use-cog-base-image` | bool | true | Use pre-built Cog base image | +| `--progress` | string | auto | Set type of build progress output | +| `-f` | string | cog.yaml | The name of the config file | + +**Examples:** + +```bash +# Push to Replicate +cog push r8.im/username/model-name + +# Push with separated weights +cog push r8.im/username/model-name --separate-weights + +# Push without cache +cog push r8.im/username/model-name --no-cache +``` + +### cog login + +Log in to Replicate Docker registry. + +``` +cog login [options] +``` + +**Flags:** + +| Flag | Type | Default | Description | +|------|------|---------|-------------| +| `--token-stdin` | bool | false | Pass login token on stdin instead of opening browser | + +**Examples:** + +```bash +# Interactive login (opens browser) +cog login + +# Login with token +echo $REPLICATE_API_TOKEN | cog login --token-stdin +``` + +### cog migrate + +Run a migration to update project to newer Cog version. + +``` +cog migrate [options] +``` + +**Flags:** + +| Flag | Type | Default | Description | +|------|------|---------|-------------| +| `-y` | bool | false | Disable interaction and automatically accept changes | +| `-f` | string | cog.yaml | The name of the config file | + +**Examples:** + +```bash +# Run migration interactively +cog migrate + +# Run migration automatically accepting all changes +cog migrate -y +``` + +### cog debug + +Generate a Dockerfile from cog configuration. + +``` +cog debug [options] +``` + +**Flags:** + +| Flag | Type | Default | Description | +|------|------|---------|-------------| +| `--image-name` | string | | The image name for the generated Dockerfile | +| `--separate-weights` | bool | false | Separate model weights from code | +| `--use-cuda-base-image` | string | auto | Use Nvidia CUDA base image | +| `--use-cog-base-image` | bool | true | Use pre-built Cog base image | +| `-f` | string | cog.yaml | The name of the config file | + +**Examples:** + +```bash +# Generate Dockerfile to stdout +cog debug + +# Generate Dockerfile with custom image name +cog debug --image-name my-model:debug +``` + +## Common Workflows + +### Basic Model Development + +```bash +# 1. Initialize a new project +cog init + +# 2. Edit cog.yaml and predict.py to define your model + +# 3. Test predictions locally +cog predict -i input_image=@photo.jpg + +# 4. Build and push to registry +cog push r8.im/username/my-model +``` + +### Using JSON Inputs + +The `--json` flag for `cog predict` allows passing complex inputs as JSON: + +```bash +# From file +cat > inputs.json << EOF +{ + "prompt": "A beautiful sunset", + "num_outputs": 4, + "guidance_scale": 7.5 +} +EOF +cog predict --json @inputs.json + +# From stdin +echo '{"prompt": "A cat", "seed": 42}' | cog predict --json @- + +# With local file paths (automatically converted to base64) +echo '{"image": "@input.jpg", "scale": 2}' | cog predict --json @- +``` + +### Working with GPUs + +```bash +# Use all available GPUs +cog run --gpus all python train.py + +# Use specific GPU +cog predict --gpus 0 -i image=@input.jpg + +# Use multiple specific GPUs +cog run --gpus '"device=0,1"' python multi_gpu_train.py +``` + +### Environment Variables + +```bash +# Pass environment variables to predict +cog predict -e API_KEY=$MY_API_KEY -i prompt="Hello" + +# Pass Replicate API token +export REPLICATE_API_TOKEN=your_token +cog predict --use-replicate-token -i prompt="Hello" + +# Multiple environment variables +cog run -e CUDA_VISIBLE_DEVICES=0 -e BATCH_SIZE=32 python train.py +``` + + + + + +--- + + + + + # Deploy models with Cog Cog containers are Docker containers that serve an HTTP server @@ -1282,16 +1700,19 @@ This document defines the API of the `cog` Python module, which is used to defin - [`BasePredictor`](#basepredictor) - [`Predictor.setup()`](#predictorsetup) - [`Predictor.predict(**kwargs)`](#predictorpredictkwargs) - - [Streaming output](#streaming-output) +- [`async` predictors and concurrency](#async-predictors-and-concurrency) - [`Input(**kwargs)`](#inputkwargs) + - [Deprecating inputs](#deprecating-inputs) - [Output](#output) - [Returning an object](#returning-an-object) - [Returning a list](#returning-a-list) - [Optional properties](#optional-properties) + - [Streaming output](#streaming-output) - [Input and output types](#input-and-output-types) - [`File()`](#file) - [`Path()`](#path) - [`Secret`](#secret) +- [`Optional`](#optional) - [`List`](#list) ## `BasePredictor` @@ -1354,36 +1775,26 @@ The `predict()` method takes an arbitrary list of named arguments, where each ar `predict()` can return strings, numbers, [`cog.Path`](#path) objects representing files on disk, or lists or dicts of those types. You can also define a custom [`Output()`](#outputbasemodel) for more complex return types. -#### Streaming output +## `async` predictors and concurrency -Cog models can stream output as the `predict()` method is running. For example, a language model can output tokens as they're being generated and an image generation model can output a images they are being generated. +> Added in cog 0.14.0. -To support streaming output in your Cog model, add `from typing import Iterator` to your predict.py file. The `typing` package is a part of Python's standard library so it doesn't need to be installed. Then add a return type annotation to the `predict()` method in the form `-> Iterator[]` where `` can be one of `str`, `int`, `float`, `bool`, `cog.File`, or `cog.Path`. +You may specify your `predict()` method as `async def predict(...)`. In +addition, if you have an async `predict()` function you may also have an async +`setup()` function: ```py -from cog import BasePredictor, Path -from typing import Iterator - class Predictor(BasePredictor): - def predict(self) -> Iterator[Path]: - done = False - while not done: - output_path, done = do_stuff() - yield Path(output_path) -``` - -If you're streaming text output, you can use `ConcatenateIterator` to hint that the output should be concatenated together into a single string. This is useful on Replicate to display the output as a string instead of a list of strings. - -```py -from cog import BasePredictor, Path, ConcatenateIterator + async def setup(self) -> None: + print("async setup is also supported...") -class Predictor(BasePredictor): - def predict(self) -> ConcatenateIterator[str]: - tokens = ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"] - for token in tokens: - yield token + " " + async def predict(self) -> str: + print("async predict"); + return "hello world"; ``` +Models that have an async `predict()` function can run predictions concurrently, up to the limit specified by [`concurrency.max`](yaml.md#max) in cog.yaml. Attempting to exceed this limit will return a 409 Conflict response. + ## `Input(**kwargs)` Use cog's `Input()` function to define each of the parameters in your `predict()` method: @@ -1406,6 +1817,7 @@ The `Input()` function takes these keyword arguments: - `max_length`: For `str` types, the maximum length of the string. - `regex`: For `str` types, the string must match this regular expression. - `choices`: For `str` or `int` types, a list of possible values for this input. +- `deprecated`: (optional) If set to `True`, marks this input as deprecated. Deprecated inputs will still be accepted, but tools and UIs may warn users that the input is deprecated and may be removed in the future. See [Deprecating inputs](#deprecating-inputs). Each parameter of the `predict()` method must be annotated with a type like `str`, `int`, `float`, `bool`, etc. See [Input and output types](#input-and-output-types) for the full list of supported types. @@ -1420,6 +1832,24 @@ class Predictor(BasePredictor): # ... ``` +## Deprecating inputs + +You can mark an input as deprecated by passing `deprecated=True` to the `Input()` function. Deprecated inputs will still be accepted, but tools and UIs may warn users that the input is deprecated and may be removed in the future. + +This is useful when you want to phase out an input without breaking existing clients immediately: + +```py +from cog import BasePredictor, Input + +class Predictor(BasePredictor): + def predict(self, + text: str = Input(description="Some deprecated text", deprecated=True), + prompt: str = Input(description="Prompt for the model") + ) -> str: + # ... + return prompt +``` + ## Output Cog predictors can return a simple data type like a string, number, float, or boolean. Use Python's `-> ` syntax to annotate the return type. @@ -1493,6 +1923,61 @@ class Predictor(BasePredictor): return Output(file=io.StringIO("hello")) ``` +### Streaming output + +Cog models can stream output as the `predict()` method is running. For example, a language model can output tokens as they're being generated and an image generation model can output images as they are being generated. + +To support streaming output in your Cog model, add `from typing import Iterator` to your predict.py file. The `typing` package is a part of Python's standard library so it doesn't need to be installed. Then add a return type annotation to the `predict()` method in the form `-> Iterator[]` where `` can be one of `str`, `int`, `float`, `bool`, or `cog.Path`. + +```py +from cog import BasePredictor, Path +from typing import Iterator + +class Predictor(BasePredictor): + def predict(self) -> Iterator[Path]: + done = False + while not done: + output_path, done = do_stuff() + yield Path(output_path) +``` + +If you have an [async `predict()` method](#async-predictors-and-concurrency), you must use `cog.AsyncIterator` instead: + +```py +from cog import AsyncIterator, BasePredictor, Path + +class Predictor(BasePredictor): + async def predict(self) -> AsyncIterator[Path]: + done = False + while not done: + output_path, done = do_stuff() + yield Path(output_path) +``` + +If you're streaming text output, you can use `ConcatenateIterator` to hint that the output should be concatenated together into a single string. This is useful on Replicate to display the output as a string instead of a list of strings. + +```py +from cog import BasePredictor, Path, ConcatenateIterator + +class Predictor(BasePredictor): + def predict(self) -> ConcatenateIterator[str]: + tokens = ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"] + for token in tokens: + yield token + " " +``` + +Or for async `predict()` methods, use `AsyncConcatenateIterator`: + +```py +from cog import BasePredictor, Path, AsyncConcatenateIterator + +class Predictor(BasePredictor): + async def predict(self) -> AsyncConcatenateIterator[str]: + tokens = ["The", "quick", "brown", "fox", "jumps", "over", "the", "lazy", "dog"] + for token in tokens: + yield token + " " +``` + ## Input and output types Each parameter of the `predict()` method must be annotated with a type. The method's return type must also be annotated. The supported types are: @@ -1556,7 +2041,7 @@ The `cog.Secret` type is used to signify that an input holds sensitive informati like a password or API token. `cog.Secret` is a subclass of Pydantic's [`SecretStr`](https://docs.pydantic.dev/latest/api/types/#pydantic.types.SecretStr). -Its default string representation redacts its contents to prevent accidental disclure. +Its default string representation redacts its contents to prevent accidental disclosure. You can access its contents with the `get_secret_value()` method. ```python @@ -1590,6 +2075,31 @@ any value passed to a `Secret` input is redacted after being sent to the model. > Passing secret values to untrusted models can result in > unintended disclosure, exfiltration, or misuse of sensitive data. +## `Optional` + +Optional inputs should be explicitly defined as `Optional[T]` so that type checker can warn us about error-prone `None` values. + +For example, the following code might fail if `prompt` is not specified in the inputs: + +```python +class Predictor(BasePredictor): + def predict(self, prompt: str=Input(description="prompt", default=None)) -> str: + return "hello" + prompt # TypeError: can only concatenate str (not "NoneType") to str +``` + +We can improve it by making `prompt` an `Optional[str]`. Note that `default=None` is now redundant as `Optional` implies it. + +```python +class Predictor(BasePredictor): + def predict(self, prompt: Optional[str]=Input(description="prompt")) -> str: + if prompt is None: # type check can warn us if we forget this + return "hello" + else: + return "hello" + prompt +``` + +Note that the error prone usage of `prompt: str=Input(default=None)` might throw an error in a future release of Cog. + ## `List` The List type is also supported in inputs. It can hold any supported type. @@ -1645,7 +2155,7 @@ test2 > [!NOTE] > The training API is still experimental, and is subject to change. -Cog's training API allows you to define a fine-tuning interface for an existing Cog model, so users of the model can bring their own training data to create derivative fune-tuned models. Real-world examples of this API in use include [fine-tuning SDXL with images](https://replicate.com/blog/fine-tune-sdxl) or [fine-tuning Llama 2 with structured text](https://replicate.com/blog/fine-tune-llama-2). +Cog's training API allows you to define a fine-tuning interface for an existing Cog model, so users of the model can bring their own training data to create derivative fine-tuned models. Real-world examples of this API in use include [fine-tuning SDXL with images](https://replicate.com/blog/fine-tune-sdxl) or [fine-tuning Llama 2 with structured text](https://replicate.com/blog/fine-tune-llama-2). ## How it works @@ -1679,6 +2189,30 @@ $ cat weights hello train ``` +You can also use classes if you want to run many model trainings and save on setup time. This works the same way as the [Predictor](./python.md) class with the only difference being the `train` method. + +`cog.yaml`: + +```yaml +build: + python_version: "3.10" +train: "train.py:Trainer" +``` + +`train.py`: + +```python +from cog import BasePredictor, File +import io + +class Trainer: + def setup(self) -> None: + self.base_model = ... # Load a big base model + + def train(param: str) -> File: + return self.base_model.train(param) # Train on top of a base model +``` + ## `Input(**kwargs)` Use Cog's `Input()` function to define each of the parameters in your `train()` function: @@ -1763,8 +2297,7 @@ It has three keys: [`build`](#build), [`image`](#image), and [`predict`](#predic ```yaml build: python_version: "3.11" - python_packages: - - pytorch==2.0.1 + python_requirements: requirements.txt system_packages: - "ffmpeg" - "git" @@ -1803,49 +2336,65 @@ build: When you use `cog run` or `cog predict`, Cog will automatically pass the `--gpus=all` flag to Docker. When you run a Docker image built with Cog, you'll need to pass this option to `docker run`. -### `python_packages` +### `python_requirements` -A list of Python packages to install from the PyPi package index, in the format `package==version`. For example: +A pip requirements file specifying the Python packages to install. For example: ```yaml build: - python_packages: - - pillow==8.3.1 - - tensorflow==2.5.0 + python_requirements: requirements.txt ``` +Your `cog.yaml` file can set either `python_packages` or `python_requirements`, but not both. Use `python_requirements` when you need to configure options like `--extra-index-url` or `--trusted-host` to fetch Python package dependencies. + +This follows the standard [requirements.txt](https://pip.pypa.io/en/stable/reference/requirements-file-format/) format. + To install Git-hosted Python packages, add `git` to the `system_packages` list, then use the `git+https://` syntax to specify the package name. For example: +`cog.yaml`: ```yaml build: system_packages: - "git" - python_packages: - - "git+https://github.com/huggingface/transformers" + python_requirements: requirements.txt +``` + +`requirements.txt`: +``` +git+https://github.com/huggingface/transformers ``` You can also pin Python package installations to a specific git commit: +`cog.yaml`: ```yaml build: system_packages: - "git" - python_packages: - - "git+https://github.com/huggingface/transformers@2d1602a" + python_requirements: requirements.txt +``` + +`requirements.txt`: +``` +git+https://github.com/huggingface/transformers@2d1602a ``` Note that you can use a shortened prefix of the 40-character git commit SHA, but you must use at least six characters, like `2d1602a` above. -### `python_requirements` +### `python_packages` -A pip requirements file specifying the Python packages to install. For example: +**DEPRECATED**: This will be removed in future versions, please use [python_requirements](#python_requirements) instead. + +A list of Python packages to install from the PyPi package index, in the format `package==version`. For example: ```yaml build: - python_requirements: requirements.txt + python_packages: + - pillow==8.3.1 + - tensorflow==2.5.0 ``` -Your `cog.yaml` file can set either `python_packages` or `python_requirements`, but not both. Use `python_requirements` when you need to configure options like `--extra-index-url` or `--trusted-host` to fetch Python package dependencies. +Your `cog.yaml` file can set either `python_packages` or `python_requirements`, but not both. ### `python_version` @@ -1900,6 +2449,23 @@ build: - "libavcodec-dev" ``` +## `concurrency` + +> Added in cog 0.14.0. + +This stanza describes the concurrency capabilities of the model. It has one option: + +### `max` + +The maximum number of concurrent predictions the model can process. If this is set, the model must specify an [async `predict()` method](python.md#async-predictors-and-concurrency). + +For example: + +```yaml +concurrency: + max: 10 +``` + ## `image` The name given to built Docker images. If you want to push to a registry, this should also include the registry name.