It seems like Replicate doesn't cache Docker images - hence a smaller image dramatically reduces cold-boot times. Other providers often cache images on a network volume or so, and as a result are fairly independent of image size.
As a result: it's very valuable to reduce image size when deploying to Replicate. use-cuda-base-image is sometimes pointed to as a solution, but imo this isn't well documented.
This is a request for better documentation on use-cuda-base-image (as described here: #1401) and other methods to reduce image size.