Using Apple's Sharp to generate 3D representation more accurately and faster #584

BrawnyAi · 2025-12-27T17:46:49Z

BrawnyAi
Dec 27, 2025

I was reading about the new Apple's SHARP model that can create 3D representation of 2D images. It might actually be better than the current depth mapping technology. What do you guys think?

https://apple.github.io/ml-sharp/

KolaKater · 2025-12-27T21:30:56Z

KolaKater
Dec 27, 2025

I saw the sample, it looks very good, hope it's possible to use this model.

0 replies

nagadomi · 2025-12-28T00:23:09Z

nagadomi
Dec 28, 2025
Maintainer

Gaussian Splatting seems more promising than SBS, as creating a client would allow users to change the viewpoint to some extent.
However, regarding ml-sharp, there are no models compatible with open-source licenses, its workflow is completely different, and running gsplat on Windows is cumbersome, so I don't plan to support it in iw3 for now.
I'm personally experimenting with stereo rendering using ml-sharp, and if I get useful results, I might implement it as a ComfyUI custom node.

14 replies

gituser123456789000 Dec 29, 2025

Gaussian Splatting seems more promising than SBS, as creating a client would allow users to change the viewpoint to some extent. However, regarding ml-sharp, there are no models compatible with open-source licenses, its workflow is completely different, and running gsplat on Windows is cumbersome, so I don't plan to support it in iw3 for now. I'm personally experimenting with stereo rendering using ml-sharp, and if I get useful results, I might implement it as a ComfyUI custom node.

Gaussian splatting would be nearing the ability to convert 2D to VR?

gituser123456789000 Dec 29, 2025

The previews look crazy good 2D to 3D world / VR. Definitely looks like you're inside the image, inside a 3D world. This is what we need, endgame. I want to be inside Jurassic Park, etc.

nagadomi Dec 29, 2025
Maintainer

I don't know if it already exists, but it's obvious that Apple intends this to be used with a VR Gaussian Splatting viewer capable of stereo rendering.

QuickscopingFTW Dec 29, 2025

the LLMs are being very useless with helping too, nothing they suggested has worked haha

maybe this is helpfull : ZeroVa sharp-synth

seems this is only available on mac sadly

gituser123456789000 Dec 29, 2025

I don't know if it already exists, but it's obvious that Apple intends this to be used with a VR Gaussian Splatting viewer capable of stereo rendering.

It doesn't exist as far as I'm aware, but I haven't been paying attention for months. I know it was a stated goal of Owl3D.. 2D to VR180.. but the most recently comments I can find about it, I doesn't look like they've been able to do it.

Their site seems to have dropped the claim that 2D to VR180 is Coming Soon that was there for years

But it says"
"Convert Video to VR
Our software makes 2D to VR easy. Owl3D is a VR video converter that supports various 3D formats such as MV-HEVC, side-by-side, top-bottom, anaglyph, RGBD and more. Compatible with AR/VR devices like Apple Vision Pro, Meta Quest, holographic displays, and 3D TVs."

I'm not sure if MV-HEVC could be it or not, or part of the way toward the goal.

gituser123456789000 · 2025-12-29T07:54:49Z

gituser123456789000
Dec 29, 2025

Trying to get it running locally. These projects need installation instructions for dummies.

**"Getting started

We recommend to first create a python environment:
conda create -n sharp python=3.13

Afterwards, you can install the project using
pip install -r requirements.txt

To test the installation, run
sharp --help"**

You need Anaconda / miniconda...
Seems they're trying to force everyone to register an account now to use Anaconda..
Here's a link to bypass that though: https://www.anaconda.com/download/success

Installed miniconda..
Did step 1... ran conda create -n sharp python=3.13 in Anaconda Prompt

..Step 2... "Afterwards, you can install the project using
pip install -r requirements.txt"

Well that doesn't work, because steps are missing and requirements.txt doesn't exist. Why not put all of the steps?I always say, for some of these projects, the people/teams are so smart to create the projects, but such dummies to not provide simple instructions for people to use it. They assume people know what they know and so they skip steps.

So I'm guessing I have to 'git pull' the repo? Where? How? I don't know. Doesn't work anyway,

I'm in Anaconda prompt.. did the first step... ran conda activate sharp to be in the sharp environment..
ran git pull https://github.com/apple/ml-sharp.git

Error message:
fatal: not a git repository (or any of the parent directories): .git

37 replies

gituser123456789000 Dec 30, 2025

pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu128

But still on Python 3.13, because that's ml-sharp step 1 of installation.
Should I wait for 3.13 cu128 wheels
Or can I change Python version in my environment
Or should I delete everything and restart ml-sharp install but use a different Python version in the install steps

nagadomi Dec 30, 2025
Maintainer

Python 3.13 support is already built. (cp313 version)
For cu130, the fix probably worked and the build should finish in about 30 minutes.
The built files will be listed here.
https://github.com/nagadomi/gsplat-windows-builds/releases

gituser123456789000 Dec 30, 2025

ok, going back to cu130 then

pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu130

Successfully installed torch-2.9.1+cu130 torchvision-0.24.1+cu130

go to:
https://github.com/nagadomi/gsplat-windows-builds/releases
Make sure to "Show all 14 assets"

see
gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl

refer to: #584 (reply in thread)

"Download the appropriate wheel"
Downloaded gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl

"Then clean first and install it with:" :
python -m pip uninstall gsplat -y
python -m pip install gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl

Run:
python -m pip uninstall gsplat -y

Run:
python -m pip install gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl

"WARNING: Requirement 'gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl' looks like a filename, but the file does not exist
Processing c:\users\YOURUSERNAME\gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'C:\Users\YOURUSERNAME\gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl'"

Move downloaded gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl to ml-sharp folder

Run:
python -m pip install gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl

"WARNING: Requirement 'gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl' looks like a filename, but the file does not exist
Processing c:\users\YOURUSERNAME\gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl
ERROR: Could not install packages due to an OSError: [Errno 2] No such file or directory: 'C:\Users\YOURUSERNAME\gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl'"

Double check downloaded file name:
gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl
Is that the same as:
gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl

Yes

Wonder to self.. Where else to put gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl file

Error seems to show it's not looking in ml-sharp folder

I forgot to cd ml-sharp because I was doing all of the torch uninstalls and reinstalls

Run
cd ml-sharp

Run:
python -m pip install gsplat-1.5.3+pt29cu130-cp313-cp313-win_amd64.whl

Bunch of stuff installed

Now attempting to run the rendering
Have 1 .PLY file in output folder already, so for now, running just the render command instead of creating .PLY and rendering command

Run:
sharp render -i C:/Users/YOURUSERNAME/ml-sharp/output -o C:/Users/YOURUSERNAME/ml-sharp/output/renderings

it worked. video output moving in a horizontal circular motion

DepthAuteur Dec 31, 2025

Hi there, folks,

And many thanks for all of you testing and posting your results, code etc. My curiosity drove me to recreate this project. In this regard I am able to create my first *.ply - file thanks to @nagadomi's gsplat-1.5.3+pt29cu128-cp313-cp313-win_amd64.whl also CUDA-supported using my RTX 5090.

And although I receive a finished *.ply-file that I can also display, I also get the following error message after the last step: 2025-12-31 15:08:10,062 | INFO | Rendering trajectory to F:\Tools\ml-sharp\output\Garten.mp4. Which means that no *.mp4-file is created.

The following additional error message is displayed.
W1231 15:08:10.809000 15996 site-packages\torch\utils\cpp_extension.py:466] Error checking compiler version for cl: [WinError 2] The system cannot find the specified file. W1231 15:08:10.810000 15996 site-packages\torch\utils\cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. W1231 15:08:10.810000 15996 site-packages\torch\utils\cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.

Are there any specific Visual Studio Build Tools 2022 I must install?

Happy 3Ding you all and thanks for helping!

Best wishes... TeeJay-NLD

nagadomi Jan 1, 2026
Maintainer

@TeeJay-NLD
I added an automatic installer to https://github.com/nagadomi/gsplat-windows-builds
I cannot address issues in this messy discussion thread, so if you encounter any problems related to gsplat, please post them on the gsplat-windows-builds issue tracker.

DepthAuteur · 2025-12-29T22:16:01Z

DepthAuteur
Dec 29, 2025

Hi there @QuickscopingFTW and @nagadomi,
Just because I do not know how this all relates specific to IW3 please allow me to ask if or how IW3 could benefit from this so called SHARP-model in some future releases? Will this ultimately outperform any other already implemented model for video conversions? Is there any implementation path for IW3?
Best wishes... TeeJay-NLD

3 replies

QuickscopingFTW Dec 30, 2025

i dont know if it will be too useful to iw3 as it seems to use a very different system and is very slow compared to modes currently in iw3

QuickscopingFTW Dec 30, 2025

who knows though

nagadomi Dec 30, 2025
Maintainer

@TeeJay-NLD
I don't plan to support this in iw3 for now, but it's generally a good thing when I gain the ability to do more.
For example, I used to say that gsplat’s Windows support was too cumbersome, but since I've now verified how to build a precompiled Windows wheel via GitHub Actions, I can't use that as an excuse anymore.

Also, I probably shouldn't say this since I know you're supporting me on Patreon, but I've made very little profit from iw3, so spending time on it is basically a loss for me. The only reason that doesn't entirely discourage me is that I personally use iw3 myself. So, there's no difference in priority for the things I use.
That said, I agree that this might not be the right place for such discussions.

nagadomi · 2025-12-30T02:46:36Z

nagadomi
Dec 30, 2025
Maintainer

The following is a simple way to perform stereo rendering with ml-sharp. output examples see #584 (reply in thread)
Replace the render_gaussians() function in src/sharp/cli/render.py with the following.

def render_gaussians(
    gaussians: Gaussians3D,
    metadata: SceneMetaData,
    output_path: Path,
    params: camera.TrajectoryParams | None = None,
) -> None:
    """Render a single gaussian checkpoint file."""
    (width, height) = metadata.resolution_px
    f_px = metadata.focal_length_px

    if params is None:
        params = camera.TrajectoryParams()

    if not torch.cuda.is_available():
        raise RuntimeError("Rendering a checkpoint requires CUDA.")

    device = torch.device("cuda")

    intrinsics = torch.tensor(
        [
            [f_px, 0, (width - 1) / 2.0, 0],
            [0, f_px, (height - 1) / 2.0, 0],
            [0, 0, 1, 0],
            [0, 0, 0, 1],
        ],
        device=device,
        dtype=torch.float32,
    )
    camera_model = camera.create_camera_model(
        gaussians, intrinsics, resolution_px=metadata.resolution_px
    )

    # Right camera offset: Increasing this value will increase the 3D strength, making objects appear smaller.
    # TODO: I haven't looked into the units or scene normalization in detail yet.
    baseline = 0.065
    # Number of camera animation loops
    params.num_repeats = 3

    trajectory = camera.create_eye_trajectory(
        gaussians, params, resolution_px=metadata.resolution_px, f_px=f_px
    )
    renderer = gsplat.GSplatRenderer(color_space=metadata.color_space)
    video_writer = io.VideoWriter(output_path)

    for _, eye_position in enumerate(trajectory):
        camera_info = camera_model.compute(eye_position)
        rendering_output = renderer(
            gaussians.to(device),
            extrinsics=camera_info.extrinsics[None].to(device),
            intrinsics=camera_info.intrinsics[None].to(device),
            image_width=camera_info.width,
            image_height=camera_info.height,
        )
        # Left view
        color_l = (rendering_output.color[0].permute(1, 2, 0) * 255.0).to(dtype=torch.uint8)
        depth_l = rendering_output.depth[0]

        # Set the right camera and render the right camera view
        eye_position_r = eye_position.clone()
        eye_position_r[0] += baseline
        camera_info = camera_model.compute(eye_position_r)
        rendering_output = renderer(
            gaussians.to(device),
            extrinsics=camera_info.extrinsics[None].to(device),
            intrinsics=camera_info.intrinsics[None].to(device),
            image_width=camera_info.width,
            image_height=camera_info.height,
        )
        color_r = (rendering_output.color[0].permute(1, 2, 0) * 255.0).to(dtype=torch.uint8)
        depth_r = rendering_output.depth[0]

        # Pack the left and right views into SBS format.
        color = torch.cat((color_l, color_r), dim=1)
        depth = torch.cat((depth_l, depth_r), dim=0)
        video_writer.add_frame(color, depth)
    video_writer.close()

diff

diff --git a/src/sharp/cli/render.py b/src/sharp/cli/render.py
index 22c0bf8..953ed52 100644
--- a/src/sharp/cli/render.py
+++ b/src/sharp/cli/render.py
@@ -99,6 +99,12 @@ def render_gaussians(
         gaussians, intrinsics, resolution_px=metadata.resolution_px
     )
 
+    # Right camera offset: Increasing this value will increase the 3D strength, making objects appear smaller.
+    # TODO: I haven't looked into the units or scene normalization in detail yet.
+    baseline = 0.065
+    # Number of camera animation loops
+    params.num_repeats = 3
+
     trajectory = camera.create_eye_trajectory(
         gaussians, params, resolution_px=metadata.resolution_px, f_px=f_px
     )
@@ -114,7 +120,26 @@ def render_gaussians(
             image_width=camera_info.width,
             image_height=camera_info.height,
         )
-        color = (rendering_output.color[0].permute(1, 2, 0) * 255.0).to(dtype=torch.uint8)
-        depth = rendering_output.depth[0]
+        # Left view
+        color_l = (rendering_output.color[0].permute(1, 2, 0) * 255.0).to(dtype=torch.uint8)
+        depth_l = rendering_output.depth[0]
+
+        # Set the right camera and render the right camera view
+        eye_position_r = eye_position.clone()
+        eye_position_r[0] += baseline
+        camera_info = camera_model.compute(eye_position_r)
+        rendering_output = renderer(
+            gaussians.to(device),
+            extrinsics=camera_info.extrinsics[None].to(device),
+            intrinsics=camera_info.intrinsics[None].to(device),
+            image_width=camera_info.width,
+            image_height=camera_info.height,
+        )
+        color_r = (rendering_output.color[0].permute(1, 2, 0) * 255.0).to(dtype=torch.uint8)
+        depth_r = rendering_output.depth[0]
+
+        # Pack the left and right views into SBS format.
+        color = torch.cat((color_l, color_r), dim=1)
+        depth = torch.cat((depth_l, depth_r), dim=0)
         video_writer.add_frame(color, depth)
     video_writer.close()

The video encoding options can be changed as follows.

diff --git a/src/sharp/utils/io.py b/src/sharp/utils/io.py
index 07a98be..b3e2e96 100644
--- a/src/sharp/utils/io.py
+++ b/src/sharp/utils/io.py
@@ -186,7 +186,7 @@ class VideoWriter(OutputWriter):
         """Initialize VideoWriter."""
         output_path.parent.mkdir(exist_ok=True, parents=True)
         self.output_path = output_path
-        self.image_writer = iio.get_writer(output_path, fps=fps)
+        self.image_writer = iio.get_writer(output_path, fps=fps, codec="libx264", ffmpeg_params=["-crf", "16"])
 
         self.max_depth_estimate = None
         if render_depth:

EDIT:
If you want to change the zero parallax point (convergence, popup), modify the following lines.

diff --git a/src/sharp/utils/camera.py b/src/sharp/utils/camera.py
index cad9a0a..0638bba 100644
--- a/src/sharp/utils/camera.py
+++ b/src/sharp/utils/camera.py
@@ -223,8 +223,8 @@ def create_camera_model(
         screen_extrinsics=screen_extrinsics,
         screen_intrinsics=screen_intrinsics,
         screen_resolution_px=screen_resolution_px,
-        focus_depth_quantile=0.1,
-        min_depth_focus=2.0,
+        focus_depth_quantile=0.5,
+        min_depth_focus=0.0,
         lookat_mode=lookat_mode,
     )
     return camera_model

The camera in sharp is already converged, as the left and right cameras are oriented toward the lookat_point. However, that point is currently set outside the 3DGS scene due to a large min_depth_focus value. Setting min_depth_focus=0.0 allows the Z position within the 3DGS scene to be specified using focus_depth_quantile.

NOTE:
For instructions on how to run gsplat on Windows, see comment #584 (reply in thread). For other PyTorch or CUDA version support, please post in the issues of the gsplat-windows-builds repository.

24 replies

gituser123456789000 Dec 30, 2025

Non-cherry-picked results... picked a decent, non-slop image to start with, but posted results either way

Cross-eyed:
https://github.com/user-attachments/assets/9d7d0b01-0b55-4dcb-8f45-5bf2db5b86f9

SBS:
https://github.com/user-attachments/assets/d411883d-ea5d-4283-9a85-106ae9b0a50d

DepthAuteur Dec 30, 2025

@gituser123456789000 ... thanks so much for sharing! After watching it on my stereoscopic passive 3D monitor in full 3D glory I really have to admit it made my mouth a little watering :-). Even if I tried hard I could not recognize any of the common conversion artifacts I encounter using VDA or ADv3 like some blurring inpainting areas, distracting misplacement of objects etc. At the most outer edges all I can see are some fuzzy objects which, in my opinion, is due to the source material

I keep my fingers crossed, that this very promising conversion method will soon be implemented into IW3 or at least in a separate application. Looking so much forward, @gituser123456789000 and @nagadomi, for some more information about what your plans are and for certain for a kind of compiled manual on how to re-build this in order to be able to understand and verify the results with my own video material.

Current conclusion: I am on fire... at least quite a bit :-).

Take care you all. TeeJay-NLD

KolaKater Dec 30, 2025

Non-cherry-picked results... picked a decent, non-slop image to start with, but posted results either way

Cross-eyed: https://github.com/user-attachments/assets/9d7d0b01-0b55-4dcb-8f45-5bf2db5b86f9

SBS: https://github.com/user-attachments/assets/d411883d-ea5d-4283-9a85-106ae9b0a50d

Thank you for sharing. I could use Apple's sharp yesterday. but today it's not working anymore.

gituser123456789000 Dec 30, 2025

Again, non-cherrypicked result... picked the photo, not the greatest quality, to fit in upload limitations.. uploaded the result either way

This shows facial accuracy.. nose looks like a nose.. eyebrow ridge and cheek bones out vs eye socket in at natural depths, lips, etc.. and again, not the highest quality image, due to upload limitations, it better with a better quality input image.

This is only 417 x 626 originally, upscaled by the website to 740 x 1110 at 80% jpeg quality

Cross-eyed
https://github.com/user-attachments/assets/f99dc557-3366-441a-aba4-197e1b52f132

SBS:
https://github.com/user-attachments/assets/b5b60d7e-5c28-4bc1-a1ca-4de58ce4a191

gituser123456789000 Dec 30, 2025

@TeeJay-NLD
Yeah, the not so great source material from Google images and having to lower resolution hurts it as far as clarity and some details. Using better images and higher res, it looks even better. And your points are spot on.. none of the regular artifacts some of us are used to, and seemingly much better at object placement in the proper 3d planes

BrawnyAi · 2025-12-30T05:20:15Z

BrawnyAi
Dec 30, 2025
Author

What's your patreon nagadomi? Or Ko-fi?

…

On Tue, Dec 30, 2025 at 2:14 AM nagadomi ***@***.***> wrote: Since it may not work if the versions don't match, you'll need to include instructions for installing the specified version of torch. You can check the install commands https://pytorch.org/get-started/locally/ To uninstall torch torchvision, use python -m pip uninstall torch torchvision -y. I'm not using conda, so I don't know the conda commands. — Reply to this email directly, view it on GitHub <#584 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/BCE2ERPAOAZ6XZLKJPOZE334EIC2DAVCNFSM6AAAAACQEB53MWVHI2DSMVQWIX3LMV43URDJONRXK43TNFXW4Q3PNVWWK3TUHMYTKMZXGI3DKOI> . You are receiving this because you authored the thread.Message ID: ***@***.***>

1 reply

nagadomi Dec 30, 2025
Maintainer

On the right side of this repository, there is Sponsor this project section.
https://www.patreon.com/nagadomi

KolaKater · 2025-12-30T12:17:55Z

KolaKater
Dec 30, 2025

0 replies

KolaKater · 2025-12-30T21:36:16Z

KolaKater
Dec 30, 2025

Now it's working again. Apple's sharp is really amazing :-D
It's possible only to convert the image into 3D SBS in jpg, and the speed is much quicker than in video.
I will try to use it to convert a video and upload it.

0 replies

KolaKater · 2025-12-31T00:45:08Z

KolaKater
Dec 31, 2025

Here is the 3D SBS video conversion with Apple's sharp.
I had to compress it to a smaller size due to the platform’s upload restrictions.

breakdance-Apples-sharp.10bit.1.mp4

10 replies

Billynom8 Dec 31, 2025

The resulting video looks good except for the floating convergence. if this was depthmap then i would say it is the normalization being the cause. Since this is GS then i dont know and can only point at the similarity of the problem.

KolaKater Dec 31, 2025

The resulting video looks good except for the floating convergence. if this was depthmap then i would say it is the normalization being the cause. Since this is GS then i dont know and can only point at the similarity of the problem.

I think it's because of the modified render.py and predict.py. At the moment I don't know how to improve them

KolaKater Dec 31, 2025

The resulting video looks good except for the floating convergence. if this was depthmap then i would say it is the normalization being the cause. Since this is GS then i dont know and can only point at the similarity of the problem.
I just found out it's a problem with predict.py. And also I recognized, if the value of baseline (controls 3D intensity, in reder.py) is reduced, it's really good to use it to convert 2D videos into 3D SBS with stable movement, but the depth is also reduced. here is the improved predict.py:

"""Contains `sharp predict` CLI implementation.

For licensing see accompanying LICENSE file.
Copyright (C) 2025 Apple Inc. All Rights Reserved.
"""

from __future__ import annotations

import logging
from pathlib import Path

import click
import numpy as np
import torch
import torch.nn.functional as F
import torch.utils.data

from sharp.models import (
    PredictorParams,
    RGBGaussianPredictor,
    create_predictor,
)
from sharp.utils import io
from sharp.utils import logging as logging_utils
from sharp.utils.gaussians import (
    Gaussians3D,
    SceneMetaData,
    save_ply,
    unproject_gaussians,
)

from .render import render_gaussians

LOGGER = logging.getLogger(__name__)

DEFAULT_MODEL_URL = "https://ml-site.cdn-apple.com/models/sharp/sharp_2572gikvuh.pt"


@click.command()
@click.option(
    "--image-only",
    is_flag=True,
    help="Only save the first rendered SBS frame as a JPG image instead of generating a video.",
)
@click.option(
    "-i",
    "--input-path",
    type=click.Path(path_type=Path, exists=True),
    help="Path to an image or containing a list of images.",
    required=True,
)
@click.option(
    "-o",
    "--output-path",
    type=click.Path(path_type=Path, file_okay=False),
    help="Path to save the predicted Gaussians and renderings.",
    required=True,
)
@click.option(
    "-c",
    "--checkpoint-path",
    type=click.Path(path_type=Path, dir_okay=False),
    default=None,
    help="Path to the .pt checkpoint. If not provided, downloads the default model automatically.",
    required=False,
)
@click.option(
    "--render/--no-render",
    "with_rendering",
    is_flag=True,
    default=False,
    help="Whether to render trajectory for checkpoint.",
)
@click.option(
    "--device",
    type=str,
    default="default",
    help="Device to run on. ['cpu', 'mps', 'cuda']",
)
@click.option("-v", "--verbose", is_flag=True, help="Activate debug logs.")
def predict_cli(
    input_path: Path,
    output_path: Path,
    checkpoint_path: Path,
    with_rendering: bool,
    device: str,
    verbose: bool,
    image_only: bool,
):
    """Predict Gaussians from input images."""
    logging_utils.configure(logging.DEBUG if verbose else logging.INFO)

    extensions = io.get_supported_image_extensions()

    image_paths = []
    if input_path.is_file():
        if input_path.suffix.lower() in extensions:
            image_paths = [input_path]
    else:
        for ext in extensions:
            image_paths.extend(input_path.rglob(f"*{ext}"))

    # 去重并排序，避免“1 变 2，11 变 22”
    image_paths = sorted(set(image_paths))

    if len(image_paths) == 0:
        LOGGER.info("No valid images found. Input was %s.", input_path)
        return

    LOGGER.info("Processing %d valid image files.", len(image_paths))

    if device == "default":
        if torch.cuda.is_available():
            device = "cuda"
        elif torch.mps.is_available():
            device = "mps"
        else:
            device = "cpu"
    LOGGER.info("Using device %s", device)

    if with_rendering and device != "cuda":
        LOGGER.warning("Can only run rendering with gsplat on CUDA. Rendering is disabled.")
        with_rendering = False

    # Load or download checkpoint
    if checkpoint_path is None:
        LOGGER.info("No checkpoint provided. Downloading default model from %s", DEFAULT_MODEL_URL)
        state_dict = torch.hub.load_state_dict_from_url(DEFAULT_MODEL_URL, progress=True)
    else:
        LOGGER.info("Loading checkpoint from %s", checkpoint_path)
        state_dict = torch.load(checkpoint_path, weights_only=True)

    gaussian_predictor = create_predictor(PredictorParams())
    gaussian_predictor.load_state_dict(state_dict)
    gaussian_predictor.eval()
    gaussian_predictor.to(device)

    output_path.mkdir(exist_ok=True, parents=True)

    for image_path in image_paths:
        LOGGER.info("Processing %s", image_path)
        image, _, f_px = io.load_rgb(image_path)
        height, width = image.shape[:2]
        intrinsics = torch.tensor(
            [
                [f_px, 0, (width - 1) / 2.0, 0],
                [0, f_px, (height - 1) / 2.0, 0],
                [0, 0, 1, 0],
                [0, 0, 0, 1],
            ],
            device=device,
            dtype=torch.float32,
        )
        gaussians = predict_image(gaussian_predictor, image, f_px, torch.device(device))

        LOGGER.info("Saving 3DGS to %s", output_path)
        # save_ply(gaussians, f_px, (height, width), output_path / f"{image_path.stem}.ply")

        if with_rendering:
            output_render_path = (output_path / image_path.stem).with_suffix(".mp4")
            if image_only:
                output_render_path = output_render_path.with_suffix(".jpg")
            LOGGER.info("Rendering to %s", output_render_path)

            metadata = SceneMetaData(intrinsics[0, 0].item(), (width, height), "linearRGB")
            render_gaussians(
                gaussians,
                metadata,
                output_render_path,
                image_only=image_only,
            )



@torch.no_grad()
def predict_image(
    predictor: RGBGaussianPredictor,
    image: np.ndarray,
    f_px: float,
    device: torch.device,
) -> Gaussians3D:
    """Predict Gaussians from an image."""
    internal_shape = (1536, 1536)

    LOGGER.info("Running preprocessing.")
    image_pt = torch.from_numpy(image.copy()).float().to(device).permute(2, 0, 1) / 255.0
    _, height, width = image_pt.shape
    disparity_factor = torch.tensor([f_px / width]).float().to(device)

    image_resized_pt = F.interpolate(
        image_pt[None],
        size=(internal_shape[1], internal_shape[0]),
        mode="bilinear",
        align_corners=True,
    )

    # Predict Gaussians in the NDC space.
    LOGGER.info("Running inference.")
    gaussians_ndc = predictor(image_resized_pt, disparity_factor)

    LOGGER.info("Running postprocessing.")
    intrinsics = (
        torch.tensor(
            [
                [f_px, 0, width / 2, 0],
                [0, f_px, height / 2, 0],
                [0, 0, 1, 0],
                [0, 0, 0, 1],
            ]
        )
        .float()
        .to(device)
    )
    intrinsics_resized = intrinsics.clone()
    intrinsics_resized[0] *= internal_shape[0] / width
    intrinsics_resized[1] *= internal_shape[1] / height

    # Convert Gaussians to metrics space.
    gaussians = unproject_gaussians(
        gaussians_ndc, torch.eye(4).to(device), intrinsics_resized, internal_shape
    )

    return gaussians

Billynom8 Dec 31, 2025

The resulting video looks good except for the floating convergence. if this was depthmap then i would say it is the normalization being the cause. Since this is GS then i dont know and can only point at the similarity of the problem.

I just thought of a possible solution. A floating convergence is just shifting right eye image left and right, so a video horizontal stabilizer can be employed.

nagadomi Dec 31, 2025
Maintainer

@Billynom8
I've added a note in #584 (comment) about how to change the zero parallax point (convergence, popup).

When applying ml-sharp to a video, depth/disparity, camera settings, and the 3DGS scene are all regenerated for each frame, so achieving stability seems difficult. That said, it is more stable than I expected.

KolaKater · 2025-12-31T13:15:42Z

KolaKater
Dec 31, 2025

for comparision, I only converted the right sides.
here for Apple's sharp I set baseline = 0.035:

I will make a comparison between baseline = 0.035 and baseline = 0.065 later

0 replies

KolaKater · 2025-12-31T14:36:55Z

KolaKater
Dec 31, 2025

here is a comparison between baseline = 0.035 and baseline = 0.065, for the video, baseline = 0.065 can be a big problem for some scenes:

if iw3 could use Aplle's sharp and work together with the iw3 inpaint models even with the light inpaint model, the result would be perfect

0 replies

KolaKater · 2025-12-31T17:45:32Z

KolaKater
Dec 31, 2025

Here I made a comparison between Apple's Sharp and StereoSpace, the difference is quite big, especially the fog and the floor:

8 replies

gituser123456789000 Dec 31, 2025

Floor you mentioned, i think is a tie.. 1-1 as far as 2 areas I notice being different

To the far right, the reflections of the window and the brown box go straight down 'into' the floor. That seems more natural than the reflections stretching toward the viewer in the Apple image. I'd have to compare to real like reflections, but I think straight down seems to be more natural.

But there's also a white line in the floor that goes from the right side of the plate to the front right bedpost, and this looks correctly straight in the Apple image, but incorrectly broken up in the bottom image.

Which is more significant.. I'd say the window and brown box reflection is more significant.. 1-1 in 2 areas on comparison with the floor, but the win going to the bottom image.. StereoSpace because that win is by a wider margin of difference and importance

KolaKater Jan 1, 2026

Floor you mentioned, i think is a tie.. 1-1 as far as 2 areas I notice being different

To the far right, the reflections of the window and the brown box go straight down 'into' the floor. That seems more natural than the reflections stretching toward the viewer in the Apple image. I'd have to compare to real like reflections, but I think straight down seems to be more natural.

But there's also a white line in the floor that goes from the right side of the plate to the front right bedpost, and this looks correctly straight in the Apple image, but incorrectly broken up in the bottom image.

Which is more significant.. I'd say the window and brown box reflection is more significant.. 1-1 in 2 areas on comparison with the floor, but the win going to the bottom image.. StereoSpace because that win is by a wider margin of difference and importance

yes, I think for light, glas, transparent things, water bubbles and mirror side etc., StereoSpace does better, you can check:
StereoSpace

John5897 Jan 2, 2026

Hi KolaKater, could you help me install "StereoSpace" from prs-eth (GitHub)? In Anaconda (miniconda3), when I type: "source ~/venv_stereospace/bin/activate", it tells me that the "source" command doesn't exist, and I'm stuck there... I've gotten past that step and managed to install the "requirements.txt" file. I created two folders: input and output.

So I type: python inference.py, but I get error messages; I must be missing something?

For the new Apple Sharp, I found a webpage that explains how to install everything on Windows, as well as the command line to enter to get the rendering in "output".

Actually, I'd like to find someone who would be willing to explain how to install "StereoSpace" on Windows, and what command line to type to do 2D rendering to SBS. Then, to watch it, I have what I need (PSVR2 + Virtual Desktop). Thanks in advance, guys.

KolaKater Jan 2, 2026

Hi KolaKater, could you help me install "StereoSpace" from prs-eth (GitHub)? In Anaconda (miniconda3), when I type: "source ~/venv_stereospace/bin/activate", it tells me that the "source" command doesn't exist, and I'm stuck there... I've gotten past that step and managed to install the "requirements.txt" file. I created two folders: input and output.

So I type: python inference.py, but I get error messages; I must be missing something?

For the new Apple Sharp, I found a webpage that explains how to install everything on Windows, as well as the command line to enter to get the rendering in "output".

Actually, I'd like to find someone who would be willing to explain how to install "StereoSpace" on Windows, and what command line to type to do 2D rendering to SBS. Then, to watch it, I have what I need (PSVR2 + Virtual Desktop). Thanks in advance, guys.

I still did not install it, I only use the webapp:https://huggingface.co/spaces/toshas/stereospace

John5897 Jan 2, 2026

Ah yes, okay, thank you.

KolaKater · 2026-01-02T01:35:45Z

KolaKater
Jan 2, 2026

I have tried many times, but still I have problem to convert this beatle with Apple's sharp. Maybe you have the same problem?
original 2D image:

the conversion:

0 replies

KolaKater · 2026-01-05T20:09:35Z

KolaKater
Jan 5, 2026

If the depth of sharp isn't 2x stronger than the IW3 Inpaint version's max, the 3D videos still turn out good. Just need to work on the speed, I think it's fixable 😊.

5 replies

gituser123456789000 Jan 5, 2026

Hasn't StereoSpace proven to be better, through the examples posted? This was fun to learn about and try, but focus should be there or elsewhere.

gituser123456789000 Jan 5, 2026

StereoSpace.. or VeloDepth supposedly better than Video Depth Anything... or the newly posted InfiniDepth

KolaKater Jan 5, 2026

StereoSpace.. or VeloDepth supposedly better than Video Depth Anything... or the newly posted InfiniDepth

I still did not test StereoSpace with my computer.
I am still testing sharp.
Yesterday, after rewriting render.py and predict.py, the speed was at least 10-20 times quicker. But I didn't make a backup of this turbo version, and suddenly after changing something, it's not working again. Now I begin again from the old backup.

KolaKater Jan 10, 2026

Hasn't StereoSpace proven to be better, through the examples posted? This was fun to learn about and try, but focus should be there or elsewhere.

comparing with ml-sharp, the 3D effect from StereoSpace is generally not so good, and even strange, as more as I have tested

nagadomi Jan 11, 2026
Maintainer

It changes not only high-frequency textures but also clothing patterns and even faces, only in the right side view. As a result, it causes binocular rivalry artifacts. As mentioned before, most diffusion models that render only the right view have this issue.
With the method using ml-sharp, both left and right cameras render the same 3DGS scene, so if there's an issue in scene generation, it would consistently appear in both eyes.

KolaKater · 2026-01-06T16:58:27Z

KolaKater
Jan 6, 2026

Today I tested with video again, for a 4 minutes video both sides, full sbs 1080p. It took about 8 hours

0 replies

KolaKater · 2026-01-09T14:02:51Z

KolaKater
Jan 9, 2026

a 3D convertor with ml-sharp, maybe you will be interested:
KARABA 3D SBS

0 replies

Uh oh!

Using Apple's Sharp to generate 3D representation more accurately and faster #584

Uh oh!

Replies: 16 comments · 102 replies

Uh oh!

Uh oh!

nagadomi Dec 28, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

nagadomi Dec 29, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nagadomi Dec 30, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

nagadomi Jan 1, 2026 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nagadomi Dec 30, 2025 Maintainer

Uh oh!

Uh oh!

nagadomi Dec 30, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BrawnyAi Dec 30, 2025 Author

Uh oh!

nagadomi Dec 30, 2025 Maintainer

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Replies: 16 comments 102 replies

nagadomi
Dec 28, 2025
Maintainer

nagadomi Dec 29, 2025
Maintainer

nagadomi Dec 30, 2025
Maintainer

nagadomi Jan 1, 2026
Maintainer

nagadomi Dec 30, 2025
Maintainer

nagadomi
Dec 30, 2025
Maintainer

BrawnyAi
Dec 30, 2025
Author

nagadomi Dec 30, 2025
Maintainer