Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMERF missing URL for teachers download #2095

Open
samhodge-aiml opened this issue May 11, 2024 · 32 comments
Open

SMERF missing URL for teachers download #2095

samhodge-aiml opened this issue May 11, 2024 · 32 comments

Comments

@samhodge-aiml
Copy link

The documents say

  1. Download teacher checkpoints. Unzip their contents like so,

    teachers/
      bicycle/
        checkpoint_50000/   # Model checkpoint
        config.gin          # Gin config
      ...
    

but it is unclear where this model checkpoint comes from.

@samhodge-aiml
Copy link
Author

Evidently that is where you put he camp_zipnerf checkpoint data.

@DuVogel87
Copy link

Hello,

Would you please help me? I am trying to install SMERF, but unfortunately, the instructions from GitHub are not working for me. When I try to clone the repository, I get this error:

C:\SMERF>git clone https://github.com/smerf-3d/smerf.git
Cloning into 'smerf'...
info: please complete authentication in your browser...
remote: Repository not found.
fatal: repository 'https://github.com/smerf-3d/smerf.git/' not found

Unfortunately, logging into my GitHub account in my browser seems to have no effect.

Could you please share how you installed it on your PC?

Thanks.

@samhodge-aiml
Copy link
Author

smerf-3d git checkout is not needed you already have the code from google-research repo in the smerf folder

@DuVogel87
Copy link

Thanks for your fast answer.

I dont get it.

If I try to git clone the repo like this:

git clone https://github.com/google-research/google-research/tree/master/smerf/smerf

it is not working!

I am really not understanding how to clone Smerf to my local drive. Can you give further instructions on this issue please?

smerf git clone

@samhodge
Copy link

Try this

Clone a Specific Folder from a GitHub Repository https://medium.com/@gabrielcruz_68416/clone-a-specific-folder-from-a-github-repository-f8949e7a02b4

@DuVogel87
Copy link

Thanks. But immediatly the first step is not possible because there is no clone URL to copy.

@DuVogel87
Copy link

But anyways, with your help I figured out, what to look for. I found a nice work around and used this site to download all files:

https://download-directory.github.io/

@samhodge
Copy link

Good luck, let me know how you get on with the computational resources required to run this project

@DuVogel87
Copy link

The documents say

  1. Download teacher checkpoints. Unzip their contents like so,

    teachers/
      bicycle/
        checkpoint_50000/   # Model checkpoint
        config.gin          # Gin config
      ...
    

but it is unclear where this model checkpoint comes from.

Where did you download the teacher checkpoints?

@samhodge-aiml
Copy link
Author

I copied them from the CamP ZipNeRF model where I trained the radiance field using that project's code base and runtime environment with my own data.

@duckworthd
Copy link
Contributor

SMERF author here. Teacher checkpoints are still awaiting legal approval for release. In the meantime, I recommend training models using the camp_zipnerf codebase.

@samhodge
Copy link

Thanks @duckworthd one step ahead of you there. Awaiting computational outcome of the SMERF training

@samhodge-aiml
Copy link
Author

samhodge-aiml commented May 31, 2024

@duckworthd I am not sure whom is repsonsible for the training code but there is a silly JAX thing where map.tree becomes map_tree if you are super keen I can do it as a PR.

but it is work you knowing about it because the current code doesn't work with the requirements.txt frozen version of JAX

similarly there is a Python versioning thing with cuda versions that says JAX can use cudnn 8.9+ for cuda_12, which causes a lack of GPU acceleration given that cudnn 9+ is also a thing unless you are on the ball and clamp the cudnn version to the cudnn version for cuda_12

This may be invisible to you if you are all 100% accelerators of a different class, but for the CUDA users over here in the rest of the world it is a bit of a stumbling block.

@duckworthd
Copy link
Contributor

Thanks for the heads-up, @samhodge-aiml. To be clear, you're saying that jax.map.tree must be replaced with jax.tree_util.tree_map to function with the pinned version of JAX? And that the CUDNN version needs to be pinned to 8.9?

@samhodge-aiml
Copy link
Author

Thanks for the heads-up, @samhodge-aiml. To be clear, you're saying that jax.map.tree must be replaced with jax.tree_util.tree_map to function with the pinned version of JAX? And that the CUDNN version needs to be pinned to 8.9?

I think it is just jax.tree.map to jax.tree_map worked for me, I hope, but if your syntax is more correct, then by all means do that.

I think it is sort of two layers of indirection, the cudnn for cuda 12.3 needs to be pinned to cudnn 8.9

I think the constraint is only for cuda 12 not cuda 12.4 so it grabs a dependancy for cudnn for 12.4 which is cudnn 9.1 but then that version of cudnn is linked against cuda 12.4 and so the symbols cannot load and as a result jax still works but falls back to no GPU acceleration, so unless you have your eyes on the prize, your CPUs will be hammered and your GPU will be idle, if you want things to be set and forget you really need a test to see that there is some sort of acceleration in place before you start training.
it was basically a future incompatibility that could have not been predicted at the time

@duckworthd
Copy link
Contributor

References to jax.tree.map have been changed to jax.tree_util.tree_map. I also added information in the instructions indicating that this code was found working with CUDA 12.3 and cuDNN 8.9, but do not enforce that in any way.

@samhodge
Copy link

samhodge commented Jun 4, 2024

Notes are good

@sumanttyagi
Copy link

@duckworthd @samhodge whyi am getting this error when trying to setup the env
python3 -m pip install --upgrade "jax[cuda12_pip]==0.4.23"
-f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

INFO: pip is looking at multiple versions of jax[cuda12-pip] to determine which version is compatible with other requirements. This could take a while.
ERROR: Ignored the following versions that require a different python version: 1.21.2 Requires-Python >=3.7,<3.11; 1.21.3 Requires-Python >=3.7,<3.11; 1.21.4 Requires-Python >=3.7,<3.11; 1.21.5 Requires-Python >=3.7,<3.11; 1.21.6 Requires-Python >=3.7,<3.11; 1.6.2 Requires-Python >=3.7,<3.10; 1.6.3 Requires-Python >=3.7,<3.10; 1.7.0 Requires-Python >=3.7,<3.10; 1.7.1 Requires-Python >=3.7,<3.10; 1.7.2 Requires-Python >=3.7,<3.11; 1.7.3 Requires-Python >=3.7,<3.11; 1.8.0 Requires-Python >=3.8,<3.11; 1.8.0rc1 Requires-Python >=3.8,<3.11; 1.8.0rc2 Requires-Python >=3.8,<3.11; 1.8.0rc3 Requires-Python >=3.8,<3.11; 1.8.0rc4 Requires-Python >=3.8,<3.11; 1.8.1 Requires-Python >=3.8,<3.11
ERROR: Could not find a version that satisfies the requirement jaxlib==0.4.23+cuda12.cudnn89; extra == "cuda12_pip" (from jax[cuda12-pip]) (from versions: 0.4.13, 0.4.14, 0.4.16, 0.4.17, 0.4.18, 0.4.19, 0.4.20, 0.4.21, 0.4.22, 0.4.23, 0.4.25, 0.4.26, 0.4.27, 0.4.28, 0.4.29, 0.4.30)
ERROR: No matching distribution found for jaxlib==0.4.23+cuda12.cudnn89; extra == "cuda12_pip"

@samhodge
Copy link

Because you need to be more explicit

jaxlib[cuda_12]==0.4.23+cuda12.cudnn89

Also refer to

https://pypi.org/project/nvidia-cudnn-cu12/

https://pypi.org/project/nvidia-cudnn-cu12/8.9.7.29/

https://anaconda.org/nvidia/cuda-toolkit

conda install nvidia/label/cuda-12.3.2::cuda-toolkit

@samhodge
Copy link

Which Python version did you use?

I think 3.8 or 3.9 are needed and it indicates you are using 3.7

@sumanttyagi
Copy link

sumanttyagi commented Jun 27, 2024

Which Python version did you use?

I think 3.8 or 3.9 are needed and it indicates you are using 3.7

@samhodge i have created a new env as mentioned on the branch - https://github.com/google-research/google-research/tree/master/smerf

Create a conda environment with Python 3.11.

conda create --name smerf-env python=3.11
conda activate smerf-env
Install JAX with GPU support.
python3 -m pip install --upgrade "jax[cuda12_pip]==0.4.23" \
  -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

@sumanttyagi
Copy link

sumanttyagi commented Jun 27, 2024

Because you need to be more explicit

jaxlib[cuda_12]==0.4.23+cuda12.cudnn89

Also refer to

https://pypi.org/project/nvidia-cudnn-cu12/

https://pypi.org/project/nvidia-cudnn-cu12/8.9.7.29/

https://anaconda.org/nvidia/cuda-toolkit

conda install nvidia/label/cuda-12.3.2::cuda-toolkit

@samhodge but the same command is not mentioned in the documentation - https://github.com/google-research/google-research/tree/master/smerf

Also do we really need a 8 or 16 GPU to train model can we do this with 2 GPU with lower image size ?

@samhodge
Copy link

The instructions are incorrect that is why I told you what to do

@sumanttyagi
Copy link

sumanttyagi commented Jun 27, 2024

+cuda12.cudnn89

@samhodge with your instruction i am still facing an issue as below ,i have edited the command to - python3 -m pip install --upgrade "jax[cuda12_pip]==0.4.23+cuda12.cudnn89" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Please let me know if i need to make any other changes too ?

python -m pip install --upgrade "jax[cuda12]==0.4.23+cuda12.cudnn89" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
Looking in links: https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
ERROR: Ignored the following yanked versions: 0.2.23, 0.3.18, 0.4.0, 0.4.15
ERROR: Could not find a version that satisfies the requirement jax==0.4.23+cuda12.cudnn89 (from versions: 0.0, 0.1, 0.1.1, 0.1.2, 0.1.3, 0.1.4, 0.1.5, 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.1.10, 0.1.11, 0.1.12, 0.1.13, 0.1.14, 0.1.15, 0.1.16, 0.1.18, 0.1.19, 0.1.20, 0.1.21, 0.1.22, 0.1.23, 0.1.24, 0.1.25, 0.1.26, 0.1.27, 0.1.28, 0.1.29, 0.1.30, 0.1.31, 0.1.32, 0.1.33, 0.1.34, 0.1.35, 0.1.36, 0.1.37, 0.1.38, 0.1.39, 0.1.40, 0.1.41, 0.1.42, 0.1.43, 0.1.44, 0.1.45, 0.1.46, 0.1.47, 0.1.48, 0.1.49, 0.1.50, 0.1.51, 0.1.52, 0.1.53, 0.1.54, 0.1.55, 0.1.56, 0.1.57, 0.1.58, 0.1.59, 0.1.60, 0.1.61, 0.1.62, 0.1.63, 0.1.64, 0.1.65, 0.1.66, 0.1.67, 0.1.68, 0.1.69, 0.1.70, 0.1.71, 0.1.72, 0.1.73, 0.1.74, 0.1.75, 0.1.76, 0.1.77, 0.2.0, 0.2.1, 0.2.2, 0.2.3, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.2.10, 0.2.11, 0.2.12, 0.2.13, 0.2.14, 0.2.15, 0.2.16, 0.2.17, 0.2.18, 0.2.19, 0.2.20, 0.2.21, 0.2.22, 0.2.24, 0.2.25, 0.2.26, 0.2.27, 0.2.28, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.3.6, 0.3.7, 0.3.8, 0.3.9, 0.3.10, 0.3.11, 0.3.12, 0.3.13, 0.3.14, 0.3.15, 0.3.16, 0.3.17, 0.3.19, 0.3.20, 0.3.21, 0.3.22, 0.3.23, 0.3.24, 0.3.25, 0.4.1, 0.4.2, 0.4.3, 0.4.4, 0.4.5, 0.4.6, 0.4.7, 0.4.8, 0.4.9, 0.4.10, 0.4.11, 0.4.12, 0.4.13, 0.4.14, 0.4.16, 0.4.17, 0.4.18, 0.4.19, 0.4.20, 0.4.21, 0.4.22, 0.4.23, 0.4.24, 0.4.25, 0.4.26, 0.4.27, 0.4.28, 0.4.29, 0.4.30)
ERROR: No matching distribution found for jax==0.4.23+cuda12.cudnn89

@samhodge
Copy link

See

#2095 (comment)

@samhodge
Copy link

image

@sumanttyagi
Copy link

@samhodge is the script working with only linux or windows systems ?
as driver stated is available for Linux only
image

@samhodge
Copy link

I have run it on Ubuntu 22.04 LTS

You can try whatever you like

@samhodge
Copy link

I cannot support you anymore, you will have to understand the instructions given CUDA 12.3 CuDNN 8.9 for CUDA 12.3 and then JAX that supports those CUDA and CUDNN versions

Via conda and python package index

@samhodge-aiml
Copy link
Author

samhodge-aiml commented Jun 27, 2024 via email

@samhodge
Copy link

samhodge commented Jun 27, 2024

https://jax.readthedocs.io/en/latest/installation.html

There is no CUDA support for JAX on Windows

image

@sumanttyagi
Copy link

https://jax.readthedocs.io/en/latest/installation.html

There is no CUDA support for JAX on Windows

image

there is support if we use Windows WSL2, x86_64, but thanks let me try it with Linux setup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants