feat: docker improvements #264

vladlearns · 2024-06-07T13:10:17Z

Features:

Includes CUDA 12.1 and cuDNN 8.9.7 & CUDA/cuDNN installation process. Solved: Kernel died #215 and libcudnn error. #225.
Optimized Docker layer caching for faster builds.
Added ability to download only necessary checkpoints. Solved: time and space.
Switched base image to python:3.10-slim.

This setup has been thoroughly tested to ensure stability and performance.

Prerequisites:

Join the NVIDIA Developer Program:

Go to the NVIDIA Developer Program.
Sign up for an account if you don't already have one.
Once you have an account, log in to the NVIDIA Developer website.

Download cuDNN:

Navigate to the cuDNN Archive.
Select the version you need (cuDNN 8.9.7 for CUDA 12.1).
Download the appropriate file for Linux (should look like cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz).
Place file in the root of the directory.

Run:

docker build -t openvoice .
then
docker run --gpus all -p 8888:8888 openvoice v2

tl;dr

Hey everyone,

I've been working on improving the Docker setup for OpenVoice, and I think these changes will make it much easier to run in a containerized environment.

The main issue I've seen is with CUDA and cuDNN versions not matching up, causing errors. In this Dockerfile, I've included CUDA 12.1 and cuDNN 8.9.7, which work well with the latest PyTorch that supports CUDA 12. This should help eliminate those errors.

Another improvement is the entrypoint shell script. Additionally, you can now download only the checkpoints you need: it will only download the checkpoints for the specified version, saving time and bandwidth.

I've also optimized the Docker layer cache. I rearranged some commands so that if only the local files change, Docker can reuse the base layers that have all the lengthy installations. This should speed up your builds when you're making changes to your local setup.

In summary, smoother, faster, and less prone to errors. It's now easier to spin up different versions and notebooks without CUDA issues or long installations.

This setup has been thoroughly tested to ensure stability and performance.

Give it a try and let me know how it goes! I'm always happy to hear feedback and suggestions. I think this will be a big improvement for the OpenVoice experience.

Happy Dockerizing! 🐳
Vlad

Screenshots:

Running:

Results:

vladlearns · 2024-06-07T14:25:13Z

If you want to implement a similar setup on windows, follow:

vladlearns · 2024-06-09T17:48:38Z

@wl-zhao, @yuxumin, @Zengyi-Qin, may you take a look, please. I wasn't able to assign a reviewer, this option seems to be disabled in this repo. Thank you!

oldgithubman

I approved this too soon and I don't know how to (or if I even can) retract it. This does not fix "kernel died" for me and has numerous problems (more than I care to enumerate right now). Needs to go back in the oven. I'll probably just move on. I've wasted far too much time trying to get this project to work. Good luck.

2024-06-11 08:25:45 (17.9 MB/s) - ‘checkpoints_v2_0417.zip’ saved [122086901/122086901]

Archive:  checkpoints_v2_0417.zip
   creating: /tmp/extract_temp/checkpoints_v2/
   creating: /tmp/extract_temp/checkpoints_v2/base_speakers/
   creating: /tmp/extract_temp/checkpoints_v2/base_speakers/ses/
  inflating: /tmp/extract_temp/checkpoints_v2/base_speakers/ses/fr.pth  
  inflating: /tmp/extract_temp/checkpoints_v2/base_speakers/ses/en-us.pth  
  inflating: /tmp/extract_temp/checkpoints_v2/base_speakers/ses/en-india.pth  
  inflating: /tmp/extract_temp/checkpoints_v2/base_speakers/ses/en-br.pth  
  inflating: /tmp/extract_temp/checkpoints_v2/base_speakers/ses/es.pth  
  inflating: /tmp/extract_temp/checkpoints_v2/base_speakers/ses/en-newest.pth  
  inflating: /tmp/extract_temp/checkpoints_v2/base_speakers/ses/jp.pth  
  inflating: /tmp/extract_temp/checkpoints_v2/base_speakers/ses/en-default.pth  
  inflating: /tmp/extract_temp/checkpoints_v2/base_speakers/ses/kr.pth  
  inflating: /tmp/extract_temp/checkpoints_v2/base_speakers/ses/zh.pth  
  inflating: /tmp/extract_temp/checkpoints_v2/base_speakers/ses/en-au.pth  
   creating: /tmp/extract_temp/checkpoints_v2/converter/
  inflating: /tmp/extract_temp/checkpoints_v2/converter/config.json  
  inflating: /tmp/extract_temp/checkpoints_v2/converter/checkpoint.pth  
mv: cannot move '/tmp/extract_temp/checkpoints_v2/base_speakers' to '/workspace/checkpoints_v2/base_speakers': Directory not empty
mv: cannot move '/tmp/extract_temp/checkpoints_v2/converter' to '/workspace/checkpoints_v2/converter': Directory not empty
Starting Jupyter Notebook...
[I 2024-06-11 08:25:46.049 ServerApp] jupyter_lsp | extension was successfully linked.
[I 2024-06-11 08:25:46.051 ServerApp] jupyter_server_terminals | extension was successfully linked.
[W 2024-06-11 08:25:46.052 LabApp] 'notebook_dir' has moved from NotebookApp to ServerApp. This config will be passed to ServerApp. Be sure to update your config before our next release.
[W 2024-06-11 08:25:46.053 ServerApp] notebook_dir is deprecated, use root_dir
[I 2024-06-11 08:25:46.053 ServerApp] jupyterlab | extension was successfully linked.
[I 2024-06-11 08:25:46.055 ServerApp] notebook | extension was successfully linked.
[I 2024-06-11 08:25:46.055 ServerApp] Writing Jupyter server cookie secret to /root/.local/share/jupyter/runtime/jupyter_cookie_secret
[I 2024-06-11 08:25:46.175 ServerApp] notebook_shim | extension was successfully linked.
[I 2024-06-11 08:25:46.181 ServerApp] notebook_shim | extension was successfully loaded.
[I 2024-06-11 08:25:46.182 ServerApp] jupyter_lsp | extension was successfully loaded.
[I 2024-06-11 08:25:46.182 ServerApp] jupyter_server_terminals | extension was successfully loaded.
[I 2024-06-11 08:25:46.183 LabApp] JupyterLab extension loaded from /usr/local/lib/python3.10/site-packages/jupyterlab
[I 2024-06-11 08:25:46.183 LabApp] JupyterLab application directory is /usr/local/share/jupyter/lab
[I 2024-06-11 08:25:46.183 LabApp] Extension Manager is 'pypi'.
[I 2024-06-11 08:25:46.197 ServerApp] jupyterlab | extension was successfully loaded.
[I 2024-06-11 08:25:46.198 ServerApp] notebook | extension was successfully loaded.
[I 2024-06-11 08:25:46.198 ServerApp] Serving notebooks from local directory: /workspace
[I 2024-06-11 08:25:46.198 ServerApp] Jupyter Server 2.14.1 is running at:
[I 2024-06-11 08:25:46.198 ServerApp] http://3b1a70be49a4:8888/tree?token=24d0165bed2a4d5aefc4b79f960fc5f53557b15e02dd5b69
[I 2024-06-11 08:25:46.198 ServerApp]     http://127.0.0.1:8888/tree?token=24d0165bed2a4d5aefc4b79f960fc5f53557b15e02dd5b69
[I 2024-06-11 08:25:46.198 ServerApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 2024-06-11 08:25:46.199 ServerApp] 
    
    To access the server, open this file in a browser:
        file:///root/.local/share/jupyter/runtime/jpserver-15-open.html
    Or copy and paste one of these URLs:
        http://3b1a70be49a4:8888/tree?token=24d0165bed2a4d5aefc4b79f960fc5f53557b15e02dd5b69
        http://127.0.0.1:8888/tree?token=24d0165bed2a4d5aefc4b79f960fc5f53557b15e02dd5b69
[W 2024-06-11 08:25:46.207 ServerApp] Failed to fetch commands from language server spec finder `pyright`:
    The 'nodejs' trait of a LanguageServerManager instance expected a unicode string, not the NoneType None.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 632, in get
    value = obj._trait_values[self.name]
KeyError: 'nodejs'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/jupyter_lsp/manager.py", line 279, in _autodetect_language_servers
    specs = spec_finder(self) or {}
  File "/usr/local/lib/python3.10/site-packages/jupyter_lsp/specs/utils.py", line 148, in __call__
    "argv": ([mgr.nodejs, node_module, *self.args] if is_installed else []),
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 687, in __get__
    return t.cast(G, self.get(obj, cls))  # the G should encode the Optional
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 649, in get
    value = self._validate(obj, default)
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 722, in _validate
    value = self.validate(obj, value)
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 2945, in validate
    self.error(obj, value)
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 831, in error
    raise TraitError(e)
traitlets.traitlets.TraitError: The 'nodejs' trait of a LanguageServerManager instance expected a unicode string, not the NoneType None.
[I 2024-06-11 08:25:46.208 ServerApp] Skipped non-installed server(s): bash-language-server, dockerfile-language-server-nodejs, javascript-typescript-langserver, jedi-language-server, julia-language-server, python-language-server, python-lsp-server, r-languageserver, sql-language-server, texlab, typescript-language-server, unified-language-server, vscode-css-languageserver-bin, vscode-html-languageserver-bin, vscode-json-languageserver-bin, yaml-language-server
[W 2024-06-11 08:25:46.214 ServerApp] Failed to fetch commands from language server spec finder `pyright`:
    The 'nodejs' trait of a LanguageServerManager instance expected a unicode string, not the NoneType None.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 632, in get
    value = obj._trait_values[self.name]
KeyError: 'nodejs'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/jupyter_lsp/manager.py", line 279, in _autodetect_language_servers
    specs = spec_finder(self) or {}
  File "/usr/local/lib/python3.10/site-packages/jupyter_lsp/specs/utils.py", line 148, in __call__
    "argv": ([mgr.nodejs, node_module, *self.args] if is_installed else []),
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 687, in __get__
    return t.cast(G, self.get(obj, cls))  # the G should encode the Optional
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 649, in get
    value = self._validate(obj, default)
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 722, in _validate
    value = self.validate(obj, value)
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 2945, in validate
    self.error(obj, value)
  File "/usr/local/lib/python3.10/site-packages/traitlets/traitlets.py", line 831, in error
    raise TraitError(e)
traitlets.traitlets.TraitError: The 'nodejs' trait of a LanguageServerManager instance expected a unicode string, not the NoneType None.

vladlearns · 2024-06-11T20:41:34Z

@oldmanjk Thank you for reviewing the pull and bringing this up, but kernel died can have various causes, this error message can be displayed in everything from lack of memory to missing libs etc.

First issue you are having

The folders should not exist or be populated prior to the checkpoint extraction, this is a docker container. Based on your logs, it seems the first issue you are facing is related to moving the extracted checkpoint files: the directories /workspace/checkpoints_v2/base_speakers and /workspace/checkpoints_v2/converter are not empty, preventing the extracted files from being moved.

In my testing environment, I've built the image from scratch, and it works without any issues. The folders should not exist or be populated prior to the extraction process.

Idk what your build context is, but these are some things that I can think of:

Are you using any persistent volumes or bind mounts when running the container? If so, could you try running the container without those mounts to see if the issue persists? If you have previously run the container with persistent volumes or bind mounts attached to those specific directories, the folders might persist even after the container is removed, causing conflicts when building the image again; or if you didn't clean up the previous container instances or the volumes, files might still be there. Could you please ensure that any previous container instances and associated volumes are properly cleaned up before building the image again?
Are there any additional files or directories in your build context that might be causing the folders to be populated?

Second issue

For the second one. Looks like LSP tries to autodetect and start lang servers and it looks for the nodejs executable path. I have the container running right now, here:

It looks like the only mention of jupyterlab-lsp, that requires node is your comment in this pull, so I assume that this is related to your particular setup.

@oldmanjk, could you please check if you have any user-specific jupyter configurations, additional jupyter extensions, or dev environment settings that might be enabling or interacting with the LSP extension? If so, try disabling or removing them and rebuilding the container to see if the errors persist

If the issue still persists after considering these, I'd be happy to work with you to investigate further and find a solution. We can explore additional steps.

oldgithubman · 2024-06-11T21:28:16Z

@oldmanjk Thank you for reviewing the pull and bringing this up, but kernel died can have various causes, this error message can be displayed in everything from lack of memory to missing libs etc.

First issue you are having

The folders should not exist or be populated prior to the checkpoint extraction, this is a docker container. Based on your logs, it seems the first issue you are facing is related to moving the extracted checkpoint files: the directories /workspace/checkpoints_v2/base_speakers and /workspace/checkpoints_v2/converter are not empty, preventing the extracted files from being moved.

In my testing environment, I've built the image from scratch, and it works without any issues. The folders should not exist or be populated prior to the extraction process.

Idk what your build context is, but these are some things that I can think of:

Are you using any persistent volumes or bind mounts when running the container? If so, could you try running the container without those mounts to see if the issue persists? If you have previously run the container with persistent volumes or bind mounts attached to those specific directories, the folders might persist even after the container is removed, causing conflicts when building the image again; or if you didn't clean up the previous container instances or the volumes, files might still be there. Could you please ensure that any previous container instances and associated volumes are properly cleaned up before building the image again?

Are there any additional files or directories in your build context that might be causing the folders to be populated?

Second issue

For the second one. Looks like LSP tries to autodetect and start lang servers and it looks for the nodejs executable path. I have the container running right now, here: It looks like the only mention of jupyterlab-lsp, that requires node is your comment in this pull, so I assume that this is related to your particular setup.

@oldmanjk, could you please check if you have any user-specific jupyter configurations, additional jupyter extensions, or dev environment settings that might be enabling or interacting with the LSP extension? If so, try disabling or removing them and rebuilding the container to see if the errors persist

If the issue still persists after considering these, I'd be happy to work with you to investigate further and find a solution. We can explore additional steps.

Thanks for the fast and thorough response. Unfortunately, I have deleted everything and moved on. Good luck though!

fix: dont hard code the tar.xz

npjonath · 2024-08-13T01:17:19Z

@vladlearns I have been working in parallels on a fix for the Dockerfile that will suit the CPU setup, particularly for Mac series M and similar systems. I have finally chanced upon a solution. Considering your work on this matter, perhaps we can combine our efforts. We could develop specialized Dockerfiles; one for CUDA and another for CPU. Correspondingly, we could generate docker-compose files (docker-compose.cuda.yml and docker-compose.cpu.yml). What do you think?

My work : npjonath#1

note: this PR also include the fix from @Afnanksalal, as this is a requirement to run this project on CPU based architecture. (#262)

The Openvoice V1 work correctly on my setup. The V2 is still not working because of this issue from MeloTTS

Issue: myshell-ai/MeloTTS#167
And a possible solution for this by running a specific version of MeloTTS : https://github.com/Meiye-lj/Dockerfiles/blob/76c88309a4bb7b7070441bed3b4b72231f5349b8/MeloTTS/Dockerfile

oldgithubman · 2024-08-13T19:35:35Z

I don't use this project anymore, so I probably shouldn't be a requested reviewer

vladlearns · 2024-08-14T12:09:07Z

@oldgithubman You added yourself by approving the PR and then dismissing the review because of your environment. Later, you decided to leave without providing any details. Now, when I ask for a review, you are automatically added, and there is no way to remove you.

vladlearns · 2024-08-14T12:09:58Z

@vladlearns I have been working in parallels on a fix for the Dockerfile that will suit the CPU setup, particularly for Mac series M and similar systems. I have finally chanced upon a solution. Considering your work on this matter, perhaps we can combine our efforts. We could develop specialized Dockerfiles; one for CUDA and another for CPU. Correspondingly, we could generate docker-compose files (docker-compose.cuda.yml and docker-compose.cpu.yml). What do you think?

My work : npjonath#1

note: this PR also include the fix from @Afnanksalal, as this is a requirement to run this project on CPU based architecture. (#262)

The Openvoice V1 work correctly on my setup. The V2 is still not working because of this issue from MeloTTS

Issue: myshell-ai/MeloTTS#167 And a possible solution for this by running a specific version of MeloTTS : https://github.com/Meiye-lj/Dockerfiles/blob/76c88309a4bb7b7070441bed3b4b72231f5349b8/MeloTTS/Dockerfile

@npjonath Hey!
So, you just want me to rename the file?

npjonath · 2024-08-14T15:42:03Z

@vladlearns No it was just for talking about this with you. You can leave the naming as it. I guess GPU usage is the default one. I will add docker-compose file and Dockerfile.cpu separately to extends your implementation.

oldgithubman · 2024-08-14T16:55:13Z

@oldgithubman You added yourself by approving the PR and then dismissing the review because of your environment. Later, you decided to leave without providing any details. Now, when I ask for a review, you are automatically added, and there is no way to remove you.

Ok. I don't really know what I'm doing. I'll just approve it so you can move on

vladlearns · 2024-08-14T21:57:48Z

@npjonath Sure. So far, I've tested my setup on multiple environments. It works for multiple people as well, but it seems they don't merge pull requests into the main branch. Instead, they ask contributors to fork the repository and point to the fork in the documentation

feat: docker improvements

e4a6638

This was referenced Jun 7, 2024

Kernel died #215

Closed

libcudnn error. #225

Open

oldgithubman approved these changes Jun 11, 2024

View reviewed changes

oldgithubman suggested changes Jun 11, 2024

View reviewed changes

vladlearns requested a review from oldgithubman June 11, 2024 20:43

rkben and others added 2 commits June 19, 2024 08:26

fix: dont hard code the tar.xz

50b92ce

Merge pull request #1 from rkben/feature/docker-improvements

a667423

fix: dont hard code the tar.xz

vladlearns mentioned this pull request Jul 7, 2024

docker repository not exist #275

Open

oldgithubman approved these changes Aug 14, 2024

View reviewed changes

gpercem mentioned this pull request Oct 14, 2024

Kernel death on se_extractor.get_se() (demo_part3.ipynb) #322

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: docker improvements #264

feat: docker improvements #264

vladlearns commented Jun 7, 2024 •

edited

Loading

vladlearns commented Jun 7, 2024

vladlearns commented Jun 9, 2024

oldgithubman left a comment •

edited

Loading

vladlearns commented Jun 11, 2024 •

edited

Loading

oldgithubman commented Jun 11, 2024

First issue you are having

Second issue

npjonath commented Aug 13, 2024

oldgithubman commented Aug 13, 2024

vladlearns commented Aug 14, 2024

vladlearns commented Aug 14, 2024

npjonath commented Aug 14, 2024

oldgithubman commented Aug 14, 2024

vladlearns commented Aug 14, 2024

feat: docker improvements #264

Are you sure you want to change the base?

feat: docker improvements #264

Conversation

vladlearns commented Jun 7, 2024 • edited Loading

Features:

Prerequisites:

Join the NVIDIA Developer Program:

Download cuDNN:

Run:

tl;dr

Screenshots:

vladlearns commented Jun 7, 2024

vladlearns commented Jun 9, 2024

oldgithubman left a comment • edited Loading

Choose a reason for hiding this comment

vladlearns commented Jun 11, 2024 • edited Loading

First issue you are having

Second issue

oldgithubman commented Jun 11, 2024

First issue you are having

Second issue

npjonath commented Aug 13, 2024

oldgithubman commented Aug 13, 2024

vladlearns commented Aug 14, 2024

vladlearns commented Aug 14, 2024

npjonath commented Aug 14, 2024

oldgithubman commented Aug 14, 2024

vladlearns commented Aug 14, 2024

vladlearns commented Jun 7, 2024 •

edited

Loading

oldgithubman left a comment •

edited

Loading

vladlearns commented Jun 11, 2024 •

edited

Loading