Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How many training steps are required to achieve the effect in the sample? #15

Closed
arceus-jia opened this issue Feb 7, 2023 · 7 comments

Comments

@arceus-jia
Copy link

I tried the 100,000 steps training, but the results still look strange, is this normal?
sample-100000
Can you tell me how many steps I need to take to achieve the right result? Thank you!

@zhangjiewu
Copy link
Collaborator

The results look weird, like the model is not trained. It usually takes 300~500 steps to train on an 8-frame video. Can you provide more info (e.g, environment, code snippets) for me to look into this issue?

@arceus-jia
Copy link
Author

Well, I'm not sure if it's the xformers version conflicts, but after I reinstalled the environment and upgraded torch to 13.1 , torchvision to 0.14.1 and installed the latest xformers version , the retraining result is fine.
Anyway, thank you!

@zhangjiewu
Copy link
Collaborator

Glad to hear that. Let me know if you have any other question. :)

This was referenced Feb 10, 2023
@liangbingzhao
Copy link

can u share your results after running python -m xformers.info? I construct a new virtual environment, with torch1.13-cu117+torchvision0.14, but after I install xformers with command pip install -U xformers, module triton is not installed. I ran pip install triton, making it installed. But the results of this repo are still like yours. Wonder how to fix it?

@arceus-jia
Copy link
Author

can u share your results after running python -m xformers.info? I construct a new virtual environment, with torch1.13-cu117+torchvision0.14, but after I install xformers with command pip install -U xformers, module triton is not installed. I ran pip install triton, making it installed. But the results of this repo are still like yours. Wonder how to fix it?

here is my environment, ,you can refer to it and compare it with yours

absl-py==1.4.0
accelerate==0.16.0
antlr4-python3-runtime==4.9.3
bitsandbytes==0.35.4
cachetools==5.3.0
certifi @ file:///croot/certifi_1671487769961/work/certifi
cffi @ file:///tmp/abs_98z5h56wf8/croots/recipe/cffi_1659598650955/work
charset-normalizer==3.0.1
decord==0.6.0
diffusers==0.11.1
einops==0.6.0
filelock==3.9.0
flit_core @ file:///opt/conda/conda-bld/flit-core_1644941570762/work/source/flit_core
ftfy==6.1.1
future @ file:///home/builder/ci_310/future_1640790123501/work
google-auth==2.16.0
google-auth-oauthlib==0.4.6
grpcio==1.51.1
huggingface-hub==0.12.0
idna==3.4
imageio==2.25.0
importlib-metadata==6.0.0
Jinja2==3.1.2
Markdown==3.4.1
MarkupSafe==2.1.2
mkl-fft==1.3.1
mkl-random @ file:///home/builder/ci_310/mkl_random_1641843545607/work
mkl-service==2.4.0
modelcards==0.1.6
mypy-extensions==1.0.0
numpy @ file:///croot/numpy_and_numpy_base_1672336185480/work
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
oauthlib==3.2.2
omegaconf==2.3.0
packaging==23.0
Pillow==9.4.0
protobuf==3.20.3
psutil==5.9.4
pyasn1==0.4.8
pyasn1-modules==0.2.8
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pyre-extensions==0.0.23
PyYAML @ file:///croot/pyyaml_1670514731622/work
regex==2022.10.31
requests==2.28.2
requests-oauthlib==1.3.1
rsa==4.9
six @ file:///tmp/build/80754af9/six_1644875935023/work
tensorboard==2.11.2
tensorboard-data-server==0.6.1
tensorboard-plugin-wit==1.8.1
tokenizers==0.13.2
torch==1.13.1
torchvision==0.14.1
tqdm==4.64.1
transformers==4.26.0
typing-inspect==0.8.0
typing_extensions @ file:///croot/typing_extensions_1669924550328/work
urllib3==1.26.14
wcwidth==0.2.6
Werkzeug==2.2.2
xformers==0.0.17.dev444
zipp==3.12.1

@liangbingzhao
Copy link

Thank you for your response. I upgrade xformers from 0.0.16 to 0.0.17. Upgraded model generates as follows:

sample-300

This seems better? But many discordances exist.

@arceus-jia
Copy link
Author

arceus-jia commented Feb 20, 2023

This seems better? But many discordances exist.

Yep, that means the training was successful. In fact the sample given by the author is similar to this one. The author mainly provide an idea for ai-generated animation with diffusion model, but if you want to productize it, it still needs a lot of improvement

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants