Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

candle-flash-attn infinite compile time #2275

Open
Gadersd opened this issue Jun 19, 2024 · 3 comments
Open

candle-flash-attn infinite compile time #2275

Gadersd opened this issue Jun 19, 2024 · 3 comments

Comments

@Gadersd
Copy link

Gadersd commented Jun 19, 2024

When I added candle-flash-attn to my .toml file the build process seems hang on Building [=======================> ] 114/118: candle-flash-attn(build) and the compilation doesn't proceed.

My .toml file is

[package]
name = "occam"
version = "0.1.0"
edition = "2021"

[dependencies]
candle-core = { git = "https://github.com/huggingface/candle.git", version = "0.6", features = ["cuda"] }
candle-flash-attn = { git = "https://github.com/huggingface/candle.git", version = "0.6", features = [] }
candle-nn = { git = "https://github.com/huggingface/candle.git", version = "0.6", features = ["cuda"]}
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
glob = "0.3"
rand = "0.8"
anyhow = "1.0"
@LaurentMazare
Copy link
Collaborator

Compilation can be very long for the cuda kernels flash attn (and easily runs out of memory too). More than 10 minutes wouldn't be surprising. Do you see anything in top / ps (nvcc, cicc, ...)?
Also you probably want to set the CANDLE_FLASH_ATTN_BUILD_DIR environment variable to something like $HOME/.candle so that the kernel compilation doesn't trigger too often.

@lewisthorpe1994
Copy link

Hey, i'm having the same issue here trying to build it. Not sure if i'm missing a trick but when i add "flash-attn" to the features for candle-transformers my editor triggers the build . From here the 32GB of RAM in my system gets maxed out and my desktop just freezes. Is they a work around for this at all?

@LaurentMazare
Copy link
Collaborator

You may want to reduce the parallelism for your build to work by setting RAYON_NUM_THREADS to an appropriate value, and set CANDLE_FLASH_ATTN_BUILD_DIR as mentioned earlier to avoid recompiling this too often (with this setup you should be able to only get it to compiled once).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants