Questions about flash_attn during training[Usage] 

### When did you clone our code?

I cloned the code base after 5/1/23

### Describe the issue

Issue:

Could you provide the pytorch version, cuda version and flash_attn version
The version I use：
torch：1.13.1+cu116
cuda:   12.1(18.04)
flash-attn:  1.0.4


Command:

CUDA_VISIBLE_DEVICES="6,7" torchrun --nproc_per_node=2 train_mem.py

Log: 

File "/envs/torch_113/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 5, in <module>
    import flash_attn_cuda
ImportError: /envs/torch_113/lib/python3.10/site-packages/flash_attn_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda20CUDACachingAllocator9allocatorE

Screenshots:
You may attach screenshots if it better explains the issue.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about flash_attn during training[Usage] #136

When did you clone our code?

Describe the issue

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Questions about flash_attn during training[Usage] #136

Description

When did you clone our code?

Describe the issue

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions