Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] mlc-llm compile bug when cutlass and cublas enabled #859

Closed
BBuf opened this issue Sep 2, 2023 · 1 comment
Closed

[Bug] mlc-llm compile bug when cutlass and cublas enabled #859

BBuf opened this issue Sep 2, 2023 · 1 comment
Labels
bug Confirmed bugs

Comments

@BBuf
Copy link
Contributor

BBuf commented Sep 2, 2023

When I enable cublas and cutlass in build relax(mlc-ai/relax) and compile the GPU model using mlc-llm with q0f16, it crashes. However, compiling with other configurations, such as q4f16_1, works fine. Additionally, when I don't enable cublas and cutlass, all configurations compile normally in mlc-llm. The error stack from the aforementioned issue is:

/bbuf> python3 -m mlc_llm.build --hf-path StarRing2022/RWKV-4-World-7B --target cuda --quantization q0f16
Weights exist at dist/models/RWKV-4-World-7B, skipping download.
Using path "dist/models/RWKV-4-World-7B" for model "RWKV-4-World-7B"
Target configured: cuda -keys=cuda,gpu -arch=sm_80 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_80 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Start computing and quantizing weights... This may take a while.
Finish computing and quantizing weights.
Total param size: 14.003204345703125 GB
Start storing to cache dist/RWKV-4-World-7B-q0f16/params
[0582/0582] saving param_581
All finished, 227 total shards committed, record saved to dist/RWKV-4-World-7B-q0f16/params/ndarray-cache.json
Finish exporting chat config to dist/RWKV-4-World-7B-q0f16/params/mlc-chat-config.json
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/bbuf/.local/lib/python3.8/site-packages/mlc_llm/build.py", line 13, in <module>
    main()
  File "/home/bbuf/.local/lib/python3.8/site-packages/mlc_llm/build.py", line 10, in main
    core.build_model_from_args(parsed_args)
  File "/home/bbuf/.local/lib/python3.8/site-packages/mlc_llm/core.py", line 584, in build_model_from_args
    mod = mod_transform_before_build(mod, param_manager, args, model_config)
  File "/home/bbuf/.local/lib/python3.8/site-packages/mlc_llm/core.py", line 407, in mod_transform_before_build
    mod = tvm.transform.Sequential(
  File "/bbuf/relax/python/tvm/ir/transform.py", line 238, in __call__
    return _ffi_transform_api.RunPass(self, mod)
  File "/bbuf/relax/python/tvm/_ffi/_ctypes/packed_func.py", line 238, in __call__
    raise get_last_ffi_error()
tvm.error.InternalError: Traceback (most recent call last):
  22: TVMFuncCall
  21: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::transform::Pass, tvm::IRModule)>::AssignTypedLambda<tvm::transform::{lambda(tvm::transform::Pass, tvm::IRModule)#7}>(tvm::transform::{lambda(tvm::transform::Pass, tvm::IRModule)#7}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue)
  20: tvm::transform::Pass::operator()(tvm::IRModule) const
  19: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  18: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  17: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  16: tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  15: _ZN3tvm7runtime13PackedFuncObj9ExtractorINS0_16PackedFuncSubObjIZNS0_15TypedPackedFuncIFNS_8IRModuleES5_NS_9transform11PassContextEEE17AssignTypedLambdaIZNS_5relax9transform16FuseOpsByPatternERKNS0_5ArrayINSC_13FusionPatternEvEEbbEUlS5_S7_E_EEvT_EUlRKNS0_7TVMArgsEPNS0_11TVMRetValueEE_EEE4CallEPKS1_SK_SO_
  14: tvm::relax::FuseOpsByPattern(tvm::runtime::Array<tvm::relax::transform::FusionPattern, void> const&, tvm::IRModule, bool, bool)
  13: tvm::relax::MakeGroupedFunctions(tvm::IRModule, std::unordered_map<tvm::runtime::Object const*, tvm::relay::GraphPartitioner::Group*, std::hash<tvm::runtime::Object const*>, std::equal_to<tvm::runtime::Object const*>, std::allocator<std::pair<tvm::runtime::Object const* const, tvm::relay::GraphPartitioner::Group*> > > const&, bool)
  12: tvm::relax::OperatorFusor::Transform()
  11: tvm::relax::ExprMutator::VisitExpr(tvm::RelayExpr const&)
  10: _ZZN3tvm5relax11ExprFunctorIFNS_9RelayExprERKS2_EE10InitVTableEvENUlRKNS_7r
  9: tvm::relax::ExprMutator::VisitExpr_(tvm::relax::FunctionNode const*)
  8: tvm::relax::ExprMutator::VisitWithNewScope(tvm::RelayExpr const&, tvm::runtime::Optional<tvm::runtime::Array<tvm::relax::Var, void> >)
  7: tvm::relax::ExprMutator::VisitExpr(tvm::RelayExpr const&)
  6: _ZZN3tvm5relax11ExprFunctorIFNS_9RelayExprERKS2_EE10InitVTableEvENUlRKNS_7r
  5: tvm::relax::ExprMutator::VisitExpr_(tvm::relax::SeqExprNode const*)
  4: tvm::relax::OperatorFusor::VisitBindingBlock(tvm::relax::BindingBlock const&)
  3: tvm::relax::OperatorFusor::VisitBindingBlock_(tvm::relax::DataflowBlockNode const*)
  2: tvm::relax::OperatorFusor::CollectFuncBoundary(tvm::runtime::Array<tvm::relax::Binding, void> const&)
  1: tvm::relax::PostOrderVisit(tvm::RelayExpr const&, std::function<void (tvm::RelayExpr const&)>)
  0: tvm::relax::OperatorFusor::CollectFuncBoundary(tvm::runtime::Array<tvm::relax::Binding, void> const&)::{lambda(tvm::RelayExpr const&)#1}::operator()(tvm::RelayExpr const&) const
  File "/bbuf/relax/src/relax/transform/fuse_ops.cc", line 876
InternalError: Check failed: (depgroup != cur_group) is false: A cyclic dependency detected between the groups lv2757 and lv2756 are in.

mlc-llm compile command is:

python3 -m mlc_llm.build --hf-path StarRing2022/RWKV-4-World-7B --target cuda --quantization q0f16
@BBuf BBuf added the bug Confirmed bugs label Sep 2, 2023
@BBuf BBuf changed the title [Bug] [Bug] mlc-llm compile bug when cutlass and cublas enabled Sep 2, 2023
@tqchen
Copy link
Contributor

tqchen commented Oct 24, 2023

This should work now

@tqchen tqchen closed this as completed Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs
Projects
None yet
Development

No branches or pull requests

2 participants