[Bug] mlc-llm compile bug when cutlass and cublas enabled #859

BBuf · 2023-09-02T12:51:59Z

When I enable cublas and cutlass in build relax（mlc-ai/relax） and compile the GPU model using mlc-llm with q0f16, it crashes. However, compiling with other configurations, such as q4f16_1, works fine. Additionally, when I don't enable cublas and cutlass, all configurations compile normally in mlc-llm. The error stack from the aforementioned issue is:

/bbuf> python3 -m mlc_llm.build --hf-path StarRing2022/RWKV-4-World-7B --target cuda --quantization q0f16
Weights exist at dist/models/RWKV-4-World-7B, skipping download.
Using path "dist/models/RWKV-4-World-7B" for model "RWKV-4-World-7B"
Target configured: cuda -keys=cuda,gpu -arch=sm_80 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_80 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Start computing and quantizing weights... This may take a while.
Finish computing and quantizing weights.
Total param size: 14.003204345703125 GB
Start storing to cache dist/RWKV-4-World-7B-q0f16/params
[0582/0582] saving param_581
All finished, 227 total shards committed, record saved to dist/RWKV-4-World-7B-q0f16/params/ndarray-cache.json
Finish exporting chat config to dist/RWKV-4-World-7B-q0f16/params/mlc-chat-config.json
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/bbuf/.local/lib/python3.8/site-packages/mlc_llm/build.py", line 13, in <module>
    main()
  File "/home/bbuf/.local/lib/python3.8/site-packages/mlc_llm/build.py", line 10, in main
    core.build_model_from_args(parsed_args)
  File "/home/bbuf/.local/lib/python3.8/site-packages/mlc_llm/core.py", line 584, in build_model_from_args
    mod = mod_transform_before_build(mod, param_manager, args, model_config)
  File "/home/bbuf/.local/lib/python3.8/site-packages/mlc_llm/core.py", line 407, in mod_transform_before_build
    mod = tvm.transform.Sequential(
  File "/bbuf/relax/python/tvm/ir/transform.py", line 238, in __call__
    return _ffi_transform_api.RunPass(self, mod)
  File "/bbuf/relax/python/tvm/_ffi/_ctypes/packed_func.py", line 238, in __call__
    raise get_last_ffi_error()
tvm.error.InternalError: Traceback (most recent call last):
  22: TVMFuncCall
  21: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::transform::Pass, tvm::IRModule)>::AssignTypedLambda<tvm::transform::{lambda(tvm::transform::Pass, tvm::IRModule)#7}>(tvm::transform::{lambda(tvm::transform::Pass, tvm::IRModule)#7}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue)
  20: tvm::transform::Pass::operator()(tvm::IRModule) const
  19: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  18: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  17: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  16: tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  15: _ZN3tvm7runtime13PackedFuncObj9ExtractorINS0_16PackedFuncSubObjIZNS0_15TypedPackedFuncIFNS_8IRModuleES5_NS_9transform11PassContextEEE17AssignTypedLambdaIZNS_5relax9transform16FuseOpsByPatternERKNS0_5ArrayINSC_13FusionPatternEvEEbbEUlS5_S7_E_EEvT_EUlRKNS0_7TVMArgsEPNS0_11TVMRetValueEE_EEE4CallEPKS1_SK_SO_
  14: tvm::relax::FuseOpsByPattern(tvm::runtime::Array<tvm::relax::transform::FusionPattern, void> const&, tvm::IRModule, bool, bool)
  13: tvm::relax::MakeGroupedFunctions(tvm::IRModule, std::unordered_map<tvm::runtime::Object const*, tvm::relay::GraphPartitioner::Group*, std::hash<tvm::runtime::Object const*>, std::equal_to<tvm::runtime::Object const*>, std::allocator<std::pair<tvm::runtime::Object const* const, tvm::relay::GraphPartitioner::Group*> > > const&, bool)
  12: tvm::relax::OperatorFusor::Transform()
  11: tvm::relax::ExprMutator::VisitExpr(tvm::RelayExpr const&)
  10: _ZZN3tvm5relax11ExprFunctorIFNS_9RelayExprERKS2_EE10InitVTableEvENUlRKNS_7r
  9: tvm::relax::ExprMutator::VisitExpr_(tvm::relax::FunctionNode const*)
  8: tvm::relax::ExprMutator::VisitWithNewScope(tvm::RelayExpr const&, tvm::runtime::Optional<tvm::runtime::Array<tvm::relax::Var, void> >)
  7: tvm::relax::ExprMutator::VisitExpr(tvm::RelayExpr const&)
  6: _ZZN3tvm5relax11ExprFunctorIFNS_9RelayExprERKS2_EE10InitVTableEvENUlRKNS_7r
  5: tvm::relax::ExprMutator::VisitExpr_(tvm::relax::SeqExprNode const*)
  4: tvm::relax::OperatorFusor::VisitBindingBlock(tvm::relax::BindingBlock const&)
  3: tvm::relax::OperatorFusor::VisitBindingBlock_(tvm::relax::DataflowBlockNode const*)
  2: tvm::relax::OperatorFusor::CollectFuncBoundary(tvm::runtime::Array<tvm::relax::Binding, void> const&)
  1: tvm::relax::PostOrderVisit(tvm::RelayExpr const&, std::function<void (tvm::RelayExpr const&)>)
  0: tvm::relax::OperatorFusor::CollectFuncBoundary(tvm::runtime::Array<tvm::relax::Binding, void> const&)::{lambda(tvm::RelayExpr const&)#1}::operator()(tvm::RelayExpr const&) const
  File "/bbuf/relax/src/relax/transform/fuse_ops.cc", line 876
InternalError: Check failed: (depgroup != cur_group) is false: A cyclic dependency detected between the groups lv2757 and lv2756 are in.

mlc-llm compile command is:

python3 -m mlc_llm.build --hf-path StarRing2022/RWKV-4-World-7B --target cuda --quantization q0f16

tqchen · 2023-10-24T13:39:24Z

This should work now

BBuf added the bug Confirmed bugs label Sep 2, 2023

BBuf changed the title ~~[Bug]~~ [Bug] mlc-llm compile bug when cutlass and cublas enabled Sep 2, 2023

tqchen closed this as completed Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] mlc-llm compile bug when cutlass and cublas enabled #859

[Bug] mlc-llm compile bug when cutlass and cublas enabled #859

BBuf commented Sep 2, 2023

tqchen commented Oct 24, 2023

[Bug] mlc-llm compile bug when cutlass and cublas enabled #859

[Bug] mlc-llm compile bug when cutlass and cublas enabled #859

Comments

BBuf commented Sep 2, 2023

tqchen commented Oct 24, 2023