You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I enable cublas and cutlass in build relax(mlc-ai/relax) and compile the GPU model using mlc-llm with q0f16, it crashes. However, compiling with other configurations, such as q4f16_1, works fine. Additionally, when I don't enable cublas and cutlass, all configurations compile normally in mlc-llm. The error stack from the aforementioned issue is:
/bbuf> python3 -m mlc_llm.build --hf-path StarRing2022/RWKV-4-World-7B --target cuda --quantization q0f16
Weights exist at dist/models/RWKV-4-World-7B, skipping download.
Using path "dist/models/RWKV-4-World-7B"for model "RWKV-4-World-7B"
Target configured: cuda -keys=cuda,gpu -arch=sm_80 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Automatically using target for weight quantization: cuda -keys=cuda,gpu -arch=sm_80 -max_num_threads=1024 -max_shared_memory_per_block=49152 -max_threads_per_block=1024 -registers_per_block=65536 -thread_warp_size=32
Start computing and quantizing weights... This may take a while.
Finish computing and quantizing weights.
Total param size: 14.003204345703125 GB
Start storing to cache dist/RWKV-4-World-7B-q0f16/params
[0582/0582] saving param_581
All finished, 227 total shards committed, record saved to dist/RWKV-4-World-7B-q0f16/params/ndarray-cache.json
Finish exporting chat config to dist/RWKV-4-World-7B-q0f16/params/mlc-chat-config.json
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/bbuf/.local/lib/python3.8/site-packages/mlc_llm/build.py", line 13, in<module>main()
File "/home/bbuf/.local/lib/python3.8/site-packages/mlc_llm/build.py", line 10, in main
core.build_model_from_args(parsed_args)
File "/home/bbuf/.local/lib/python3.8/site-packages/mlc_llm/core.py", line 584, in build_model_from_args
mod = mod_transform_before_build(mod, param_manager, args, model_config)
File "/home/bbuf/.local/lib/python3.8/site-packages/mlc_llm/core.py", line 407, in mod_transform_before_build
mod = tvm.transform.Sequential(
File "/bbuf/relax/python/tvm/ir/transform.py", line 238, in __call__
return _ffi_transform_api.RunPass(self, mod)
File "/bbuf/relax/python/tvm/_ffi/_ctypes/packed_func.py", line 238, in __call__
raise get_last_ffi_error()
tvm.error.InternalError: Traceback (most recent call last):
22: TVMFuncCall
21: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::transform::Pass, tvm::IRModule)>::AssignTypedLambda<tvm::transform::{lambda(tvm::transform::Pass, tvm::IRModule)#7}>(tvm::transform::{lambda(tvm::transform::Pass, tvm::IRModule)#7}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue)
20: tvm::transform::Pass::operator()(tvm::IRModule) const
19: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
18: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
17: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
16: tvm::transform::ModulePassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
15: _ZN3tvm7runtime13PackedFuncObj9ExtractorINS0_16PackedFuncSubObjIZNS0_15TypedPackedFuncIFNS_8IRModuleES5_NS_9transform11PassContextEEE17AssignTypedLambdaIZNS_5relax9transform16FuseOpsByPatternERKNS0_5ArrayINSC_13FusionPatternEvEEbbEUlS5_S7_E_EEvT_EUlRKNS0_7TVMArgsEPNS0_11TVMRetValueEE_EEE4CallEPKS1_SK_SO_
14: tvm::relax::FuseOpsByPattern(tvm::runtime::Array<tvm::relax::transform::FusionPattern, void> const&, tvm::IRModule, bool, bool)
13: tvm::relax::MakeGroupedFunctions(tvm::IRModule, std::unordered_map<tvm::runtime::Object const*, tvm::relay::GraphPartitioner::Group*, std::hash<tvm::runtime::Object const*>, std::equal_to<tvm::runtime::Object const*>, std::allocator<std::pair<tvm::runtime::Object const* const, tvm::relay::GraphPartitioner::Group*>>> const&, bool)
12: tvm::relax::OperatorFusor::Transform()
11: tvm::relax::ExprMutator::VisitExpr(tvm::RelayExpr const&)
10: _ZZN3tvm5relax11ExprFunctorIFNS_9RelayExprERKS2_EE10InitVTableEvENUlRKNS_7r
9: tvm::relax::ExprMutator::VisitExpr_(tvm::relax::FunctionNode const*)
8: tvm::relax::ExprMutator::VisitWithNewScope(tvm::RelayExpr const&, tvm::runtime::Optional<tvm::runtime::Array<tvm::relax::Var, void>>)
7: tvm::relax::ExprMutator::VisitExpr(tvm::RelayExpr const&)
6: _ZZN3tvm5relax11ExprFunctorIFNS_9RelayExprERKS2_EE10InitVTableEvENUlRKNS_7r
5: tvm::relax::ExprMutator::VisitExpr_(tvm::relax::SeqExprNode const*)
4: tvm::relax::OperatorFusor::VisitBindingBlock(tvm::relax::BindingBlock const&)
3: tvm::relax::OperatorFusor::VisitBindingBlock_(tvm::relax::DataflowBlockNode const*)
2: tvm::relax::OperatorFusor::CollectFuncBoundary(tvm::runtime::Array<tvm::relax::Binding, void> const&)
1: tvm::relax::PostOrderVisit(tvm::RelayExpr const&, std::function<void (tvm::RelayExpr const&)>)
0: tvm::relax::OperatorFusor::CollectFuncBoundary(tvm::runtime::Array<tvm::relax::Binding, void> const&)::{lambda(tvm::RelayExpr const&)#1}::operator()(tvm::RelayExpr const&) const
File "/bbuf/relax/src/relax/transform/fuse_ops.cc", line 876
InternalError: Check failed: (depgroup != cur_group) is false: A cyclic dependency detected between the groups lv2757 and lv2756 are in.
mlc-llm compile command is:
python3 -m mlc_llm.build --hf-path StarRing2022/RWKV-4-World-7B --target cuda --quantization q0f16
The text was updated successfully, but these errors were encountered:
When I enable cublas and cutlass in build relax(mlc-ai/relax) and compile the GPU model using mlc-llm with q0f16, it crashes. However, compiling with other configurations, such as q4f16_1, works fine. Additionally, when I don't enable cublas and cutlass, all configurations compile normally in mlc-llm. The error stack from the aforementioned issue is:
mlc-llm compile command is:
The text was updated successfully, but these errors were encountered: