add rwkv world tokenizer #14

BBuf · 2023-09-01T10:24:19Z

Hzfengsy

General LGTM

include/tokenizers_cpp.h

Hzfengsy · 2023-09-01T10:59:42Z

CMakeLists.txt

@@ -71,6 +77,13 @@ endif ()
 get_filename_component(TOKENIZERS_CPP_ROOT ${CMAKE_CURRENT_LIST_FILE} DIRECTORY)
 set(TOKENIZERS_CPP_CARGO_SOURCE_PATH ${TOKENIZERS_CPP_ROOT}/rust)

+FetchContent_Declare(


I wonder the differences between fetch_content and setting it as a 3rdparty in the repo

There shouldn't be much difference, whether it's 3rd or FetchContent, both pull repositories online for building. Personally, I prefer this approach.

BBuf · 2023-09-01T11:27:43Z

After latest commit, unittest can also run success.

Hzfengsy · 2023-09-01T11:34:57Z

Thanks @BBuf

junrushao · 2023-09-04T21:10:11Z

@BBuf @Hzfengsy This commit introduces a regression on windows build and blocks our nightly package rebuild. Please see the detailed logs below:

D:\a\package\package>cd mlc-llm 

D:\a\package\package\mlc-llm>rd /s /q build 
The system cannot find the file specified.

D:\a\package\package\mlc-llm>mkdir build 

D:\a\package\package\mlc-llm>cd build 

D:\a\package\package\mlc-llm\build>cmake -A x64 -Thost=x64       -G "Visual Studio 17 2022"       -DUSE_VULKAN=ON       .. 
-- The C compiler identification is MSVC 1[9](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:10).36.32538.0
-- The CXX compiler identification is MSVC 19.36.32538.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.36.32532/bin/HostX64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.36.32532/bin/HostX64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test SUPPORT_CXX17
-- Performing Test SUPPORT_CXX17 - Success
-- Setting default build type to RelWithDebInfo
-- Hide private symbols
-- TVM_HOME: 3rdparty/tvm
-- Found the path to ccache, enabling ccache
-- VTA build is skipped in Windows..
-- Vulkan_INCLUDE_DIRS=C:/Miniconda/envs/tlcpack-build/Library/includeC:/Miniconda/envs/tlcpack-build/Library/include/spirv-toolsC:/Miniconda/envs/tlcpack-build/Library/include/spirv/unified1C:/Miniconda/envs/tlcpack-build/Library/include/spirv/unified1
-- Vulkan_LIBRARY=C:/Miniconda/envs/tlcpack-build/Library/lib/vulkan-1.lib
-- Vulkan_SPIRV_TOOLS_LIBRARY=C:/Miniconda/envs/tlcpack-build/Library/lib/SPIRV-Tools.lib
-- Build with Vulkan support
-- Build with contrib.random
-- Build with contrib.sort
-- Build with contrib.hybriddump
-- Git found: C:/Miniconda/envs/tlcpack-build/Library/bin/git.exe
-- Found TVM_GIT_COMMIT_HASH=b0d1c21f329e6aed8fd639c530c29acc3b1f9305
-- Found TVM_GIT_COMMIT_TIME=2023-08-31 15:57:11 -0400
-- Building with TVM Map...
-- Build with thread support...
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE  
-- Performing Test FILE_PREFIX_MAP_SUPPORTED
-- Performing Test FILE_PREFIX_MAP_SUPPORTED - Failed
-- system-nameWindows
MSBuild version 17.7.2+d6990bcfa for .NET Framework

  1>Checking Build System
  1>Creating directories for 'msgpack-populate'
  Performing download step (git clone) for 'msgpack-populate'
  Cloning into 'msgpack-src'...
  HEAD is now at 8c602e85 Merge pull request #[10](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:11)83 from redboltz/update_610
  CMake Error at msgpack-subbuild/msgpack-populate-prefix/tmp/msgpack-populate-gitclone.cmake:62 (message):
    Failed to update submodules in:
-- Configuring incomplete, errors occurred!
    'D:/a/package/package/mlc-llm/build/_deps/msgpack-src'
  
  
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(249,5): error MSB8066: Custom build for 'D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-download.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-update.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-patch.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-configure.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-build.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-install.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-test.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\df644a9fff8f1cab49663ee66d0ee69a\msgpack-populate-complete.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\b[12](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:13)dc4ee056b4292e5c2bb36792642c2\msgpack-populate.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeLists.txt' exited with code 1. [D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\msgpack-populate.vcxproj]

CMake Error at C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:[16](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:17)62 (message):
  Build step for msgpack failed: 1
Call Stack (most recent call first):
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:1802:EVAL:2 (__FetchContent_directPopulate)
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:1802 (cmake_language)
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:2016 (FetchContent_Populate)
  3rdparty/tokenizers-cpp/CMakeLists.txt:86 (FetchContent_MakeAvailable)



D:\a\package\package\mlc-llm\build>cmake --build . --parallel 3 --config Release -- /m 
MSBuild version [17](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:18).7.2+d[69](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:70)90bcfa for .NET Framework
MSBUILD : error MSB1009: Project file does not exist.
Switch: ALL_BUILD.vcxproj

D:\a\package\package\mlc-llm\build>cd ..\..

Link to the nightly run: https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741

I do not have any clear idea how to get it fixed as I don't have a windows machine, but usually we would prefer submodule to FetchContent for network stability concerns.

tqchen · 2023-09-04T21:52:07Z

@BBuf would be great if we can followup on this. In the meantime, we can pin tokenizer-cpp to an earlier version if needed for now. Indeed submodule would be better

BBuf · 2023-09-06T01:07:30Z

I will change it to the submodule approach as soon as possible.

BBuf · 2023-09-06T05:53:58Z

@BBuf @Hzfengsy This commit introduces a regression on windows build and blocks our nightly package rebuild. Please see the detailed logs below:

D:\a\package\package>cd mlc-llm 

D:\a\package\package\mlc-llm>rd /s /q build 
The system cannot find the file specified.

D:\a\package\package\mlc-llm>mkdir build 

D:\a\package\package\mlc-llm>cd build 

D:\a\package\package\mlc-llm\build>cmake -A x64 -Thost=x64       -G "Visual Studio 17 2022"       -DUSE_VULKAN=ON       .. 
-- The C compiler identification is MSVC 1[9](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:10).36.32538.0
-- The CXX compiler identification is MSVC 19.36.32538.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.36.32532/bin/HostX64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.36.32532/bin/HostX64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test SUPPORT_CXX17
-- Performing Test SUPPORT_CXX17 - Success
-- Setting default build type to RelWithDebInfo
-- Hide private symbols
-- TVM_HOME: 3rdparty/tvm
-- Found the path to ccache, enabling ccache
-- VTA build is skipped in Windows..
-- Vulkan_INCLUDE_DIRS=C:/Miniconda/envs/tlcpack-build/Library/includeC:/Miniconda/envs/tlcpack-build/Library/include/spirv-toolsC:/Miniconda/envs/tlcpack-build/Library/include/spirv/unified1C:/Miniconda/envs/tlcpack-build/Library/include/spirv/unified1
-- Vulkan_LIBRARY=C:/Miniconda/envs/tlcpack-build/Library/lib/vulkan-1.lib
-- Vulkan_SPIRV_TOOLS_LIBRARY=C:/Miniconda/envs/tlcpack-build/Library/lib/SPIRV-Tools.lib
-- Build with Vulkan support
-- Build with contrib.random
-- Build with contrib.sort
-- Build with contrib.hybriddump
-- Git found: C:/Miniconda/envs/tlcpack-build/Library/bin/git.exe
-- Found TVM_GIT_COMMIT_HASH=b0d1c21f329e6aed8fd639c530c29acc3b1f9305
-- Found TVM_GIT_COMMIT_TIME=2023-08-31 15:57:11 -0400
-- Building with TVM Map...
-- Build with thread support...
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE  
-- Performing Test FILE_PREFIX_MAP_SUPPORTED
-- Performing Test FILE_PREFIX_MAP_SUPPORTED - Failed
-- system-nameWindows
MSBuild version 17.7.2+d6990bcfa for .NET Framework

  1>Checking Build System
  1>Creating directories for 'msgpack-populate'
  Performing download step (git clone) for 'msgpack-populate'
  Cloning into 'msgpack-src'...
  HEAD is now at 8c602e85 Merge pull request #[10](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:11)83 from redboltz/update_610
  CMake Error at msgpack-subbuild/msgpack-populate-prefix/tmp/msgpack-populate-gitclone.cmake:62 (message):
    Failed to update submodules in:
-- Configuring incomplete, errors occurred!
    'D:/a/package/package/mlc-llm/build/_deps/msgpack-src'
  
  
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(249,5): error MSB8066: Custom build for 'D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-download.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-update.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-patch.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-configure.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-build.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-install.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-test.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\df644a9fff8f1cab49663ee66d0ee69a\msgpack-populate-complete.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\b[12](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:13)dc4ee056b4292e5c2bb36792642c2\msgpack-populate.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeLists.txt' exited with code 1. [D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\msgpack-populate.vcxproj]

CMake Error at C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:[16](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:17)62 (message):
  Build step for msgpack failed: 1
Call Stack (most recent call first):
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:1802:EVAL:2 (__FetchContent_directPopulate)
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:1802 (cmake_language)
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:2016 (FetchContent_Populate)
  3rdparty/tokenizers-cpp/CMakeLists.txt:86 (FetchContent_MakeAvailable)



D:\a\package\package\mlc-llm\build>cmake --build . --parallel 3 --config Release -- /m 
MSBuild version [17](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:18).7.2+d[69](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:70)90bcfa for .NET Framework
MSBUILD : error MSB1009: Project file does not exist.
Switch: ALL_BUILD.vcxproj

D:\a\package\package\mlc-llm\build>cd ..\..

Link to the nightly run: https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741

I do not have any clear idea how to get it fixed as I don't have a windows machine, but usually we would prefer submodule to FetchContent for network stability concerns.

#15

BBuf added 2 commits September 1, 2023 10:17

support rwkv world tokenizer

1e72713

refine

fca0d8a

Hzfengsy approved these changes Sep 1, 2023

View reviewed changes

rename

baa14c6

Hzfengsy merged commit eec72a6 into mlc-ai:main Sep 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add rwkv world tokenizer #14

add rwkv world tokenizer #14

BBuf commented Sep 1, 2023

Hzfengsy left a comment

Hzfengsy Sep 1, 2023

BBuf Sep 1, 2023

BBuf commented Sep 1, 2023

Hzfengsy commented Sep 1, 2023

junrushao commented Sep 4, 2023 •

edited

Loading

tqchen commented Sep 4, 2023

BBuf commented Sep 6, 2023

BBuf commented Sep 6, 2023

add rwkv world tokenizer #14

add rwkv world tokenizer #14

Conversation

BBuf commented Sep 1, 2023

Hzfengsy left a comment

Choose a reason for hiding this comment

Hzfengsy Sep 1, 2023

Choose a reason for hiding this comment

BBuf Sep 1, 2023

Choose a reason for hiding this comment

BBuf commented Sep 1, 2023

Hzfengsy commented Sep 1, 2023

junrushao commented Sep 4, 2023 • edited Loading

tqchen commented Sep 4, 2023

BBuf commented Sep 6, 2023

BBuf commented Sep 6, 2023

junrushao commented Sep 4, 2023 •

edited

Loading