Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rwkv world tokenizer #14

Merged
merged 3 commits into from Sep 1, 2023
Merged

add rwkv world tokenizer #14

merged 3 commits into from Sep 1, 2023

Conversation

BBuf
Copy link
Contributor

@BBuf BBuf commented Sep 1, 2023

图片

prepare for mlc-ai/mlc-llm#848

Copy link
Member

@Hzfengsy Hzfengsy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General LGTM

include/tokenizers_cpp.h Outdated Show resolved Hide resolved
@@ -71,6 +77,13 @@ endif ()
get_filename_component(TOKENIZERS_CPP_ROOT ${CMAKE_CURRENT_LIST_FILE} DIRECTORY)
set(TOKENIZERS_CPP_CARGO_SOURCE_PATH ${TOKENIZERS_CPP_ROOT}/rust)

FetchContent_Declare(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder the differences between fetch_content and setting it as a 3rdparty in the repo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be much difference, whether it's 3rd or FetchContent, both pull repositories online for building. Personally, I prefer this approach.

@BBuf
Copy link
Contributor Author

BBuf commented Sep 1, 2023

After latest commit, unittest can also run success.

图片

@Hzfengsy Hzfengsy merged commit eec72a6 into mlc-ai:main Sep 1, 2023
@Hzfengsy
Copy link
Member

Hzfengsy commented Sep 1, 2023

Thanks @BBuf

@junrushao
Copy link
Member

junrushao commented Sep 4, 2023

@BBuf @Hzfengsy This commit introduces a regression on windows build and blocks our nightly package rebuild. Please see the detailed logs below:

D:\a\package\package>cd mlc-llm 

D:\a\package\package\mlc-llm>rd /s /q build 
The system cannot find the file specified.

D:\a\package\package\mlc-llm>mkdir build 

D:\a\package\package\mlc-llm>cd build 

D:\a\package\package\mlc-llm\build>cmake -A x64 -Thost=x64       -G "Visual Studio 17 2022"       -DUSE_VULKAN=ON       .. 
-- The C compiler identification is MSVC 1[9](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:10).36.32538.0
-- The CXX compiler identification is MSVC 19.36.32538.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.36.32532/bin/HostX64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.36.32532/bin/HostX64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test SUPPORT_CXX17
-- Performing Test SUPPORT_CXX17 - Success
-- Setting default build type to RelWithDebInfo
-- Hide private symbols
-- TVM_HOME: 3rdparty/tvm
-- Found the path to ccache, enabling ccache
-- VTA build is skipped in Windows..
-- Vulkan_INCLUDE_DIRS=C:/Miniconda/envs/tlcpack-build/Library/includeC:/Miniconda/envs/tlcpack-build/Library/include/spirv-toolsC:/Miniconda/envs/tlcpack-build/Library/include/spirv/unified1C:/Miniconda/envs/tlcpack-build/Library/include/spirv/unified1
-- Vulkan_LIBRARY=C:/Miniconda/envs/tlcpack-build/Library/lib/vulkan-1.lib
-- Vulkan_SPIRV_TOOLS_LIBRARY=C:/Miniconda/envs/tlcpack-build/Library/lib/SPIRV-Tools.lib
-- Build with Vulkan support
-- Build with contrib.random
-- Build with contrib.sort
-- Build with contrib.hybriddump
-- Git found: C:/Miniconda/envs/tlcpack-build/Library/bin/git.exe
-- Found TVM_GIT_COMMIT_HASH=b0d1c21f329e6aed8fd639c530c29acc3b1f9305
-- Found TVM_GIT_COMMIT_TIME=2023-08-31 15:57:11 -0400
-- Building with TVM Map...
-- Build with thread support...
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE  
-- Performing Test FILE_PREFIX_MAP_SUPPORTED
-- Performing Test FILE_PREFIX_MAP_SUPPORTED - Failed
-- system-nameWindows
MSBuild version 17.7.2+d6990bcfa for .NET Framework

  1>Checking Build System
  1>Creating directories for 'msgpack-populate'
  Performing download step (git clone) for 'msgpack-populate'
  Cloning into 'msgpack-src'...
  HEAD is now at 8c602e85 Merge pull request #[10](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:11)83 from redboltz/update_610
  CMake Error at msgpack-subbuild/msgpack-populate-prefix/tmp/msgpack-populate-gitclone.cmake:62 (message):
    Failed to update submodules in:
-- Configuring incomplete, errors occurred!
    'D:/a/package/package/mlc-llm/build/_deps/msgpack-src'
  
  
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(249,5): error MSB8066: Custom build for 'D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-download.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-update.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-patch.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-configure.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-build.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-install.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-test.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\df644a9fff8f1cab49663ee66d0ee69a\msgpack-populate-complete.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\b[12](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:13)dc4ee056b4292e5c2bb36792642c2\msgpack-populate.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeLists.txt' exited with code 1. [D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\msgpack-populate.vcxproj]

CMake Error at C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:[16](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:17)62 (message):
  Build step for msgpack failed: 1
Call Stack (most recent call first):
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:1802:EVAL:2 (__FetchContent_directPopulate)
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:1802 (cmake_language)
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:2016 (FetchContent_Populate)
  3rdparty/tokenizers-cpp/CMakeLists.txt:86 (FetchContent_MakeAvailable)



D:\a\package\package\mlc-llm\build>cmake --build . --parallel 3 --config Release -- /m 
MSBuild version [17](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:18).7.2+d[69](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:70)90bcfa for .NET Framework
MSBUILD : error MSB1009: Project file does not exist.
Switch: ALL_BUILD.vcxproj

D:\a\package\package\mlc-llm\build>cd ..\..

Link to the nightly run: https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741

I do not have any clear idea how to get it fixed as I don't have a windows machine, but usually we would prefer submodule to FetchContent for network stability concerns.

@tqchen
Copy link
Contributor

tqchen commented Sep 4, 2023

@BBuf would be great if we can followup on this. In the meantime, we can pin tokenizer-cpp to an earlier version if needed for now. Indeed submodule would be better

@BBuf
Copy link
Contributor Author

BBuf commented Sep 6, 2023

I will change it to the submodule approach as soon as possible.

@BBuf
Copy link
Contributor Author

BBuf commented Sep 6, 2023

@BBuf @Hzfengsy This commit introduces a regression on windows build and blocks our nightly package rebuild. Please see the detailed logs below:

D:\a\package\package>cd mlc-llm 

D:\a\package\package\mlc-llm>rd /s /q build 
The system cannot find the file specified.

D:\a\package\package\mlc-llm>mkdir build 

D:\a\package\package\mlc-llm>cd build 

D:\a\package\package\mlc-llm\build>cmake -A x64 -Thost=x64       -G "Visual Studio 17 2022"       -DUSE_VULKAN=ON       .. 
-- The C compiler identification is MSVC 1[9](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:10).36.32538.0
-- The CXX compiler identification is MSVC 19.36.32538.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.36.32532/bin/HostX64/x64/cl.exe - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: C:/Program Files/Microsoft Visual Studio/2022/Enterprise/VC/Tools/MSVC/14.36.32532/bin/HostX64/x64/cl.exe - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test SUPPORT_CXX17
-- Performing Test SUPPORT_CXX17 - Success
-- Setting default build type to RelWithDebInfo
-- Hide private symbols
-- TVM_HOME: 3rdparty/tvm
-- Found the path to ccache, enabling ccache
-- VTA build is skipped in Windows..
-- Vulkan_INCLUDE_DIRS=C:/Miniconda/envs/tlcpack-build/Library/includeC:/Miniconda/envs/tlcpack-build/Library/include/spirv-toolsC:/Miniconda/envs/tlcpack-build/Library/include/spirv/unified1C:/Miniconda/envs/tlcpack-build/Library/include/spirv/unified1
-- Vulkan_LIBRARY=C:/Miniconda/envs/tlcpack-build/Library/lib/vulkan-1.lib
-- Vulkan_SPIRV_TOOLS_LIBRARY=C:/Miniconda/envs/tlcpack-build/Library/lib/SPIRV-Tools.lib
-- Build with Vulkan support
-- Build with contrib.random
-- Build with contrib.sort
-- Build with contrib.hybriddump
-- Git found: C:/Miniconda/envs/tlcpack-build/Library/bin/git.exe
-- Found TVM_GIT_COMMIT_HASH=b0d1c21f329e6aed8fd639c530c29acc3b1f9305
-- Found TVM_GIT_COMMIT_TIME=2023-08-31 15:57:11 -0400
-- Building with TVM Map...
-- Build with thread support...
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Found Threads: TRUE  
-- Performing Test FILE_PREFIX_MAP_SUPPORTED
-- Performing Test FILE_PREFIX_MAP_SUPPORTED - Failed
-- system-nameWindows
MSBuild version 17.7.2+d6990bcfa for .NET Framework

  1>Checking Build System
  1>Creating directories for 'msgpack-populate'
  Performing download step (git clone) for 'msgpack-populate'
  Cloning into 'msgpack-src'...
  HEAD is now at 8c602e85 Merge pull request #[10](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:11)83 from redboltz/update_610
  CMake Error at msgpack-subbuild/msgpack-populate-prefix/tmp/msgpack-populate-gitclone.cmake:62 (message):
    Failed to update submodules in:
-- Configuring incomplete, errors occurred!
    'D:/a/package/package/mlc-llm/build/_deps/msgpack-src'
  
  
C:\Program Files\Microsoft Visual Studio\2022\Enterprise\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(249,5): error MSB8066: Custom build for 'D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-download.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-update.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-patch.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-configure.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-build.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-install.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\dfa5057374fb3dfcddc46b7b964a6e83\msgpack-populate-test.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\df644a9fff8f1cab49663ee66d0ee69a\msgpack-populate-complete.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeFiles\b[12](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:13)dc4ee056b4292e5c2bb36792642c2\msgpack-populate.rule;D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\CMakeLists.txt' exited with code 1. [D:\a\package\package\mlc-llm\build\_deps\msgpack-subbuild\msgpack-populate.vcxproj]

CMake Error at C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:[16](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:17)62 (message):
  Build step for msgpack failed: 1
Call Stack (most recent call first):
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:1802:EVAL:2 (__FetchContent_directPopulate)
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:1802 (cmake_language)
  C:/Miniconda/envs/tlcpack-build/Library/share/cmake-3.27/Modules/FetchContent.cmake:2016 (FetchContent_Populate)
  3rdparty/tokenizers-cpp/CMakeLists.txt:86 (FetchContent_MakeAvailable)



D:\a\package\package\mlc-llm\build>cmake --build . --parallel 3 --config Release -- /m 
MSBuild version [17](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:18).7.2+d[69](https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741#step:10:70)90bcfa for .NET Framework
MSBUILD : error MSB1009: Project file does not exist.
Switch: ALL_BUILD.vcxproj

D:\a\package\package\mlc-llm\build>cd ..\..

Link to the nightly run: https://github.com/mlc-ai/package/actions/runs/6069714174/job/16464517741

I do not have any clear idea how to get it fixed as I don't have a windows machine, but usually we would prefer submodule to FetchContent for network stability concerns.

#15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants