Skip to content

ai.onnx.contrib:SentencepieceTokenizer(-1) is not a registered function/op #937

@biendltb

Description

@biendltb

Hi, I tried to build a shared library with the SPM flags ON as the CMake config below. However, when I registered the library as a custom ops library with ONNX Runtime, I got the error of ai.onnx.contrib:SentencepieceTokenizer(-1) is not a registered function/op.

My ONNX Runtime version: 1.21.0

The flag and my CMake config:

    set(ONNXRUNTIME_EXTENSIONS_BUILD_OPTIONS
      -DOCOS_BUILD_SHARED_LIB=ON
      -DOCOS_ENABLE_UNIVERSE_OPS=ON
      -DOCOS_ENABLE_SPM_TOKENIZER=ON
      -DOCOS_ENABLE_CPP_EXCEPTIONS=ON
      -DOCOS_BUILD_STATIC_LIB=OFF
      -DCMAKE_BUILD_TYPE=RelWithDebInfo
    )
    
    # Run the build script with shared library flag
    execute_process(
      COMMAND bash ./build.sh ${ONNXRUNTIME_EXTENSIONS_BUILD_OPTIONS}
      WORKING_DIRECTORY ${onnxruntime_extensions_SOURCE_DIR}
      RESULT_VARIABLE ONNXRUNTIME_EXTENSIONS_BUILD_RESULT
      OUTPUT_VARIABLE ONNXRUNTIME_EXTENSIONS_BUILD_OUTPUT
      ERROR_VARIABLE ONNXRUNTIME_EXTENSIONS_BUILD_ERROR
    )

After getting the .so library, I used it in C++ code:

        try {
          void *handle = nullptr;
          Ort::ThrowOnError(
              Ort::GetApi().RegisterCustomOpsLibrary(session_options, extensions_lib_path.c_str(), &handle));
        } catch (const Ort::Exception &e) {
          g_warning("Failed to register custom ops library: %s", e.what());
          // Continue without extensions
        }

From my inspection, the problem is that the Sentencepiece was not successfully added to the .so library. I inspected the .so library with the below command and I didn't see the sentence piece symbol

$ nm -gD /data/example/build/linux/x64/debug/_deps/onnxruntime_extensions-src/out/Linux/RelWithDebInfo/lib/libortextensions.so.0.15.0 | grep sentence

I tried to build the library directly with the build.sh but the sentencepiece still not appear in the output shared library:

./build.sh -DOCOS_ENABLE_SPM_TOKENIZER=ON -DOCOS_BUILD_SHARED_LIB=ON -DOCOS_BUILD_STATIC_LIB=OFF

My question is what configs there for Sentencepiece that I missed in the build?

Update:

  • Using nm to inspect the output shared library for linux might not be the right way. As we don't need to expose the Sentencepiece symbols in the shared library while the symbols are exposed in the python shared library build.
  • I also built the library with -DOCOS_ENABLE_C_API=ON but still getting the same runtime issue in ONNX Runtime

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions