Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/cpp baby llama rework #2903

Merged
merged 8 commits into from
Jan 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
69 changes: 46 additions & 23 deletions cpp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ python ts_scripts/install_dependencies.py --cpp [--cuda=cu121|cu118]
### Building the backend
```
## Dev Build
cd serve/cpp
cd serve/cpp
./build.sh [-g cu121|cu118]

## Install TorchServe from source
Expand All @@ -34,32 +34,60 @@ cd serve
torchserve torchserve --ncs --start --model-store model_store
```
## Backend
TorchServe cpp backend can run as a process, which is similar to [TorchServe Python backend](https://github.com/pytorch/serve/tree/master/ts). By default, TorchServe supports torch scripted model in cpp backend. [src/backends/core/backend.hh](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/core/backend.hh) defines the APIs of backend to support multiple different platforms such as MxNet, ONNX and so on.
* [Backend](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/core/backend.hh#L60) defines function `LoadModelInternal` to support model loading on different platforms.
* [ModelInstance](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/core/backend.hh#L25) represents a model copy. The function `Predict` is to support prediction on different platforms.
### TorchScripted Backend
By default, TorchServe cpp provides [TorchScripted backend](https://github.com/pytorch/serve/tree/cpp_backend/cpp/src/backends/torch_scripted). Its [base handler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh) defines APIs to customize handler.
* [Initialize](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh#L29)
* [LoadModel](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh#L37)
* [Preprocess](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh#L40)
* [Inference](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh#L46)
* [Postprocess](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/backends/torch_scripted/handler/base_handler.hh#L53)
TorchServe cpp backend can run as a process, which is similar to [TorchServe Python backend](https://github.com/pytorch/serve/tree/master/ts). By default, TorchServe supports torch scripted model in cpp backend. Other platforms such as MxNet, ONNX can be supported through custom handlers following the TorchScript example [src/backends/handler/torch_scripted_handler.hh](https://github.com/pytorch/serve/blob/master/src/backends/handler/torch_scripted_handler.hh).
### Custom Handler
By default, TorchServe cpp provides a handler for TorchScript [src/backends/handler/torch_scripted_handler.hh](https://github.com/pytorch/serve/blob/master/src/backends/handler/torch_scripted_handler.hh). Its uses the [BaseHandler](https://github.com/pytorch/serve/blob/master/src/backends/handler/base_handler.hh) which defines the APIs to customize handler.
* [Initialize](serve/blob/cpp_backend/cpp/src/backends/handler/base_handler.hh#L29)
* [LoadModel](serve/blob/cpp_backend/cpp/src/backends/handler/base_handler.hh#L37)
* [Preprocess](serve/blob/cpp_backend/cpp/src/backends/handler/base_handler.hh#L40)
* [Inference](serve/blob/cpp_backend/cpp/src/backends/handler/base_handler.hh#L46)
* [Postprocess](serve/blob/cpp_backend/cpp/src/backends/handler/base_handler.hh#L53)
#### Example
##### Using BaseHandler
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
* set handler as "BaseHandler" in model archiver option [--handler](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
##### Using TorchScriptHandler
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
* set handler as "TorchScriptHandler" in model archiver option [--handler](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
```
torch-model-archiver --model-name mnist_base --version 1.0 --serialized-file mnist_script.pt --handler BaseHandler --runtime LSP
torch-model-archiver --model-name mnist_base --version 1.0 --serialized-file mnist_script.pt --handler TorchScriptHandler --runtime LSP
```
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/torchscript_model/mnist/base_handler) of unzipped model mar file.
##### Using customized handler
##### Using Custom Handler
* build customized handler shared lib. For example [Mnist handler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/src/examples/image_classifier/mnist).
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
* set runtime as "LSP" in model archiver option [--runtime](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
* set handler as "libmnist_handler:MnistHandler" in model archiver option [--handler](https://github.com/pytorch/serve/tree/master/model-archiver#arguments)
```
torch-model-archiver --model-name mnist_handler --version 1.0 --serialized-file mnist_script.pt --handler libmnist_handler:MnistHandler --runtime LSP
```
Here is an [example](https://github.com/pytorch/serve/tree/cpp_backend/cpp/test/resources/torchscript_model/mnist/mnist_handler) of unzipped model mar file.
##### BabyLLama Example
The babyllama example can be found [here](https://github.com/pytorch/serve/blob/master/cpp/src/examples/babyllama/).
mreso marked this conversation as resolved.
Show resolved Hide resolved
To run the example we need to download the weights as well as tokenizer files:
```bash
wget https://huggingface.co/karpathy/tinyllamas/resolve/main/stories15M.bin
wget https://github.com/karpathy/llama2.c/raw/master/tokenizer.bin
```
Subsequently, we need to adjust the paths according to our local file structure in [config.json](https://github.com/pytorch/serve/blob/master/serve/cpp/test/resources/torchscript_model/babyllama/babyllama_handler/config.json).
```bash
{
"checkpoint_path" : "/home/ubuntu/serve/cpp/stories15M.bin",
"tokenizer_path" : "/home/ubuntu/serve/cpp/src/examples/babyllama/tokenizer.bin"
}
```
Then we can create the mar file and deploy it with:
```bash
cd serve/cpp/test/resources/torchscript_model/babyllama/babyllama_handler
torch-model-archiver --model-name llm --version 1.0 --handler libbabyllama_handler:BabyLlamaHandler --runtime LSP --extra-files config.json
mkdir model_store && mv llm.mar model_store/
torchserve --ncs --start --model-store model_store

curl -v -X POST "http://localhost:8081/models?initial_workers=1&url=llm.mar"
```
The handler name `libbabyllama_handler:BabyLlamaHandler` consists of our shared library name (as defined in our [CMakeLists.txt](https://github.com/pytorch/serve/blob/master/serve/cpp/src/examples/CMakeLists.txt)) as well as the class name we chose for our [custom handler class](https://github.com/pytorch/serve/blob/master/serve/cpp/src/examples/babyllama/baby_llama_handler.cc) which derives its properties from BaseHandler.

To test the model we can run:
```bash
cd serve/cpp/test/resources/torchscript_model/babyllama/
curl http://localhost:8080/predictions/llm -T prompt.txt
```
##### Mnist example
* Transform data on client side. For example:
```
Expand All @@ -75,9 +103,4 @@ image = Image.open("examples/image_classifier/mnist/test_data/0.png")
image = image_processing(image)
torch.save(image, "0_png.pt")
```
* Run model registration and prediction: [Using BaseHandler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L54) or [Using customized handler](https://github.com/pytorch/serve/blob/cpp_backend/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L72).





* Run model registration and prediction: [Using BaseHandler](serve/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L54) or [Using customized handler](serve/cpp/test/backends/torch_scripted/torch_scripted_backend_test.cc#L72).
4 changes: 4 additions & 0 deletions cpp/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -212,6 +212,10 @@ function build() {
mv $DEPS_DIR/../src/examples/libmnist_handler.so $DEPS_DIR/../../test/resources/torchscript_model/mnist/mnist_handler/libmnist_handler.so
fi

if [ -f "$DEPS_DIR/../src/examples/libbabyllama_handler.so" ]; then
mv $DEPS_DIR/../src/examples/libbabyllama_handler.so $DEPS_DIR/../../test/resources/torchscript_model/babyllama/babyllama_handler/libbabyllama_handler.so
fi

cd $DEPS_DIR/../..
if [ -f "$DEPS_DIR/../test/torchserve_cpp_test" ]; then
$DEPS_DIR/../test/torchserve_cpp_test
Expand Down
41 changes: 14 additions & 27 deletions cpp/src/backends/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -15,40 +15,27 @@ target_link_libraries(ts_backends_protocol PRIVATE ts_utils ${FOLLY_LIBRARIES})
install(TARGETS ts_backends_protocol DESTINATION ${torchserve_cpp_SOURCE_DIR}/_build/libs)

# build library ts_backend_core
set(TS_BACKENDS_CORE_SOURCE_FILES "")
list(APPEND TS_BACKENDS_CORE_SOURCE_FILES ${TS_BACKENDS_CORE_SRC_DIR}/backend.cc)
add_library(ts_backends_core SHARED ${TS_BACKENDS_CORE_SOURCE_FILES})
set(BACKEND_SOURCE_FILES "")
list(APPEND BACKEND_SOURCE_FILES ${TS_BACKENDS_SRC_DIR}/core/backend.cc)
list(APPEND BACKEND_SOURCE_FILES ${TS_BACKENDS_SRC_DIR}/core/model_instance.cc)
list(APPEND BACKEND_SOURCE_FILES ${TS_BACKENDS_SRC_DIR}/handler/base_handler.cc)
list(APPEND BACKEND_SOURCE_FILES ${TS_BACKENDS_SRC_DIR}/handler/torch_scripted_handler.cc)
add_library(ts_backends_core SHARED ${BACKEND_SOURCE_FILES})
target_include_directories(ts_backends_core PUBLIC ${TS_BACKENDS_CORE_SRC_DIR})
target_link_libraries(ts_backends_core PRIVATE ts_utils ts_backends_protocol ${FOLLY_LIBRARIES})
target_link_libraries(ts_backends_core PUBLIC ts_utils ts_backends_protocol ${FOLLY_LIBRARIES})
install(TARGETS ts_backends_core DESTINATION ${torchserve_cpp_SOURCE_DIR}/_build/libs)

# build library ts_backend_torch_scripted
set(TS_BACKENDS_TORCH_SCRIPTED_SOURCE_FILES "")
list(APPEND TS_BACKENDS_TORCH_SCRIPTED_SOURCE_FILES ${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}/torch_scripted_backend.cc)
list(APPEND TS_BACKENDS_TORCH_SCRIPTED_SOURCE_FILES ${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}/handler/base_handler.cc)
add_library(ts_backends_torch_scripted SHARED ${TS_BACKENDS_TORCH_SCRIPTED_SOURCE_FILES})
target_include_directories(ts_backends_torch_scripted PUBLIC
${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR} ${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}/handler ${TORCH_INCLUDE_DIRS})
target_link_libraries(ts_backends_torch_scripted PUBLIC ts_utils ts_backends_core ${TORCH_LIBRARIES})
install(TARGETS ts_backends_torch_scripted DESTINATION ${torchserve_cpp_SOURCE_DIR}/_build/libs)

# build library ts_backend_torch_deploy
#set(TS_BACKENDS_TORCH_DEPLOY_SOURCE_FILES "")
#add_library(ts_backends_torch_deploy SHARED ${TS_BACKENDS_TORCH_DEPLOY_SOURCE_FILES})
#target_include_directories(ts_backends_torch_deploy PUBLIC ${TS_BACKENDS_TORCH_DEPLOY_SRC_DIR})
#target_link_libraries(ts_backends_torch_deploy PRIVATE ts_utils ts_backends_core ${TORCH_LIBRARIES})

# build exe model_worker_socket
add_executable(model_worker_socket
add_executable(model_worker_socket
"${TS_BACKENDS_PROCESS_SRC_DIR}/model_worker_socket.cc"
"${TS_BACKENDS_PROCESS_SRC_DIR}/model_worker.cc"
)
target_include_directories(model_worker_socket PRIVATE
target_include_directories(model_worker_socket PRIVATE
${TS_BACKENDS_CORE_SRC_DIR}
${TS_BACKENDS_PROTOCOL_SRC_DIR}
${TS_BACKENDS_PROCESS_SRC_DIR}
${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}
${TS_BACKENDS_PROTOCOL_SRC_DIR}
${TS_BACKENDS_PROCESS_SRC_DIR}
${TS_BACKENDS_TORCH_SCRIPTED_SRC_DIR}
)
target_link_libraries(model_worker_socket
PRIVATE ts_backends_core ts_backends_protocol ts_backends_torch_scripted ${FOLLY_LIBRARIES})
target_link_libraries(model_worker_socket
PRIVATE ts_backends_core ts_backends_protocol ${FOLLY_LIBRARIES} ${TORCH_LIBRARIES})
install(TARGETS model_worker_socket DESTINATION ${torchserve_cpp_SOURCE_DIR}/_build/bin)
96 changes: 92 additions & 4 deletions cpp/src/backends/core/backend.cc
Original file line number Diff line number Diff line change
@@ -1,6 +1,63 @@
#include "src/backends/core/backend.hh"

#include <memory>

#include "src/backends/handler/handler_factory.hh"

namespace torchserve {
Backend::Backend() {}

Backend::~Backend() {
handler_.reset();
model_instance_table_.clear();
// Todo: do proper cleanup
// dl_loader_->CloseDL();
}

bool Backend::Initialize(const std::string &model_dir) {
random_generator_.seed(time(0));
manifest_ = std::make_shared<torchserve::Manifest>();
// TODO: windows
if (!manifest_->Initialize(
fmt::format("{}/MAR-INF/MANIFEST.json", model_dir))) {
return false;
}

LoadHandler(model_dir);

if (!handler_) {
return false;
}

handler_->Initialize(model_dir, manifest_);

return true;
}

void Backend::LoadHandler(const std::string &model_dir) {
const std::string &handler_str = manifest_->GetModel().handler;
std::size_t delimiter_pos = handler_str.find(manifest_->kHandler_Delimiter);
if (delimiter_pos != std::string::npos) {
#ifdef __APPLE__
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this require separate packaging for TorchServe Mac installables vs Linux version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're currently not planning to provide precompiled binaries but will rely on the build.sh script for installation. If we change this in the future these macros will be resolved by the preprocessor during compilation and we would require different packages for the different platforms.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can handle this as a separate PR, filed issue #2908 for tracking

std::string lib_path = fmt::format("{}/{}.dylib", model_dir,
handler_str.substr(0, delimiter_pos));
#else
std::string lib_path = fmt::format("{}/{}.so", model_dir,
handler_str.substr(0, delimiter_pos));
#endif
std::string handler_class_name = handler_str.substr(delimiter_pos + 1);
std::string allocator_func = fmt::format("allocator{}", handler_class_name);
std::string deleter_func = fmt::format("deleter{}", handler_class_name);

dl_loader_ = std::make_unique<DLLoader<BaseHandler>>(
lib_path, allocator_func, deleter_func);
dl_loader_->OpenDL();
handler_ = dl_loader_->GetInstance();
} else {
handler_ = HandlerFactory::GetInstance().createHandler(handler_str);
}
}

std::unique_ptr<torchserve::LoadModelResponse> Backend::LoadModel(
std::shared_ptr<torchserve::LoadModelRequest> load_model_request) {
/**
Expand All @@ -13,12 +70,43 @@ std::unique_ptr<torchserve::LoadModelResponse> Backend::LoadModel(
* - status_READY: return the model instance if it is already.
*
* Common steps:
* https://github.com/pytorch/serve/blob/master/ts/model_loader.py#L62
* serve/blob/master/ts/model_loader.py#L62
*/

// TODO: support request envelope:
// serve/tree/master/ts/torch_handler/request_envelope

return LoadModelInternal(std::move(load_model_request));
}

std::unique_ptr<LoadModelResponse> Backend::LoadModelInternal(
std::shared_ptr<LoadModelRequest> load_model_request) {
std::string model_instance_id = BuildModelInstanceId(load_model_request);
try {
model_instance_table_[model_instance_id] = {
ModelInstanceStatus::INIT, std::shared_ptr<ModelInstance>(nullptr)};

auto result = handler_->LoadModel(load_model_request);
SetModelInstanceInfo(model_instance_id, ModelInstanceStatus::READY,
std::make_shared<ModelInstance>(
model_instance_id, std::move(result.first),
handler_, std::move(result.second)));

ready_model_instance_ids_.emplace_back(model_instance_id);
std::string message =
fmt::format("loaded model {}", load_model_request->model_name);
return std::make_unique<LoadModelResponse>(
// TODO: check current response msg content
200, message);
} catch (const c10::Error &e) {
SetModelInstanceInfo(model_instance_id, ModelInstanceStatus::FAILED,
std::shared_ptr<ModelInstance>(nullptr));
return std::make_unique<LoadModelResponse>(
// TODO: check existing
500, e.msg());
}
}

std::string Backend::BuildModelInstanceId(
std::shared_ptr<torchserve::LoadModelRequest> load_model_request) {
std::string device_type("cpu");
Expand All @@ -30,15 +118,15 @@ std::string Backend::BuildModelInstanceId(
}

void Backend::SetModelInstanceInfo(
const std::string& model_instance_id, ModelInstanceStatus new_status,
const std::string &model_instance_id, ModelInstanceStatus new_status,
std::shared_ptr<torchserve::ModelInstance> new_model_instance) {
model_instance_table_[model_instance_id].status = new_status;
model_instance_table_[model_instance_id].model_instance =
std::move(new_model_instance);
}

torchserve::Backend::ModelInstanceStatus Backend::GetModelInstanceStatus(
const std::string& model_instance_id) {
const std::string &model_instance_id) {
auto model_instance_info = model_instance_table_.find(model_instance_id);
if (model_instance_info == model_instance_table_.end()) {
return torchserve::Backend::ModelInstanceStatus::NOT_INIT;
Expand All @@ -47,7 +135,7 @@ torchserve::Backend::ModelInstanceStatus Backend::GetModelInstanceStatus(
}

std::shared_ptr<torchserve::ModelInstance> Backend::GetModelInstance(
const std::string& model_instance_id) {
const std::string &model_instance_id) {
auto model_instance_info = model_instance_table_.find(model_instance_id);
if (model_instance_info == model_instance_table_.end()) {
return std::shared_ptr<torchserve::ModelInstance>(nullptr);
Expand Down
Loading
Loading