Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recognition model fails with GPU #11

Open
BBO-repo opened this issue Aug 27, 2023 · 6 comments
Open

recognition model fails with GPU #11

BBO-repo opened this issue Aug 27, 2023 · 6 comments

Comments

@BBO-repo
Copy link

BBO-repo commented Aug 27, 2023

Hello,
I'm trying to have tuatara working on GPU, while it is working for the detection model craft_traced_torchscript_model.pt, it is not working for the recognition model parseq_torchscript.bin.
Those would be the snippets to have image_to_data working on GPU

std::vector<OutputItem> image_to_data(cv::Mat image, std::string weights_dir, std::string outputs_dir)
{
    auto device = torch::cuda::is_available() ? torch::kCUDA : torch::kCPU;
    ....
    detector_model = torch::jit::load(model_path, device);
    ....
    image_tensor = image_tensor.div(255.0);
    image_tensor = image_tensor.to(device);
    ....
    parseq_model = torch::jit::load(parseq_model_path, device);
    ....
    parseq_tensor = parseq_tensor.div(255.0);
    parseq_tensor = parseq_tensor.to(device);
    ....
}

Unfortunately this fails for recognizer model with an error

terminate called after throwing an instance of 'std::runtime_error'
  what():  The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/strhub/models/parseq/system.py", line 80, in forward
....
....
  File "/home/darwin/.pyenv/versions/up-deps/lib/python3.9/site-packages/torch/nn/functional.py", line 2044, in embedding
        # remove once script supports set_grad_enabled
        _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
           ~~~~~~~~~~~~~~~ <--- HERE
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

Would you have any suggestion to address this?

Also, I do not know if saving recognition model in .bin or .pth is causing any trouble, but just in case it would be possible to provide recognition model .pth that would be great.

@jackvial
Copy link
Owner

Hi @BBO-repo, thanks for trying this on GPU. I haven't tried it myself as the main focus was running on CPU but would be great to have GPU support too.

Pretty sure .bin or .pth doesn't matter, and think it's only a extension naming convention.

The error "RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)"

sounds like a .to(device) statement is missing somewhere, so some of the data or model are being sent to GPU but others are still on cpu.

@BBO-repo
Copy link
Author

BBO-repo commented Sep 1, 2023

Hi,
I'll try to double check even I've checked every entry point to make sure every tensor are on GPU but let's confirm again.

I was wondering, as there are two different ways to save a model in PyTorch state_dict and full model, if both detection model and recognition model have been saved in the same manner, cf. https://saturncloud.io/blog/what-are-the-differences-between-bin-and-pt-pytorch-saved-model-types/, https://pytorch.org/tutorials/beginner/saving_loading_models.html).

@jackvial
Copy link
Owner

jackvial commented Sep 4, 2023

@BBO-repo When not using torchscript you can save only the weights or weights and config. But in the case of the CRAFT and Parseq models both are saved as torchscript files which contain both weights and config.

The torchscript files don't specify a device (GPU/CPU), device is specified at load time in C++.

I think the problem is that while you might have set all the data tensors to use GPU you also need to set the models to use GPU so that the model weights are also on the GPU with the data.

To move both models to GPU I think you'll need to add a line like the following somewhere before the model weights are loaded.

torch::Device device(torch::kCUDA, 0);  // For GPU (device 0)

For a bit more context on how the models were created. The parseq model is torchscript_model.bin from this Huggingface repo https://huggingface.co/baudm/parseq-tiny/tree/main . I'm not sure if this used Torchscript tracing or scripting but it shouldn't effect it's ability to run on GPU or CPU.

I created the CRAFT detector model with torchscript tracing. e.g.

import torch
import torchvision.models as models

# Load a pretrained model
model = models.resnet18(pretrained=True)
model.eval()

# Create example input
example_input = torch.randn(1, 3, 224, 224)

# Convert the model to TorchScript via tracing
traced_model = torch.jit.trace(model, example_input)

# Save the TorchScript model
traced_model.save("resnet18_traced.pt")

Something else to be aware of when running on GPU, you'll want to bypass the CPU multi-threading code starting around https://github.com/jackvial/tuatara/blob/main/tuatara.cpp#L468

@BBO-repo
Copy link
Author

BBO-repo commented Sep 6, 2023

Hi,
Thank you for additional explanation.
I confirm that both tensors and models were already moved to GPU:

std::vector<OutputItem> image_to_data(cv::Mat image, std::string weights_dir, std::string outputs_dir)
{
    auto device = torch::cuda::is_available() ? torch::kCUDA : torch::kCPU;
    ....
    detector_model = torch::jit::load(model_path, device);
    ....
    parseq_model = torch::jit::load(parseq_model_path, device);
    ....
}

But you got a point with the multi-threading part.
I'll give a try and see if I can figure it out.

@BBO-repo
Copy link
Author

Hi,
Trying again to run recognition model in GPU it unfortunately still fails despite ensuring the code is run is single thread and force models and tensors to GPU, cf. image below
infer

Maybe the recognition model was traced in CPU and should also be traced in GPU, cf:

I'm a bit stuck .. in case there is any way to get recognition model traced in GPU that would be great!

@jackvial
Copy link
Owner

@BBO-repo The CRAFT recognition model is the same one used in the EasyOCR project. If you configure the model to run on GPU, and trace the model after model.eval() here that should give you a GPU compatible version. e.g.

traced_script_module = torch.jit.trace(model, torch.randn(x.size()))
traced_script_module.save("/craft_traced_torchscript_model.pt")

where x is a batch of input images

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants