fix: error triton environment when deploying HuggingFace models #150

Phelan164 · 2022-10-07T02:11:53Z

Because

When deploying the HuggingFace, there is an issue related to loading the library libomp used by torch
Internal: OSError: /tmp/python_env_pu73bg/0/lib/python3.8/site-packages/torch/lib/libgomp-d22c30c5.so.1: cannot allocate memory in static TLS block
Tried with some solutions to dynamic load the library such as Triton server starting up a script or from Python model backend but it not working. Because the library is created when the Triton server creates a Python environment for a Python model and is in the /tmp folder, which is the temporary folder.
The workaround solution now is copy the library into Triton server folder and load the library before deploying a model

This commit

copy the library and mount it into the Triton server
set LD_PRELOAD when starting up the Triton server

🤖 I have created a release *beep* *boop* --- ## Product Updates ### Announcement 📣 * VDP is officially renamed to `Versatile Data Pipeline`. We realise that as a general ETL infrastructure, VDP is capable of processing all kinds of unstructured data, and we should not limit its usage to only visual data. That's why we replace the word Visual with Versatile. Besides, the term Data Preparation is a bit misleading, users often think it has something to do with data labelling or cleaning. The term Data Pipeline is definitely more precise to capture the core concept of VDP. ### Features ✨ * support new task Instance segmentation. Check out the [Streamlit example](https://github.com/instill-ai/vdp/tree/main/examples/streamlit/instance_segmentation) ## VDP ([0.3.0-alpha](v0.2.6-alpha...v0.3.0-alpha)) ### Features * support Instance segmentation task [0476f59](0476f59) * add console e2e test into vdp ([#148](#148)) ([a779a11](a779a11)) * add instance segmentation example ([#167](#167)) ### Bug Fixes * fix wrong triton environment when deploying HuggingFace models ([#150](#150)) ([b2fda36](b2fda36)) * use COCO RLE format for instance segmentation ([4d10e46](4d10e46)) * update model output protocol ([e6ea88d](e6ea88d)) ## Pipeline-backend ([0.9.3-alpha](https://github.com/instill-ai/pipeline-backend/releases/tag/v0.9.3-alpha)) ### Bug Fixes * fix pipeline trigger model hanging (instill-ai/pipeline-backend#80) ([7ba58e5](instill-ai/pipeline-backend@7ba58e5)) ## Connector-backend ([0.7.2-alpha](https://github.com/instill-ai/connector-backend/releases/tag/v0.7.2-alpha)) ### Bug Fixes * fix connector empty description update ([0bc3086](instill-ai/connector-backend@0bc3086)) ## Model-backend ([0.10.0-alpha](https://github.com/instill-ai/model-backend/releases/tag/v0.10.0-alpha)) ### Features * support instance segmentation task (instill-ai/model-backend#183) ([d28cfdc](instill-ai/model-backend@d28cfdc)) * support async deploy and undeploy model instance (instill-ai/model-backend#192) ([ed36dc7](instill-ai/model-backend@ed36dc7)) * support semantic segmentation (instill-ai/model-backend#203) ([f22262c](instill-ai/model-backend@f22262c)) ### Bug Fixes * allow updating emtpy description for a model (instill-ai/model-backend#177) ([100ec84](instill-ai/model-backend@100ec84)) * HuggingFace batching bug in preprocess model ([b1582e8](instill-ai/model-backend@b1582e8)) * model instance state update to unspecified state (instill-ai/model-backend#206) ([14c87d5](instill-ai/model-backend@14c87d5)) * panic error with nil object (instill-ai/model-backend#208) ([a342113](instill-ai/model-backend@a342113)) ## Console ### Features * extend the time span of our user cookie (instill-ai/console#289) ([76a6f99](instill-ai/console@76a6f99)) * finish integration test and make it stable (instill-ai/console#281) ([3fd8d21](instill-ai/console@3fd8d21)) * replace prism.js with code-hike (instill-ai/console#292) ([cb61708](instill-ai/console@cb61708)) * unify the gap between elements in every table (instill-ai/console#291) ([e743820](instill-ai/console@e743820)) * update console request URL according to new protobuf (instill-ai/console#287) ([fa7ecc3](instill-ai/console@fa7ecc3)) * add hg model id field at model_instance page (instill-ai/console#300) ([31a6eab](instill-ai/console@31a6eab)) * cleanup connector after test (instill-ai/console#295) ([f9c8e4c](instill-ai/console@f9c8e4c)) * disable html report (instill-ai/console#297) ([689f50d](instill-ai/console@689f50d)) * enhance the warning of the resource id field (instill-ai/console#303) ([6c4aa4f](instill-ai/console@6c4aa4f)) * make playwright output dot on CI (instill-ai/console#293) ([e5c2958](instill-ai/console@e5c2958)) * support model-backend async long run operation (instill-ai/console#309) ([f795ce8](instill-ai/console@f795ce8)) * update e2e test (instill-ai/console#313) ([88bf0cd](instill-ai/console@88bf0cd)) update how we test model detail page (instill-ai/console#310) ([04c83a1](instill-ai/console@04c83a1)) * wipe out all data after test (instill-ai/console#296) ([e4085dd](instill-ai/console@e4085dd)) ### Bug Fixes * fix pipeline e2e not stable (instill-ai/console#285) ([a26e599](instill-ai/console@a26e599)) * fix set-cookie api route issue due to wrong domain name (instill-ai/console#284) ([c3efcdd](instill-ai/console@c3efcdd)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

chore: add library for triton environment

2985a75

Phelan164 requested review from pinglin and xiaofei-du as code owners October 7, 2022 02:11

Phelan164 changed the title ~~chore: add a library for triton environment~~ fix: error triton environment when deploying HuggingFace models Oct 7, 2022

xiaofei-du approved these changes Oct 7, 2022

View reviewed changes

xiaofei-du merged commit b2fda36 into main Oct 7, 2022

xiaofei-du deleted the add-lib-so-triton-env branch October 7, 2022 14:25

droplet-bot mentioned this pull request Oct 7, 2022

chore(main): release 0.3.0-alpha #151

Merged

droplet-bot mentioned this pull request Jan 11, 2023

chore(main): release 0.4.0-alpha #173

Merged

This was referenced Mar 1, 2023

chore: release main #212

Closed

chore(main): release vdp 0.1.0-alpha #213

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: error triton environment when deploying HuggingFace models #150

fix: error triton environment when deploying HuggingFace models #150

Phelan164 commented Oct 7, 2022 •

edited

fix: error triton environment when deploying HuggingFace models #150

fix: error triton environment when deploying HuggingFace models #150

Conversation

Phelan164 commented Oct 7, 2022 • edited

Phelan164 commented Oct 7, 2022 •

edited