New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BYOM Model support #812
BYOM Model support #812
Conversation
|
else: | ||
break | ||
os.makedirs(local_dir, exist_ok=True) | ||
snapshot_download( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add some details here, why do we need to download the snapshot again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first download saves the model to local hf cache (not local_dir). This is resumable in case there’s something wrong with the internet. The second download is copying from hf cache to local_dir. Downloading to local_dir is not resumable based how it works but copying is unlikely to have errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, i missed this part - local_dir=local_dir
.
From HF:
If `local_dir` is provided, the file structure from the repo will be replicated in this location. When using this
option, the `cache_dir` will not be used and a `.huggingface/` folder will be created at the root of `local_dir`
to store some metadata related to the downloaded files. While this mechanism is not as robust as the main
cache-system, it's optimized for regularly pulling the latest version of a repository.
Also looks like local_dir_use_symlinks
is deprecated argument?
if local_dir_use_symlinks != "auto":
warnings.warn(
"`local_dir_use_symlinks` parameter is deprecated and will be ignored. "
"The process to download files to a local folder has been updated and do "
"not rely on symlinks anymore. You only need to pass a destination folder "
"as`local_dir`.\n"
"For more details, check out https://huggingface.co/docs/huggingface_hub/main/en/guides/download#download-files-to-local-folder."
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this deprecation came only 2 days ago - huggingface/huggingface_hub#2223
It will depend on what version of hub api will carry this code.
ads/aqua/utils.py
Outdated
f"Error uploading the object. Exit code: {e.returncode} with error {e.stdout}" | ||
) | ||
|
||
print(os_details) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably used for debug purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -116,6 +116,7 @@ opctl = [ | |||
"rich", | |||
"fire", | |||
"cachetools", | |||
"huggingface_hub" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we need to get approval for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to.
I've been thinking whether we should allow users to specify the VLLM/TGI version as alternative to the container name, and have our system automatically select the appropriate container based on that input. If a user specifies just the VLLM/TGI interface without the specific version, we could default to the latest container that supports VLLM/TGI.
Additionally, I think it's important to offer users the ability to specify different containers for inference and fine-tuning when creating deployments and fine-tuning. When users register models and specify containers, these can be set as default for deployment and fine-tuning, however given that containers may become obsolete really quick, particularly with regular security updates, users still will have an option to override containers. |
container_type=container_type_key, | ||
) | ||
if not is_custom_container | ||
else container_type_key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarifying: for SMC container, container_type_key
is odsc-vllm-serving, whereas for byoc container, it will be something like <region>.ocir.io/<namespace>/user-provided-container:1.0.0.0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is right
ads/aqua/model.py
Outdated
os_path, | ||
model_name: str, | ||
inference_container, | ||
finetuning_contianer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: replace finetuning_contianer
with finetuning_container
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
ads/aqua/model.py
Outdated
str: Display name of th model (This should be model ocid) | ||
""" | ||
api = HfApi() | ||
model_info = api.model_info(model_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will raise RepositoryNotFoundError if model_name is isn't available.
ads/aqua/model.py
Outdated
filter_tag = Tags.AQUA_FINE_TUNED_MODEL_TAG.value | ||
elif model_type == MODEL_TYPE.BASE.value: | ||
filter_tag = Tags.BASE_MODEL_CUSTOM.value | ||
print(filter_tag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use logger.debug instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed, was unintentional commit
ads/aqua/model.py
Outdated
@@ -59,6 +67,11 @@ class FineTuningMetricCategories(Enum): | |||
TRAINING = "training" | |||
|
|||
|
|||
class MODEL_TYPE(Enum): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use camel case, i.e. ModelType to stick with convention?
os_path=os_path, local_dir=local_dir, model_name=model | ||
) | ||
# Create Model catalog entry with pass by reference | ||
return self._create_model_catalog_entry( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after registering, can we return an AquaModel object instead of just returning the display name? User can refer to info within that returned result to proceed with next steps (deploy, FT, etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of returning model id, maybe AquaModel is better.
ads/aqua/model.py
Outdated
break | ||
os.makedirs(local_dir, exist_ok=True) | ||
snapshot_download( | ||
repo_id=model, local_dir=local_dir, local_dir_use_symlinks=False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if model size exceeds local_dir, download can be interrupted. Can we check repo metadata first before downloading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add this to TODO. Can we be done in the a separate PR.
|
|
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
approving this PR, we can address todos in subsequent updates.
|
1 similar comment
|
Externalize container configuration for deployment
|
Bring your own model Support
This support entails support for "any" huggingface model on AI quick actions. There are following scenarios that needs to be addressed to facilitate this feature -
The PR also brings in a notion of "regstering" model. What it means is that user is importing the huggingface model into model catalog within user tenancy.
Assumptions
Huggingface token setup
Run
huggingface-cli login
command and follow the on screen instructionUsage
ads aqua model register --model meta-llama/Meta-Llama-3-8B --os_path oci://mayoor-dev-versioned@namespace/cached-models --local_dir `pwd`/cache-models
ads aqua model register --model meta-llama/Meta-Llama-3-8B --os_path oci://mayoor-dev-versioned@namespace/cached-models --local_dir `pwd`/cache-models --odsc-vllm-container --inference_container_type_smc --finetuning_container odsc-finetuning-llm-container --finetuning_container_type_smc
ads aqua model register --model meta-llama/Meta-Llama-3-8B --os_path oci://mayoor-dev-versioned@namespace/cached-models --local_dir `pwd`/cache-models --inference_container iad.ocir.io/my/custom:1.0 --finetuning_container iad.ocir.io/my/custom-ft:1.0
TODO