Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

huggingface: fix model param population #24743

Merged
merged 11 commits into from
Aug 24, 2024

Conversation

keenborder786
Copy link
Contributor

@efriis efriis added the partner label Jul 27, 2024
@efriis efriis self-assigned this Jul 27, 2024
@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Jul 27, 2024
Copy link

vercel bot commented Jul 27, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Aug 24, 2024 0:39am

@keenborder786
Copy link
Contributor Author

@eyurtsev, @ccurme please review

@dosubot dosubot bot added 🔌: huggingface Primarily related to HuggingFace integrations 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Jul 27, 2024
@eyurtsev eyurtsev assigned eyurtsev and unassigned efriis Jul 27, 2024
)

if values["endpoint_url"] is None and "repo_id" not in values:
if "endpoint_url" not in values and "repo_id" not in values:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update the doc-string for the wrapper to explain which variables can be specified?


I haven't used this wrapper, but the new code does not behave correctly as far as I can tell. If repo_id is not strictly necessary for initializaiton, then a user may be relying on endpoint_url being specified via an env variable. This scenario will cause a validation error right now.

I am trying to initialize the HuggingFaceEndpoint, but despite passing the correct repo_id, I am encountering an error. I have identified the bug: even though I provide the repo_id, the HuggingFaceEndpoint validation always checks for the endpoint_url, which is incorrect. If the repo_id is passed, it should not be checking for the endpoint_url. I will create a PR to fix this issue.

Old validation does not seem to be checking endpoint_url, it seems to be loading it from the environment by default.

Possible solutions:

  1. Throw an error if both values can be loaded
  2. Only try to load_endpoint url after repo_id is missing and only then do the check that at least one of them is specified (this is still too implicit -- so bad solution)
  3. Change precedence order if it's obvious that repo_id should be favored over endpoint_url (too implicit -- so bad solution)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I have updated the wrapper's docstring.

  • Previously, the endpoint_url was first checked from environment variables. If the endpoint_url was not set as an environment variable and only repo_id was provided, an error would occur despite the repo_id being passed. I have now revised this behavior so that the code first checks whether the user has provided a repo_id or endpoint_url directly. Only after that, if repo_id is not provided will it then retrieve the values from the environment variables.

  • See below please.

llm = HuggingFaceEndpoint(
                repo_id="microsoft/Phi-3-mini-4k-instruct", # user intend to use repo_id but validation error will be raised
for `endpoint_url` env variable missing which is wrong since user has passed repo_id
                huggingfacehub_api_token = 'xxxxxx',
                task="text-generation",
                max_new_tokens=512,
                do_sample=False,
                repetition_penalty=1.03,
            )

Copy link
Collaborator

@eyurtsev eyurtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a PR explaining the fix. The current fix introduces another bug in a situation where a user is relying for the endpoint URL to be loaded from the env

@eyurtsev eyurtsev self-requested a review August 1, 2024 19:55
@keenborder786
Copy link
Contributor Author

@eyurtsev will have a look

@efriis efriis self-assigned this Aug 3, 2024
raise ValueError(
"Please specify either an `endpoint_url` OR a `repo_id`, not both."
)
if "repo_id" not in values:
values["endpoint_url"] = get_from_dict_or_env(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We lost validation on repo_id being unspecified and endpoint_url being unspecified

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, the validation is happening above. See Line 149. @eyurtsev

@efriis efriis removed their assignment Aug 6, 2024
@efriis efriis self-assigned this Aug 11, 2024
@keenborder786
Copy link
Contributor Author

@eyurtsev please see

@keenborder786
Copy link
Contributor Author

@ccurme

@keenborder786
Copy link
Contributor Author

@ccurme @hwchase17

@efriis efriis removed their assignment Aug 24, 2024
@efriis efriis self-assigned this Aug 24, 2024
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Aug 24, 2024
@efriis efriis changed the title [Partners]: HuggingFaceEndpoint endpoint_url validation fix huggingface: fix model param population Aug 24, 2024
@efriis
Copy link
Member

efriis commented Aug 24, 2024

ended up rewriting the implementation to also include the model param. This is a lightly breaking change but only for people that were incorrectly populating the model param as well as others, which was likely done by mistake.

@efriis efriis enabled auto-merge (squash) August 24, 2024 00:42
@efriis efriis merged commit 9a29398 into langchain-ai:master Aug 24, 2024
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature 🔌: huggingface Primarily related to HuggingFace integrations partner size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants