-
Notifications
You must be signed in to change notification settings - Fork 15.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS Lambda - Read Only #1412
Comments
@3coins may be able to help with this. |
That would be great if I could get some help with this, or some documentation! Thank you! |
Ah, I have also seen this issue. As I was mentioning on the langchain discord:
The first one is not usable when going serverless (AWS lambdas, or Google Cloud Functions). And it might represent a deal breaker for a more than valid use case. I'd even say a desirable use case, the serverless scenario. So far I have been able to find a workaround, but it's not ideal and will be likely to suffer breaking changes as they continue updating langchain (which happens very frequently). The workaround is to create a custom implementation of
I hope this helps on your scenario @Joepetey , and let's aim for an official solution from @hwchase17 |
Alternatively we could continue using Now, having to download everytime (serverless is stateless) is going to make things very slow I believe. I am talking btw without being an expert. |
I have opened a PR that would solve this issue. |
Wow thank you @juankysoriano this is so helpful! |
thanks @juankysoriano ! merging this in now |
Solves #1412 Currently `OpenAIChat` inherits the way it calculates the number of tokens, `get_num_token`, from `BaseLLM`. In the other hand `OpenAI` inherits from `BaseOpenAI`. `BaseOpenAI` and `BaseLLM` uses different methodologies for doing this. The first relies on `tiktoken` while the second on `GPT2TokenizerFast`. The motivation of this PR is: 1. Bring consistency about the way of calculating number of tokens `get_num_token` to the `OpenAI` family, regardless of `Chat` vs `non Chat` scenarios. 2. Give preference to the `tiktoken` method as it's serverless friendly. It doesn't require downloading models which might make it incompatible with `readonly` filesystems.
solved by #1457 |
Quick question, when can we expect this to be available in a release? |
I don't know but the owner is typically very fast on doing releases, they are very frequent. It shouldn't take more than a couple of days in my experience. Will see |
@Joepetey if the recent release, |
With AWS you can mount an EFS to a lambda to cache a pre-trained model. Also if you're using HuggingFaceEmbeddings (which uses sentence_transformers.SentenceTransformer) you need to use SENTENCE_TRANSFORMERS_HOME env variable to download the model at a specific location |
Thank you Juanky, it worked! |
…-ai#1457) Solves langchain-ai#1412 Currently `OpenAIChat` inherits the way it calculates the number of tokens, `get_num_token`, from `BaseLLM`. In the other hand `OpenAI` inherits from `BaseOpenAI`. `BaseOpenAI` and `BaseLLM` uses different methodologies for doing this. The first relies on `tiktoken` while the second on `GPT2TokenizerFast`. The motivation of this PR is: 1. Bring consistency about the way of calculating number of tokens `get_num_token` to the `OpenAI` family, regardless of `Chat` vs `non Chat` scenarios. 2. Give preference to the `tiktoken` method as it's serverless friendly. It doesn't require downloading models which might make it incompatible with `readonly` filesystems.
Solves langchain-ai/langchain#1412 Currently `OpenAIChat` inherits the way it calculates the number of tokens, `get_num_token`, from `BaseLLM`. In the other hand `OpenAI` inherits from `BaseOpenAI`. `BaseOpenAI` and `BaseLLM` uses different methodologies for doing this. The first relies on `tiktoken` while the second on `GPT2TokenizerFast`. The motivation of this PR is: 1. Bring consistency about the way of calculating number of tokens `get_num_token` to the `OpenAI` family, regardless of `Chat` vs `non Chat` scenarios. 2. Give preference to the `tiktoken` method as it's serverless friendly. It doesn't require downloading models which might make it incompatible with `readonly` filesystems.
Langchain trys to download the GPT2FastTokenizer when I run a chain. In a Lambda function this doesnt work because the Lambda is read only. Any run into this, or know how to fix this?
The text was updated successfully, but these errors were encountered: