AWS Lambda - Read Only #1412

Joepetey · 2023-03-03T10:54:16Z

Langchain trys to download the GPT2FastTokenizer when I run a chain. In a Lambda function this doesnt work because the Lambda is read only. Any run into this, or know how to fix this?

ellisonbg · 2023-03-04T04:55:01Z

@3coins may be able to help with this.

Joepetey · 2023-03-05T21:29:34Z

That would be great if I could get some help with this, or some documentation!

Thank you!

juankysoriano · 2023-03-05T22:22:47Z

Ah, I have also seen this issue. As I was mentioning on the langchain discord:

OpenAIChat uses GPT2TokenizerFast.from_pretrained("gpt2") for calculating the number of tokens, while tiktoken.get_encoding("gpt2") uses tiktoken.

The first one is not usable when going serverless (AWS lambdas, or Google Cloud Functions). And it might represent a deal breaker for a more than valid use case. I'd even say a desirable use case, the serverless scenario.

So far I have been able to find a workaround, but it's not ideal and will be likely to suffer breaking changes as they continue updating langchain (which happens very frequently).

The workaround is to create a custom implementation of OpenAIChat overriding the problematic function.

class TikOpenAIChat(OpenAIChat):
    def get_num_tokens(self, text: str) -> int:
        import tiktoken
        enc = tiktoken.get_encoding("gpt2")
        tokenized_text = enc.encode(text)
        return len(tokenized_text)

I hope this helps on your scenario @Joepetey , and let's aim for an official solution from @hwchase17

juankysoriano · 2023-03-05T22:31:13Z

Alternatively we could continue using GPT2FastTokenizer but we need facilitate a way of selecting where are those models going to be stored, serverless functions has typically some available space on /tmp.

Now, having to download everytime (serverless is stateless) is going to make things very slow I believe.

I am talking btw without being an expert.

juankysoriano · 2023-03-05T23:18:20Z

I have opened a PR that would solve this issue.

#1457

Joepetey · 2023-03-05T23:54:58Z

Wow thank you @juankysoriano this is so helpful!

hwchase17 · 2023-03-06T15:16:19Z

thanks @juankysoriano ! merging this in now

Solves #1412 Currently `OpenAIChat` inherits the way it calculates the number of tokens, `get_num_token`, from `BaseLLM`. In the other hand `OpenAI` inherits from `BaseOpenAI`. `BaseOpenAI` and `BaseLLM` uses different methodologies for doing this. The first relies on `tiktoken` while the second on `GPT2TokenizerFast`. The motivation of this PR is: 1. Bring consistency about the way of calculating number of tokens `get_num_token` to the `OpenAI` family, regardless of `Chat` vs `non Chat` scenarios. 2. Give preference to the `tiktoken` method as it's serverless friendly. It doesn't require downloading models which might make it incompatible with `readonly` filesystems.

juankysoriano · 2023-03-06T16:44:57Z

solved by #1457

Joepetey · 2023-03-06T16:46:15Z

Quick question, when can we expect this to be available in a release?

juankysoriano · 2023-03-06T16:56:35Z

I don't know but the owner is typically very fast on doing releases, they are very frequent. It shouldn't take more than a couple of days in my experience. Will see

juankysoriano · 2023-03-07T00:51:00Z

@Joepetey if the recent release, 0.0.102 solved your problem, the issue can be closed.

gmpetrov · 2023-03-07T14:24:06Z

Alternatively we could continue using GPT2FastTokenizer but we need facilitate a way of selecting where are those models going to be stored, serverless functions has typically some available space on /tmp.

Now, having to download everytime (serverless is stateless) is going to make things very slow I believe.

I am talking btw without being an expert.

With AWS you can mount an EFS to a lambda to cache a pre-trained model.
Checkout this example https://github.com/aws-samples/zero-administration-inference-with-aws-lambda-for-hugging-face

Also if you're using HuggingFaceEmbeddings (which uses sentence_transformers.SentenceTransformer) you need to use SENTENCE_TRANSFORMERS_HOME env variable to download the model at a specific location

Joepetey · 2023-03-07T21:19:26Z

Thank you Juanky, it worked!

…-ai#1457) Solves langchain-ai#1412 Currently `OpenAIChat` inherits the way it calculates the number of tokens, `get_num_token`, from `BaseLLM`. In the other hand `OpenAI` inherits from `BaseOpenAI`. `BaseOpenAI` and `BaseLLM` uses different methodologies for doing this. The first relies on `tiktoken` while the second on `GPT2TokenizerFast`. The motivation of this PR is: 1. Bring consistency about the way of calculating number of tokens `get_num_token` to the `OpenAI` family, regardless of `Chat` vs `non Chat` scenarios. 2. Give preference to the `tiktoken` method as it's serverless friendly. It doesn't require downloading models which might make it incompatible with `readonly` filesystems.

Solves langchain-ai/langchain#1412 Currently `OpenAIChat` inherits the way it calculates the number of tokens, `get_num_token`, from `BaseLLM`. In the other hand `OpenAI` inherits from `BaseOpenAI`. `BaseOpenAI` and `BaseLLM` uses different methodologies for doing this. The first relies on `tiktoken` while the second on `GPT2TokenizerFast`. The motivation of this PR is: 1. Bring consistency about the way of calculating number of tokens `get_num_token` to the `OpenAI` family, regardless of `Chat` vs `non Chat` scenarios. 2. Give preference to the `tiktoken` method as it's serverless friendly. It doesn't require downloading models which might make it incompatible with `readonly` filesystems.

juankysoriano mentioned this issue Mar 5, 2023

Change method to calculate number of tokens for OpenAIChat #1457

Merged

Joepetey closed this as completed Mar 7, 2023

dosubot bot mentioned this issue Aug 15, 2023

How do fix GPT2 Tokenizer error in Langchain map_reduce (LLama2)? #9273

Closed

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS Lambda - Read Only #1412

AWS Lambda - Read Only #1412

Joepetey commented Mar 3, 2023

ellisonbg commented Mar 4, 2023

Joepetey commented Mar 5, 2023

juankysoriano commented Mar 5, 2023 •

edited

Loading

juankysoriano commented Mar 5, 2023

juankysoriano commented Mar 5, 2023

Joepetey commented Mar 5, 2023

hwchase17 commented Mar 6, 2023

juankysoriano commented Mar 6, 2023

Joepetey commented Mar 6, 2023

juankysoriano commented Mar 6, 2023

juankysoriano commented Mar 7, 2023

gmpetrov commented Mar 7, 2023

Joepetey commented Mar 7, 2023

AWS Lambda - Read Only #1412

AWS Lambda - Read Only #1412

Comments

Joepetey commented Mar 3, 2023

ellisonbg commented Mar 4, 2023

Joepetey commented Mar 5, 2023

juankysoriano commented Mar 5, 2023 • edited Loading

juankysoriano commented Mar 5, 2023

juankysoriano commented Mar 5, 2023

Joepetey commented Mar 5, 2023

hwchase17 commented Mar 6, 2023

juankysoriano commented Mar 6, 2023

Joepetey commented Mar 6, 2023

juankysoriano commented Mar 6, 2023

juankysoriano commented Mar 7, 2023

gmpetrov commented Mar 7, 2023

Joepetey commented Mar 7, 2023

juankysoriano commented Mar 5, 2023 •

edited

Loading