Missing Azure OpenAI support for "OpenAIEmbeddings" #1577

tunayokumus · 2023-03-10T09:06:05Z

It's currently not possible to pass a custom deployment name as model/deployment names are hard-coded as "text-embedding-ada-002" in variables within the class definition.

In Azure OpenAI, the deployment names can be customized and that doesn't work with OpenAIEmbeddings class.

There is proper Azure support for LLM OpenAI, but it is missing for Embeddings.

lakshyaag · 2023-03-10T09:13:22Z

Same issue as #1560

JonAtDocuWare · 2023-03-10T10:33:20Z

FYI I worked around this issue with naming my deployment name "text-embedding-ada-002" using the model of the same name. The next issue I ran into was hitting the rate limiter when using embeddings, as the default is 300(?) requests per minute. I had to add my own retrying functionality to the /embeddings/openai.py code, but I am using an older version (0.088?) and it looks like in more recent commits better retrying is being built in.

just in case it helps you, here is my embed_documents function, with modifications at the end (last else case):

import tenacity
from tenacity import retry
...

def embed_documents(
        self, texts: List[str], chunk_size: int = 1000
    ) -> List[List[float]]:
        """Call out to OpenAI's embedding endpoint for embedding search docs.

        Args:
            texts: The list of texts to embed.
            chunk_size: The maximum number of texts to send to OpenAI at once
                (max 1000).

        Returns:
            List of embeddings, one for each text.
        """
        # handle large batches of texts
        if self.embedding_ctx_length > 0:
            return self._get_len_safe_embeddings(
                texts, engine=self.document_model_name, chunk_size=chunk_size
            )
        else:
            @retry(wait=tenacity.wait_fixed(10), stop=tenacity.stop_after_attempt(60))
            def addText(text):
                embedding = self._embedding_func(text, engine=self.document_model_name)
                return embedding
            responses = []
            for text in texts:
                responses.append(addText(text))
                    
            return responses

geg00 · 2023-03-22T12:30:13Z

The issue comes when you try to use GPT-3.5-turbo
Azure does not let you name the deployments with '.'

JonAtDocuWare · 2023-03-22T12:33:06Z

The issue comes when you try to use GPT-3.5-turbo Azure does not let you name the deployments with '.'

Can you create embeddings with GPT-3.5-turbo?

geg00 · 2023-03-22T12:45:34Z

I'm using
embeddings = OpenAIEmbeddings(chunk_size=1) for embedings
By default OpenAIEmbeddings uses text-embedding-ada-002
Create a model deployment name
text-embedding-ada-002 with the model text-embedding-002

Aure will not work unless you have chunk_size=1

JonAtDocuWare · 2023-03-22T13:03:33Z

The issue comes when you try to use GPT-3.5-turbo Azure does not let you name the deployments with '.'

So this is not really an issue for OpenAIEmbeddings

tunayokumus · 2023-04-18T08:44:10Z

Opened a PR to fix this issue by separating deployment name and model name:
#3076

dosubot · 2023-09-20T16:04:57Z

Hi, @tunayokumus! I'm Dosu, and I'm helping the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.

From what I understand, the issue is about the missing Azure OpenAI support for "OpenAIEmbeddings". The model and deployment names are hard-coded and cannot be customized. One user suggested a workaround by naming the deployment "text-embedding-ada-002" using the model of the same name. Another user mentioned hitting the rate limiter when using embeddings and added their own retrying functionality.

However, it seems that the original author has addressed the issue by opening a pull request to fix it.

Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on the issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your contribution to the LangChain repository!

tunayokumus mentioned this issue Apr 20, 2023

fix: separate model and deployment for OpenAIEmbeddings #3076

Merged

trangdata mentioned this issue Jul 12, 2023

Use of Azure endpoints? wch/chatstream#8

Open

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 20, 2023

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 27, 2023

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Sep 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing Azure OpenAI support for "OpenAIEmbeddings" #1577

Missing Azure OpenAI support for "OpenAIEmbeddings" #1577

tunayokumus commented Mar 10, 2023

lakshyaag commented Mar 10, 2023

JonAtDocuWare commented Mar 10, 2023

geg00 commented Mar 22, 2023

JonAtDocuWare commented Mar 22, 2023

geg00 commented Mar 22, 2023

JonAtDocuWare commented Mar 22, 2023

tunayokumus commented Apr 18, 2023

dosubot bot commented Sep 20, 2023

Missing Azure OpenAI support for "OpenAIEmbeddings" #1577

Missing Azure OpenAI support for "OpenAIEmbeddings" #1577

Comments

tunayokumus commented Mar 10, 2023

lakshyaag commented Mar 10, 2023

JonAtDocuWare commented Mar 10, 2023

geg00 commented Mar 22, 2023

JonAtDocuWare commented Mar 22, 2023

geg00 commented Mar 22, 2023

JonAtDocuWare commented Mar 22, 2023

tunayokumus commented Apr 18, 2023

dosubot bot commented Sep 20, 2023