V2 embedding models handle new-lines #419

ravwojdyla · 2023-04-27T16:05:02Z

EliahKagan · 2023-07-12T18:20:55Z

openai/embeddings_utils.py

@@ -17,8 +17,9 @@
 @retry(wait=wait_random_exponential(min=1, max=20), stop=stop_after_attempt(6))
 def get_embedding(text: str, engine="text-similarity-davinci-001", **kwargs) -> List[float]:

-    # replace newlines, which can negatively affect performance.
-    text = text.replace("\n", " ")
+    if engine.endswith("001"):


Newlines should also be left in when using first-generation models that expect code inputs, so I suggest:

- if engine.endswith("001"): + if engine.endswith("-001") and not engine.endswith("-code-001"):

(This applies identically to the other three occurrences of the if check.)

RobertCraigie · 2023-11-06T17:16:54Z

Thanks for putting the time into opening a PR! The embeddings_utils.py file has been removed in v1 so this no longer applies unfortunately.

V2 embedding models handle new-lines

985b650

ravwojdyla mentioned this pull request Apr 27, 2023

Why does newline negatively impact embedding performance? #418

Closed

EliahKagan reviewed Jul 12, 2023

View reviewed changes

RobertCraigie closed this Nov 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V2 embedding models handle new-lines #419

V2 embedding models handle new-lines #419

ravwojdyla commented Apr 27, 2023

EliahKagan Jul 12, 2023

RobertCraigie commented Nov 6, 2023

V2 embedding models handle new-lines #419

V2 embedding models handle new-lines #419

Conversation

ravwojdyla commented Apr 27, 2023

EliahKagan Jul 12, 2023

Choose a reason for hiding this comment

RobertCraigie commented Nov 6, 2023