GPTSimpleVectorIndex throwing #617

epicshardz · 2023-03-05T16:16:11Z

Getting some sort of Authentication error during embedding.
Running in local Jupyter notebook and tried in google colabs.
Testing with only "paul_graham_essay.txt"
Did pip install llama-index and gpt-index, both same result.
To answer the first question yes, my API key is correct and I ran the following where "my actual key" is replaced with key from openai:

import os
os.environ["OPENAI_API_KEY"] ="my actual key"

Line of code that threw error:

index = GPTSimpleVectorIndex(documents, llm_predictor=llm_predictor)

Thanks in advanced!

Error: AuthenticationError                       Traceback (most recent call last)
File [c:\Python310\lib\site-packages\tenacity\__init__.py:409](file:///C:/Python310/lib/site-packages/tenacity/__init__.py:409), in Retrying.__call__(self, fn, *args, **kwargs)
    408 try:
--> 409     result = fn(*args, **kwargs)
    410 except BaseException:  # noqa: B902

File [c:\Python310\lib\site-packages\gpt_index\embeddings\openai.py:147](file:///C:/Python310/lib/site-packages/gpt_index/embeddings/openai.py:147), in get_embeddings(list_of_text, engine)
    145 list_of_text = [text.replace("\n", " ") for text in list_of_text]
--> 147 data = openai.Embedding.create(input=list_of_text, engine=engine).data
    148 data = sorted(data, key=lambda x: x["index"])  # maintain the same order as input.

File [~\AppData\Roaming\Python\Python310\site-packages\openai\api_resources\embedding.py:33](https://file+.vscode-resource.vscode-cdn.net/d%3A/Downloaded/Testing/~/AppData/Roaming/Python/Python310/site-packages/openai/api_resources/embedding.py:33), in Embedding.create(cls, *args, **kwargs)
     32 try:
---> 33     response = super().create(*args, **kwargs)
     35     # If a user specifies base64, we'll just return the encoded string.
     36     # This is only for the default case.

File [~\AppData\Roaming\Python\Python310\site-packages\openai\api_resources\abstract\engine_api_resource.py:149](https://file+.vscode-resource.vscode-cdn.net/d%3A/Downloaded/Testing/~/AppData/Roaming/Python/Python310/site-packages/openai/api_resources/abstract/engine_api_resource.py:149), in EngineAPIResource.create(cls, api_key, api_base, api_type, request_id, api_version, organization, **params)
    127 @classmethod
    128 def create(
    129     cls,
   (...)
    136     **params,
    137 ):
...
--> 363     raise retry_exc from fut.exception()
    365 if self.wait:
    366     sleep = self.wait(retry_state=retry_state)

RetryError: RetryError[]

The text was updated successfully, but these errors were encountered:

jerryjliu · 2023-03-06T00:07:03Z

could you try generating a new openai api key, running the above cell, and seeing if the issue still persists?

epicshardz · 2023-03-06T02:33:45Z

@jerryjliu That's the first thing that I did.

jerryjliu · 2023-03-06T21:26:41Z

hmm. could you try setting export OPENAI_API_KEY= in the terminal, and then running those commands in a python interpreter?

jerryjliu · 2023-03-08T18:00:50Z

@mrsmirf bump on this

jerryjliu · 2023-03-09T20:18:21Z

@mrsmirf going to close for now, feel free to reopen if you have an update on the issue

g-benton · 2023-03-29T20:27:45Z

I'm also experience this issue while trying to use a custom LLM (instead of the OpenAI API)

Below is a minimum working example using the Paul Graham essay as the document.

from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader, PromptHelper, LLMPredictor, ServiceContext
import torch
from langchain.llms.base import LLM
from transformers import pipeline
from typing import Optional, List, Mapping, Any

# define prompt helper
# set maximum input size
max_input_size = 512
# set number of output tokens
num_output = 512
# set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)


class CustomLLM(LLM):
    model_name = "t5-small"
    pipeline = pipeline("text2text-generation", model=model_name, device="cuda:0", model_kwargs={"torch_dtype":torch.bfloat16})

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        response = self.pipeline(prompt)[0]["generated_text"]

        # only return newly generated tokens
        return response

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": self.model_name}

    @property
    def _llm_type(self) -> str:
        return "custom"

llm_predictor = LLMPredictor(llm=CustomLLM())

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)


documents = SimpleDirectoryReader('data').load_data()
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

HoustonMuzamhindo · 2023-03-31T14:17:33Z

Hi @jerryjliu . I have the same error while following your tutorial on Medium: https://app.pinecone.io/organizations/-NRbDE2qTz4DKjcRqFRF/projects/us-west4-gcp:119edac/indexes/pdfextracts

I have installed packages with same version as the Colab file but when running this part, I get endless API error, though on the side the API key does work

index_set = {}
for year in years:
cur_index = GPTSimpleVectorIndex(doc_set[year])
index_set[year] = cur_index
cur_index.save_do_disk(f'index_{year}.json')

HoustonMuzamhindo · 2023-03-31T14:33:07Z

Update: I created my own person API key not the corporate one and it worked. Could this be something to do with it? I don't know how the API keys work. My personal one looks completely different to the one we use at the corporate.

vishalp-simplecrm · 2023-05-02T13:15:58Z

How to update simplevector index if some new files are added on google drive folder

import os
import pickle
from langchain import OpenAI
from flask import Flask, render_template, request
from google.auth.transport.requests import Request
from google_auth_oauthlib.flow import InstalledAppFlow
from llama_index import LLMPredictor, GPTSimpleVectorIndex, PromptHelper, ServiceContext, download_loader,MockLLMPredictor,MockEmbedding,Document
from langchain.chat_models import ChatOpenAI

os.environ['OPENAI_API_KEY'] = 'my key'

def authorize_gdocs():
google_oauth2_scopes = [
"https://www.googleapis.com/auth/drive.readonly",
"https://www.googleapis.com/auth/documents.readonly"
]
cred = None
if os.path.exists("token.pickle"):
with open("token.pickle", 'rb') as token:
cred = pickle.load(token)
if not cred or not cred.valid:
if cred and cred.expired and cred.refresh_token:
cred.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file("client_secrets.json", google_oauth2_scopes)
cred = flow.run_local_server(port=0)
with open("token.pickle", 'wb') as token:
pickle.dump(cred, token)

authorize_gdocs()

GoogleDriveReader = download_loader('GoogleDriveReader')
folder_id = '1h02mlinlWRZdxjf1WotZ3GERNDSoGCmk'
loader = GoogleDriveReader()
documents = loader.load_data(folder_id=folder_id)

Define LLM

llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0.7, model_name="gpt-3.5-turbo"))

Define prompt helper

max_input_size = 4096
num_output = 512
max_chunk_overlap = 20
chunk_size_limit = 600
prompt_helper = PromptHelper(max_input_size, num_output,max_chunk_overlap,chunk_size_limit=chunk_size_limit)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

Create index from documents only one time then comment it

index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

Save your index to a index.json file

index.save_to_disk('index1.json')

index = GPTSimpleVectorIndex.load_from_disk('index1.json')

astro313 · 2023-06-21T19:22:47Z

Update: I created my own person API key not the corporate one and it worked. Could this be something to do with it? I don't know how the API keys work. My personal one looks completely different to the one we use at the corporate.

Same error with Corporate Key, don't have a personal one.

Error solved by adding: openai.api_key = os.getenv("OPENAI_API_KEY")

gp-birender · 2023-08-04T08:11:36Z

still getting error with both keys

jerryjliu closed this as completed Mar 9, 2023

g-benton mentioned this issue Mar 29, 2023

GPTSimpleVectorIndex Throwing Error #975

Closed

dosubot bot mentioned this issue Aug 24, 2023

[Bug]: Unable to build the VectorStoreIndex from simple PDF documents #7391

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPTSimpleVectorIndex throwing #617

GPTSimpleVectorIndex throwing #617

epicshardz commented Mar 5, 2023

jerryjliu commented Mar 6, 2023

epicshardz commented Mar 6, 2023

jerryjliu commented Mar 6, 2023

jerryjliu commented Mar 8, 2023

jerryjliu commented Mar 9, 2023

g-benton commented Mar 29, 2023

HoustonMuzamhindo commented Mar 31, 2023

HoustonMuzamhindo commented Mar 31, 2023

vishalp-simplecrm commented May 2, 2023

astro313 commented Jun 21, 2023 •

edited

Loading

gp-birender commented Aug 4, 2023

GPTSimpleVectorIndex throwing #617

GPTSimpleVectorIndex throwing #617

Comments

epicshardz commented Mar 5, 2023

jerryjliu commented Mar 6, 2023

epicshardz commented Mar 6, 2023

jerryjliu commented Mar 6, 2023

jerryjliu commented Mar 8, 2023

jerryjliu commented Mar 9, 2023

g-benton commented Mar 29, 2023

HoustonMuzamhindo commented Mar 31, 2023

HoustonMuzamhindo commented Mar 31, 2023

vishalp-simplecrm commented May 2, 2023

Define LLM

Define prompt helper

Create index from documents only one time then comment it

Save your index to a index.json file

astro313 commented Jun 21, 2023 • edited Loading

gp-birender commented Aug 4, 2023

astro313 commented Jun 21, 2023 •

edited

Loading