Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPTSimpleVectorIndex throwing #617

Closed
epicshardz opened this issue Mar 5, 2023 · 11 comments
Closed

GPTSimpleVectorIndex throwing #617

epicshardz opened this issue Mar 5, 2023 · 11 comments

Comments

@epicshardz
Copy link

Getting some sort of Authentication error during embedding.
Running in local Jupyter notebook and tried in google colabs.
Testing with only "paul_graham_essay.txt"
Did pip install llama-index and gpt-index, both same result.
To answer the first question yes, my API key is correct and I ran the following where "my actual key" is replaced with key from openai:

import os
os.environ["OPENAI_API_KEY"] ="my actual key"

Line of code that threw error:

index = GPTSimpleVectorIndex(documents, llm_predictor=llm_predictor)

Thanks in advanced!

Error: AuthenticationError                       Traceback (most recent call last)
File [c:\Python310\lib\site-packages\tenacity\__init__.py:409](file:///C:/Python310/lib/site-packages/tenacity/__init__.py:409), in Retrying.__call__(self, fn, *args, **kwargs)
    408 try:
--> 409     result = fn(*args, **kwargs)
    410 except BaseException:  # noqa: B902

File [c:\Python310\lib\site-packages\gpt_index\embeddings\openai.py:147](file:///C:/Python310/lib/site-packages/gpt_index/embeddings/openai.py:147), in get_embeddings(list_of_text, engine)
    145 list_of_text = [text.replace("\n", " ") for text in list_of_text]
--> 147 data = openai.Embedding.create(input=list_of_text, engine=engine).data
    148 data = sorted(data, key=lambda x: x["index"])  # maintain the same order as input.

File [~\AppData\Roaming\Python\Python310\site-packages\openai\api_resources\embedding.py:33](https://file+.vscode-resource.vscode-cdn.net/d%3A/Downloaded/Testing/~/AppData/Roaming/Python/Python310/site-packages/openai/api_resources/embedding.py:33), in Embedding.create(cls, *args, **kwargs)
     32 try:
---> 33     response = super().create(*args, **kwargs)
     35     # If a user specifies base64, we'll just return the encoded string.
     36     # This is only for the default case.

File [~\AppData\Roaming\Python\Python310\site-packages\openai\api_resources\abstract\engine_api_resource.py:149](https://file+.vscode-resource.vscode-cdn.net/d%3A/Downloaded/Testing/~/AppData/Roaming/Python/Python310/site-packages/openai/api_resources/abstract/engine_api_resource.py:149), in EngineAPIResource.create(cls, api_key, api_base, api_type, request_id, api_version, organization, **params)
    127 @classmethod
    128 def create(
    129     cls,
   (...)
    136     **params,
    137 ):
...
--> 363     raise retry_exc from fut.exception()
    365 if self.wait:
    366     sleep = self.wait(retry_state=retry_state)

RetryError: RetryError[]
@jerryjliu
Copy link
Collaborator

could you try generating a new openai api key, running the above cell, and seeing if the issue still persists?

@epicshardz
Copy link
Author

@jerryjliu That's the first thing that I did.

@jerryjliu
Copy link
Collaborator

hmm. could you try setting export OPENAI_API_KEY= in the terminal, and then running those commands in a python interpreter?

@jerryjliu
Copy link
Collaborator

@mrsmirf bump on this

@jerryjliu
Copy link
Collaborator

@mrsmirf going to close for now, feel free to reopen if you have an update on the issue

@g-benton
Copy link

I'm also experience this issue while trying to use a custom LLM (instead of the OpenAI API)

Below is a minimum working example using the Paul Graham essay as the document.

from llama_index import GPTSimpleVectorIndex, SimpleDirectoryReader, PromptHelper, LLMPredictor, ServiceContext
import torch
from langchain.llms.base import LLM
from transformers import pipeline
from typing import Optional, List, Mapping, Any

# define prompt helper
# set maximum input size
max_input_size = 512
# set number of output tokens
num_output = 512
# set maximum chunk overlap
max_chunk_overlap = 20
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)


class CustomLLM(LLM):
    model_name = "t5-small"
    pipeline = pipeline("text2text-generation", model=model_name, device="cuda:0", model_kwargs={"torch_dtype":torch.bfloat16})

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        response = self.pipeline(prompt)[0]["generated_text"]

        # only return newly generated tokens
        return response

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": self.model_name}

    @property
    def _llm_type(self) -> str:
        return "custom"

llm_predictor = LLMPredictor(llm=CustomLLM())

service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)


documents = SimpleDirectoryReader('data').load_data()
index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

@HoustonMuzamhindo
Copy link

Hi @jerryjliu . I have the same error while following your tutorial on Medium: https://app.pinecone.io/organizations/-NRbDE2qTz4DKjcRqFRF/projects/us-west4-gcp:119edac/indexes/pdfextracts

I have installed packages with same version as the Colab file but when running this part, I get endless API error, though on the side the API key does work

index_set = {}
for year in years:
cur_index = GPTSimpleVectorIndex(doc_set[year])
index_set[year] = cur_index
cur_index.save_do_disk(f'index_{year}.json')

@HoustonMuzamhindo
Copy link

Update: I created my own person API key not the corporate one and it worked. Could this be something to do with it? I don't know how the API keys work. My personal one looks completely different to the one we use at the corporate.

@vishalp-simplecrm
Copy link

How to update simplevector index if some new files are added on google drive folder

import os
import pickle
from langchain import OpenAI
from flask import Flask, render_template, request
from google.auth.transport.requests import Request
from google_auth_oauthlib.flow import InstalledAppFlow
from llama_index import LLMPredictor, GPTSimpleVectorIndex, PromptHelper, ServiceContext, download_loader,MockLLMPredictor,MockEmbedding,Document
from langchain.chat_models import ChatOpenAI

os.environ['OPENAI_API_KEY'] = 'my key'

def authorize_gdocs():
google_oauth2_scopes = [
"https://www.googleapis.com/auth/drive.readonly",
"https://www.googleapis.com/auth/documents.readonly"
]
cred = None
if os.path.exists("token.pickle"):
with open("token.pickle", 'rb') as token:
cred = pickle.load(token)
if not cred or not cred.valid:
if cred and cred.expired and cred.refresh_token:
cred.refresh(Request())
else:
flow = InstalledAppFlow.from_client_secrets_file("client_secrets.json", google_oauth2_scopes)
cred = flow.run_local_server(port=0)
with open("token.pickle", 'wb') as token:
pickle.dump(cred, token)

authorize_gdocs()

GoogleDriveReader = download_loader('GoogleDriveReader')
folder_id = '1h02mlinlWRZdxjf1WotZ3GERNDSoGCmk'
loader = GoogleDriveReader()
documents = loader.load_data(folder_id=folder_id)

Define LLM

llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0.7, model_name="gpt-3.5-turbo"))

Define prompt helper

max_input_size = 4096
num_output = 512
max_chunk_overlap = 20
chunk_size_limit = 600
prompt_helper = PromptHelper(max_input_size, num_output,max_chunk_overlap,chunk_size_limit=chunk_size_limit)
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

Create index from documents only one time then comment it

index = GPTSimpleVectorIndex.from_documents(documents, service_context=service_context)

Save your index to a index.json file

index.save_to_disk('index1.json')

index = GPTSimpleVectorIndex.load_from_disk('index1.json')

@astro313
Copy link

astro313 commented Jun 21, 2023

Update: I created my own person API key not the corporate one and it worked. Could this be something to do with it? I don't know how the API keys work. My personal one looks completely different to the one we use at the corporate.

Update: I created my own person API key not the corporate one and it worked. Could this be something to do with it? I don't know how the API keys work. My personal one looks completely different to the one we use at the corporate.

Same error with Corporate Key, don't have a personal one.

Error solved by adding: openai.api_key = os.getenv("OPENAI_API_KEY")

@gp-birender
Copy link

still getting error with both keys

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants