Retrieval Enhanced Generative Question Answering with OpenAI - Dictionary problem #175

Avs-safety · 2023-04-12T12:59:41Z

Hi,

I've been following the code here: https://github.com/pinecone-io/examples/blob/master/generation/generative-qa/openai/gen-qa-openai/gen-qa-openai.ipynb with a different dataset, and get an error (TypeError: string indices must be integers) for ids_batch = [x['id'] for x in meta_batch] and further down for cleaning up the meta data.

from tqdm.auto import tqdm
from time import sleep

batch_size = 100  # how many embeddings we create and insert at once

for i in tqdm(range(0, len(new_data), batch_size)):
    # find end of batch
    i_end = min(len(new_data), i+batch_size)
    meta_batch = new_data[i:i_end]
    # get ids
    ids_batch = [x['id'] for x in meta_batch]
    # get texts to encode
    texts = [x['text'] for x in meta_batch]
    # create embeddings (try-except added to avoid RateLimitError)
    try:
        res = openai.Embedding.create(input=texts, engine=embed_model)
    except:
        done = False
        while not done:
            sleep(5)
            try:
                res = openai.Embedding.create(input=texts, engine=embed_model)
                done = True
            except:
                pass
    embeds = [record['embedding'] for record in res['data']]
    # cleanup metadata
    meta_batch = [{
        'start': x['start'],
        'end': x['end'],
        'title': x['title'],
        'text': x['text'],
        'url': x['url'],
        'published': x['published'],
        'channel_id': x['channel_id']
    } for x in meta_batch]
    to_upsert = list(zip(ids_batch, embeds, meta_batch))
    # upsert to Pinecone
    index.upsert(vectors=to_upsert)

I have got around this by amending the code as shown below. However I'm not sure why there is a problem with accessing the dictionary keys (plus I'm not sure if my amendments make the vectors inaccurate) - Appreciate any help you can provide!

from tqdm.auto import tqdm
from time import sleep

batch_size = 100  # how many embeddings we create and insert at once

for i in tqdm(range(0, len(data), batch_size)):
    # find end of batch
    i_end = min(len(data), i+batch_size)
    meta_batch = data[i:i_end]
    # get ids
    ids_batch = meta_batch['acn_num_ACN']#[x['acn_num_ACN'] for x in meta_batch]
    # get texts to encode
    texts = meta_batch['Report 1_Narrative']#[x['Report 1_Narrative'] for x in meta_batch]
    # create embeddings (try-except added to avoid RateLimitError)
    try:
        res = openai.Embedding.create(input=texts, engine=embed_model)
    except:
        done = False
        while not done:
            sleep(5)
            try:
                res = openai.Embedding.create(input=texts, engine=embed_model)
                done = True
            except:
                pass
    embeds = [record['embedding'] for record in res['data']]
    # cleanup metadata
    #meta_batch = [{
  #      'acn_num_ACN': x['acn_num_ACN'],
  #      'Time_Date': x['Time_Date'],
  #     'Report 1_Narrative': x['Report 1_Narrative'],
   #     'Report 1.2_Synopsis': x['Report 1.2_Synopsis']
  #  } for x in meta_batch]
    #meta_batch = [{'acn_num_ACN': ['acn_num_ACN'], 'Time_Date': ['Time_Date'], 'Report 1_Narrative': ['Report 1_Narrative'], 'Report 1.2_Synopsis': ['Report 1.2_Synopsis']}]
    to_upsert = list(zip(ids_batch, embeds, data))
    # upsert to Pinecone
    index.upsert(vectors=to_upsert)

The text was updated successfully, but these errors were encountered:

DosticJelena · 2023-08-22T10:46:45Z

Hello, It appears that you are iterating over strings ([x['id'] for x in meta_batch]) instead of objects. So, when you try to access x['id'], you're encountering an error because x is being treated as a string.

The error message ""string indices must be integers"" is indicating this issue. It's likely that you provided the "new_data" object in an incorrect type or shape.

To resolve this, you need to update the object to have the correct format: a list of dictionaries containing the 'id' field. This adjustment should help you overcome the error you're facing.

DosticJelena closed this as completed Aug 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieval Enhanced Generative Question Answering with OpenAI - Dictionary problem #175

Retrieval Enhanced Generative Question Answering with OpenAI - Dictionary problem #175

Avs-safety commented Apr 12, 2023

DosticJelena commented Aug 22, 2023 •

edited

Retrieval Enhanced Generative Question Answering with OpenAI - Dictionary problem #175

Retrieval Enhanced Generative Question Answering with OpenAI - Dictionary problem #175

Comments

Avs-safety commented Apr 12, 2023

DosticJelena commented Aug 22, 2023 • edited

DosticJelena commented Aug 22, 2023 •

edited