[Retrieval] Is the PineconeRM functional?

# Overview
I wanted to test out this library for a project, but I have hit so many roadblocks that I do not think this library is even functional. Here is my code I have been testing with, built following documentation in this repo (I have replaced sensitive information throughout with `...`)

```python3
import dspy
from dspy.evaluate import Evaluate
from dspy.evaluate.metrics import answer_exact_match
from dspy.retrieve.pinecone_rm import PineconeRM
from dspy.teleprompt import BootstrapFewShot, BootstrapFewShotWithRandomSearch


class RAG(dspy.Module):
    def __init__(self, num_passages: int = 3):
        super().__init__()
        # declare three modules: the retriever, a query generator, and an answer generator
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate_query = dspy.ChainOfThought("question -> search_query")
        self.generate_answer = dspy.ChainOfThought("context, question -> answer")
    
    def forward(self, question: str):
        # generate a search query from the question, and use it to retrieve passages
        search_query = self.generate_query(question=question).search_query
        passages = self.retrieve(query_or_queries=search_query).passages

        # generate an answer from the passages and the question
        return self.generate_answer(context=passages, question=question)


turbo = dspy.OpenAI(
    model="gpt-4-1106-preview",
    api_key="...")


retriever_model = PineconeRM(
    pinecone_index_name="...",
    pinecone_api_key="...",
    pinecone_env="...",
    openai_embed_model="text-embedding-3-small",
    openai_api_key="...",
    k=3)


dspy.settings.configure(
    lm=turbo,
    rm=retriever_model,
)


train = [('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...')]
train = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in train]


dev = [('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...'), ('...', '...')]
dev = [dspy.Example(question=question, answer=answer).with_inputs('question') for question, answer in dev]


teleprompter = BootstrapFewShot(metric=answer_exact_match, max_bootstrapped_demos=2)
teleprompter2 = BootstrapFewShotWithRandomSearch(metric=answer_exact_match, max_bootstrapped_demos=2, num_candidate_programs=8, num_threads=32)


rag_compiled = teleprompter.compile(RAG(), trainset=train)

rag_compiled2 = teleprompter2.compile(RAG(), trainset=train, valset=dev)


evaluate_hotpot = Evaluate(devset=dev, metric=answer_exact_match, num_threads=32, display_progress=True, display_table=15)
evaluate_hotpot(rag_compiled)
evaluate_hotpot(rag_compiled2)
rag_compiled("...")
rag_compiled2("...")
```

# Shortlist of errors encountered
## Improper check
Filename: `dspy/retrieve/pinecone_rm.py`
Function name: `_get_embeddings()`
Description: Before checking if the user has set `self.use_local_model`, the function is checking if `torch` is installed. This dependency is not needed if the user is using OpenAI embeddings, however with this current logic DSPy is forcing users to have it installed no matter what. Reordering this function like below solves this problem
```python3
if not self.use_local_model:
    if OPENAI_LEGACY:
        embedding = openai.Embedding.create(
            input=queries, model=self._openai_embed_model
        )
    else:
        embedding = openai.embeddings.create(
            input=queries, model=self._openai_embed_model
        ).model_dump()
    return [embedding["embedding"] for embedding in embedding["data"]]

try:
    import torch
except ImportError as exc:
    raise ModuleNotFoundError(
        "You need to install torch to use a local embedding model with PineconeRM."
    ) from exc
```


## Parameters being passed incorrectly
Filename: `dspy/retrieve/pinecone_rm.py`
Function name: `forward()`
Description: Somewhere up the chain of calls (i believe in `dsp/primitives/search.py` -> `retrieveEnsemble()`) `k` is being passed to the `forward()` function of `PineconeRM`. I remedied this by adding an unused `k` to the function definition 
```python3
def forward(self, query_or_queries: Union[str, List[str]], k: int) -> dspy.Prediction:
```
But this was just a short term fix as I was trying to get this script functional for testing.


## Passages has no value `long_text`
Filename: `dsp/primitives/search.py`
Function name: `retrieve()`
Description: This line was encountering issues accessing the attribute `.long_text`
```python3
passages = [psg.long_text for psg in passages]
```
If the passages in this list are dictionaries, shouldn't we be using `.get("long_text")`? Either way, at this point each passage was a string in this list and I am no longer sure if my debugging is making things better or worse.

Looking for some guidance if I am wildly off base or if this `PineconeRM` is not operational. Thanks

### Additional Info
My environment is using `python 3.10.13` and i have `dspy-ai==2.1.6` installed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Retrieval] Is the PineconeRM functional? #322

Overview

Shortlist of errors encountered

Improper check

Parameters being passed incorrectly

Passages has no value `long_text`

Additional Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Retrieval] Is the PineconeRM functional? #322

Description

Overview

Shortlist of errors encountered

Improper check

Parameters being passed incorrectly

Passages has no value long_text

Additional Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Passages has no value `long_text`