# Retrievers

This notebook shows how to use the retrievers in the LangChain Europe PMC package.

## EuropePMCRetriever

The `EuropePMCRetriever` retrieves scientific articles from Europe PMC, a repository of biomedical and life sciences literature. It uses the Europe PMC API to search for articles based on a query and returns them as Document objects.

In [2]:
from langchain_europe_pmc.retrievers import EuropePMCRetriever

# Initialize the retriever with default parameters
retriever = EuropePMCRetriever()

# Search for articles about cancer
docs = retriever.invoke("malaria")

# Print the first document
print(f"Found {len(docs)} documents")
if docs:
    print("\nFirst document:")
    print(docs[0].page_content)

Found 3 documents

First document:
# Markers of neutrophil activation and some immune and haematological indices in malaria infection during pregnancy.

##Abstract

#### Background

Neutrophils are the first responders to pathogen invasion and are important first-line defenders. The defence mechanism of activated neutrophils includes neutrophil extracellular traps (NETs) formation that immobilize pathogens, stop their spread within the tissues, and ultimately kill them. However, their roles in the context of malaria during pregnancy are still elusive. This study was conducted to investigate markers of neutrophil activation as well as immunological and haematological cellular responses during Plasmodium infection in pregnancy.

#### Method

A total of 340 pregnant women aged between 19 and 42 years were recruited for this study carried out in South-east, Nigeria. All the subjects were tested for malaria parasite (MP) status. Those infected with human immunodeficiency virus (HIV) and tho

### Customizing the Retriever

You can customize the retriever by specifying parameters such as the number of results to return, the maximum query length, and the result type.

In [3]:
# Initialize the retriever with custom parameters
retriever = EuropePMCRetriever(
    top_k_results=5,  # Return 5 results instead of the default 3
    result_type="core"  # Use the core result type
)

# Search for articles about CRISPR gene editing
docs = retriever.invoke("CRISPR gene editing")

# Print the number of documents and their titles
print(f"Found {len(docs)} documents\n")
for i, doc in enumerate(docs):
    title = doc.metadata.get("title", "No title available")
    print(f"{i+1}. {title}")

Found 3 documents

1. Vectors in CRISPR Gene Editing for Neurological Disorders: Challenges and Opportunities.
2. Research advances CRISPR gene editing technology generated models in the study of epithelial ovarian carcinoma.
3. CRISPR Gene-Editing Combat: Targeting AIDS for total eradication.


### Using the Retriever in a Chain

You can use the retriever in a LangChain chain to answer questions based on the retrieved documents.

In [None]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

# Initialize the retriever
retriever = EuropePMCRetriever(top_k_results=3)

# Create a prompt template
prompt = ChatPromptTemplate.from_template(
    """Answer the question based only on the context provided.

Context: {context}

Question: {question}"""
)

# Initialize the language model
llm = ChatOpenAI(model="gpt-3.5-turbo-0125")

# Function to format documents
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# Create the chain
chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)

# Run the chain
response = chain.invoke("What are the latest advances in CRISPR gene editing for cystic fibrosis?")
print(response)

### Accessing Document Metadata

Each document returned by the retriever includes metadata such as the title, authors, journal, year, PMID, DOI, and URL. You can access this metadata to get more information about the retrieved documents.

In [None]:
# Initialize the retriever
retriever = EuropePMCRetriever(top_k_results=1)

# Search for articles about Alzheimer's disease
docs = retriever.invoke("Alzheimer's disease")

# Print the metadata of the first document
if docs:
    doc = docs[0]
    print("Document Metadata:")
    for key, value in doc.metadata.items():
        print(f"{key}: {value}")
    
    # Print the URL to access the article
    print(f"\nAccess the article at: {doc.metadata.get('url', 'URL not available')}")