From 43b07cbb2b4baafbb9ecd7b71f782890bac5ee57 Mon Sep 17 00:00:00 2001 From: Islem Maboud Date: Tue, 23 Sep 2025 09:44:22 +0100 Subject: [PATCH 1/4] feat: add langchain-perigon integration documentation --- .../integrations/providers/all_providers.mdx | 8 + .../python/integrations/providers/index.mdx | 1 + .../python/integrations/providers/perigon.mdx | 61 ++++ .../integrations/retrievers/perigon.mdx | 289 ++++++++++++++++++ 4 files changed, 359 insertions(+) create mode 100644 src/oss/python/integrations/providers/perigon.mdx create mode 100644 src/oss/python/integrations/retrievers/perigon.mdx diff --git a/src/oss/python/integrations/providers/all_providers.mdx b/src/oss/python/integrations/providers/all_providers.mdx index b089d4a41..f294d509f 100644 --- a/src/oss/python/integrations/providers/all_providers.mdx +++ b/src/oss/python/integrations/providers/all_providers.mdx @@ -3012,4 +3012,12 @@ title: "All providers" cta="View provider guide" /> + + diff --git a/src/oss/python/integrations/providers/index.mdx b/src/oss/python/integrations/providers/index.mdx index 6155181a7..e7e97ddb2 100644 --- a/src/oss/python/integrations/providers/index.mdx +++ b/src/oss/python/integrations/providers/index.mdx @@ -135,6 +135,7 @@ These providers have standalone `langchain-{provider}` packages for improved ver | [MCP Toolbox](/oss/integrations/providers/toolbox-langchain/) | [toolbox-langchain](https://pypi.org/project/toolbox-langchain/) | ![Downloads](https://static.pepy.tech/badge/toolbox-langchain/month) | ![PyPI - Version](https://img.shields.io/pypi/v/toolbox-langchain?style=flat-square&label=%20&color=orange) | ❌ | | [Scrapeless](/oss/integrations/providers/scrapeless/) | [langchain-scrapeless](https://pypi.org/project/langchain-scrapeless/) | ![Downloads](https://static.pepy.tech/badge/langchain-scrapeless/month) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-scrapeless?style=flat-square&label=%20&color=orange) | ❌ | | [ZeusDB](/oss/integrations/providers/zeusdb/) | [langchain-zeusdb](https://pypi.org/project/langchain-zeusdb/) | ![Downloads](https://static.pepy.tech/badge/langchain-zeusdb/month) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-zeusdb?style=flat-square&label=%20&color=orange) | ❌ | +| [Perigon](/oss/integrations/providers/perigon_search/) | [langchain-perigon](https://pypi.org/project/langchain-perigon/) | ![Downloads](https://static.pepy.tech/badge/langchain-perigon/month) | ![PyPI - Version](https://img.shields.io/pypi/v/langchain-perigon?style=flat-square&label=%20&color=orange) | ❌ | ## All Providers diff --git a/src/oss/python/integrations/providers/perigon.mdx b/src/oss/python/integrations/providers/perigon.mdx new file mode 100644 index 000000000..4186e847d --- /dev/null +++ b/src/oss/python/integrations/providers/perigon.mdx @@ -0,0 +1,61 @@ +--- +title: Perigon +--- + +>[Perigon](https://perigon.io/) is a comprehensive news API that provides access to real-time contextual information in news articles, stories, metadata and wikipedia pages from thousands of sources worldwide. +> + +## Installation and Setup + +`Perigon` integration exists in its own [partner package](https://pypi.org/project/langchain-perigon/). You can install it with: + +```python +%pip install -qU langchain-perigon +``` + +In order to use the package, you will also need to set the `PERIGON_API_KEY` environment variable to your Perigon API key. + +## Retrievers + +Perigon provides two retrievers: + +### ArticlesRetriever + +This retriever retrieves articles based on a given query and optional filters. + +See a [full usage example](/oss/integrations/retrievers/perigon#using-articlesretriever). + +```python +# Make sure PERIGON_API_KEY environment variable is set to your Perigon API key +from langchain_perigon import ArticlesRetriever, ArticlesFilter + +retriever = ArticlesRetriever(k=12) +options: ArticlesFilter = { + "showReprints": False, + "filter": {"country": "us"}, +} +documents = retriever.invoke("Recent big tech layoffs", options=options) + +if documents: + print(f"First document: {documents[0].page_content[:200]}...") +``` + +You can use the `ArticlesRetriever` in a standard retrieval pipeline. You can import it as follows. + +### WikipediaRetriever + +This retriever retrieves wikipedia pages based on a given query and optional filters. + +See a [full usage example](/oss/integrations/retrievers/perigon#using-wikipediaretriever). + +```python +# Make sure PERIGON_API_KEY environment variable is set to your Perigon API key +from langchain_perigon import WikipediaRetriever + +retriever = WikipediaRetriever(k=12) +documents = retriever.invoke("machine learning") + +print(f"First document: {documents[0].page_content[:200]}...") +``` + +You can use the `WikipediaRetriever` in a standard retrieval pipeline. You can import it as follows. diff --git a/src/oss/python/integrations/retrievers/perigon.mdx b/src/oss/python/integrations/retrievers/perigon.mdx new file mode 100644 index 000000000..9f5ed3e2f --- /dev/null +++ b/src/oss/python/integrations/retrievers/perigon.mdx @@ -0,0 +1,289 @@ +--- +title: Perigon +--- + +The Perigon API suite provides fast, structured access to global news and events, helping you build real-time, data-driven products. Whether you're tracking emerging risks, surfacing relevant articles, or uncovering key insights, Perigon gives you the tools to do it programmatically. + +Unlike traditional keyword-based search, Perigon's semantic search capabilities allow it to understand queries contextually and return relevant documents. + +This notebook demonstrates how to use Perigon's retrievers with LangChain for both news articles and Wikipedia content. + +## Setup + +### Installation + +Install the LangChain Perigon integration package: + +```python +%pip install -qU langchain-perigon + +# and some deps for this notebook +%pip install -qU langchain langchain-openai langchain-community +``` + +### Credentials + +You'll need a Perigon API key to use this integration. You can sign up at [Perigon.io](https://perigon.io/) to get your API key. + +```python +import getpass +import os + +if not os.environ.get("PERIGON_API_KEY"): + os.environ["PERIGON_API_KEY"] = getpass.getpass("Perigon API key:\n") +``` + +## Using ArticlesRetriever + +The ArticlesRetriever allows you to search through news articles using semantic search capabilities: + +### Basic Usage + +```python +from langchain_perigon import ArticlesRetriever + +# Create a new instance of the ArticlesRetriever +# PERIGON_API_KEY is set in the environment variables +retriever = ArticlesRetriever() + +# Search for articles and save the results +documents = retriever.invoke("artificial intelligence developments") + +# Print the results +print(f"Found {len(documents)} articles") +for doc in documents[:3]: # Print first 3 results + print(f"Title: {doc.metadata.get('title', 'N/A')}") + print(f"URL: {doc.metadata.get('url', 'N/A')}") + print(f"Published: {doc.metadata.get('publishedAt', 'N/A')}") + print(f"Content: {doc.page_content[:200]}...") + print("-" * 80) +``` + +### Advanced Features with Filtering + +You can use advanced filtering options to narrow down your search results: + +```python +from langchain_perigon import ArticlesRetriever, ArticlesFilter + +# Create retriever with custom parameters +# PERIGON_API_KEY is automatically read from environment variables +retriever = ArticlesRetriever( + k=10 # Number of results to return +) + +# Define advanced filter options +options: ArticlesFilter = { + "size": 10, + "showReprints": False, # Exclude reprints + "filter": { + "country": "us", # Only US articles + "category": "tech", # Technology category + "source": ["techcrunch.com", "wired.com"] # Specific sources + } +} + +# Search with filters +documents = retriever.invoke("machine learning breakthroughs", options=options) + +print(f"Found {len(documents)} filtered articles") +for doc in documents[:3]: + print(f"Title: {doc.metadata.get('title', 'N/A')}") + print(f"Source: {doc.metadata.get('source', 'N/A')}") + print(f"Category: {doc.metadata.get('category', 'N/A')}") + print(f"Content: {doc.page_content[:150]}...") + print("-" * 80) +``` + +### Location-Based Filtering + +You can filter articles by geographic relevance: + +```python +from langchain_perigon.types import ArticlesFilter +from langchain_perigon import ArticlesRetriever + +retriever = ArticlesRetriever() + +# Filter by location +location_options: ArticlesFilter = { + "size": 5, + "filter": {"country": "us", "state": "CA", "city": "San Francisco"}, +} + +documents = retriever.invoke("startup funding rounds", options=location_options) + +print(f"Found {len(documents)} San Francisco startup articles") +for doc in documents: + print(f"Title: {doc.metadata.get('title', 'N/A')}") + print("-" * 60) +``` + +## Using WikipediaRetriever + +The WikipediaRetriever provides semantic search capabilities over Wikipedia content with rich metadata: + +### Basic Usage + +```python +from langchain_perigon import WikipediaRetriever + +# Create a new instance of the WikipediaRetriever +# PERIGON_API_KEY is automatically read from environment variables +wiki_retriever = WikipediaRetriever() + +# Search for Wikipedia articles +documents = wiki_retriever.invoke("quantum computing") + +# Print the results +print(f"Found {len(documents)} Wikipedia articles") +for doc in documents[:3]: + print(f"Title: {doc.metadata.get('title', 'N/A')}") + print(f"Pageviews: {doc.metadata.get('pageviews', 'N/A')}") + print(f"Wikidata ID: {doc.metadata.get('wikidataId', 'N/A')}") + print(f"Content: {doc.page_content[:200]}...") + print("-" * 80) +``` + +### Advanced Wikipedia Search + +You can filter Wikipedia results by popularity, categories, and other metadata: + +```python +from langchain_perigon import WikipediaRetriever, WikipediaOptions + +# Create retriever with custom parameters +# PERIGON_API_KEY is automatically read from environment variables +wiki_retriever = WikipediaRetriever(k=5) + +# Define advanced filter options +wiki_options: WikipediaOptions = { + "size": 5, + "pageviewsFrom": 100, # Only popular pages with 100+ daily views + "filter": { + "wikidataInstanceOfLabel": ["academic discipline"], + "category": ["Computer science", "Physics"], + }, +} + +# Search with filters +documents = wiki_retriever.invoke("machine learning", options=wiki_options) + +print(f"Found {len(documents)} academic Wikipedia articles") +for doc in documents: + print(f"Title: {doc.metadata.get('title', 'N/A')}") + print(f"Daily pageviews: {doc.metadata.get('pageviews', 'N/A')}") + print(f"Instance of: {doc.metadata.get('wikidataInstanceOf', 'N/A')}") + print(f"Wiki code: {doc.metadata.get('wikiCode', 'N/A')}") + print("-" * 80) +``` + +### Time-Based Wikipedia Filtering + +Filter Wikipedia articles by revision dates: + +```python +from langchain_perigon import WikipediaRetriever, WikipediaOptions + +wiki_retriever = WikipediaRetriever() + +# Filter by recent revisions +recent_options: WikipediaOptions = { + "size": 10, + "wiki_revision_from": "2025-09-22T00:00:00.000", # Recently updated articles + "filter": {"with_pageviews": True}, # Only articles with pageview data +} + +documents = wiki_retriever.invoke("artificial intelligence", options=recent_options) + +print(f"Found {len(documents)} recently updated AI articles") +for doc in documents: + print(f"Title: {doc.metadata.get('title', 'N/A')}") + print(f"Last revision: {doc.metadata.get('wikiRevisionTs', 'N/A')}") + print(f"Pageviews: {doc.metadata.get('pageviews', 'N/A')}") + print("-" * 60) + +``` + +## Async Usage + +Both retrievers support asynchronous operations for better performance: + +```python +import asyncio +from langchain_perigon import ( + ArticlesRetriever, + WikipediaRetriever, + ArticlesFilter, + WikipediaOptions, +) + + +async def search_both(): + # Initialize retrievers + # PERIGON_API_KEY is automatically read from environment variables + articles_retriever = ArticlesRetriever() + wiki_retriever = WikipediaRetriever() + + # Define options + articles_options: ArticlesFilter = { + "size": 3, + "filter": {"country": "us", "category": "tech"}, + } + + wiki_options: WikipediaOptions = {"size": 3, "pageviewsFrom": 50} + + # Perform async searches + articles_task = articles_retriever.ainvoke( + "climate change", options=articles_options + ) + wiki_task = wiki_retriever.ainvoke("climate change", options=wiki_options) + + # Wait for both to complete + articles, wiki_docs = await asyncio.gather(articles_task, wiki_task) + + return articles, wiki_docs + + +# Run async search +articles, wiki_docs = asyncio.run(search_both()) + +print(f"Found {len(articles)} news articles and {len(wiki_docs)} Wikipedia articles") +``` + +## Integration with LangChain + +### Combining Both Retrievers + +You can combine both retrievers for comprehensive search across news and encyclopedic content: + +```python +from langchain.retrievers import EnsembleRetriever + +from langchain_perigon import ArticlesRetriever, WikipediaRetriever + +# Create both retrievers +# PERIGON_API_KEY is automatically read from environment variables +news_retriever = ArticlesRetriever() +wiki_retriever = WikipediaRetriever() + +# Combine them with different weights +ensemble_retriever = EnsembleRetriever( + retrievers=[news_retriever, wiki_retriever], + weights=[0.6, 0.4], # Favor news articles slightly over Wikipedia +) + +# Use combined retriever +documents = ensemble_retriever.get_relevant_documents("artificial intelligence ethics") + +print(f"Found {len(documents)} combined results") +for i, doc in enumerate(documents[:5]): + source_type = "News" if "url" in doc.metadata else "Wikipedia" + print(f"{i+1}. [{source_type}] {doc.metadata.get('title', 'N/A')}") + print(f" Content: {doc.page_content[:100]}...") + print() +``` + +## API Reference + +For detailed documentation of all Perigon API features and configurations, visit the [Perigon API documentation](https://dev.perigon.io/docs). From d2ea6857c198bae8051701fb06c7e6f60d6f3c05 Mon Sep 17 00:00:00 2001 From: Islem Maboud Date: Tue, 23 Sep 2025 14:35:27 +0100 Subject: [PATCH 2/4] feat: add proper error handling to the code snippets --- .../python/integrations/providers/perigon.mdx | 37 +++- .../integrations/retrievers/perigon.mdx | 209 ++++++++++++------ 2 files changed, 172 insertions(+), 74 deletions(-) diff --git a/src/oss/python/integrations/providers/perigon.mdx b/src/oss/python/integrations/providers/perigon.mdx index 4186e847d..ea5949b36 100644 --- a/src/oss/python/integrations/providers/perigon.mdx +++ b/src/oss/python/integrations/providers/perigon.mdx @@ -29,18 +29,28 @@ See a [full usage example](/oss/integrations/retrievers/perigon#using-articlesre # Make sure PERIGON_API_KEY environment variable is set to your Perigon API key from langchain_perigon import ArticlesRetriever, ArticlesFilter +# Create retriever with specific number of results retriever = ArticlesRetriever(k=12) + +# Configure filter options to exclude reprints and focus on US articles options: ArticlesFilter = { - "showReprints": False, - "filter": {"country": "us"}, + "showReprints": False, # Exclude duplicate/reprint articles + "filter": {"country": "us"}, # Only US-based news } -documents = retriever.invoke("Recent big tech layoffs", options=options) -if documents: - print(f"First document: {documents[0].page_content[:200]}...") +try: + documents = retriever.invoke("Recent big tech layoffs", options=options) + + # Check if we got results before accessing + if documents: + print(f"First document: {documents[0].page_content[:200]}...") + else: + print("No articles found for the given query.") +except Exception as e: + print(f"Error retrieving articles: {e}") ``` -You can use the `ArticlesRetriever` in a standard retrieval pipeline. You can import it as follows. +You can use the `ArticlesRetriever` in a standard retrieval pipeline: ### WikipediaRetriever @@ -52,10 +62,19 @@ See a [full usage example](/oss/integrations/retrievers/perigon#using-wikipediar # Make sure PERIGON_API_KEY environment variable is set to your Perigon API key from langchain_perigon import WikipediaRetriever +# Create retriever with specific number of results retriever = WikipediaRetriever(k=12) -documents = retriever.invoke("machine learning") -print(f"First document: {documents[0].page_content[:200]}...") +try: + documents = retriever.invoke("machine learning") + + # Safely access results with error handling + if documents: + print(f"First document: {documents[0].page_content[:200]}...") + else: + print("No Wikipedia articles found for the given query.") +except Exception as e: + print(f"Error retrieving Wikipedia articles: {e}") ``` -You can use the `WikipediaRetriever` in a standard retrieval pipeline. You can import it as follows. +You can use the `WikipediaRetriever` in a standard retrieval pipeline: diff --git a/src/oss/python/integrations/retrievers/perigon.mdx b/src/oss/python/integrations/retrievers/perigon.mdx index 9f5ed3e2f..b2e094714 100644 --- a/src/oss/python/integrations/retrievers/perigon.mdx +++ b/src/oss/python/integrations/retrievers/perigon.mdx @@ -23,7 +23,7 @@ Install the LangChain Perigon integration package: ### Credentials -You'll need a Perigon API key to use this integration. You can sign up at [Perigon.io](https://perigon.io/) to get your API key. +You'll need a Perigon API key to use this integration. Sign up at [Perigon.io](https://perigon.io/) for your API key. ```python import getpass @@ -43,20 +43,29 @@ The ArticlesRetriever allows you to search through news articles using semantic from langchain_perigon import ArticlesRetriever # Create a new instance of the ArticlesRetriever -# PERIGON_API_KEY is set in the environment variables +# PERIGON_API_KEY is automatically read from environment variables retriever = ArticlesRetriever() -# Search for articles and save the results -documents = retriever.invoke("artificial intelligence developments") - -# Print the results -print(f"Found {len(documents)} articles") -for doc in documents[:3]: # Print first 3 results - print(f"Title: {doc.metadata.get('title', 'N/A')}") - print(f"URL: {doc.metadata.get('url', 'N/A')}") - print(f"Published: {doc.metadata.get('publishedAt', 'N/A')}") - print(f"Content: {doc.page_content[:200]}...") - print("-" * 80) +try: + # Search for articles using semantic search + documents = retriever.invoke("artificial intelligence developments") + + # Check if we got results + if not documents: + print("No articles found for the given query.") + else: + print(f"Found {len(documents)} articles") + + # Display first 3 results with metadata + for doc in documents[:3]: + # Safely extract metadata with fallbacks + print(f"Title: {doc.metadata.get('title', 'N/A')}") + print(f"URL: {doc.metadata.get('url', 'N/A')}") + print(f"Published: {doc.metadata.get('publishedAt', 'N/A')}") + print(f"Content: {doc.page_content[:200]}...") + print("-" * 80) +except Exception as e: + print(f"Error retrieving articles: {e}") ``` ### Advanced Features with Filtering @@ -83,16 +92,25 @@ options: ArticlesFilter = { } } -# Search with filters -documents = retriever.invoke("machine learning breakthroughs", options=options) - -print(f"Found {len(documents)} filtered articles") -for doc in documents[:3]: - print(f"Title: {doc.metadata.get('title', 'N/A')}") - print(f"Source: {doc.metadata.get('source', 'N/A')}") - print(f"Category: {doc.metadata.get('category', 'N/A')}") - print(f"Content: {doc.page_content[:150]}...") - print("-" * 80) +try: + # Search with advanced filters applied + documents = retriever.invoke("machine learning breakthroughs", options=options) + + if not documents: + print("No articles found matching the filter criteria.") + else: + print(f"Found {len(documents)} filtered articles") + + # Display results with relevant metadata + for doc in documents[:3]: + print(f"Title: {doc.metadata.get('title', 'N/A')}") + print(f"Source: {doc.metadata.get('source', 'N/A')}") + print(f"Category: {doc.metadata.get('category', 'N/A')}") + print(f"Content: {doc.page_content[:150]}...") + print("-" * 80) + +except Exception as e: + print(f"Error retrieving filtered articles: {e}") ``` ### Location-Based Filtering @@ -132,17 +150,26 @@ from langchain_perigon import WikipediaRetriever # PERIGON_API_KEY is automatically read from environment variables wiki_retriever = WikipediaRetriever() -# Search for Wikipedia articles -documents = wiki_retriever.invoke("quantum computing") - -# Print the results -print(f"Found {len(documents)} Wikipedia articles") -for doc in documents[:3]: - print(f"Title: {doc.metadata.get('title', 'N/A')}") - print(f"Pageviews: {doc.metadata.get('pageviews', 'N/A')}") - print(f"Wikidata ID: {doc.metadata.get('wikidataId', 'N/A')}") - print(f"Content: {doc.page_content[:200]}...") - print("-" * 80) +try: + # Search for Wikipedia articles using semantic search + documents = wiki_retriever.invoke("quantum computing") + + # Validate results before processing + if not documents: + print("No Wikipedia articles found for the given query.") + else: + print(f"Found {len(documents)} Wikipedia articles") + + # Display first 3 results with rich metadata + for doc in documents[:3]: + # Extract Wikipedia-specific metadata safely + print(f"Title: {doc.metadata.get('title', 'N/A')}") + print(f"Pageviews: {doc.metadata.get('pageviews', 'N/A')}") + print(f"Wikidata ID: {doc.metadata.get('wikidataId', 'N/A')}") + print(f"Content: {doc.page_content[:200]}...") + print("-" * 80) +except Exception as e: + print(f"Error retrieving Wikipedia articles: {e}") ``` ### Advanced Wikipedia Search @@ -220,35 +247,77 @@ from langchain_perigon import ( async def search_both(): - # Initialize retrievers - # PERIGON_API_KEY is automatically read from environment variables + """Perform concurrent searches across news articles and Wikipedia. + + Returns: + tuple: (news_articles, wikipedia_docs) - Results from both retrievers + + Raises: + Exception: If either retriever fails or API errors occur + """ + # Initialize retrievers with automatic API key detection articles_retriever = ArticlesRetriever() wiki_retriever = WikipediaRetriever() - # Define options + # Configure search options for targeted results articles_options: ArticlesFilter = { - "size": 3, - "filter": {"country": "us", "category": "tech"}, + "size": 3, # Limit to 3 articles for faster response + "filter": { + "country": "us", # US-based news sources + "category": "tech", # Technology category only + }, } - wiki_options: WikipediaOptions = {"size": 3, "pageviewsFrom": 50} - - # Perform async searches - articles_task = articles_retriever.ainvoke( - "climate change", options=articles_options - ) - wiki_task = wiki_retriever.ainvoke("climate change", options=wiki_options) - - # Wait for both to complete - articles, wiki_docs = await asyncio.gather(articles_task, wiki_task) - - return articles, wiki_docs - - -# Run async search -articles, wiki_docs = asyncio.run(search_both()) + # Filter Wikipedia results by popularity (pageviews) + wiki_options: WikipediaOptions = { + "size": 3, # Limit to 3 articles + "pageviewsFrom": 50 # Only articles with 50+ daily views + } -print(f"Found {len(articles)} news articles and {len(wiki_docs)} Wikipedia articles") + try: + # Perform concurrent async searches for better performance + articles_task = articles_retriever.ainvoke( + "climate change", options=articles_options + ) + wiki_task = wiki_retriever.ainvoke( + "climate change", options=wiki_options + ) + + # Wait for both searches to complete simultaneously + articles, wiki_docs = await asyncio.gather( + articles_task, wiki_task, return_exceptions=True + ) + + # Handle potential exceptions from either retriever + if isinstance(articles, Exception): + print(f"Articles retrieval failed: {articles}") + articles = [] + if isinstance(wiki_docs, Exception): + print(f"Wikipedia retrieval failed: {wiki_docs}") + wiki_docs = [] + + return articles, wiki_docs + + except Exception as e: + print(f"Error in concurrent search: {e}") + return [], [] + + +# Run async search with error handling +try: + articles, wiki_docs = asyncio.run(search_both()) + + # Display results summary + print(f"Found {len(articles)} news articles and {len(wiki_docs)} Wikipedia articles") + + # Show sample results if available + if articles: + print(f"Sample article: {articles[0].metadata.get('title', 'N/A')}") + if wiki_docs: + print(f"Sample Wikipedia: {wiki_docs[0].metadata.get('title', 'N/A')}") + +except Exception as e: + print(f"Async search failed: {e}") ``` ## Integration with LangChain @@ -273,15 +342,25 @@ ensemble_retriever = EnsembleRetriever( weights=[0.6, 0.4], # Favor news articles slightly over Wikipedia ) -# Use combined retriever -documents = ensemble_retriever.get_relevant_documents("artificial intelligence ethics") - -print(f"Found {len(documents)} combined results") -for i, doc in enumerate(documents[:5]): - source_type = "News" if "url" in doc.metadata else "Wikipedia" - print(f"{i+1}. [{source_type}] {doc.metadata.get('title', 'N/A')}") - print(f" Content: {doc.page_content[:100]}...") - print() +try: + # Use combined retriever for comprehensive search + documents = ensemble_retriever.get_relevant_documents("artificial intelligence ethics") + + if not documents: + print("No results found from either retriever.") + else: + print(f"Found {len(documents)} combined results") + + # Display top 5 results with source identification + for i, doc in enumerate(documents[:5]): + # Determine source type based on metadata + source_type = "News" if "url" in doc.metadata else "Wikipedia" + print(f"{i+1}. [{source_type}] {doc.metadata.get('title', 'N/A')}") + print(f" Content: {doc.page_content[:100]}...") + print() + +except Exception as e: + print(f"Error with ensemble retriever: {e}") ``` ## API Reference From a3100a54bc353ce35dad8d0ed9635680fc670ff9 Mon Sep 17 00:00:00 2001 From: Islem Maboud Date: Wed, 24 Sep 2025 09:23:10 +0100 Subject: [PATCH 3/4] fix: remove integration with langchain section --- .../integrations/retrievers/perigon.mdx | 43 ------------------- 1 file changed, 43 deletions(-) diff --git a/src/oss/python/integrations/retrievers/perigon.mdx b/src/oss/python/integrations/retrievers/perigon.mdx index b2e094714..6bb322ad2 100644 --- a/src/oss/python/integrations/retrievers/perigon.mdx +++ b/src/oss/python/integrations/retrievers/perigon.mdx @@ -320,49 +320,6 @@ except Exception as e: print(f"Async search failed: {e}") ``` -## Integration with LangChain - -### Combining Both Retrievers - -You can combine both retrievers for comprehensive search across news and encyclopedic content: - -```python -from langchain.retrievers import EnsembleRetriever - -from langchain_perigon import ArticlesRetriever, WikipediaRetriever - -# Create both retrievers -# PERIGON_API_KEY is automatically read from environment variables -news_retriever = ArticlesRetriever() -wiki_retriever = WikipediaRetriever() - -# Combine them with different weights -ensemble_retriever = EnsembleRetriever( - retrievers=[news_retriever, wiki_retriever], - weights=[0.6, 0.4], # Favor news articles slightly over Wikipedia -) - -try: - # Use combined retriever for comprehensive search - documents = ensemble_retriever.get_relevant_documents("artificial intelligence ethics") - - if not documents: - print("No results found from either retriever.") - else: - print(f"Found {len(documents)} combined results") - - # Display top 5 results with source identification - for i, doc in enumerate(documents[:5]): - # Determine source type based on metadata - source_type = "News" if "url" in doc.metadata else "Wikipedia" - print(f"{i+1}. [{source_type}] {doc.metadata.get('title', 'N/A')}") - print(f" Content: {doc.page_content[:100]}...") - print() - -except Exception as e: - print(f"Error with ensemble retriever: {e}") -``` - ## API Reference For detailed documentation of all Perigon API features and configurations, visit the [Perigon API documentation](https://dev.perigon.io/docs). From 774fc1249e260cb509c67b56d875f98b73f693ae Mon Sep 17 00:00:00 2001 From: Islem Maboud Date: Thu, 25 Sep 2025 15:05:59 +0100 Subject: [PATCH 4/4] fix: reorder perigon card in alphabetical order --- .../integrations/providers/all_providers.mdx | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/oss/python/integrations/providers/all_providers.mdx b/src/oss/python/integrations/providers/all_providers.mdx index f294d509f..2ae2092a4 100644 --- a/src/oss/python/integrations/providers/all_providers.mdx +++ b/src/oss/python/integrations/providers/all_providers.mdx @@ -2067,6 +2067,14 @@ title: "All providers" cta="View provider guide" /> + + - -