# Web Searching Tools

In [128]:
import os
from dotenv import load_dotenv
load_dotenv(override=True)

True

In my case, I mostly work with langchain, so I will use the langchain library to load the data.

Also, output from langchain wrapper is typically clean and LLM-ready


## Firecrawl

You will need to get your own API key. See https://firecrawl.dev

In [107]:
from langchain_community.document_loaders.firecrawl import FireCrawlLoader

In [108]:
loader = FireCrawlLoader(url = "https://www.salesfactory.com/",
                         mode = "scrape")

### Mode


 - scrape: Scrape single url and return the markdown.
 - crawl: Crawl the url and all accessible sub pages and return the markdown for each one.
 - map: Maps the URL and returns a list of semantically related pages.

In [109]:
data = loader.load()

In [110]:
# get page content
print("Page content:")
print(data[0].page_content)

Page content:
 

* * *

Retail marketing solutions built for an unpredictable world.

Sales Factory is a data-driven marketing agency that helps brands win by focusing on one thing: selling in and selling through at retail.

* * *

Sell In. Sell Through.  
Win at Retail.
---------------------------------------

You can't outperform your competitors at retail unless you truly understand how products sell in and sell through. That’s where we stand above the competition – yours and ours. Since 1984, our team has been helping brands get products onto the shelves at brick-and-mortar retail, outsell through e-commerce and optimize products in online stores.

Being brave is easier when you know you’re right. Our insights team continuously monitors consumer behavior including in-depth analysis of Trades Professionals. We walk thousands of miles of store aisles annually and manage hundreds of thousands of research respondents with a data driven approach that's helped our clients thrive through 

In [111]:
print("Metadata:")
print(data[0].metadata)



Metadata:
{'title': 'Sales Factory | Growing Home Retail BrandsFollow us on LinkedInFollow us on YouTubeFollow us on FacebookFollow us on InstagramFollow Us On LinkedInFollow us on YouTubeFollow us on FacebookFollow us on Instagram', 'description': 'You need an agency that’s found a better way to connect insights, strategy and creative. From initial business strategy to marketing plan execution, we convert our experience into a win for you.', 'language': 'en', 'ogTitle': 'Sales Factory | Growing Home Retail Brands', 'ogDescription': 'You need an agency that’s found a better way to connect insights, strategy and creative. From initial business strategy to marketing plan execution, we convert our experience into a win for you.', 'ogUrl': 'https://www.salesfactory.com', 'ogImage': 'https://www.salesfactory.com/hubfs/21-SFA-0119-Scoreboard-Image-V2-1.jpg', 'ogLocaleAlternate': [], 'viewport': 'width=device-width, initial-scale=1', 'og:description': 'You need an agency that’s found a better

## DuckDuckGo Search (Free)


Good to search for general information, but not so real time

In [112]:
from langchain_community.tools import DuckDuckGoSearchRun

In [113]:
# get basic search information

search = DuckDuckGoSearchRun()

search.run("What is the current weather in Dallas Texas?")


'Dallas Weather Forecasts. Weather Underground provides local & long-range weather forecasts, weatherreports, maps & tropical weather conditions for the Dallas area. Dallas/Fort Worth, TX. Fall-Like Temperatures, Rain and Light Snow From the Upper Midwest to the Northeast; Elevated Fire Weather Conditions in the Southern Plains. ... Current conditions at Dallas/Fort Worth International Airport (KDFW) Lat: 32.9°NLon: 97.02°WElev: 541ft. Dallas TX 32.8°N 96.78°W (Elev. 505 ft) Last Update: 3:01 pm CDT Oct 9, 2024. Forecast Valid: 5pm CDT Oct 9, 2024-6pm CDT Oct 16, 2024 . ... Severe Weather ; Current Outlook Maps ; Drought ; Fire Weather ; Fronts/Precipitation Maps ; Current Graphical Forecast Maps ; Rivers ; Marine ; Offshore and High Seas; Hurricanes ; FOX 4 Weather. Dallas Weather: October 11 morning forecast. Hot temperatures for the next couple of days... but, it will start to feel like Fall next week. Plus, updates on new and old hurricanes ... NWS Fort Worth/Dallas. A few location

In [114]:
# to get additional information

from langchain_community.tools import DuckDuckGoSearchResults

search = DuckDuckGoSearchResults()

search.invoke('Why did Biden drop out of the 2024 presidential race?')


"snippet: U.S. Rep. Adam Schiff, D-Calif., called on President Joe Biden to drop out of the 2024 election last Wednesday, becoming the most prominent Democratic lawmaker so far to publicly push for a ..., title: Biden drops out: A timeline of major moments that led to the decision, link: https://www.usatoday.com/story/news/politics/elections/2024/07/22/biden-drops-out-2024-race-timeline/74495115007/, snippet: The cover at left was published on July 21, the day President Joe Biden dropped out of the presidential race; the one at right was published digitally June 28, the day after the first presidential ..., title: Why Joe Biden Dropped Out | TIME, link: https://time.com/7001028/why-joe-biden-stepped-down/, snippet: Thirty-seven congressional Democrats, including independent Sen. Joe Manchin, who previously was in the Democratic Party, had called on Biden to drop out of the 2024 election before he delivered ..., title: President Joe Biden drops out of 2024 presidential race - NBC News, 

In [115]:
# get news information

search = DuckDuckGoSearchResults(backend = "news") # much more up to date 

search.invoke("What is the current weather in Dallas Texas?")


"snippet: Hurricanes don't stop at the coast: Why inland mountain towns are on alert, title: Lake Dallas, TX Current Weather, link: https://www.theweathernetwork.com/en/city/us/texas/lake-dallas/current, date: 2023-05-05T16:43:00+00:00, source: The Weather Network, snippet: DALLAS — TONIGHT: Mainly clear and pleasant ... of local radars near you as well as the latest forecast, cameras and current conditions., title: Much warmer Tuesday afternoon, much cooler Wednesday, link: https://www.wfaa.com/article/weather/forecast/weather-forecast-dallas-fort-worth/287-24264997, date: 2024-10-15T03:29:00+00:00, source: WFAA8, snippet: DALLAS — Officially ... The hottest temps of this current wave of heat will be Friday through Saturday. Highs will be well into the 90s, with heat index values of 100° or higher for most ..., title: DFW Weather: When will it cool down? Not this week, link: https://www.wfaa.com/article/weather/dallas-texas-dfw-weather-forecast-heat-temperatures/287-cde55a0c-6638-4f1b

You can also directly pass a custom DuckDuckGoSearchAPIWrapper to DuckDuckGoSearchResults to provide more control over the search results. This seems like the best way to control search results from DuckDuckGo

In [116]:
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper

wrapper = DuckDuckGoSearchAPIWrapper(region="us-en", time="d", max_results=2) # time limit to "d, w, m , y. Default to none"

search = DuckDuckGoSearchResults(api_wrapper=wrapper, source="news")

search.invoke("iphone 16")

"snippet: That's the premise behind the iPhone 16 Camera Control button, but as a confident photographer, it doesn't really do that for me. What Camera Control lacks is the ability to customize it., title: I tried the iPhone 16 Camera Control button and I hope Android phones ..., link: https://www.tomsguide.com/phones/iphones/i-tried-the-iphone-16-camera-control-button-and-i-hope-android-phones-dont-copy-it-heres-why, snippet: The iPhone 16 Pro comes with a 6.3-inch LTPO Super Retina XDR OLED display. It supports HDR10, Dolby Vision, and a 120Hz refresh rate, with brightness levels reaching 2000 nits for outdoor ..., title: iPhone 16 Pro vs. Xiaomi 14 Pro: Which is Better and Why?, link: https://www.gizmochina.com/2024/10/17/iphone-16-pro-vs-xiaomi-14-pro-which-is-better-and-why/, snippet: Unter der Haube steckt eine neue Generation der Apple-eigenen Prozessoren: der A18 (im iPhone 16 und iPhone 16 Plus) und der A18 Pro in den Pro-Modellen. Unser Test zeigt, wie die neue tech\xadnische

## Wikipedia

In [117]:
from langchain_community.retrievers import WikipediaRetriever

retriever = WikipediaRetriever(doc_content_chars_max=2000, lang = "en", top_k_results=2, load_all_available_meta=False)

In [118]:
data = retriever.invoke("iphone 16")

In [119]:
# metadata
print(data[0].metadata)


{'title': 'IPhone 16', 'summary': 'The iPhone 16  and iPhone 16 Plus are smartphones developed and marketed by Apple Inc. They are the eighteenth-generation iPhones, succeeding the iPhone 15 and iPhone 15 Plus. The devices were unveiled alongside the higher-priced iPhone 16 Pro and 16 Pro Max during the September 9, 2024 Apple Event at Apple Park in Cupertino, California.', 'source': 'https://en.wikipedia.org/wiki/IPhone_16'}


In [120]:
# content
print(data[0].page_content)


The iPhone 16  and iPhone 16 Plus are smartphones developed and marketed by Apple Inc. They are the eighteenth-generation iPhones, succeeding the iPhone 15 and iPhone 15 Plus. The devices were unveiled alongside the higher-priced iPhone 16 Pro and 16 Pro Max during the September 9, 2024 Apple Event at Apple Park in Cupertino, California.


== History ==
The devices were unveiled during an event on September 9, 2024, marking the first time an iPhone release had been announced on a Monday.


== Design ==
Like the iPhone 15, the device features rounded edges, a slightly curved display, and back glass.


=== Display ===
The iPhone 16 and iPhone 16 Plus retain their screen sizes of 6.1 inches and 6.7 inches, respectively. They feature a bezel-less, full-edge screen design with no visible borders.


=== Camera ===

A more refined quad-lens array on the back, possibly with 48 MP ultra-wide camera
Improved computational photography for real-time lighting adjustments, higher resolution videos, 

## Google Serper

In [121]:
from langchain_core.documents import Document

In [122]:
from langchain_community.utilities import GoogleSerperAPIWrapper

google_serper = GoogleSerperAPIWrapper(k = 4, type="search") # param type: Literal['news', 'search', 'places', 'images'] = 'search'


In [123]:
google_result =google_serper.results('Where is China?')


In [127]:
# reformat the result
def ggsearch_reformat(result):
    """
    Reformats Google search results into a list of Document objects.

    Args:
        result (dict): The raw search result from Google Serper API.

    Returns:
        list: A list of Document objects containing formatted search results.

    This function processes both Knowledge Graph and organic search results.
    If an error occurs or no results are found, it returns a Document with an error message.
    """
    documents = []
    
    try:
        # Process Knowledge Graph results if present
        if 'knowledgeGraph' in result:
            kg = result['knowledgeGraph']
            doc = Document(
                page_content=kg.get('description', ''),
                metadata={
                    'source': kg.get('descriptionLink', ''),
                    'title': kg.get('title', ''),
                }
            )
            documents.append(doc)
        
        # Process organic search results
        if 'organic' in result:
            for item in result['organic']:
                doc = Document(
                    page_content=item.get('snippet', ''),
                    metadata={
                        'source': item.get('link', ''),
                        'title': item.get('title', ''),
                    }
                )
                documents.append(doc)
        
        # Raise an error if no results were found
        if not documents:
            raise ValueError("No search results found")
        
    except Exception as e:
        # Handle any exceptions and return an error Document
        print(f"An error occurred: {str(e)}")
        documents.append(Document(
            page_content="No search results found or an error occurred.",
            metadata={'source': 'Error', 'title': 'Search Error'}
        ))
    return documents



In [126]:

ggsearch_reformat(google_result)

[Document(metadata={'source': 'https://en.wikipedia.org/wiki/China', 'title': 'China - Wikipedia'}, page_content="China, officially the People's Republic of China (PRC), is a country in East Asia. With a population exceeding 1.4 billion, it is the second-most populous ..."),
 Document(metadata={'source': 'https://www.britannica.com/place/China', 'title': 'China | Events, People, Dates, Flag, Map, & Facts | Britannica'}, page_content='China is a country of East Asia. It is the largest of all Asian countries and has one of the largest populations of any country in the world.'),
 Document(metadata={'source': 'https://kids.nationalgeographic.com/geography/countries/article/china', 'title': 'China Country Profile - National Geographic Kids'}, page_content='Stretching 3100 miles (5000 kilometers) from east to west and 3400 miles (5500 kilometers) from north to south, China is a large country with widely varying ...'),
 Document(metadata={'source': 'https://www.cia.gov/the-world-factbook/coun

# Reader API by Jina AI

This tool makes it easy to scrape information from the web.


In [129]:
import requests

def scrape_jina_ai(url: str) -> str:
  response = requests.get("https://r.jina.ai/" + url)
  return response.text

In [131]:
content = scrape_jina_ai("https://www.salesfactory.com/")
print(content)


Title: Sales Factory | Growing Home Retail Brands

URL Source: https://www.salesfactory.com/

Markdown Content:
* * *

Retail marketing solutions built for an unpredictable world.
------------------------------------------------------------

Sales Factory is a data-driven marketing agency that helps brands win by focusing on one thing: selling in and selling through at retail.

* * *

Sell In. Sell Through.  
Win at Retail.
---------------------------------------

You can't outperform your competitors at retail unless you truly understand how products sell in and sell through. That’s where we stand above the competition – yours and ours. Since 1984, our team has been helping brands get products onto the shelves at brick-and-mortar retail, outsell through e-commerce and optimize products in online stores.

Being brave is easier when you know you’re right. Our insights team continuously monitors consumer behavior including in-depth analysis of Trades Professionals. We walk thousands of m

didn't extract as much content as firecrawl, but the format is just as great, but it's free


# Other API sources

Here are a list of other API sources that could be used to extract information

https://guides.lib.berkeley.edu/c.php?g=4395&p=7995952 
