# NewsGPT Cookbook with LangChain

1) First load news from various news sites

In [1]:
from langchain.document_loaders import SeleniumURLLoader
loader = SeleniumURLLoader(urls=["https://www.wsj.com", "https://www.nytimes.com/", "https://www.apnews.com/", "https://www.reuters.com/"])
docs = loader.load()

In [2]:
# print(docs)
print(len(docs))
print(type(docs[0]))

4
<class 'langchain.schema.Document'>


Now use OpenAI LLM Model. To get OpenAI API key via Azure, follow [this link](https://learn.microsoft.com/en-gb/azure/cognitive-services/openai/quickstart?tabs=command-line&pivots=programming-language-python)

In [20]:
import os
os.environ["OPENAI_API_KEY"] = "9830af98fab64693834ee6eefb0801a4"
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_BASE"] = "https://newsgpt.openai.azure.com/"
os.environ["OPENAI_API_VERSION"] = "2022-12-01"

In [25]:
from langchain.llms import AzureOpenAI

# llm = AzureOpenAI(model_name="text-davinci-003")
llm = AzureOpenAI(
    deployment_name="text-davinci-003", model_name="text-davinci-003")


USING API_BASE: 
https://newsgpt.openai.azure.com/


In [26]:
from langchain import PromptTemplate

template = """
I need your expertise as a marketing consultant for a new product launch.

Here are some examples of successful product names:

wearable fitness tracker, Fitbit
premium headphones, Beats
ride-sharing app, Uber
The name should be unique, memorable and relevant to the product.

What is a good name for a {product_type} that offers {benefit}?
"""

prompt = PromptTemplate(
input_variables=["product_type", "benefit"],
template=template,
)

print(llm(
    prompt.format(
        product_type="pair of sunglasses",
        benefit = 'high altitude protection'
    )
))

InvalidRequestError: Resource not found

The above URL loader will convert the scrapped news to langchain Document format. If we wish to load our own data, we can use the following code:

In [5]:
# use template data
data = "Google’s employees were shocked when they learned in March that the South Korean consumer electronics giant Samsung was considering replacing Google with Microsoft’s Bing as the default search engine on its devices. For years, Bing had been a search engine also-ran. But it became a lot more interesting to industry insiders when it recently added new artificial intelligence technology. \
    Google’s reaction to the Samsung threat was “panic,” according to internal messages reviewed by The New York Times. An estimated $3 billion in annual revenue was at stake with the Samsung contract. An additional $20 billion is tied to a similar Apple contract that will be up for renewal this year. \
    A.I. competitors like the new Bing are quickly becoming the most serious threat to Google’s search business in 25 years, and in response, Google is racing to build an all-new search engine powered by the technology. It is also upgrading the existing one with A.I. features, according to internal documents reviewed by The Times. \
    The new features, under the project name Magi, are being created by designers, engineers and executives working in so-called sprint rooms to tweak and test the latest versions. The new search engine would offer users a far more personalized experience than the company’s current service, attempting to anticipate users’ needs. \
    Lara Levin, a Google spokeswoman, said in a statement that “not every brainstorm deck or product idea leads to a launch, but as we’ve said before, we’re excited about bringing new A.I.-powered features to search, and will share more details soon.” \
    Billions of people use Google’s search engine every day for everything from finding restaurants and directions to understanding a medical diagnosis, and that simple white page with the company logo and an empty bar in the middle is one of the most widely used web pages in the world. Changes to it would have a significant impact on the lives of ordinary people, and until recently it was hard to imagine anything challenging it."

from langchain.text_splitter import CharacterTextSplitter

text_splitter = CharacterTextSplitter()
texts = text_splitter.split_text(data)

from langchain.docstore.document import Document

docs = [Document(page_content=t) for t in texts[:3]]

In [10]:
# Refer to: https://python.langchain.com/en/latest/use_cases/summarization.html

from langchain.chains.summarize import load_summarize_chain
from langchain import PromptTemplate

map_prompt_template = """Write a summary of today's news headlines from this site:

    {text}

    A SUMMARY WITH 10 TO 20 SENTENCES:"""

reduce_prompt_template = """Write a summary of today's news headlines from multiple news sources:

    {text}

    A SUMMARY WITH 5 TO 10 SENTENCES:"""

MAP_PROMPT = PromptTemplate(template=map_prompt_template, input_variables=["text"])
REDUCE_PROMPT = PromptTemplate(template=map_prompt_template, input_variables=["text"])

chain = load_summarize_chain(llm, chain_type="map_reduce", map_prompt=MAP_PROMPT, combine_prompt=REDUCE_PROMPT)
summary = chain.run(docs)

InvalidRequestError: Resource not found

In [13]:
print(summary)


Today's news headlines cover a variety of topics from around the world. In the US, the House voted to restore solar panel tariffs paused by President Biden, and Florida Governor Ron DeSantis was allowed to run for president while in office. Republicans have been pushing for restrictions on transgender medical treatments across the US, with North Dakota passing a range of restrictions on transgender people, including a ban on transition care for minors. In Michigan, a man was reported for continuing a date after committing a fatal shooting over $40. Internationally, 23 people were killed in a Russian missile and drone attack in Ukraine, while in China, foreign companies were facing increased scrutiny and pressure. Queen Elizabeth's coronation in 1953 was said to have lifted Britain's post-war gloom. In the entertainment world, Jazz Fest kicked off in New Orleans, the new Marvel film "Guardians 3" was reviewed, and Steve Austin's new show "Stone Cold Takes on America" was discussed. Las