## Libraries learnt - 
 - os
- requests
- dotenv
- bs4
- IPython.Markdown
- openai

Summarization - This can be applied to any business vertical - summarizing the news, summarizing financial performance, summarizing a resume in a cover letter

In [1]:
# imports

import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

# If you get an error running this cell, then please head over to the troubleshooting notebook!

## Connect to OpenAI

In [3]:
# Load environment variables in a file called .env

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")


API key found and looks good so far!


In [4]:
openai = OpenAI()


In [5]:
# A class to represent a Webpage
# If you're not familiar with Classes, check out the "Intermediate Python" notebook

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [10]:
# Let's try one out. Change the website and add print statements to follow along.

sp = Website("https://linkedin.com/in/schubhm")
print(sp.title)
print(sp.text)

Shubham Panday - Audacy, Inc. | LinkedIn
Skip to main content
LinkedIn
Articles
People
Learning
Jobs
Games
Get the app
Join now
Sign in
Shubham Panday
Welcome back
Email or phone
Password
Show
Forgot password?
Sign in
or
By clicking Continue to join or sign in, you agree to LinkedIn’s
User Agreement
,
Privacy Policy
, and
Cookie Policy
.
New to LinkedIn?
Join now
United States
Contact Info
Welcome back
Email or phone
Password
Show
Forgot password?
Sign in
or
By clicking Continue to join or sign in, you agree to LinkedIn’s
User Agreement
,
Privacy Policy
, and
Cookie Policy
.
New to LinkedIn?
Join now
1K followers
500+ connections
See your mutual connections
Welcome back
Email or phone
Password
Show
Forgot password?
Sign in
or
By clicking Continue to join or sign in, you agree to LinkedIn’s
User Agreement
,
Privacy Policy
, and
Cookie Policy
.
New to LinkedIn?
Join now
Join to view profile
Message
Welcome back
Email or phone
Password
Show
Forgot password?
Sign in
or
By clicking Continue

## Types of prompts

You may know this already - but if not, you will get very familiar with it!

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [7]:
# Define our system prompt - you can experiment with this later, changing the last sentence to 'Respond in markdown in Spanish."

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [8]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [11]:
print(user_prompt_for(sp))

You are looking at a website titled Shubham Panday - Audacy, Inc. | LinkedIn
The contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.

Skip to main content
LinkedIn
Articles
People
Learning
Jobs
Games
Get the app
Join now
Sign in
Shubham Panday
Welcome back
Email or phone
Password
Show
Forgot password?
Sign in
or
By clicking Continue to join or sign in, you agree to LinkedIn’s
User Agreement
,
Privacy Policy
, and
Cookie Policy
.
New to LinkedIn?
Join now
United States
Contact Info
Welcome back
Email or phone
Password
Show
Forgot password?
Sign in
or
By clicking Continue to join or sign in, you agree to LinkedIn’s
User Agreement
,
Privacy Policy
, and
Cookie Policy
.
New to LinkedIn?
Join now
1K followers
500+ connections
See your mutual connections
Welcome back
Email or phone
Password
Show
Forgot password?
Sign in
or
By clicking Continue to join or sign in, you agree to Linke

In [12]:
messages = [
    {"role": "system", "content": "You are a snarky assistant"},
    {"role": "user", "content": "What is 2 + 2?"}
]

In [13]:
# To give you a preview -- calling OpenAI with system and user messages:

response = openai.chat.completions.create(model="gpt-4o-mini", messages=messages)
print(response.choices[0].message.content)

Oh, look at you, tackling the big math problems! The answer is 4. You get a gold star for that one! 🎉


In [14]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [19]:
# Try this out, and then try for a few more websites
messages_for(sp)

[{'role': 'system',
  'content': 'You are an assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown.'},
 {'role': 'user',
  'content': "You are looking at a website titled Shubham Panday - Audacy, Inc. | LinkedIn\nThe contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.\n\nSkip to main content\nLinkedIn\nArticles\nPeople\nLearning\nJobs\nGames\nGet the app\nJoin now\nSign in\nShubham Panday\nWelcome back\nEmail or phone\nPassword\nShow\nForgot password?\nSign in\nor\nBy clicking Continue to join or sign in, you agree to LinkedIn’s\nUser Agreement\n,\nPrivacy Policy\n, and\nCookie Policy\n.\nNew to LinkedIn?\nJoin now\nUnited States\nContact Info\nWelcome back\nEmail or phone\nPassword\nShow\nForgot password?\nSign in\nor\nBy clicking Continue to join or sign in, you agree to LinkedIn’s

In [20]:
# And now: call the OpenAI API. You will get very familiar with this!

def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "gpt-4o-mini",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [21]:
summarize("https://WWW.LINKEDIN.com/in/schubhm")

"# Summary of Shubham Panday's LinkedIn Profile\n\n## Professional Overview\nShubham Panday is a Solutions Architect with over 14 years of experience in system design and implementation, specifically in the realm of web services and technology integrations. He is currently associated with Audacy, Inc. and has a diverse skill set that includes expertise in Oracle EBS, blockchain technologies, and cloud computing.\n\n## Experience & Education\n- Holds various certifications in technologies such as AWS, Oracle, and Snowflake.\n- Recognized with awards for exceptional client service and team performance during his tenure at EY and other roles.\n\n## Skill Highlights\n- Extensive experience in Oracle eBusiness Suite and technical analysis.\n- Proficient in developing interfaces and customizations for different modules.\n- Strong communication skills that facilitate collaboration between technical and functional teams.\n\n## Recommendations\nShubham has received positive testimonials from co

In [22]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))
    

In [23]:
display_summary("https://WWW.LINKEDIN.com/in/schubhm")

# Summary of Shubham Panday's LinkedIn Profile

**Profile Overview:**
- **Name:** Shubham Panday
- **Current Position:** Solutions Architect at Audacy, Inc.
- **Experience:** Over 14 years in designing technical solutions.
- **Connections:** 500+ connections and 1K followers.

**Certifications:**
Shubham holds several certifications including:
- Snowflake Fundamentals Training (T3)
- Blockchain and AI certifications from EY
- AWS Certified Cloud Practitioner
- MuleSoft Certified Developer
- Oracle SOA Suite Certified Implementation Specialist

**Awards and Recognitions:**
- Exceptional Client Service Award from EY
- Limca Book of Records for distributing maximum stationery in a day
- Bravo Award from Infosys for outstanding performance

**Recommendations:**
Shubham has received positive feedback from colleagues highlighting his technical skills in Oracle EBS and SOA, as well as his ability to communicate effectively within teams and take ownership of projects.

This profile provides insight into Shubham’s expertise, considerable professional experience, and his accomplishments in the tech industry.

In [25]:
display_summary("https://www.google.com")

# Summary of Google Website

The Google website serves primarily as a search platform, offering various services and tools to users. The key features include:

- **Search Functionality**: A robust search engine that provides users with the ability to find information efficiently.
- **Gmail and Images**: Integration points for additional Google services such as email and image searches.
- **Privacy and Settings**: A focus on user privacy with options to manage settings, data, and search history.

The site also highlights:

- **Advertising and Business Solutions**: Insights into how Google supports businesses through advertising.
- **AI Applications**: An emphasis on how Google leverages artificial intelligence for scientific and environmental advancements.

Overall, Google's website is structured to facilitate user interaction with its search capabilities and various services while offering information about user privacy and business tools. There are no specific news or announcements detailed in the provided content.