In [1]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [2]:
load_dotenv()
api_key = os.getenv('OPENAI_API_KEY')
# print(api_key)

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [3]:
class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [4]:
ed = Website("https://edwarddonner.com")
ed.links
# print(ed.links)

['https://edwarddonner.com/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'https://edwarddonner.com/2024/11/13/llm-engineering-resources/',
 'https://edwarddonner.com/2024/10/16/from-software-engineer-to-ai-data-scientist-resources/',
 'https://edwarddonner.com/2024/10/16/from-software-engineer-to-ai-data-scientist-resources/',
 'https://edwarddonner.com/2024/08/06/outsmart/',
 'https://edwarddonner.com/2024/08/06/outsmart/',
 'https://edwarddonner.com/2024/06/26/choosing-the-right-llm-resources/

In [5]:
# ed.get_contents()
print(ed.get_contents())

Webpage Title:
Home - Edward Donner
Webpage Contents:
Home
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers and tons of pr

In [6]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [7]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [8]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [9]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2024/11/13/llm-engineering-resources/
https://edwarddonner.com/2024/11/13/llm-engineering-resources/
https://edwarddonner.com/2024/10/16/from-software-engineer-to-ai-data-scientist-resources/
https://edwarddonner

In [10]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [11]:
anthropic = Website("https://anthropic.com")
anthropic.links

['/',
 '/claude',
 '/team',
 '/enterprise',
 '/api',
 '/pricing',
 '/research',
 '/company',
 '/careers',
 '/news',
 'https://www.anthropic.com/research#entry:8@1:url',
 'https://www.anthropic.com/claude',
 'https://claude.ai/',
 '/api',
 '/news/3-5-models-and-computer-use',
 '/claude/sonnet',
 '/claude/haiku',
 '/news/claude-for-enterprise',
 '/research/constitutional-ai-harmlessness-from-ai-feedback',
 '/news/core-views-on-ai-safety',
 '/jobs',
 '/',
 '/claude',
 '/api',
 '/team',
 '/pricing',
 '/research',
 '/company',
 '/customers',
 '/news',
 '/careers',
 'mailto:press@anthropic.com',
 'https://support.anthropic.com/',
 'https://status.anthropic.com/',
 '/supported-countries',
 'https://twitter.com/AnthropicAI',
 'https://www.linkedin.com/company/anthropicresearch',
 'https://www.youtube.com/@anthropic-ai',
 '/legal/consumer-terms',
 '/legal/commercial-terms',
 '/legal/privacy',
 '/legal/aup',
 '/responsible-disclosure-policy',
 'https://trust.anthropic.com/']

In [12]:
get_links("https://anthropic.com")

{'links': [{'type': 'about page', 'url': 'https://anthropic.com/company'},
  {'type': 'careers page', 'url': 'https://anthropic.com/careers'},
  {'type': 'team page', 'url': 'https://anthropic.com/team'},
  {'type': 'research page', 'url': 'https://anthropic.com/research'},
  {'type': 'news page', 'url': 'https://anthropic.com/news'}]}

In [13]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [14]:
# print(get_all_details("https://anthropic.com"))

In [15]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

# system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
# and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
# Include details of company culture, customers and careers/jobs if you have the information."


In [16]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:20_000] # Truncate if more than 20,000 characters
    return user_prompt

In [17]:
# get_brochure_user_prompt("Anthropic", "https://anthropic.com")

In [18]:
def create_brochure(company_name, url):
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response.choices[0].message.content
    display(Markdown(result))

In [19]:
# create_brochure("Anthropic", "https://anthropic.com")

In [20]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )

    # for chunk in stream:
    #     print(chunk.choices[0].delta.content or '', end='')
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [21]:
stream_brochure("Anthropic", "https://anthropic.com")

Found links: {'links': [{'type': 'about page', 'url': 'https://anthropic.com/company'}, {'type': 'careers page', 'url': 'https://anthropic.com/careers'}, {'type': 'team page', 'url': 'https://anthropic.com/team'}, {'type': 'enterprise page', 'url': 'https://anthropic.com/enterprise'}, {'type': 'api page', 'url': 'https://anthropic.com/api'}, {'type': 'pricing page', 'url': 'https://anthropic.com/pricing'}, {'type': 'research page', 'url': 'https://anthropic.com/research'}, {'type': 'news page', 'url': 'https://anthropic.com/news'}]}


# Welcome to Anthropic

**Your Partner in Safe and Reliable AI**

---

## About Us

At **Anthropic**, we are committed to advancing the field of artificial intelligence while ensuring safety and reliability. Based in **San Francisco**, we are an interdisciplinary team composed of researchers, engineers, policy experts, and operational leaders, dedicated to building AI systems that prioritize human wellbeing.

### Our Mission

Our core mission is to develop AI systems that are:
- **Reliable**: Designed to perform consistently under various conditions.
- **Interpretable**: Easy to understand and explain.
- **Steerable**: Capable of being directed towards specific tasks or goals.

We believe that AI will significantly impact the world, and we approach this transformative technology with an unwavering focus on safety.

---

## Company Culture

### Values that Drive Us

1. **Here for the Mission**: We exist to ensure that transformative AI helps society flourish, collaborating broadly to achieve this objective.
2. **Unusually High Trust**: Our environment promotes good faith, kindness, and honesty among team members.
3. **One Big Team**: Collaboration is central to our ethos; all teams work together towards shared goals.
4. **Do the Simple Thing that Works**: We embrace practical approaches, preferring empirical evidence over cleverness when addressing challenges.

### Team Collaboration

We champion a culture of teamwork, allowing individuals from various backgrounds to contribute to our common mission efficiently. Our structure encourages cross-team collaboration, ensuring that our collective expertise is leveraged in AI development.

---

## Our Products

### Claude - Your AI Assistant

Meet **Claude 3.5 Sonnet**, our intelligent AI model that assists in various tasks, enhancing productivity across teams. With Claude, organizations can:
- Collaborate better and achieve more efficient outcomes.
- Automate routine tasks and focus on creative and strategic work.
- Secure sensitive data while formulating projects and sharing knowledge.

---

## Our Customers

   We work with diverse customers spanning various sectors, such as:
   - **Businesses**: Enhance internal workflows and customer interactions.
   - **Nonprofits**: Drive impactful societal initiatives.
   - **Governments**: Implement reliable AI systems for public service enhancements.

---

## Join Our Team

### Your Career at Anthropic

At Anthropic, we offer more than just jobs; we provide opportunities to contribute to groundbreaking research and product development in AI safety. Our comprehensive benefits include:
- Health, dental, and vision insurance.
- Generous parental leave and flexible paid time off.
- Competitive salaries and equity packages.

### What We Look For

We welcome applicants from diverse backgrounds and experiences. Our hiring process is designed to be inclusive, assessing candidates based on their skills and how they align with our mission:

1. **Exploratory Chat**: Discuss your career interests with our staff.
2. **Skills Assessment**: Depending on the role, you may complete a take-home assignment or technical screening.
3. **Team Screen**: Talk with potential team members.
4. **Final Interviews**: Engage in deeper discussions, including culture-focused interviews.

---

## Why Anthropic?

We’re not just focused on building AI; we’re dedicated to ensuring it’s safe, reliable, and beneficial for humanity. Join us as we shape the future of AI with your unique perspective and expertise!

---

For more information or to view open positions, visit our website at [Anthropic.com](https://www.anthropic.com).

---

*Together, let's build a future where AI empowers individuals and society.*

In [22]:
stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'company page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'blog page', 'url': 'https://huggingface.co/blog'}, {'type': 'community page', 'url': 'https://discuss.huggingface.co'}, {'type': 'GitHub page', 'url': 'https://github.com/huggingface'}, {'type': 'LinkedIn page', 'url': 'https://www.linkedin.com/company/huggingface/'}]}



# Welcome to Hugging Face

## About Us
Hugging Face is a pioneering AI and machine learning (ML) community dedicated to building the future of intelligent solutions through collaborative efforts. Our mission is to democratize good machine learning one commit at a time. We create tools that simplify the process of developing, sharing, and utilizing machine learning models, data, and applications.

## Our Services
- **Models**: Access over **400,000 models** tailored for various applications including text, image, video, and audio processing.
- **Datasets**: Explore more than **100,000 datasets** that facilitate training and evaluation of machine learning algorithms.
- **Spaces**: Create and host applications that leverage our cutting-edge tools and models using our collaboration platform.
- **Enterprise Solutions**: We offer enterprise-grade features with advanced security, dedicated support, and pricing starting from **$20/user/month**.

## Community
At Hugging Face, we are proud to support a vibrant community of over **50,000 organizations** utilizing our resources, including prominent names like Meta, Amazon Web Services, Google, and Microsoft. Our platform encourages users to contribute, engage, and collaborate on projects that enhance the field of AI.

## Company Culture
We believe in fostering an open, inclusive environment that nurtures innovation and creativity. Our team comprises diverse talents and perspectives, all working towards a common goal of advancing machine learning technologies safely and responsibly. Our culture promotes continuous learning and collaboration, encouraging members to share insights and expertise.

## Careers at Hugging Face
We are always looking for talented individuals eager to contribute to the field of AI and machine learning. If you are passionate about open-source contributions and pushing the boundaries of technology, consider joining our team. Check our current job openings on our careers page and help us in shaping the future of AI!

## Connect With Us
Join our community discussions and stay updated with the latest advancements through our blogs, forums, and social media channels.
- [GitHub](https://github.com/huggingface)
- [Twitter](https://twitter.com/huggingface)
- [LinkedIn](https://www.linkedin.com/company/hugging-face/)
- [Discord](https://discord.gg/huggingface)

### Contact Us
For press inquiries or general information, feel free to reach out to our team. Explore, collaborate, and innovate with us at Hugging Face as we build the future together!

--- 
**Hugging Face**: The AI community building the future.



In [23]:
# Try changing the system prompt to the humorous version when you make the Brochure for Hugging Face:

system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

stream_brochure("HuggingFace", "https://huggingface.co")

Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/huggingface'}, {'type': 'careers page', 'url': 'https://apply.workable.com/huggingface/'}, {'type': 'blog', 'url': 'https://huggingface.co/blog'}, {'type': 'enterprise page', 'url': 'https://huggingface.co/enterprise'}, {'type': 'pricing page', 'url': 'https://huggingface.co/pricing'}, {'type': 'documentation page', 'url': 'https://huggingface.co/docs'}]}



# Welcome to Hugging Face: The Fuzzy Frontier of AI!

🎈 **Join the AI Community Building the Future, One Commit at a Time!** 🎈

### Who Are We?

At Hugging Face, we’re not just a tech company; we're the warm, cuddly embrace of AI innovation. Whether it's machine learning models, datasets, or simply spreading joy, we have something for everyone (including your cat—sorry, no actual cat models yet)!

### What We Do

- **Models Galore:** With over **400k models** at your fingertips, you could model just about anything—like the number of times you’ve said "just one more episode" when watching your favorite show. 
- **Dataset Heaven:** Need data? We have a staggering **100k datasets**! Whether it's blue-sky posts or linguistic datasets to optimize your cat meme strategy.
- **Spaces for Everyone:** Create and showcase your models in our lush **Spaces**. We encourage creativity, unless your creativity involves untrained AI plotting world domination.
  
### Join the Fun!

- **Become a Pro:** Join our club for just **$9/month**! You'll unlock advanced features and even get a shiny Pro badge. Because let’s be honest, we all look good in badges. 
- **Enterprise Solutions:** Big teams? No problem! Get enterprise-grade security, priority support, and a touch of pedagogy (yes, vocabulary is our future)! All starting at **$20/user/month**. 

### Meet Our Community 

With over **50,000 organizations** including titans like Google, Amazon Web Services, and even your neighbor's startup, we’re transforming industries. Our community's so cool that even the cool kids are joining in!

### Careers at Hugging Face

Looking for a job where you can hug robots all day? You’re in luck! Join our delightful team of 224 fuzzy innovators. Experience a culture where memes and ML thrive (your chance to convince everyone that your cat’s face is a valid experimental model!). Curious? Check out our current openings on our careers page.

### So Why HUG Us?

1. **Collaboration** feels cozy; we host unlimited models, datasets, and a delightful community.
2. **Learning together** (and maybe stumbling through mistakes) makes us stronger—like yoga for your brain!
3. **Be part of something BIG:** We’re on a mission to democratize good machine learning. It's not just about the tech; it’s about making the world a friendlier place.

### In Closing

As we continue our mission at Hugging Face, remember: in a world full of chaos, become someone’s comforting algorithm. Join us on this journey—where every day is a chance to innovate and every model is a hug waiting to happen! 

**Follow us for updates** because who wouldn’t want more AI puns in their lives? 

💌 **Hugging Face—The Snuggly Space Where AI Dreams Come True!** 💌

Just remember: at Hugging Face, we might not be able to predict the future, but we sure can make it a lot cozier! Enjoy this journey into the world of AI with us!