# AI Broschure Generator
### Buiseness Challenge:
Create a product that builds a Broschure for a company to be used for prospective clients, investors and potential recruiters.

We will be provided the company name and their primary website.

In [38]:
import os 
import requests 
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.schema import SystemMessage, HumanMessage

In [10]:
load_dotenv()
google_api_key = os.getenv('GEMINI_API_KEY')

model = "gemini-1.5-flash"
gemini = ChatGoogleGenerativeAI(
    model = model,
    temperature=0.4,
    google_api_key=google_api_key
)


In [21]:
# A class to represent a webpage
class Website:
    """
    A utility class to represent a website that we have scrapped with links
    """
    url: str
    title: str
    body: str
    links: List[str]
    text: str

    def __init__(self, url):
        self.url = url
        response = requests.get(url)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No Title Found"

        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)

        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title} \nWebpage Contents:\n{self.text}\n\n"

In [41]:
web = Website("https://www.geeksforgeeks.org/machine-learning/machine-learning/")
print(web.get_contents())

Webpage Title:
Machine Learning Tutorial - GeeksforGeeks 
Webpage Contents:
Skip to content
Courses
DSA / Placements
GATE 2026 Prep
ML & Data Science
Development
Cloud / DevOps
Programming Languages
All Courses
Tutorials
Python
Java
DSA
ML & Data Science
Interview Corner
Programming Languages
Web Development
GATE
CS Subjects
DevOps
School Learning
Software and Tools
Practice
Practice Coding Problems
Nation Skillup- Free Courses
Problem of the Day
Jobs
Become a Mentor
Apply Now!
Post Jobs
Job-A-Thon: Hiring Challenge
Jobs Updates
Notifications
Mark all as read
All
View All
Notifications
Mark all as read
All
Unread
Read
You're all caught up!!
Python for Machine Learning
Machine Learning with R
Machine Learning Algorithms
EDA
Math for Machine Learning
Machine Learning Interview Questions
ML Projects
Deep Learning
NLP
Computer vision
Data Science
Artificial Intelligence
Sign In
▲
Open In App
Share Your Experiences
Machine Learning Basics
Introduction to Machine Learning
Types of Machine Le

In [39]:
# Getting all the links from webpage
web.links[:10]

['#main',
 'https://www.geeksforgeeks.org/',
 'https://www.geeksforgeeks.org/courses/category/dsa-placements',
 'https://www.geeksforgeeks.org/courses/category/gate/',
 'https://www.geeksforgeeks.org/courses/category/machine-learning-data-science',
 'https://www.geeksforgeeks.org/courses/category/development-testing',
 'https://www.geeksforgeeks.org/courses/category/cloud-devops',
 'https://www.geeksforgeeks.org/courses/category/programming-languages',
 'https://www.geeksforgeeks.org/courses',
 'https://www.geeksforgeeks.org/python/python-programming-language-tutorial/']

### Step:1 - Figure out which links are relevant 

Now we'll use the Gemini model to read the links on a webpage, and respond in structured JSON format


In [42]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a broschure about the company, \
such as links to an About page, or a Company page, or Career/Jobs pages. \n"

link_system_prompt += "Yoy should respond in JSON as this example:" 

link_system_prompt += """
{
    "links" : [
        {"type" : "about page", "url" : "https://full.url/here/about"}
        {"type" : "careers page", "url" : "https://another.full.url/here/about"}
    ]
}
"""

In [43]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a broschure about the company, such as links to an About page, or a Company page, or Career/Jobs pages. 
Yoy should respond in JSON as this example:
{
    "links" : [
        {"type" : "about page", "url" : "https://full.url/here/about"}
        {"type" : "careers page", "url" : "https://another.full.url/here/about"}
    ]
}



In [35]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += """Please decide which of these are relative web links for a brochure about the company, response with the full url, 
Do not include terms of service, Privacy, email links. \n """
    user_prompt += "Links (Some might be relative links): \n"
    user_prompt += "\n".join(website.links)

    return user_prompt

In [36]:
print(get_links_user_prompt(web))

Here is the list of links on the website of https://www.geeksforgeeks.org/machine-learning/machine-learning/ - Please decide which of these are relative web links for a brochure about the company, response with the full url, 
Do not include terms of service, Privacy, email links. 
 Links (Some might be relative links): 
#main
https://www.geeksforgeeks.org/
https://www.geeksforgeeks.org/courses/category/dsa-placements
https://www.geeksforgeeks.org/courses/category/gate/
https://www.geeksforgeeks.org/courses/category/machine-learning-data-science
https://www.geeksforgeeks.org/courses/category/development-testing
https://www.geeksforgeeks.org/courses/category/cloud-devops
https://www.geeksforgeeks.org/courses/category/programming-languages
https://www.geeksforgeeks.org/courses
https://www.geeksforgeeks.org/python/python-programming-language-tutorial/
https://www.geeksforgeeks.org/java/java/
https://www.geeksforgeeks.org/learn-data-structures-and-algorithms-dsa-tutorial/
https://www.geeksf

In [75]:
from pydantic import BaseModel

# Schema
class Link(BaseModel):
    url_type: str
    url: str

class LinksSchema(BaseModel):
    links: List[Link]

# Structured LLM
structured_llm = gemini.with_structured_output(LinksSchema)

# # Makining the LLM to response in structured format
# structured_llm = gemini.with_structured_output({
#     "links" : [{
#         "url_type" : "string",
#         "url" : "string"
#     }]
#   })

In [76]:
def get_links(url):
    website = Website(url)

    messages = [
        SystemMessage(content = link_system_prompt),
        HumanMessage(content = get_links_user_prompt(website))
    ]

    result = structured_llm.invoke(messages)
    json_result = json.dumps(result.model_dump(), indent=2)  
    
    return json_result

In [81]:
anthropic = Website("https://www.anthropic.com/")
anthropic.links

['#main',
 '#footer',
 'https://www.anthropic.com/',
 'https://www.anthropic.com/claude',
 'https://www.anthropic.com/claude-code',
 'https://www.anthropic.com/max',
 'https://www.anthropic.com/team',
 'https://www.anthropic.com/enterprise',
 'https://www.anthropic.com/pricing',
 'https://claude.ai/download',
 'https://claude.ai/',
 'https://www.anthropic.com/news/claude-character',
 'https://www.anthropic.com/api',
 'https://docs.anthropic.com/',
 'https://www.anthropic.com/pricing#api',
 'https://console.anthropic.com/',
 'https://docs.anthropic.com/en/docs/welcome',
 'https://www.anthropic.com/solutions/agents',
 'https://www.anthropic.com/solutions/code-modernization',
 'https://www.anthropic.com/solutions/coding',
 'https://www.anthropic.com/solutions/customer-support',
 'https://www.anthropic.com/solutions/education',
 'https://www.anthropic.com/solutions/financial-services',
 'https://www.anthropic.com/solutions/government',
 'https://www.anthropic.com/customers',
 'https://www.

In [77]:
print(get_links("https://www.anthropic.com/"))

{
  "links": [
    {
      "url_type": "company page",
      "url": "https://www.anthropic.com/company"
    },
    {
      "url_type": "team page",
      "url": "https://www.anthropic.com/team"
    },
    {
      "url_type": "careers page",
      "url": "https://www.anthropic.com/careers"
    },
    {
      "url_type": "research page",
      "url": "https://www.anthropic.com/research"
    },
    {
      "url_type": "customers page",
      "url": "https://www.anthropic.com/customers"
    },
    {
      "url_type": "solutions page",
      "url": "https://www.anthropic.com/solutions"
    },
    {
      "url_type": "pricing page",
      "url": "https://www.anthropic.com/pricing"
    },
    {
      "url_type": "claude page",
      "url": "https://www.anthropic.com/claude"
    },
    {
      "url_type": "max page",
      "url": "https://www.anthropic.com/max"
    },
    {
      "url_type": "enterprise page",
      "url": "https://www.anthropic.com/enterprise"
    },
    {
      "url_type": "

### Step:2 - Make the broschure

In [89]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    # Get the links
    links = get_links(url) # JSON string format
    # Convert the links from JSON string to dict
    links = json.loads(links)
    
    print("Found Links: ", links)

    for link in links["links"]:
        result += f"\n\n {link['url_type']} "
        result += Website(link['url']).get_contents()
    return result

In [88]:
print(get_all_details("https://www.anthropic.com/"))

Found Links:  {'links': [{'url_type': 'company page', 'url': 'https://www.anthropic.com/company'}, {'url_type': 'team page', 'url': 'https://www.anthropic.com/team'}, {'url_type': 'careers page', 'url': 'https://www.anthropic.com/careers'}, {'url_type': 'research page', 'url': 'https://www.anthropic.com/research'}, {'url_type': 'customers page', 'url': 'https://www.anthropic.com/customers'}, {'url_type': 'solutions page', 'url': 'https://www.anthropic.com/solutions'}, {'url_type': 'claude page', 'url': 'https://www.anthropic.com/claude'}, {'url_type': 'max page', 'url': 'https://www.anthropic.com/max'}, {'url_type': 'pricing page', 'url': 'https://www.anthropic.com/pricing'}, {'url_type': 'api page', 'url': 'https://www.anthropic.com/api'}, {'url_type': 'events page', 'url': 'https://www.anthropic.com/events'}, {'url_type': 'news page', 'url': 'https://www.anthropic.com/news'}]}
Landing page:
Webpage Title:
Home \ Anthropic 
Webpage Contents:
Skip to main content
Skip to footer
Claude


Till here we've got the scrapped content of all the related websites now we have to  combine this data and summerize this using the LLM to make the brochure

In [101]:
system_prompt = """you are an assistant that analyzes the contents of several relevant pages from a company website  
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.
Include details of company culture, customers and careers/jobs if you have the information."""


In [102]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown. \n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:20000]
    return user_prompt 

In [None]:
get_brochure_user_prompt("Anthropic", "https://www.anthropic.com/")

In [108]:
def create_brochure(company_name, url):
    messages = [
        SystemMessage(content = system_prompt),
        HumanMessage(content = get_brochure_user_prompt(company_name, url))
    ]

    result = gemini.invoke(messages)
    display(Markdown(result.content))

In [109]:
create_brochure("Anthropic", "https://www.anthropic.com/")

Found Links:  {'links': [{'url_type': 'company page', 'url': 'https://www.anthropic.com/company'}, {'url_type': 'team page', 'url': 'https://www.anthropic.com/team'}, {'url_type': 'careers page', 'url': 'https://www.anthropic.com/careers'}, {'url_type': 'research page', 'url': 'https://www.anthropic.com/research'}, {'url_type': 'customers page', 'url': 'https://www.anthropic.com/customers'}, {'url_type': 'solutions page', 'url': 'https://www.anthropic.com/solutions'}, {'url_type': 'pricing page', 'url': 'https://www.anthropic.com/pricing'}, {'url_type': 'claude page', 'url': 'https://www.anthropic.com/claude'}, {'url_type': 'max page', 'url': 'https://www.anthropic.com/max'}, {'url_type': 'claude-code page', 'url': 'https://www.anthropic.com/claude-code'}, {'url_type': 'news page', 'url': 'https://www.anthropic.com/news'}, {'url_type': 'events page', 'url': 'https://www.anthropic.com/events'}]}


# Anthropic: AI for Good

**A Brochure for Customers, Investors, and Recruits**

##  About Anthropic

Anthropic is a leading AI safety and research company dedicated to building reliable, interpretable, and steerable AI systems.  As a public benefit corporation, our mission is to ensure the long-term benefits of AI while mitigating its risks. We achieve this through cutting-edge research, responsible product development, and collaborative partnerships across industry, government, and academia.

**Our Core Values:**

*   **Global Good:** Maximizing positive outcomes for humanity.
*   **Balanced Perspective:** Acknowledging both the potential benefits and risks of AI.
*   **User-Centricity:** Prioritizing the needs of our customers and all stakeholders.
*   **Safety Leadership:** Driving the industry towards higher safety standards.
*   **Efficiency and Impact:** Focusing on practical solutions with significant impact.
*   **Transparency and Collaboration:** Open communication and collaboration with partners.
*   **Mission-Driven:** Prioritizing our mission above all else.

##  Our Products: Claude

Anthropic's flagship product, Claude, is a powerful and versatile AI assistant available through various plans (Max, Team, Enterprise) to suit different needs.  Claude offers:

*   **Enhanced Productivity:**  Cut project timelines and boost team efficiency.
*   **Expert-Level Capabilities:** Empower every team member with access to institutional knowledge.
*   **Scalable Solutions:** Adapt to growing team needs and complex challenges.
*   **Customizable Experience:** Tailor Claude to your brand voice and specific requirements.
*   **Seamless Integrations:** Connect Claude with existing tools like Jira, Confluence, and Intercom.


##  Our Customers

Anthropic serves a diverse range of customers, including:

*   Businesses seeking to improve productivity and efficiency.
*   Non-profits leveraging AI for social good.
*   Government agencies utilizing AI for public services.
*   Educational institutions integrating AI into learning environments.
*   Financial services companies enhancing customer support and risk management.


##  Careers at Anthropic

We are a collaborative team of researchers, engineers, policy experts, business leaders, and operators from diverse backgrounds.  We value:

*   **Innovation and Creativity:**  We encourage bold ideas and experimentation.
*   **Collaboration and Teamwork:**  We work together to achieve our shared goals.
*   **Impact and Purpose:**  We are driven by our mission to build safe and beneficial AI.
*   **Diversity and Inclusion:** We foster a welcoming and inclusive environment.

**Open Roles:** Visit our careers page to explore current opportunities.  We offer competitive salaries, comprehensive benefits, and a stimulating work environment.


##  Investment Opportunity

Anthropic is committed to responsible AI development and offers a unique investment opportunity in a rapidly growing sector. Our focus on safety and ethical considerations positions us for long-term success and positive societal impact.  Contact us to learn more about investment opportunities.


##  Contact Us

For inquiries, please visit our website or contact us directly.

### Stream: Results come back in typewriter animation

In [119]:
def stream_brochure(company_name, url):
    messages = [
        SystemMessage(content=system_prompt),
        HumanMessage(content=get_brochure_user_prompt(company_name, url))
    ]
    response_text = ""
    display_handle = display(Markdown(""), display_id = True)

    # Stream tokens instead of aiting for full response
    for chunk in gemini.stream(messages):
        if chunk.content:  # every chunk may contain partial tokens
            response_text += chunk.content
            # clean formatting if needed
            clean_text = response_text.replace("```", "").replace("markdown", "")
            update_display(Markdown(clean_text), display_id=display_handle.display_id)

In [120]:
stream_brochure("Anthropic", "https://www.anthropic.com/")

Found Links:  {'links': [{'url_type': 'company page', 'url': 'https://www.anthropic.com/company'}, {'url_type': 'team page', 'url': 'https://www.anthropic.com/team'}, {'url_type': 'careers page', 'url': 'https://www.anthropic.com/careers'}, {'url_type': 'research page', 'url': 'https://www.anthropic.com/research'}, {'url_type': 'customers page', 'url': 'https://www.anthropic.com/customers'}, {'url_type': 'solutions page', 'url': 'https://www.anthropic.com/solutions'}, {'url_type': 'pricing page', 'url': 'https://www.anthropic.com/pricing'}, {'url_type': 'claude page', 'url': 'https://www.anthropic.com/claude'}, {'url_type': 'max page', 'url': 'https://www.anthropic.com/max'}, {'url_type': 'news page', 'url': 'https://www.anthropic.com/news'}]}


# Anthropic: AI for Good

**A Brochure for Customers, Investors, and Recruits**

##  About Anthropic

Anthropic is a public benefit corporation dedicated to responsibly developing and deploying advanced AI systems. We believe AI will have a vast impact on the world, and we are committed to ensuring its benefits are maximized while mitigating its risks.  Our core focus is on building reliable, interpretable, and steerable AI systems, prioritizing safety at every stage of development.  We achieve this through cutting-edge research, rigorous testing, and collaborative partnerships.  Our flagship product, Claude, offers powerful AI capabilities for various applications.

##  For Customers

**Claude:** Our AI assistant, Claude, is available in several plans (Max, Team, Enterprise) to suit different needs and budgets. Claude excels at:

* **Enhanced Productivity:** Cut project timelines and boost team efficiency.
* **Complex Problem Solving:** Tackle challenges beyond typical expertise.
* **Knowledge Scaling:** Access and leverage institutional knowledge across your organization.
* **Customizable Interactions:** Tailor Claude's responses to match your brand voice.
* **Integration Capabilities:** Seamlessly connect with existing tools like Jira, Confluence, and Intercom.

**Solutions:** Claude is applicable across various sectors, including:

* Customer Support
* Education
* Financial Services
* Government
* Code Modernization

Explore our case studies to see how Claude is helping businesses and organizations achieve their goals.


## For Investors

Anthropic is at the forefront of AI safety research and development. We are building a sustainable business model based on the deployment of our safe and reliable AI systems.  Our commitment to transparency and responsible scaling ensures long-term value creation.  Key highlights include:

* **Frontier AI Research:**  We are pushing the boundaries of AI safety and interpretability.
* **Robust Product Portfolio:**  Our Claude suite provides scalable and adaptable solutions for a wide range of clients.
* **Public Benefit Corporation Status:**  Our commitment to societal benefit is at the core of our mission.
* **Strong Leadership Team:**  We are led by experienced researchers and entrepreneurs.
* **ISO 42001 Certification:** Demonstrates our commitment to quality management systems.


## For Recruits

Anthropic offers a unique and rewarding work environment for individuals passionate about building a safer and more beneficial future with AI. We value:

* **Collaboration:** We work as a diverse and interdisciplinary team.
* **Impact:** We are making a tangible difference in the world.
* **Safety First:** Our commitment to safety guides every decision.
* **Innovation:** We encourage bold ideas and experimentation.
* **Transparency:** Open communication and knowledge sharing are key.

We are always looking for talented individuals in various roles, including:* Researchers
* Engineers
* Policy Experts
* Business Leaders
* Operations


Join us in building the future of safe and reliable AI.  Visit our careers page to explore open positions.


**Contact Us:**

[Website Link]
[Email Address]
[Phone Number]