# **Testing to create a brochure generator using llama3.2**

In [1]:
# imports

import os
import requests
from requests.exceptions import RequestException
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
import ollama
import re

In [2]:
# pulling llama3.2 model

!ollama pull llama3.2

[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠦ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠧ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠇ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠏ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠴ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling dde5aa3fc5ff: 100% ▕██████████████████▏ 2.0 GB                         [K
pulling 966de9

In [3]:
MODEL = 'llama3.2'

In [4]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        try:
            response = requests.get(url, headers=headers, timeout=10)
            self.body = response.content
            soup = BeautifulSoup(self.body, 'html.parser')
            self.title = soup.title.string if soup.title else "No title found"
            if soup.body:
                for irrelevant in soup.body(["script", "style", "img", "input"]):
                    irrelevant.decompose()
                self.text = soup.body.get_text(separator="\n", strip=True)
            else:
                self.text = ""
            links = [link.get('href') for link in soup.find_all('a')]
            self.links = [link for link in links if link]
        except RequestException as e:
            self.body = ""
            self.title = f"Failed to fetch: {url}"
            self.text = f"Error: {e}"
            self.links = []

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [5]:
link_system_prompt = """
You are provided with a list of links found on a webpage. 
You are able to decide which of the links would be most relevant to include in a brochure about the company, 
such as links to an About page, or a Company page, or Careers/Jobs pages.
Respond in JSON format like:
{
    "links": [
        {"type": "about page", "url": "https://example.com/about"},
        {"type": "careers page", "url": "https://example.com/careers"}
    ]
}
""".strip()


In [6]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. "
    user_prompt += "Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt


In [7]:
ed = Website("https://edwarddonner.com")
ed.links

['https://edwarddonner.com/',
 'https://edwarddonner.com/connect-four/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/',
 'https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/12/21/llm-

In [8]:
print(ed.get_contents())

Webpage Title:
Home - Edward Donner
Webpage Contents:
Home
Connect Four
Outsmart
An arena that pits LLMs against each other in a battle of diplomacy and deviousness
About
Posts
Well, hi there.
I’m Ed. I like writing code and experimenting with LLMs, and hopefully you’re here because you do too. I also enjoy DJing (but I’m badly out of practice), amateur electronic music production (
very
amateur) and losing myself in
Hacker News
, nodding my head sagely to things I only half understand.
I’m the co-founder and CTO of
Nebula.io
. We’re applying AI to a field where it can make a massive, positive impact: helping people discover their potential and pursue their reason for being. Recruiters use our product today to source, understand, engage and manage talent. I’m previously the founder and CEO of AI startup untapt,
acquired in 2021
.
We work with groundbreaking, proprietary LLMs verticalized for talent, we’ve
patented
our matching model, and our award-winning platform has happy customers a

## **1. Have llama3.2 figure out which links are relevant**

### Use a call to llama3.2 to read the links on a webpage, and respond in structured JSON.  
It should decide which links are relevant, and replace relative links such as "/about" with "https://company.com/about".  

In [9]:
link_system_prompt = """
You are provided with a list of links found on a webpage. 
You are able to decide which of the links would be most relevant to include in a brochure about the company, 
such as links to an About page, or a Company page, or Careers/Jobs pages.
Respond in JSON format like:
{
    "links": [
        {"type": "about page", "url": "https://example.com/about"},
        {"type": "careers page", "url": "https://example.com/careers"}
    ]
}
""".strip()


In [10]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. 
You are able to decide which of the links would be most relevant to include in a brochure about the company, 
such as links to an About page, or a Company page, or Careers/Jobs pages.
Respond in JSON format like:
{
    "links": [
        {"type": "about page", "url": "https://example.com/about"},
        {"type": "careers page", "url": "https://example.com/careers"}
    ]
}


In [11]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. "
    user_prompt += "Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt


In [12]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://edwarddonner.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
https://edwarddonner.com/
https://edwarddonner.com/connect-four/
https://edwarddonner.com/outsmart/
https://edwarddonner.com/about-me-and-about-nebula/
https://edwarddonner.com/posts/
https://edwarddonner.com/
https://news.ycombinator.com
https://nebula.io/?utm_source=ed&utm_medium=referral
https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html
https://patents.google.com/patent/US20210049536A1/
https://www.linkedin.com/in/eddonner/
https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/
https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/
https://edwarddonner.com/2025/01/23/ll

In [13]:
def chat_with_llama(system_prompt: str, user_prompt: str, model=MODEL):
    url = 'http://localhost:11434/api/chat'
    data = {
        "model": model,
        "messages": [
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": "Links: /about /contact"},
            {"role": "assistant", "content": '{"links": [{"type": "about page", "url": "https://example.com/about"}]}'},
            {"role": "user", "content": "Links: /terms /privacy"},
            {"role": "assistant", "content": '{"links": []}'},
            {"role": "user", "content": user_prompt}
        ],
        "stream": False
    }
    response = requests.post(url, json=data)
    return response.json()["message"]["content"]

In [14]:
def get_links(url):
    website = Website(url)
    prompt = get_links_user_prompt(website)
    response = chat_with_llama(link_system_prompt, prompt)
    print("LLM response for links:", response)  
    try:
        data = json.loads(response)
        if isinstance(data, dict) and "links" in data:
            return data
        else:
            return {"links": []}
    except Exception as e:
        print("Error parsing LLM response:", e)
        return {"links": []}

Test with https://huggingface.co

In [15]:
get_links("https://huggingface.co")

LLM response for links: {
  "links": [
    "https://huggingface.co/",
    "https://huggingface.co/models",
    "https://huggingface.co/datasets",
    "https://huggingface.co/spaces",
    "https://huggingface.co/pricing",
    "https://huggingface.co/enterprise",
    "https://huggingface.co/about",
    "https://huggingface.co/press"
  ]
}


{'links': ['https://huggingface.co/',
  'https://huggingface.co/models',
  'https://huggingface.co/datasets',
  'https://huggingface.co/spaces',
  'https://huggingface.co/pricing',
  'https://huggingface.co/enterprise',
  'https://huggingface.co/about',
  'https://huggingface.co/press']}

## **2. make the brochure**

Assemble all the details into another prompt to llama3.2

In [16]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        if isinstance(link, dict) and "type" in link and "url" in link:
            result += f"\n\n{link['type']}\n"
            result += Website(link["url"]).get_contents()
        else:
            print("Skipping invalid link:", link)
    return result

In [17]:
print(get_all_details("https://huggingface.co"))

LLM response for links: {
  "links": [
    {
      "type": "home page",
      "url": "https://huggingface.co"
    },
    {
      "type": "models page",
      "url": "https://huggingface.co/models"
    },
    {
      "type": "datasets page",
      "url": "https://huggingface.co/datasets"
    },
    {
      "type": "spaces page",
      "url": "https://huggingface.co/spaces"
    },
    {
      "type": "docs page",
      "url": "https://huggingface.co/docs"
    },
    {
      "type": "pricing page",
      "url": "https://huggingface.co/pricing"
    },
    {
      "type": "enterprise page",
      "url": "https://huggingface.co/enterprise"
    },
    {
      "type": "about company page",
      "url": "https://huggingface.co/about"
    }
  ]
}
Found links: {'links': [{'type': 'home page', 'url': 'https://huggingface.co'}, {'type': 'models page', 'url': 'https://huggingface.co/models'}, {'type': 'datasets page', 'url': 'https://huggingface.co/datasets'}, {'type': 'spaces page', 'url': 'https:/

In [18]:
system_prompt = """
You are an assistant that analyzes the contents of several relevant pages from a company website 
and creates a short humorous, entertaining, jokey brochure about the company for prospective customers, investors and recruits.
Respond in markdown. For every mention of a company, partner, social media, or resource, include the actual clickable markdown link using the real URL if available.
Include details of company culture, customers and careers/jobs if you have the information.
"""

In [19]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += "When mentioning any partner, social media, or resource, always use the actual markdown link with the real URL provided below.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5000]  # Truncate if too long
    return user_prompt

In [20]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co")

LLM response for links: {
    "links": [
        "https://huggingface.co/",
        "https://ui.endpoints.huggingface.co",
        "https://apply.workable.com/huggingface/",
        "https://discuss.huggingface.co",
        "https://status.huggingface.co/",
        "https://github.com/huggingface",
        "https://twitter.com/huggingface",
        "https://www.linkedin.com/company/huggingface/"
    ]
}
Found links: {'links': ['https://huggingface.co/', 'https://ui.endpoints.huggingface.co', 'https://apply.workable.com/huggingface/', 'https://discuss.huggingface.co', 'https://status.huggingface.co/', 'https://github.com/huggingface', 'https://twitter.com/huggingface', 'https://www.linkedin.com/company/huggingface/']}
Skipping invalid link: https://huggingface.co/
Skipping invalid link: https://ui.endpoints.huggingface.co
Skipping invalid link: https://apply.workable.com/huggingface/
Skipping invalid link: https://discuss.huggingface.co
Skipping invalid link: https://status.huggingface.

'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nWhen mentioning any partner, social media, or resource, always use the actual markdown link with the real URL provided below.\nLanding page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nnvidia/parakeet-tdt-0.6b-v2\nUpdated\n9 days ago\n•\n72.7k\n•\n659\nACE-Step/ACE-Step-v1-3.5B\nUpdated\nabout 9 hours ago\n•\n288\nnari-labs/Dia-1.6B\nUpdated\n4 days ago\n•\n148k\n•\n2.01k\nLightricks/LTX-Video\nUpdated\n4 days ago\n•\n214k\n•\n1.35k\ndeepseek-ai/Deep

In [21]:
def create_brochure(company_name, url):
    user_prompt = get_brochure_user_prompt(company_name, url)
    result = chat_with_llama(system_prompt, user_prompt)
    display(Markdown(result))

In [22]:
create_brochure("HuggingFace", "https://huggingface.co")

LLM response for links: {
    "links": [
        {"type": "about page", "url": "https://huggingface.co/"},
        {"type": "home page", "url": "https://ui.endpoints.huggingface.co"}
    ]
}
Found links: {'links': [{'type': 'about page', 'url': 'https://huggingface.co/'}, {'type': 'home page', 'url': 'https://ui.endpoints.huggingface.co'}]}


Hugging Face Brochure
======================

Welcome to Hugging Face, the AI community building the future.

**Our Mission**
----------------

We're on a mission to make AI more accessible and fun for everyone. Our platform is where the machine learning community collaborates on models, datasets, and applications.

**What We Do**
--------------

* Explore 1 million+ models and browse trending ones
* Create, discover, and collaborate on ML projects with our community
* Build your portfolio by sharing your work and building your ML profile
* Accelerate your ML journey with paid Compute and Enterprise solutions

**Meet Our Partners**
--------------------

We're proud to partner with industry leaders like:

[Meta](https://www.meta.com/) - 756 models, 3.23k followers
[Ai2](https://ai2.org/) - non-profit organization using Hugging Face
[Amazon](https://aws.amazon.com/) - 20 models, 3.14k followers
[Google](https://google.com/) - 991 models, 12.9k followers
[Intel](https://intel.com/) - 220 models, 2.52k followers
[Microsoft](https://microsoft.com/) - 374 models, 12.2k followers
[Grammarly](https://grammarly.com/) - Enterprise company with 10 models, 158 followers

**Our Open Source**
-----------------

We're building the foundation of ML tooling with our community. Check out our open source projects:

* [Transformers](https://huggingface.co/transformers) - State-of-the-art ML for PyTorch, TensorFlow, JAX
* [Diffusers](https://huggingface.co/diffusers) - State-of-the-art Diffusion models in PyTorch
* [Safetensors](https://huggingface.co/safetensors) - Safe way to store/distribute neural network weights

**Join Our Community**
---------------------

Sign up for our platform and join the AI community building the future.

[Sign Up](https://example.com/sign-up)

[Learn More](https://example.com/about)

Follow us on social media:

[GitHub](https://github.com/huggingface)
[Twitter](https://twitter.com/huggingface)
[LinkedIn](https://linkedin.com/company/huggingface)
[Discord](https://discord.com/invite/huggingface)

## 3. Improvement for stream animation

In [23]:
def stream_brochure(company_name, url):
    stream = ollama.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        content = chunk.get("message", {}).get("content", "")
        if content:
            response += content
            response = response.replace("```", "").replace("markdown", "")
            update_display(Markdown(response), display_id=display_handle.display_id)

In [24]:
stream_brochure("HuggingFace", "https://huggingface.co")

LLM response for links: {
  "links": [
    {
      "type": "Company Overview/About page",
      "url": "https://huggingface.co"
    },
    {
      "type": "Models Page",
      "url": "https://huggingface.co/models"
    },
    {
      "type": "Datasets Page",
      "url": "https://huggingface.co/datasets"
    },
    {
      "type": "Spaces Page",
      "url": "https://huggingface.co/spaces"
    },
    {
      "type": "Blog/News",
      "url": "https://blog.huggingface.co"
    },
    {
      "type": "GitHub Repository",
      "url": "https://github.com/huggingface"
    },
    {
      "type": "Twitter Profile",
      "url": "https://twitter.com/huggingface"
    },
    {
      "type": "LinkedIn Company Page",
      "url": "https://www.linkedin.com/company/huggingface/"
    }
  ]
}
Found links: {'links': [{'type': 'Company Overview/About page', 'url': 'https://huggingface.co'}, {'type': 'Models Page', 'url': 'https://huggingface.co/models'}, {'type': 'Datasets Page', 'url': 'https://hugging

**The Hugging Face Brochure**

Welcome to Hugging Face, the AI community building the future!

[![Hugging Face Logo](https://huggingface.co/huggingface_logo.svg)](https://huggingface.co/)

At Hugging Face, we're passionate about empowering the machine learning community to collaborate, innovate, and push the boundaries of what's possible with AI. Our platform is where you can explore, create, and share models, datasets, and applications that shape the future of AI.

**Explore 1 Million+ Models**

[![Models](https://huggingface.co/images/models.png)](https://huggingface.co/models)

From cutting-edge research models to practical applications, our model repository is vast and ever-growing. Browse trending models, or search for specific ones by name or tags.

**Collaborate on Spaces**

[![Spaces](https://huggingface.co/images/spaces.png)](https://huggingface.co/spaces)

Join thousands of users in our collaborative workspace, where you can host, share, and work on unlimited public models, datasets, and applications. Share your portfolio and build your ML profile!

**Accelerate Your AI**

[![Compute](https://huggingface.co/images/compute.png)](https://huggingface.co/compute)

Get started with our paid Compute solutions, which provide optimized inference endpoints or GPU deployment in just a few clicks. Explore our pricing plans to find the right fit for your needs.

**Join the Community**

[![GitHub](https://img.shields.io/badge/learn%20more-About--HuggingFace--on-GitHub-blue)](https://github.com/huggingface)

Follow us on GitHub and stay up-to-date with our latest releases, features, and community projects. Get involved in the conversation, share your work, and learn from others.

**We're Supported by Top Partners**

* [AI2](https://ai2.github.io/)
* [Meta](https://about.meta.com/)
* [Amazon](https://aws.amazon.com/)
* [Google](https://cloud.google.com/)
* [Intel](https://www.intel.com/)
* [Microsoft](https://www.microsoft.com/)
* [Grammarly](https://grammarly.com/)
* [Writer](https://writer.huggingface.co/)

**Transforming the Future of AI**

At Hugging Face, we're dedicated to building a foundation for machine learning tooling with our community. Explore our open-source projects and technologies, including:

* **Transformers**: State-of-the-art ML for PyTorch, TensorFlow, JAX
* **Diffusers**: State-of-the-art Diffusion models in PyTorch
* **Safetensors**: Safe way to store/distribute neural network weights

**Stay Connected**

[![Twitter](https://img.shields.io/badge/Follow%20HuggingFace--on-Twitter-blue)](https://twitter.com/huggingface)
[![LinkedIn](https://img.shields.io/badge/Connect%20with%20HuggingFace--on-LinkedIn-blue)](https://www.linkedin.com/company/huggingface/)
[![Discord](https://img.shields.io/badge/Join%20our%20Community--on-Discord-blue)](https://discord.gg/huggingface)

Join the Hugging Face community today and start building a future with AI!