# A full business solution

## Now we will take our project from Day 1 to the next level

### BUSINESS CHALLENGE:

Create a product that builds a Brochure for a company to be used for prospective clients, investors and potential recruits.

We will be provided a company name and their primary website.

See the end of this notebook for examples of real-world business applications.

And remember: I'm always available if you have problems or ideas! Please do reach out.

In [1]:
!pip install beautifulsoup4 requests



In [None]:
!huggingface-cli login --token {"YOR_TOKEN"}

In [3]:
import os
import requests
import json
from typing import List
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display

# مدل رو بعداً لود می‌کنیم
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

In [4]:
DEVICE = "cpu"
if torch.backends.mps.is_available():
    DEVICE = "mps"
elif torch.cuda.is_available():
    DEVICE = "cuda"

print(f"Using device: {DEVICE}")

Using device: cuda


In [5]:
# نام مدل
model_id = "meta-llama/Llama-3.2-1B-Instruct"

# توکنایزر و مدل را بارگذاری می‌کنیم
tokenizer = AutoTokenizer.from_pretrained(model_id, use_auth_token=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,  # استفاده از float16 برای GPU
    device_map="auto"           # به صورت خودکار روی GPU یا CPU قرار می‌گیره
)
tokenizer.pad_token = tokenizer.eos_token



tokenizer_config.json:   0%|          | 0.00/54.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/877 [00:00<?, ?B/s]

2025-04-22 15:52:24.998630: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1745337145.392374      31 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1745337145.509673      31 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

In [6]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [7]:
ed = Website("https://edwarddonner.com")
ed.links

['https://edwarddonner.com/',
 'https://edwarddonner.com/connect-four/',
 'https://edwarddonner.com/outsmart/',
 'https://edwarddonner.com/about-me-and-about-nebula/',
 'https://edwarddonner.com/posts/',
 'https://edwarddonner.com/',
 'https://news.ycombinator.com',
 'https://nebula.io/?utm_source=ed&utm_medium=referral',
 'https://www.prnewswire.com/news-releases/wynden-stark-group-acquires-nyc-venture-backed-tech-startup-untapt-301269512.html',
 'https://patents.google.com/patent/US20210049536A1/',
 'https://www.linkedin.com/in/eddonner/',
 'https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/',
 'https://edwarddonner.com/2025/04/21/the-complete-agentic-ai-engineering-course/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2025/01/23/llm-workshop-hands-on-with-agents-resources/',
 'https://edwarddonner.com/2024/12/21/llm-resources-superdatascience/',
 'https://edwarddonner.com/2024/12/21/llm-

## First step: Have the LLaMA 3.2 model identify which links are relevant

### In this step, we will use our locally loaded LLaMA 3.2-1B-Instruct model to analyze the list of links extracted from a webpage and respond with structured JSON.

The goal is for the model to decide which links are relevant to the main content of the site, and convert any relative links (like "/about") into full absolute links (e.g., "https://company.com/about").

We will use **one-shot prompting**, where we show the model an example input and expected output inside the prompt, to guide it toward producing JSON structured responses.

This task benefits from the reasoning capabilities of language models like LLaMA, as it involves interpreting context and understanding the purpose of hyperlinks — something that would be very complex to implement using traditional rule-based web parsers.

📝 Note: For stricter control of outputs (e.g., full JSON schema validation), more advanced techniques like function calling or structured output enforcement are required, but this basic approach already works well for our use case.


In [8]:
link_system_prompt = (
    "You are provided with a list of links found on a webpage.\n"
    "Your task is to decide which links are most relevant to include in a company brochure — "
    "for example, links to pages like About, Company, Careers, Jobs, Team, etc.\n"
    "Please output your result in structured JSON format like the example below:\n\n"
    "{\n"
    "    \"links\": [\n"
    "        {\"type\": \"about page\", \"url\": \"https://full.url/goes/here/about\"},\n"
    "        {\"type\": \"careers page\", \"url\": \"https://another.full.url/careers\"}\n"
    "    ]\n"
    "}"
)


In [9]:
print(link_system_prompt)

You are provided with a list of links found on a webpage.
Your task is to decide which links are most relevant to include in a company brochure — for example, links to pages like About, Company, Careers, Jobs, Team, etc.
Please output your result in structured JSON format like the example below:

{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page", "url": "https://another.full.url/careers"}
    ]
}


In [10]:
def get_links_user_prompt(website):
    user_prompt = (
        f"You are reviewing a list of links from the website: {website.url}\n\n"
        "Please identify which of these links would be appropriate for a company brochure.\n"
        "Only include links like About, Company, Careers, Jobs, Team, etc.\n"
        "❌ Do not include: Privacy Policy, Terms of Service, Email (mailto:) links.\n"
        "Output a structured JSON like this:\n\n"
        "{\n"
        "  \"links\": [\n"
        "    {\"type\": \"about page\", \"url\": \"https://example.com/about\"},\n"
        "    {\"type\": \"careers page\", \"url\": \"https://example.com/careers\"}\n"
        "  ]\n"
        "}\n\n"
        "Here is the list of links found on the page:\n"
    )
    user_prompt += "\n".join(f"- {link}" for link in website.links)
    return user_prompt


In [11]:
from transformers import GenerationConfig

def generate_response(prompt, model, tokenizer, **generation_kwargs):
    """
    Generates a response from the model given a prompt string using Hugging Face transformers.
    Supports configurable generation settings like temperature, top_p, repetition_penalty, etc.
    """

    # اگر مدل generation_config نداشت، مقدار پیش‌فرض تعریف کن
    gen_config = getattr(model, "generation_config", GenerationConfig())

    # تنظیمات پیش‌فرض تولید
    default_generation_kwargs = {
        "do_sample": True,
        "top_p": 0.9,
        "temperature": 0.7,
        "max_new_tokens": 1024,
        "repetition_penalty": 1.2
    }

    # ادغام تنظیمات دستی و پیش‌فرض
    generation_kwargs = {**default_generation_kwargs, **generation_kwargs}

    # آماده‌سازی ورودی‌ها
    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    # تولید پاسخ با مدل
    outputs = model.generate(
        input_ids=inputs["input_ids"],
        attention_mask=inputs["attention_mask"],
        **generation_kwargs
    )

    input_length = inputs["input_ids"].shape[1]

    return tokenizer.decode(outputs[0][input_length:], skip_special_tokens=True)


In [12]:
def format_messages_as_prompt(messages, tokenizer):
    """
    Converts chat-style messages into a prompt using the tokenizer's chat template.
    """
    return tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

In [13]:
import re
import json

def extract_json_from_output(text):
    """
    استخراج تنها بخش JSON از خروجی مدل
    """
    match = re.search(r"{\s*\"links\":\s*\[.*?\]\s*}", text, re.DOTALL)
    if match:
        try:
            return json.loads(match.group(0))
        except json.JSONDecodeError:
            print("⚠️ JSON structure found but not valid.")
            return None
    else:
        print("❌ JSON block not found in model output.")
        return None


def get_links(url, model, tokenizer):
    website = Website(url)

    # ساخت پیام‌ها
    messages = [
        {"role": "system", "content": link_system_prompt},
        {"role": "user", "content": get_links_user_prompt(website)}
    ]

    # ساخت پرامپت
    prompt = format_messages_as_prompt(messages, tokenizer)

    # تولید پاسخ
    output = generate_response(prompt, model, tokenizer)

    # تلاش برای استخراج JSON معتبر
    return extract_json_from_output(output)


In [14]:
# Anthropic has made their site harder to scrape, so I'm using HuggingFace..

huggingface = Website("https://huggingface.co")
huggingface.links

['/',
 '/models',
 '/datasets',
 '/spaces',
 '/posts',
 '/docs',
 '/enterprise',
 '/pricing',
 '/login',
 '/join',
 'blog/inference-providers-cohere',
 '/spaces',
 '/models',
 '/microsoft/bitnet-b1.58-2B-4T',
 '/nari-labs/Dia-1.6B',
 '/HiDream-ai/HiDream-I1-Full',
 '/microsoft/MAI-DS-R1',
 '/Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0',
 '/models',
 '/spaces/enzostvs/deepsite',
 '/spaces/bytedance-research/UNO-FLUX',
 '/spaces/InstantX/InstantCharacter',
 '/spaces/jamesliu1217/EasyControl_Ghibli',
 '/spaces/Yuanshi/OminiControl_Art',
 '/spaces',
 '/datasets/zwhe99/DeepMath-103K',
 '/datasets/Anthropic/values-in-the-wild',
 '/datasets/nvidia/OpenCodeReasoning',
 '/datasets/openai/mrcr',
 '/datasets/future-technologies/Universal-Transformers-Dataset',
 '/datasets',
 '/join',
 '/pricing#endpoints',
 '/pricing#spaces',
 '/pricing',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/enterprise',
 '/allenai',
 '/facebook',
 '/amazon',
 '/g

In [15]:
links_data = get_links("https://huggingface.co", model, tokenizer)

if links_data:
    print("✅ Extracted links:")
    print(json.dumps(links_data, indent=2))
else:
    print("⛔ No valid JSON returned.")


Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


✅ Extracted links:
{
  "links": [
    {
      "type": "company",
      "url": "https://huggingface.co"
    },
    {
      "type": "about page",
      "url": "https://about.huggingface.co"
    }
  ],
  "other_links": []
}


## Second step: make the brochure!

Assemble all the details into another prompt to GPT4-o

In [16]:
def get_all_details(url, model, tokenizer):
    result = "🔹 Landing Page:\n"
    result += Website(url).get_contents()

    links = get_links(url, model, tokenizer)
    if not links or "links" not in links:
        result += "\n\n⚠️ No relevant links found or model failed to respond."
        return result

    numbered_links = []
    link_contents = []

    print("✅ Found links:", links)

    for idx, link in enumerate(links["links"], start=1):
        link_type = link.get("type", "unknown").capitalize()
        link_url = link.get("url", "")

        # فهرست لینک‌ها
        numbered_links.append(f"[{idx}] {link_type}: {link_url}")

        # تلاش برای دریافت محتوا
        try:
            content = Website(link_url).get_contents()
        except Exception as e:
            content = f"⚠️ Failed to load content from {link_url}\nReason: {str(e)}\n"

        # اضافه به محتوای لینک‌ها
        link_contents.append(f"\n\n🔸 [{idx}] {link_type}:\n{content}")

    # افزودن فهرست لینک‌ها در ابتدای نتیجه
    result += "\n\n🔗 **List of Links:**\n" + "\n".join(numbered_links)

    # افزودن محتوای هر لینک شماره‌گذاری‌شده
    result += "\n" + "\n".join(link_contents)

    return result


In [17]:
print(get_all_details("https://huggingface.co", model, tokenizer))

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


❌ JSON block not found in model output.
🔹 Landing Page:
Webpage Title:
Hugging Face – The AI community building the future.
Webpage Contents:
Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
NEW
Welcome Cohere on the Hub 🔥
Welcome Hyperbolic, Nebius AI Studio, and Novita on the Hub 🔥
Welcome Fireworks.ai on the Hub 🎆
The AI community building the future.
The platform where the machine learning community collaborates on models, datasets, and applications.
Explore AI Apps
or
Browse 1M+ models
Trending on
this week
Models
microsoft/bitnet-b1.58-2B-4T
Updated
2 days ago
•
17.4k
•
659
nari-labs/Dia-1.6B
Updated
about 2 hours ago
•
5.67k
•
315
HiDream-ai/HiDream-I1-Full
Updated
about 13 hours ago
•
26.8k
•
685
microsoft/MAI-DS-R1
Updated
6 days ago
•
427
•
200
Shakker-Labs/FLUX.1-dev-ControlNet-Union-Pro-2.0
Updated
about 8 hours ago
•
11.9k
•
186
Browse 1M+ models
Spaces
Running
5.02k
5.02k
DeepSite
🐳
Generate any application with DeepSeek
Running
on
Zero
510

In [29]:
# system_prompt = (
#     "You are an assistant that analyzes the contents of a company's website to create a short brochure "
#     "for prospective customers, investors, and potential employees.\n\n"
#     "The input includes the landing page and a list of relevant internal pages (like About, Careers, etc.). "
#     "Each page is presented in order with a number and a type.\n\n"
#     "Ignore the numbering ([1], [2], ...) and list formatting. Focus only on the actual text content from each section.\n"
#     "Your task is to extract useful information and create a well-structured, markdown-style brochure.\n"
#     "Highlight topics like company overview, mission, values, products, customers, and career opportunities.\n"
# )



# Or uncomment the lines below for a more humorous brochure - this demonstrates how easy it is to incorporate 'tone':

system_prompt = (
    "You're a witty, humorous assistant hired to write an entertaining company brochure "
    "for people who might want to invest in, work at, or accidentally stumble upon this company.\n\n"
    "You’ve been given some numbered sections ([1], [2], etc.) representing different parts of the company website. "
    "Please ignore the numbering and use the content smartly.\n\n"
    "Your brochure should be in markdown, fun, informative, and a little cheeky. Feel free to be sarcastic, but also helpful.\n"
    "Touch on things like what the company does, who works there, what kind of vibe it gives off, and whether someone should consider joining or investing."
)


In [30]:
def get_brochure_user_prompt(company_name, url, model, tokenizer):
    """
    Generates a user prompt for the LLM to write a company brochure.
    Includes landing page and relevant internal pages (about, careers, etc.)
    """
    user_prompt = (
        f"You are looking at a company called: {company_name}\n"
        f"Here are the contents of its landing page and other relevant pages; "
        f"use this information to build a short brochure of the company in markdown format.\n\n"
    )

    # محتوای صفحه اصلی + لینک‌ها + شماره‌گذاری + متن
    user_prompt += get_all_details(url, model, tokenizer)

    # در صورت طولانی بودن، فقط ۵۰۰۰ کاراکتر اول
    user_prompt = user_prompt[:7_000]

    return user_prompt


In [20]:
get_brochure_user_prompt("HuggingFace", "https://huggingface.co", model, tokenizer)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


✅ Found links: {'links': [{'type': 'company', 'url': 'https://huggingface.co'}, {'type': 'team', 'url': 'https://huggingface.co/team'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown format.\n\n🔹 Landing Page:\nWebpage Title:\nHugging Face – The AI community building the future.\nWebpage Contents:\nHugging Face\nModels\nDatasets\nSpaces\nPosts\nDocs\nEnterprise\nPricing\nLog In\nSign Up\nNEW\nWelcome Cohere on the Hub 🔥\nWelcome Hyperbolic, Nebius AI Studio, and Novita on the Hub 🔥\nWelcome Fireworks.ai on the Hub 🎆\nThe AI community building the future.\nThe platform where the machine learning community collaborates on models, datasets, and applications.\nExplore AI Apps\nor\nBrowse 1M+ models\nTrending on\nthis week\nModels\nmicrosoft/bitnet-b1.58-2B-4T\nUpdated\n2 days ago\n•\n17.4k\n•\n659\nnari-labs/Dia-1.6B\nUpdated\nabout 2 hours ago\n•\n5.67k\n•\n315\nHiDream-ai/HiDream-I1-Full\nUpdated\nabout 13 hours ago\n•\n26.8k\n•\n685\nmicrosoft/MAI-DS-R1\nUpdated\n6 days ago\n•

In [31]:
from IPython.display import Markdown, display

def create_brochure(company_name, url, model, tokenizer, system_prompt):
    # ساخت user prompt همراه با محتوا و لینک‌ها
    user_prompt = get_brochure_user_prompt(company_name, url, model, tokenizer)

    # ساخت ساختار messageها (system + user)
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]

    # تبدیل messageها به prompt کامل برای مدل
    prompt = format_messages_as_prompt(messages, tokenizer)

    # گرفتن خروجی از مدل
    result = generate_response(prompt, model, tokenizer)

    # نمایش خروجی به صورت markdown
    display(Markdown(result))


In [22]:
create_brochure(
    company_name="Hugging Face",
    url="https://huggingface.co",
    model=model,
    tokenizer=tokenizer,
    system_prompt=system_prompt  # رسمی یا طنز، هر کدوم که انتخاب کرده باشی
)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


❌ JSON block not found in model output.


# Welcome to Hugging Face - Building a Community for Artificial Intelligence

**What We Do**
================

At Hugging Face, we're creating a collaborative space for artificial intelligence enthusiasts and professionals alike to come together, experiment, and innovate. Our mission is to make it easier for anyone to harness the power of AI by providing:

* A comprehensive platform for discovering, training, and deploying state-of-the-art models on various natural language processing (NLP) tasks
* Access to millions of pre-trained models, datasets, and tools for accelerating AI development
* An open-source framework built on top of popular frameworks such as PyTorch and TensorFlow

## **Who We Are**

### Company Overview

* Founded in [Year]
* Headquarters: [Location]

### Values

| Value | Description |
| --- | --- |
| Innovation | Embracing cutting-edge technology and staying ahead of the curve |
| Collaboration | Fostering communities around shared interests |
| Accessibility | Making high-quality AI capabilities accessible to everyone |

### Products & Services

#### Models

* [Link] HiDream-ai/HiDream-I1-Full
	+ A conversational AI designed for customer support chatbots
* Microsoft/NVIDIA/OpenCodeReasoning
	+ State-of-the-art reasoning models for computer vision and robotics tasks
* Future-Techs/Universal-Transformers-Dataset
	+ Large-scale transformer-based dataset for NLP tasks
* TextGeneration Inference
	+ Optimized toolkit for serving language models efficiently

#### Spaces

* Running
	+ Zero Framework: Instant customization options for generating images and videos
	+ UNO Flux: Advanced control over generated data flows
* Deep Site
	+ Generate custom visualizations with text inputs and outputs

#### Communities

* Twitter
	+ Join our community discussion about emerging trends and challenges in AI
* LinkedIn
	+ Learn from industry experts and thought leaders
* Discord
	+ Participate in real-time conversations with fellow developers and researchers

### Resources

* GitHub
	+ Explore thousands of repositories related to AI and NLP projects
* Blog
	+ Stay up-to-date with the latest news, tutorials, and research papers on AI advancements

### Get Started

To get involved with us, sign up below:

[Insert Registration Form]

Let's bridge the gap between humans and machines! Join our vibrant ecosystem today!

```
# Getting Started Section

### Contact Us
If you have questions or need assistance, please reach out to us through our contact form or social media channels.
```

## Finally - a minor improvement

With a small adjustment, we can change this so that the results stream back from OpenAI,
with the familiar typewriter animation

In [34]:
# "Character-by-Character Streaming"
import time
from IPython.display import display, Markdown, update_display

def stream_brochure(company_name, url, model, tokenizer, system_prompt, delay=0.01):
    # ساخت user prompt
    user_prompt = get_brochure_user_prompt(company_name, url, model, tokenizer)

    # ساخت messages
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt}
    ]

    # تبدیل به پرامپت متنی
    prompt = format_messages_as_prompt(messages, tokenizer)

    # گرفتن پاسخ کامل
    response = generate_response(prompt, model, tokenizer)

    # حذف بلاک‌های markdown اضافی
    response = response.replace("```", "").replace("markdown", "")

    # شبیه‌سازی استریم (نمایش تدریجی)
    display_handle = display(Markdown(""), display_id=True)
    shown = ""

    for char in response:
        shown += char
        time.sleep(delay)
        update_display(Markdown(shown), display_id=display_handle.display_id)

## "Paragraph-based Simulated Stream"
# import time
# from IPython.display import display, Markdown, update_display

# def stream_brochure(company_name, url, model, tokenizer, system_prompt, delay=0.3):
#     """
#     Generates a brochure and displays it paragraph by paragraph, simulating a live stream.
#     """
#     # ساخت user prompt
#     user_prompt = get_brochure_user_prompt(company_name, url, model, tokenizer)

#     # ساخت messages
#     messages = [
#         {"role": "system", "content": system_prompt},
#         {"role": "user", "content": user_prompt}
#     ]

#     # تبدیل به prompt نهایی
#     prompt = format_messages_as_prompt(messages, tokenizer)

#     # گرفتن پاسخ از مدل
#     response = generate_response(prompt, model, tokenizer)

#     # پاک‌سازی برای جلوگیری از نمایش کد بی‌مورد
#     response = response.replace("```", "").replace("markdown", "")

#     # تقسیم خروجی به پاراگراف (با دو خط فاصله یا یک خط خالی)
#     chunks = response.split("\n\n")

#     # نمایش اولیه
#     display_handle = display(Markdown(""), display_id=True)
#     shown = ""

#     for chunk in chunks:
#         if chunk.strip():  # اگه پاراگراف خالی نبود
#             shown += chunk.strip() + "\n\n"
#             update_display(Markdown(shown), display_id=display_handle.display_id)
#             time.sleep(delay)  # صبر کن تا حس جریان بده

In [28]:
stream_brochure(
    company_name="Hugging Face",
    url="https://huggingface.co",
    model=model,
    tokenizer=tokenizer,
    system_prompt=system_prompt  # رسمی 
)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⚠️ JSON structure found but not valid.


# Hugging Face - Building the Future of Artificial Intelligence
===========================================================

## Introduction

At Hugging Face, we're revolutionizing the field of artificial intelligence by providing a collaborative platform for developers, researchers, and data scientists to build, train, deploy, and showcase their machine learning projects. Our goal is to make it easy to integrate AI technology into various industries, including but not limited to natural language processing, computer vision, and more.

### What We Do

*   **Model Sharing**: Share high-quality, pre-trained models developed by our community members across various platforms such as Hugging Face Models, Hugging Face Spaces, GitHub, and others.
*   **Dataset Collection**: Offer a vast collection of datasets suitable for training machine learning models, covering diverse domains and techniques.
*   **Collaboration Tools**: Provide features like real-time collaboration on models, spaces, and inference endpoints enable seamless integration within teams and projects.
*   **Training and Deployment**: Enable users to easily train, deploy, and manage their models on different hardware configurations, ensuring optimal performance under varying conditions.

## Features

### Model Showcase

#### Microsoft Bitnet-B1.58-2B-4T
-------------------------

| Description | Updated |
| --- | --- |
| • A state-of-the-art language modeling model | about 2 days ago |

#### Nari Labs/Dia-1.6B
------------------------

| Description | Update History |
| --- | --- |
| • Develops conversational AI models | about 2 hours ago |

#### HiDream-Ai/HiDream-i1-Full
-------------------------------

| Description | Update History |
| --- | --- |
| • Offers highly realistic human-like dialogue generation capabilities | about 13 hours ago |

#### Shakker-Labs/FLUX.1Dev-ControlNet-Union-Pro-2.0
-----------------------------------------------------------

| Description | Update History |
| --- | --- |
| • Provides a wide range of control networks for deep learning | about 8 hours ago |

### Space Development

#### Running
--------------

| Feature | Value Proposition |
| --- | --- |
| • Supports deployment on Zero device | zero setup required |

#### InstantCharacter
--------------------

| Feature | Value Proposition |
| --- | --- |
| • Customizable character customization tools | flexible options available |

### Dataset Management

#### zwhe99/DeepMath-103K
---------------------------------

| Data Details | Updated |
| --- | --- |
| • High-quality mathematical formulas dataset | four weeks ago |

#### Anthropic/Values-In-The-Wild
--------------------------------------

| Data Details | Updated |
| --- | --- |
| • Natural Language Processing (NLP) benchmarks | one month ago |

### Collaboration

#### newcuberobot/EasyControlGiblE
---------------------------------------

| Functionality | Value Proposition |
| --- | --- |
| • Transform images into artful styles | instant results available |

### Integration

#### fireworkslab/NebulsaStudio
---------------------------------

| Tool | Value Proposition |
| --- | --- |
| • Integrate seamlessly with existing frameworks | automatic configuration possible |

### Community Engagement

#### hub.io/HF-Hub
---------------

| Activity | Value Proposition |
| --- | --- |
| • Connect with thousands of experts worldwide | open discussion forum accessible |

### Training Resources

#### huggingface/models/transformerjs
-------------------------------------

| Resource Type | Last Updated |
| --- | --- |
| • Official transformer implementation | two months ago |

**Get Started**

Join us today and start exploring how you can leverage our cutting-edge technologies to drive innovation!

[Sign up](https://www.hugging face.com/signup)

Stay updated with the latest developments:

[Twitter](https://twitter.com/@huggingface)
[LinkedIn](https://linkedin.com/company/hugging-face/)
[Discord](https://discord.gg/hf-hub)
[Zhihu](https://zhihu.cn/q_21884481)
[GitHub](https://github.com/huggingface/models/)



In [35]:
# Try changing the system prompt to the humorous version when you make the Brochure for Hugging Face:

stream_brochure(
    company_name="Hugging Face",
    url="https://huggingface.co",
    model=model,
    tokenizer=tokenizer,
    system_prompt=system_prompt # شوخ
)

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


⚠️ JSON structure found but not valid.


# Welcome to Hugging Face - Where Creativity Meets Innovation

**Home**

At Hugging Face, we're not just about AI (Artificial Intelligence). We're about creating tools that bring creativity and innovation together, making life easier for everyone involved in the tech space. Our mission is simple:

"Empowering creators to unleash their imagination"

## About Us

* **Meet our Team**: We've assembled a talented group of individuals from various backgrounds, each bringing unique skills and expertise to the table. From researchers to engineers, designers, and entrepreneurs, you'll find us passionate about pushing boundaries and exploring new frontiers.
* **A Community of Passionate Minds**: With over 50,000 registered users across platforms, our community consists of innovators, scientists, artists, writers, developers, and more. They come together to explore AI, collaborate on projects, and learn from one another.

### Spaces

*   **OpenSpace**: A collaborative environment where creatives can experiment, showcase their work, and engage with others.
*   **Product Space**: Explore products built by our community members, including apps, services, and tools designed to make working with AI easier and more accessible.
*   **Research Lab**: Stay up-to-date with the latest advancements in AI and machine learning through discussions, tutorials, and expert insights.

## Models

*   **Millions of High-Quality Models**: Browse millions of pre-trained models and fine-tune them for specific tasks, such as generating text, classifying objects, or translating languages.
*   **Model Gallery**: Discover cutting-edge techniques and state-of-the-art architectures developed by top researchers worldwide.
*   **Training Data**: Access vast amounts of training data to improve model performance and accuracy.

## Teams

*   **Hyperbolic Nebius AI Studio**: Collaborative workspace dedicated to developing high-quality AI models, experiments, and visualizations.
*   **Novita**: Focus area for innovative ideas, prototypes, and proof-of-concepts related to emerging trends in AI technology.
*   **Fireworks.ai**: Workshops focused on hands-on experimentation, prototyping, and deployment of custom-made models.

## Pricing

*   **Free Plan**: Get instant access to select features and APIs without paying anything upfront.
*   **Pro Plan**: Pay only $100/month to gain full access to premium features, compute power, and priority customer support.

## Jobs

*   **Software Engineer**: Join our engineering team to contribute to the development of cloud-based applications powered by Hugging Face's infrastructure.
*   **Data Scientist**: Unlock your analytical potential with our scalable computing resources and AI-powered tools tailored to complex problems.

## Getting Started

To start exploring how Hugging Face empowers creative minds, sign up today! Register here: <https://www.huggingface.com/get-started>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../business.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#181;">Business applications</h2>
            <span style="color:#181;">In this exercise we extended the Day 1 code to make multiple LLM calls, and generate a document.

This is perhaps the first example of Agentic AI design patterns, as we combined multiple calls to LLMs. This will feature more in Week 2, and then we will return to Agentic AI in a big way in Week 8 when we build a fully autonomous Agent solution.

Generating content in this way is one of the very most common Use Cases. As with summarization, this can be applied to any business vertical. Write marketing content, generate a product tutorial from a spec, create personalized email content, and so much more. Explore how you can apply content generation to your business, and try making yourself a proof-of-concept prototype. See what other students have done in the community-contributions folder -- so many valuable projects -- it's wild!</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../important.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#900;">Before you move to Week 2 (which is tons of fun)</h2>
            <span style="color:#900;">Please see the week1 EXERCISE notebook for your challenge for the end of week 1. This will give you some essential practice working with Frontier APIs, and prepare you well for Week 2.</span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../resources.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#f71;">A reminder on 3 useful resources</h2>
            <span style="color:#f71;">1. The resources for the course are available <a href="https://edwarddonner.com/2024/11/13/llm-engineering-resources/">here.</a><br/>
            2. I'm on LinkedIn <a href="https://www.linkedin.com/in/eddonner/">here</a> and I love connecting with people taking the course!<br/>
            3. I'm trying out X/Twitter and I'm at <a href="https://x.com/edwarddonner">@edwarddonner<a> and hoping people will teach me how it's done..  
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left;">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../thankyou.jpg" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#090;">Finally! I have a special request for you</h2>
            <span style="color:#090;">
                My editor tells me that it makes a MASSIVE difference when students rate this course on Udemy - it's one of the main ways that Udemy decides whether to show it to others. If you're able to take a minute to rate this, I'd be so very grateful! And regardless - always please reach out to me at ed@edwarddonner.com if I can help at any point.
            </span>
        </td>
    </tr>
</table>