𝘂𝘀𝗶𝗻𝗴 𝗼𝗹𝗹𝗮𝗺𝗮 𝗳𝗼𝗿 𝗰𝗿𝗲𝗮𝘁𝗶𝗻𝗴 𝗯𝗿𝗼𝗰𝗵𝘂𝗿𝗲𝘀 𝗳𝗿𝗼𝗺 𝗮 𝘄𝗲𝗯𝘀𝗶𝘁𝗲

In [2]:
# imports
# If these fail, please check you're running from an 'activated' environment with (llms) in the command prompt

import os
import requests
import json
from typing import List
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display

In [3]:
# Initialize and constants
OLLAMA_API = "http://localhost:11434/api/chat"
HEADERS = {"Content-Type": "application/json"}
MODEL = "llama3.2:1b"

In [4]:
# A class to represent a Webpage

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [5]:
webLinks = Website("https://www.apple.com/in/")
webLinks.links

['/in/',
 '/in/shop/goto/store',
 '/in/mac/',
 '/in/ipad/',
 '/in/iphone/',
 '/in/watch/',
 '/in/airpods/',
 '/in/tv-home/',
 '/in/entertainment/',
 '/in/shop/goto/buy_accessories',
 'https://support.apple.com/en-in/?cid=gn-ols-home-hp-tab',
 '/in/search',
 '/in/shop/goto/bag',
 '#footnote-1',
 '#footnote-2',
 '/in/shop/goto/store',
 '/in/iphone-16-pro/',
 '/in/iphone-16-pro/',
 '/in/shop/goto/buy_iphone/iphone_16_pro',
 '/in/iphone-16/',
 '/in/iphone-16/',
 '/in/shop/goto/buy_iphone/iphone_16',
 '/in/macbook-pro/',
 '/in/macbook-pro/',
 '/in/shop/goto/buy_mac/macbook_pro',
 '/in/apple-watch-series-10/',
 '/in/apple-watch-series-10/',
 '/in/shop/goto/buy_watch/apple_watch_series_10',
 '/in/airpods-4/',
 '/in/airpods-4/',
 '/in/shop/goto/buy_airpods/airpods_4',
 '/in/mac-mini/',
 '/in/mac-mini/',
 '/in/shop/goto/buy_mac/mac_mini',
 '/in/imac/',
 '/in/imac/',
 '/in/shop/goto/buy_mac/imac',
 '/in/ipad-mini/',
 '/in/ipad-mini/',
 '/in/shop/goto/buy_ipad/ipad_mini',
 '/in/shop/goto/trade_in

In [6]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in only JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [7]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in only JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [8]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [9]:
print(get_links_user_prompt(webLinks))

Here is the list of links on the website of https://www.apple.com/in/ - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
/in/
/in/shop/goto/store
/in/mac/
/in/ipad/
/in/iphone/
/in/watch/
/in/airpods/
/in/tv-home/
/in/entertainment/
/in/shop/goto/buy_accessories
https://support.apple.com/en-in/?cid=gn-ols-home-hp-tab
/in/search
/in/shop/goto/bag
#footnote-1
#footnote-2
/in/shop/goto/store
/in/iphone-16-pro/
/in/iphone-16-pro/
/in/shop/goto/buy_iphone/iphone_16_pro
/in/iphone-16/
/in/iphone-16/
/in/shop/goto/buy_iphone/iphone_16
/in/macbook-pro/
/in/macbook-pro/
/in/shop/goto/buy_mac/macbook_pro
/in/apple-watch-series-10/
/in/apple-watch-series-10/
/in/shop/goto/buy_watch/apple_watch_series_10
/in/airpods-4/
/in/airpods-4/
/in/shop/goto/buy_airpods/airpods_4
/in/mac-mini/
/in/mac-mini/
/in/shop/goto/buy_mac/mac_

In [10]:
import ollama

def get_links(url):
    website = Website(url)
    response = ollama.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ]
    )
    result = response['message']['content']
    return json.loads(result)

In [11]:
apple = Website("https://www.apple.com/in/")
apple.links

['/in/',
 '/in/shop/goto/store',
 '/in/mac/',
 '/in/ipad/',
 '/in/iphone/',
 '/in/watch/',
 '/in/airpods/',
 '/in/tv-home/',
 '/in/entertainment/',
 '/in/shop/goto/buy_accessories',
 'https://support.apple.com/en-in/?cid=gn-ols-home-hp-tab',
 '/in/search',
 '/in/shop/goto/bag',
 '#footnote-1',
 '#footnote-2',
 '/in/shop/goto/store',
 '/in/iphone-16-pro/',
 '/in/iphone-16-pro/',
 '/in/shop/goto/buy_iphone/iphone_16_pro',
 '/in/iphone-16/',
 '/in/iphone-16/',
 '/in/shop/goto/buy_iphone/iphone_16',
 '/in/macbook-pro/',
 '/in/macbook-pro/',
 '/in/shop/goto/buy_mac/macbook_pro',
 '/in/apple-watch-series-10/',
 '/in/apple-watch-series-10/',
 '/in/shop/goto/buy_watch/apple_watch_series_10',
 '/in/airpods-4/',
 '/in/airpods-4/',
 '/in/shop/goto/buy_airpods/airpods_4',
 '/in/mac-mini/',
 '/in/mac-mini/',
 '/in/shop/goto/buy_mac/mac_mini',
 '/in/imac/',
 '/in/imac/',
 '/in/shop/goto/buy_mac/imac',
 '/in/ipad-mini/',
 '/in/ipad-mini/',
 '/in/shop/goto/buy_ipad/ipad_mini',
 '/in/shop/goto/trade_in

In [12]:
get_links("https://www.apple.com/in/")

{'links': [{'type': 'about page', 'url': 'https://www.apple.com/in/about'},
  {'type': 'careers page',
   'url': 'https://support.apple.com/en-in/?cid=gn-ols-home-hp-tab'},
  {'type': 'shop page', 'url': 'https://support.apple.com/en-in/'},
  {'type': 'buy accessories page',
   'url': 'https://www.apple.com/in/shop/goto/buy_accessories'}]}

In [15]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [16]:
print(get_all_details("https://www.apple.com/in/"))

Found links: {'links': [{'type': 'about page', 'url': 'https://www.apple.com/in/about'}, {'type': 'careers page', 'url': 'https://www.apple.com/in/careers'}, {'type': 'shop page', 'url': 'https://www.apple.com/in/shop'}, {'type': 'buy accessibly', 'url': 'https://www.apple.com/in/accessibility'}]}
Landing page:
Webpage Title:
Apple (India)
Webpage Contents:
Apple
Apple
Store
Mac
iPad
iPhone
Watch
AirPods
TV & Home
Entertainment
Accessories
Support
0
+
Get up to ₹10000.00 instant cashback with eligible cards.
*
Plus up to 12 months of No Cost EMI.
‡
Shop
iPhone 16 Pro
Built for Apple Intelligence.
Learn more
Buy
Apple Intelligence available now in US English
1
iPhone 16
Built for Apple Intelligence.
Learn more
Buy
Apple Intelligence available now in US English
1
MacBook Pro
A work of smart.
Learn more
Buy
Built for Apple Intelligence.
Available now in US English
1
Apple Watch Series 10
Thinstant classic.
Learn more
Buy
AirPods 4
Iconic. Now supersonic. Available with Active Noise Cancel

In [17]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

In [18]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:20_000] # Truncate if more than 20,000 characters
    return user_prompt

In [20]:
get_brochure_user_prompt("Apple", "https://www.apple.com/in/")

Found links: {'links': [{'type': 'about page', 'url': 'https://www.apple.com/in/about'}, {'type': 'careers page', 'url': 'https://jobs.apple.com/'}, {'type': 'shop page', 'url': 'https://www.apple.com/in/shop'}, {'type': 'macintosh page', 'url': 'https://www.apple.com/mac'}]}


'You are looking at a company called: Apple\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nApple (India)\nWebpage Contents:\nApple\nApple\nStore\nMac\niPad\niPhone\nWatch\nAirPods\nTV & Home\nEntertainment\nAccessories\nSupport\n0\n+\nGet up to ₹10000.00 instant cashback with eligible cards.\n*\nPlus up to 12\xa0months of No\xa0Cost\xa0EMI.\n‡\nShop\niPhone 16 Pro\nBuilt for Apple\xa0Intelligence.\nLearn more\nBuy\nApple Intelligence available now in US English\n1\niPhone 16\nBuilt for Apple\xa0Intelligence.\nLearn more\nBuy\nApple Intelligence available now in US English\n1\nMacBook Pro\nA work of smart.\nLearn more\nBuy\nBuilt for Apple Intelligence.\nAvailable now in US English\n1\nApple Watch Series 10\nThinstant classic.\nLearn more\nBuy\nAirPods 4\nIconic. Now supersonic. Available\xa0with\xa0Active Noise Cancellation.\n2\nLearn more\nBuy\nMac mini\nSiz

In [21]:
def create_brochure(company_name, url):
    response = ollama.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
    )
    result = response['message']['content']
    display(Markdown(result))

In [23]:
create_brochure("Microsoft" , "https://careers.microsoft.com/v2/global/en/home.html")

Found links: {'links': [{'type': 'about page', 'url': 'https://careers.microsoft.com/v2/global/en/home'}, {'type': 'careers page', 'url': 'https://careers.microsoft.com/v2/global/en/careers'}]}


# Microsoft: Empowering Innovation and Excellence

### Our Story

Microsoft is a technology company that has been shaping the world since 1975. With a diverse team of over 170,000 people from more than 190 countries, we are committed to delivering innovative solutions that make a meaningful impact on society.

## **Why Choose Microsoft?**

- **Impact Matters**: We believe that our work should have a positive impact on people's lives. That's why we focus on creating products and services that solve real-world problems.
- **Growth Mindset**: Our culture is built around the idea of continuous learning, innovation, and growth. We encourage our employees to think creatively, take risks, and learn from their mistakes.
- **Diversity and Inclusion**: At Microsoft, we celebrate diversity in all its forms. From our products and services to our workplace culture, we strive to create an environment where everyone can thrive.

## **Our Benefits**

We're committed to helping you live well. Take a look at our benefits package:

* Explore Microsoft's world-class benefits designed to help you and your family live well: [https://careers.microsoft.com/v2/global/en/benefits](https://careers.microsoft.com/v2/global/en/benefits)
* Discover how we care about your well-being, from flexible work arrangements to mental health support: [https://careers.microsoft.com/v2/global/en/culture](https://careers.microsoft.com/v2/global/en/culture)
* Learn more about our benefits for you and your family, including parental leave, dependent care assistance, and more.

## **Culture**

At Microsoft, we believe that culture is everything. We're a company of innovators, thinkers, and doers who are passionate about creating solutions that make a difference in the world.

* Discover how we celebrate diversity and inclusion: [https://careers.microsoft.com/v2/global/en/diversityandinclusion](https://careers.microsoft.com/v2/global/en/diversityandinclusion)
* Learn more about our flexible work arrangements, including remote work options and flexible hours.
* Read about the company culture on our "About" page.

### Flexible Work

At Microsoft, we value flexibility as part of our hybrid workplace. That means you can work from anywhere, at any time that's most productive for you.

* Learn more about our flexible work arrangements, including remote work options and flexible hours.
* Discover how we empower our employees to do their best work in a way that suits them best.

## # Careers

We're always on the lookout for talented individuals who share our passion for innovation and excellence. If you're interested in joining the Microsoft team, explore our career opportunities:

* Visit our careers page: [https://careers.microsoft.com/v2/global/en](https://careers.microsoft.com/v2/global/en)
* Take a look at our job openings on our "About" page.
* Read more about what we do and who we are on our "Why Microsoft" page.

𝗦𝘁𝗿𝗲𝗮𝗺𝗶𝗻𝗴 𝗿𝗲𝘀𝗽𝗼𝗻𝘀𝗲 𝘂𝘀𝗶𝗻𝗴 𝗼𝗹𝗹𝗮𝗺𝗮

In [24]:
def stream_brochure(company_name, url):
    stream = ollama.chat(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk['message']['content'] or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [26]:
stream_brochure("Microsoft","https://careers.microsoft.com/v2/global/en/home.html")

Found links: {'links': [{'type': 'about page', 'url': 'https://careers.microsoft.com/v2/global/en/about'}, {'type': 'career homepage', 'url': 'https://careers.microsoft.com/v2/global/en/career-homepage'}, {'type': 'job openings', 'url': 'https://careers.microsoft.com/v2/global/en/jobs'}]}


# Microsoft: A Leader in Technology and Culture

## About Microsoft
### Our Story

At Microsoft, we're on a mission to empower every person and organization on the planet through the power of technology. With a rich history dating back to 1975, our journey has been shaped by innovators who have dared to dream big.

### Our Culture

We believe that everyone deserves equal opportunities to succeed, which is why we're committed to creating an inclusive culture that values diversity, equity, and inclusion. Here are just a few examples of what life at Microsoft looks like:

* **Growth Mindset**: We encourage you to take risks, learn from your mistakes, and strive for continuous improvement.
* **Diversity and Inclusion**: We celebrate the unique perspectives and experiences that our diverse team brings to the table.
* **Flexible Work**: With a hybrid workplace, you'll have the freedom to work in the way that best suits you.

### Benefits

Take a look at what we're offering:

* **World-class benefits designed to help you live well**:
	+ Explore our benefits package on our careers website: [careers.microsoft.com/v2/global/en/benefits](https://careers.microsoft.com/v2/global/en/benefits)
* **Culture**: Learn more about our values and what it means to work with us: https://careers.microsoft.com/v2/global/en/culture
* **Diversity and Inclusion**: Discover how we're committed to creating a workplace that's inclusive and welcoming: https://careers.microsoft.com/v2/global/en/diversityandinclusion
* **Flexible Work**: Learn about our flexible work arrangements and how you can thrive in our hybrid workplace: https://careers.microsoft.com/v2/global/en/flexible-work

## Diversity and Inclusion at Microsoft

We're passionate about celebrating the diversity that makes us stronger. Here's what we're working on:

* **Celebrating Diverse Perspectives**: Our diverse team brings unique experiences and perspectives to the table, which informs our products, services, and company culture.
* **Inclusive Hiring Process**: We make sure that every candidate has an equal chance to succeed in their application process.

### Flexible Work

At Microsoft, we value flexibility as part of our hybrid workplace. Here's what you can expect:

* **Flexible work arrangements**: Learn more about our flexible work options on our careers website: [careers.microsoft.com/v2/global/en/flexible-work](https://careers.microsoft.com/v2/global/en/flexible-work)
* **Work-Life Balance**: With the flexibility to create your own schedule, you'll have the freedom to balance your work and personal life.

### Not Found? Visit Our Careers Website

We're constantly growing and innovating. If you can't find what we're looking for on our main careers page, check out these alternative routes:

* **Home**: Take a tour of our global headquarters: https://careers.microsoft.com/v2/global/en/home.html
* **Asset Manifest JSON**: Download the latest asset manifest to learn more about our company culture and values: https://careers.microsoft.com/support/asset-manifest.json