Website Brochure 

In [5]:
import os
import requests
import json
from typing import List
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display, update_display
from openai import OpenAI

In [6]:
load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

if api_key and api_key.startswith('sk-proj-') and len(api_key)>10:
    print("API key looks good so far")
else:
    print("There might be a problem with your API key? Please visit the troubleshooting notebook!")
    
MODEL = 'gpt-4o-mini'
openai = OpenAI()

API key looks good so far


In [7]:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:
    """
    A utility class to represent a Website that we have scraped, now with links
    """

    def __init__(self, url):
        self.url = url
        response = requests.get(url, headers=headers)
        self.body = response.content
        soup = BeautifulSoup(self.body, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        if soup.body:
            for irrelevant in soup.body(["script", "style", "img", "input"]):
                irrelevant.decompose()
            self.text = soup.body.get_text(separator="\n", strip=True)
        else:
            self.text = ""
        links = [link.get('href') for link in soup.find_all('a')]
        self.links = [link for link in links if link]

    def get_contents(self):
        return f"Webpage Title:\n{self.title}\nWebpage Contents:\n{self.text}\n\n"

In [8]:
ed = Website("https://fullhomegardening.com")
ed.links

['#content',
 'https://www.facebook.com/fullhome.gardening',
 'https://www.instagram.com/fullhomegardening/',
 'https://www.pinterest.ca/fullhomegardening/',
 'https://twitter.com/full_gardening',
 'https://fullhomegardening.com/',
 '#',
 'http://fullhomegardening.com/',
 'https://fullhomegardening.com/category/home-gardening/',
 'https://fullhomegardening.com/category/home-gardening-2/',
 'https://fullhomegardening.com/about/',
 'https://fullhomegardening.com/contact/',
 '#',
 'https://fullhomegardening.com/joseph-joseph-compost-bin-review/',
 'https://fullhomegardening.com/author/team-russlobogmail-com/',
 'https://fullhomegardening.com/joseph-joseph-compost-bin-review/',
 'https://fullhomegardening.com/joseph-joseph-compost-bin-review/',
 'https://fullhomegardening.com/joseph-joseph-compost-bin-review/#respond',
 'https://fullhomegardening.com/kohler-compost-bin-review/',
 'https://fullhomegardening.com/author/team-russlobogmail-com/',
 'https://fullhomegardening.com/kohler-compost-

In [9]:
link_system_prompt = "You are provided with a list of links found on a webpage. \
You are able to decide which of the links would be most relevant to include in a brochure about the company, \
such as links to an About page, or a Company page, or Careers/Jobs pages.\n"
link_system_prompt += "You should respond in JSON as in this example:"
link_system_prompt += """
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}
"""

In [10]:
print(link_system_prompt)

You are provided with a list of links found on a webpage. You are able to decide which of the links would be most relevant to include in a brochure about the company, such as links to an About page, or a Company page, or Careers/Jobs pages.
You should respond in JSON as in this example:
{
    "links": [
        {"type": "about page", "url": "https://full.url/goes/here/about"},
        {"type": "careers page": "url": "https://another.full.url/careers"}
    ]
}



In [11]:
def get_links_user_prompt(website):
    user_prompt = f"Here is the list of links on the website of {website.url} - "
    user_prompt += "please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. \
Do not include Terms of Service, Privacy, email links.\n"
    user_prompt += "Links (some might be relative links):\n"
    user_prompt += "\n".join(website.links)
    return user_prompt

In [12]:
print(get_links_user_prompt(ed))

Here is the list of links on the website of https://fullhomegardening.com - please decide which of these are relevant web links for a brochure about the company, respond with the full https URL in JSON format. Do not include Terms of Service, Privacy, email links.
Links (some might be relative links):
#content
https://www.facebook.com/fullhome.gardening
https://www.instagram.com/fullhomegardening/
https://www.pinterest.ca/fullhomegardening/
https://twitter.com/full_gardening
https://fullhomegardening.com/
#
http://fullhomegardening.com/
https://fullhomegardening.com/category/home-gardening/
https://fullhomegardening.com/category/home-gardening-2/
https://fullhomegardening.com/about/
https://fullhomegardening.com/contact/
#
https://fullhomegardening.com/joseph-joseph-compost-bin-review/
https://fullhomegardening.com/author/team-russlobogmail-com/
https://fullhomegardening.com/joseph-joseph-compost-bin-review/
https://fullhomegardening.com/joseph-joseph-compost-bin-review/
https://fullho

In [13]:
def get_links(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": link_system_prompt},
            {"role": "user", "content": get_links_user_prompt(website)}
      ],
        response_format={"type": "json_object"}
    )
    result = response.choices[0].message.content
    return json.loads(result)

In [14]:
huggingface = Website("https://fullhomegardening.com")
huggingface.links

['#content',
 'https://www.facebook.com/fullhome.gardening',
 'https://www.instagram.com/fullhomegardening/',
 'https://www.pinterest.ca/fullhomegardening/',
 'https://twitter.com/full_gardening',
 'https://fullhomegardening.com/',
 '#',
 'http://fullhomegardening.com/',
 'https://fullhomegardening.com/category/home-gardening/',
 'https://fullhomegardening.com/category/home-gardening-2/',
 'https://fullhomegardening.com/about/',
 'https://fullhomegardening.com/contact/',
 '#',
 'https://fullhomegardening.com/joseph-joseph-compost-bin-review/',
 'https://fullhomegardening.com/author/team-russlobogmail-com/',
 'https://fullhomegardening.com/joseph-joseph-compost-bin-review/',
 'https://fullhomegardening.com/joseph-joseph-compost-bin-review/',
 'https://fullhomegardening.com/joseph-joseph-compost-bin-review/#respond',
 'https://fullhomegardening.com/kohler-compost-bin-review/',
 'https://fullhomegardening.com/author/team-russlobogmail-com/',
 'https://fullhomegardening.com/kohler-compost-

In [15]:
get_links("https://fullhomegardening.com")

{'links': [{'type': 'about page',
   'url': 'https://fullhomegardening.com/about/'},
  {'type': 'contact page', 'url': 'https://fullhomegardening.com/contact/'}]}

In [16]:
def get_all_details(url):
    result = "Landing page:\n"
    result += Website(url).get_contents()
    links = get_links(url)
    print("Found links:", links)
    for link in links["links"]:
        result += f"\n\n{link['type']}\n"
        result += Website(link["url"]).get_contents()
    return result

In [17]:
print(get_all_details("https://fullhomegardening.com"))

Found links: {'links': [{'type': 'about page', 'url': 'https://fullhomegardening.com/about/'}, {'type': 'contact page', 'url': 'https://fullhomegardening.com/contact/'}]}
Landing page:
Webpage Title:
Full Home Gardening - Your Guide to the World of Home Gardening
Webpage Contents:
Skip to content
Full Home Gardening
Menu
Home
Gardening
Guide
About
Contact
Joseph Joseph compost bin review
by
MehrishKK
This compost bin is uniquely designed and looks fantastic to …
Read more
Leave a comment
Kohler compost bin review
by
MehrishKK
This compost bin is crafted with an innovative design and …
Read more
Leave a comment
Simplehuman compost bucket review
by
MehrishKK
This compost bin looks amazing. It is definitely constructed for a modern …
Read more
Leave a comment
THIRD ROCK COMPOST BUCKET REVIEW
by
MehrishKK
This compost bucket has a timeless design and it looks extremely …
Read more
Leave a comment
MODUODUO Composting Bin Review
by
MehrishKK
This is a double-barrel Bin that is designed for c

In [18]:
system_prompt = "You are an assistant that analyzes the contents of several relevant pages from a company website \
and creates a short brochure about the company for prospective customers, investors and recruits. Respond in markdown.\
Include details of company culture, customers and careers/jobs if you have the information."

In [19]:
def get_brochure_user_prompt(company_name, url):
    user_prompt = f"You are looking at a company called: {company_name}\n"
    user_prompt += f"Here are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\n"
    user_prompt += get_all_details(url)
    user_prompt = user_prompt[:5_000] # Truncate if more than 5,000 characters
    return user_prompt

In [20]:
get_brochure_user_prompt("Full Home Gardening", "https://fullhomegardening.com")

Found links: {'links': [{'type': 'about page', 'url': 'https://fullhomegardening.com/about/'}, {'type': 'contact page', 'url': 'https://fullhomegardening.com/contact/'}]}


'You are looking at a company called: Full Home Gardening\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nFull Home Gardening - Your Guide to the World of Home Gardening\nWebpage Contents:\nSkip to content\nFull Home Gardening\nMenu\nHome\nGardening\nGuide\nAbout\nContact\nJoseph\xa0Joseph\xa0compost bin review\nby\nMehrishKK\nThis compost bin is uniquely designed and looks fantastic to …\nRead more\nLeave a comment\nKohler compost bin review\nby\nMehrishKK\nThis compost bin is crafted with an innovative design and …\nRead more\nLeave a comment\nSimplehuman\xa0compost bucket review\nby\nMehrishKK\nThis compost bin looks amazing. It is\xa0definitely constructed\xa0for a modern …\nRead more\nLeave a comment\nTHIRD ROCK COMPOST BUCKET REVIEW\nby\nMehrishKK\nThis compost bucket has a timeless\xa0design and it looks extremely …\nRead more\nLeave a comment\nMODUODUO

In [21]:
get_brochure_user_prompt("HuggingFace", "https://fullhomegardening.com")

Found links: {'links': [{'type': 'about page', 'url': 'https://fullhomegardening.com/about/'}, {'type': 'contact page', 'url': 'https://fullhomegardening.com/contact/'}]}


'You are looking at a company called: HuggingFace\nHere are the contents of its landing page and other relevant pages; use this information to build a short brochure of the company in markdown.\nLanding page:\nWebpage Title:\nFull Home Gardening - Your Guide to the World of Home Gardening\nWebpage Contents:\nSkip to content\nFull Home Gardening\nMenu\nHome\nGardening\nGuide\nAbout\nContact\nJoseph\xa0Joseph\xa0compost bin review\nby\nMehrishKK\nThis compost bin is uniquely designed and looks fantastic to …\nRead more\nLeave a comment\nKohler compost bin review\nby\nMehrishKK\nThis compost bin is crafted with an innovative design and …\nRead more\nLeave a comment\nSimplehuman\xa0compost bucket review\nby\nMehrishKK\nThis compost bin looks amazing. It is\xa0definitely constructed\xa0for a modern …\nRead more\nLeave a comment\nTHIRD ROCK COMPOST BUCKET REVIEW\nby\nMehrishKK\nThis compost bucket has a timeless\xa0design and it looks extremely …\nRead more\nLeave a comment\nMODUODUO Compost

In [22]:
def stream_brochure(company_name, url):
    stream = openai.chat.completions.create(
        model=MODEL,
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": get_brochure_user_prompt(company_name, url)}
          ],
        stream=True
    )
    
    response = ""
    display_handle = display(Markdown(""), display_id=True)
    for chunk in stream:
        response += chunk.choices[0].delta.content or ''
        response = response.replace("```","").replace("markdown", "")
        update_display(Markdown(response), display_id=display_handle.display_id)

In [23]:
stream_brochure("SquareYards", "https://squareyards.com")

Found links: {'links': [{'type': 'homepage', 'url': 'https://www.squareyards.com/'}, {'type': 'about page', 'url': 'https://www.squareyards.com/aboutus'}, {'type': 'careers page', 'url': 'https://www.squareyards.com/career'}, {'type': 'real estate services', 'url': 'https://www.squareyards.com/real-estate-services'}, {'type': 'contact page', 'url': 'https://www.squareyards.com/contactus'}]}


# Square Yards Brochure

---

## **Welcome to Square Yards!**

### **Real Estate Made Real Easy**

Square Yards is a leading proptech company revolutionizing the real estate landscape in India. With a robust network across numerous cities and a comprehensive suite of services, we are dedicated to simplifying the property buying, selling, and renting process for everyone.

### **Our Services**
At Square Yards, we offer a wide range of services to meet the needs of our clients:
- **Property Transactions**: Facilitating hassle-free buying, selling, and renting processes.
- **Financing Solutions**: Assistance with securing property financing tailored to your needs.
- **Interior Design Services**: Transform your property with our professional interior design offerings.
- **Property Management**: Comprehensive management services to actively maintain your assets.
- **Site Visits**: Personalized tours of available properties.

### **The Square Yards Advantage**
- **Nationwide Presence**: Access to properties in cities across India—from major metropolitan areas to emerging towns.
- **Diverse Offerings**: Residential and commercial properties, including apartments, plots, villas, and office spaces.
- **User-Friendly Platform**: Easily search, shortlist, and manage property options through our online portal.

---

## **Company Culture**
At Square Yards, our culture is built on innovation, integrity, and customer-centricity. We embrace diversity and foster an inclusive environment where every team member is valued and given the opportunity to grow. Our commitment to transparency and quality ensures that we prioritize the needs and satisfaction of our clients above everything else.

### **Our Team**
Our dedicated professionals leverage cutting-edge data intelligence to understand market trends and empower our customers with informed decisions. Whether you're a seasoned real estate expert or new to the sector, you'll find that Square Yards supports your career growth and aspirations.

---

## **Join Us!**
Looking for an exciting career in the dynamic world of real estate? Square Yards is always on the lookout for talented individuals who share our vision. We provide a platform for creativity, growth, and the chance to work with top industry professionals.

### **Current Openings**
- Real Estate Consultants
- Financial Analysts
- Marketing Professionals
- IT and Data Analysts

*Explore a career where your skills can shine and make a difference!*

---

## **Our Customers**
Square Yards caters to a diverse clientele ranging from first-time home buyers to experienced investors. With personalized services tailored to fit every customer's unique needs, we strive to create a seamless experience, ensuring our clients find their perfect property.

---

### **Get in Touch**
Discover more about Square Yards by visiting our website or contacting us directly. Join the property revolution today!

**Website**: [Square Yards](https://www.squareyards.com)  
**Email**: contact@squareyards.com  
**Phone**: 1800 123 4567

--- 

*Square Yards - Your trusted partner in real estate!*
