In [1]:
!pip show ollama

Name: ollama
Version: 0.4.7
Summary: The official Python client for Ollama.
Home-page: https://ollama.com
Author: Ollama
Author-email: hello@ollama.com
License: MIT
Location: /Users/nguyenthithu/anaconda3/envs/llms/lib/python3.11/site-packages
Requires: httpx, pydantic
Required-by: 


In [2]:
# imports
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
import ollama

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

MODEL = "llama3.2"

## Define class Website to crawl contents from webpage with given URL
- Use `requests` to send HTTP requests and retrieve webpage content in Python. It allows you to fetch HTML pages and interact with web APIs. Instead of `requests`, we can use `selenium` to handle JavaScripts-rendered contents.
- Use `BeautifulSoup` to parse HTML and XML documents. It helps extract and navigate data from web pages.
  

In [3]:
!which chromedriver

/opt/homebrew/bin/chromedriver


In [4]:
# A class to represent a Webpage
# If you're not familiar with Classes, check out the "Intermediate Python" notebook

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        options = Options()
        options.add_argument("--headless")
        options.add_argument("--disable-gpu")
        options.add_argument("--no-sandbox")
        options.add_argument(f"user-agent={headers['User-Agent']}")
        
        service = Service("/opt/homebrew/bin/chromedriver")  # Update with actual path
        driver = webdriver.Chrome(service=service, options=options)
        driver.get(url)
        
        try:
            WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.TAG_NAME, "body"))
            )
            soup = BeautifulSoup(driver.page_source, 'html.parser')
            self.title = soup.title.string if soup.title else "No title found"
            
            if soup.body:
                for irrelevant in soup.body(["script", "style", "img", "input"]):
                    irrelevant.decompose()
                self.text = soup.body.get_text(separator="\n", strip=True)
            else:
                self.text = "No content found"
        finally:
            driver.quit()

## Types of prompts

You may know this already - but if not, you will get very familiar with it!

Models like GPT4o have been trained to receive instructions in a particular way.

They expect to receive:

**A system prompt** that tells them what task they are performing and what tone they should use

**A user prompt** -- the conversation starter that they should reply to

In [5]:
# Define our system prompt 
system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown in Vietnamese."

In [6]:
# A function that writes a User Prompt that asks for summaries of websites:
def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

## Messages

The API from OpenAI expects to receive messages in a particular structure.
Many of the other APIs share this structure:

```
[
    {"role": "system", "content": "system message goes here"},
    {"role": "user", "content": "user message goes here"}
]

To give you a preview, the next 2 cells make a rather simple call - we won't stretch the mighty GPT (yet!)

In [7]:
messages = [
    {"role": "system", "content": "You are a snarky assistant"},
    {"role": "user", "content": "What is 2 + 2?"}
]

## And now let's build useful messages for GPT-4o-mini, using a function

In [8]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

## Time to bring it together - the API for OpenAI is very simple!

In [9]:
# And now: call the OpenAI API. You will get very familiar with this!

def summarize(url):
    website = Website(url)
    messages = messages_for(website)
    response = ollama.chat(model=MODEL, messages=messages)
    return response['message']['content']

In [10]:
summarize("https://edwarddonner.com")



In [11]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [12]:
display_summary("https://edwarddonner.com")

**Tóm tắt trang web Home - Edward Donner**
=====================================

*   Trang web này thuộc về chủ đề AI (Kỹ thuật trí tuệ nhân tạo) và LLM (Loạt ngôn ngữ nhân tạo).
*   Chủ sở hữu trang web là Ed, người đồng sáng lập và CTO của Nebula.io.
*   Tại trang web, Ed chia sẻ về công việc cũng như các dự án khác nhau liên quan đến AI.

**Tin tức và thông báo**
-------------------------

*   January 23, 2025: LLM Workshop – Hands-on with Agents – resources
*   December 21, 2024: Welcome, SuperDataScientists!
*   November 13, 2024: Mastering AI and LLM Engineering – Resources
*   October 16, 2024: From Software Engineer to AI Data Scientist – resources

In [13]:
display_summary("https://www.tradingview.com/markets/stocks-usa/market-movers-gainers/")


It appears that the provided text is not a traditional financial news article or stock market report, but rather a snapshot of a TradingView platform, which is a social trading and investing platform.

The text shows various metrics and data related to different stocks, ETFs, and other securities, including:

1. Stock prices and charts
2. Volume and trading activity
3. Technical indicators (e.g., RSI, Bollinger Bands)
4. Market data and economic calendars
5. News and earnings announcements

The text also includes various user-generated content, such as:

1. Screenshots of trades and positions
2. Comments and messages from other users
3. Links to TradingView ideas and scripts

Overall, the text provides a snapshot of what's happening in the financial markets, but it's not a traditional news article or report.

If you're looking for stock market news or analysis, I'd be happy to help you with that!

## An extra exercise for those who enjoy web scraping

You may notice that if you try `display_summary("https://openai.com")` - it doesn't work! That's because OpenAI has a fancy website that uses Javascript. There are many ways around this that some of you might be familiar with. For example, Selenium is a hugely popular framework that runs a browser behind the scenes, renders the page, and allows you to query it. If you have experience with Selenium, Playwright or similar, then feel free to improve the Website class to use them. In the community-contributions folder, you'll find an example Selenium solution from a student (thank you!)