# WebPage Summarizer through AI data-scrapinng

You can use OLLAMA open source model as well as Frontier Model API..

You'll be able to use this technique for all subsequent projects if you'd prefer not to use paid APIs.



## Installation of llama3.2

Simply visit [ollama.com](https://ollama.com) and install!

After installig ollama... (make sure its turned on)

In you cmd/powershell/bash 
run: ollama run llama3.2

Once complete, the ollama server should already be running locally.  
If you visit:  
[http://localhost:11434/](http://localhost:11434/)

You should see the message `Ollama is running`.  

If not, bring up a new Terminal (Mac) or Powershell (Windows) and enter `ollama serve`  
And in another Terminal (Mac) or Powershell (Windows), enter `ollama pull llama3.2`  
Then try [http://localhost:11434/](http://localhost:11434/) again.

If Ollama is slow on your machine, try using `llama3.2:1b` as an alternative. Run `ollama pull llama3.2:1b` from a Terminal or Powershell, and change the code below from `MODEL = "llama3.2"` to `MODEL = "llama3.2:1b"`

# Python Virtual Environment Setup running this AI-based webscrapper


## Installation Steps

### Step 1: Create Your Virtual Environment

**For Windows:**
Open Command Prompt or PowerShell in your project directory and run:
```bash
python -m venv venv
venv\Scripts\activate
```

**For Mac/Linux:**
Open Terminal in your project directory and run:
```bash
python3 -m venv venv
source venv/bin/activate
```

Once activated, your command prompt should show `(venv)` at the beginning.

### Step 2: Install Required Packages

Try installing all packages at once first:
```bash
pip install requests python-dotenv beautifulsoup4 ipython openai
```

**If the combined installation doesn't work**, install them one by one:
```bash
pip install requests
pip install python-dotenv
pip install beautifulsoup4
pip install ipython
pip install openai
```

Once complete, you should be able to run these imports without any errors:
```python
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI
```

### Step 3: Verify Everything Works

Create a test file or run this in your notebook:
```python
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

print("✅ All imports successful!")
print("✅ Virtual environment is ready!")
```

## What Each Package Does

- **requests** - Makes HTTP requests to websites and APIs
- **python-dotenv** - Loads your API keys from .env files safely
- **beautifulsoup4** - Parses HTML content from web pages
- **ipython** - Enhanced Python shell with display utilities
- **openai** - Official OpenAI API client


In [None]:
import os
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

In [None]:
#constants

OLLAMA_API = "http://localhost:11434/api/chat"
MODEL_GPT = 'gpt-4o-mini'
MODEL_LLAMA = 'llama3.2'

In [None]:
# Run this if you have OPENAI key othervise skip it. 
# Environment variables in a file called .env 
# If you dont have then:
# In the folder directory create .env file and add OPENAI_API_KEY= sk-proj-......

load_dotenv(override=True)
api_key = os.getenv('OPENAI_API_KEY')

# Check the key

if not api_key:
    print("No API key was found - please head over to the troubleshooting notebook in this folder to identify & fix!")
elif not api_key.startswith("sk-proj-"):
    print("An API key was found, but it doesn't start sk-proj-; please check you're using the right key - see troubleshooting notebook")
elif api_key.strip() != api_key:
    print("An API key was found, but it looks like it might have space or tab characters at the start or end - please remove them - see troubleshooting notebook")
else:
    print("API key found and looks good so far!")

In [None]:
#IF you have open API key run
openai = OpenAI() 

In [None]:
#If you have local LLama3.2 then run
ollama_via_openai = OpenAI(base_url='http://localhost:11434/v1', api_key=MODEL_LLAMA)


In [None]:
# A class to represent a Webpage
# If you're not familiar with Classes, check out the "Intermediate Python" notebook

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website():
    def __init__(self,url):
        self.url = url
        response = requests.get(url, headers=headers)
        soup= BeautifulSoup(response.content, 'html.parser')
        self.title= soup.title.string if soup.title else "No Title FOund!"
        for irrelevant in soup.body(['script','style','img','input']):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)


        


    

In [None]:
# This is the system prompt which tells how the model should response like i.e tone, grit, technicality
# You can customize it too
system_prompt = f"You need to summarise the website in a witty and humorous way!"


# This is the user prompt you can customize it too
def user_prompt(website):
    user_prompt= f"This is the title of this website {website.title}"
    user_prompt+=f"\nYou have to return the summary of the following website. \
    If there are headings, do mention them properly and cleany display information in bullets,\
    numbering or simple paragraph as per the nature of content."
    user_prompt+=website.text
    return user_prompt


    

In [None]:
def message_for(website):
    return [ {"role":"system", "content": system_prompt},
             {"role":"user", "content": user_prompt(website)}]

In [None]:
# Run if you have Open_AI key

def openai_response(url):
    website = Website(url)
    response = openai.chat.completions.create(model=MODEL_GPT, messages= message_for(website))
    return response.choices[0].message.content
    
    

In [None]:
# Run if you have Llama key

def ollama_response(url):
    website = Website(url)
    response = ollama_via_openai.chat.completions.create(model=MODEL_LLAMA, messages= message_for(website))
    return response.choices[0].message.content



In [None]:
# Run if you have Open_AI key
def open_ai_display_summary(url):
    summary = openai_response(url)
    display(Markdown(summary))
    
    
    
    

In [None]:
# Run if you have llama 3.2
def ollama_display_summary(url):
    summary = ollama_response(url)
    display(Markdown(summary))
    
    
    

In [None]:
# Run if you have Open_AI key
open_ai_display_summary("https://www.udemy.com")


In [None]:
# Run if you have llama 3.2
ollama_display_summary("https://www.udemy.com")