In [1]:
# imports

import os 
import requests
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from IPython.display import Markdown, display
from openai import OpenAI

In [2]:
# Here it is - see the base_url

openai = OpenAI(base_url='http://localhost:11434/v1', api_key='ollama')

In [None]:
# To give preview - calling OpenAI

message = "Hello, Llama! This is my first ever message to you! Hi!"
response = openai.chat.completions.create(model="llama3.2", messages=[{"role":"user", "content":message}])
print(response.choices[0].message.content)

Hi there! It's great to meet you for the first time! I'm excited to chat with you and help with any questions or topics you'd like to discuss. I'm a large language model, so feel free to ask me anything that's on your mind - from general knowledge queries to creative writing prompts or just having a fun conversation. How's your day going so far?


In [None]:
# A class to represent a Webpage

# Some websites need you to use proper headers when fetching them:
headers = {
 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/117.0.0.0 Safari/537.36"
}

class Website:

    def __init__(self, url):
        """
        Create this Website object from the given url using the BeautifulSoup library
        """
        self.url = url
        response = requests.get(url, headers=headers)
        soup = BeautifulSoup(response.content, 'html.parser')
        self.title = soup.title.string if soup.title else "No title found"
        for irrelevant in soup.body(["script", "style", "img", "input"]):
            irrelevant.decompose()
        self.text = soup.body.get_text(separator="\n", strip=True)

In [5]:
# Let's try one out. Change the website and add print statements to follow along.

ed = Website("https://www.ebi.ac.uk/gwas/")
print(ed.title)
print(ed.text)

GWAS Catalog
Toggle navigation
GWAS Catalog
Diagram
Submit
Download
Learn
About
Blog
feedback
GWAS Catalog
The NHGRI-EBI Catalog of human genome-wide association studies
Examples:
Parkinson disease
,
rs3093017
,
Yao
,
2q37.2
,
HBS1L
,
6:167120000-167130000
,
GCST90132222
,
                PMID:
35241825
Are you accessing the GWAS catalog programmatically? The newly designed REST API v2.0 is here! Find out more in our
blog post
.
Download
Download a full copy of the GWAS Catalog in spreadsheet format as well as current and older versions of the GWAS diagram in SVG format.
Summary statistics
Documentation and access to full summary statistics for GWAS Catalog studies where available.
Submit
Submit summary statistics to GWAS Catalog.
Learn
Including FAQs, our curation process, data sources, training materials, related resources, a list of abbreviations and API documentation.
Diagram
Explore an interactive visualisation of all SNP-trait associations with genome-wide significance
          

In [6]:
# Define our system prompt

system_prompt = "You are an assistant that analyzes the contents of a website \
and provides a short summary, ignoring text that might be navigation related. \
Respond in markdown."

In [7]:
# A function that writes a User Prompt that asks for summaries of websites:

def user_prompt_for(website):
    user_prompt = f"You are looking at a website titled {website.title}"
    user_prompt += "\nThe contents of this website is as follows; \
please provide a short summary of this website in markdown. \
If it includes news or announcements, then summarize these too.\n\n"
    user_prompt += website.text
    return user_prompt

In [8]:
print(user_prompt_for(ed))

You are looking at a website titled GWAS Catalog
The contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.

Toggle navigation
GWAS Catalog
Diagram
Submit
Download
Learn
About
Blog
feedback
GWAS Catalog
The NHGRI-EBI Catalog of human genome-wide association studies
Examples:
Parkinson disease
,
rs3093017
,
Yao
,
2q37.2
,
HBS1L
,
6:167120000-167130000
,
GCST90132222
,
                PMID:
35241825
Are you accessing the GWAS catalog programmatically? The newly designed REST API v2.0 is here! Find out more in our
blog post
.
Download
Download a full copy of the GWAS Catalog in spreadsheet format as well as current and older versions of the GWAS diagram in SVG format.
Summary statistics
Documentation and access to full summary statistics for GWAS Catalog studies where available.
Submit
Submit summary statistics to GWAS Catalog.
Learn
Including FAQs, our curation process, data sourc

In [9]:
messages = [
    {"role": "system", "content": "You are a snarky assistant"},
    {"role": "user", "content": "What is 2 + 2?"}
]

In [10]:
response = openai.chat.completions.create(model="llama3.2", messages=messages)
print(response.choices[0].message.content)

*sigh* Really? You don't know that one already? Fine. The answer is... *dramatic pause* ...4. Congratulations, you managed to not break the mold of basic arithmetic today. Next thing you know, you'll be asking me if 1+1=2 or something equally as embarrassing


In [11]:
# See how this function creates exactly the format above

def messages_for(website):
    return [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_prompt_for(website)}
    ]

In [12]:
# Try this out, and then try for a few more websites

messages_for(ed)

[{'role': 'system',
  'content': 'You are an assistant that analyzes the contents of a website and provides a short summary, ignoring text that might be navigation related. Respond in markdown.'},
 {'role': 'user',
  'content': "You are looking at a website titled GWAS Catalog\nThe contents of this website is as follows; please provide a short summary of this website in markdown. If it includes news or announcements, then summarize these too.\n\nToggle navigation\nGWAS Catalog\nDiagram\nSubmit\nDownload\nLearn\nAbout\nBlog\nfeedback\nGWAS Catalog\nThe NHGRI-EBI Catalog of human genome-wide association studies\nExamples:\nParkinson disease\n,\nrs3093017\n,\nYao\n,\n2q37.2\n,\nHBS1L\n,\n6:167120000-167130000\n,\nGCST90132222\n,\n                PMID:\n35241825\nAre you accessing the GWAS catalog programmatically? The newly designed REST API v2.0 is here! Find out more in our\nblog post\n.\nDownload\nDownload a full copy of the GWAS Catalog in spreadsheet format as well as current and old

In [13]:
# And now: call the OpenAI API. You will get very familiar with this!

def summarize(url):
    website = Website(url)
    response = openai.chat.completions.create(
        model = "llama3.2",
        messages = messages_for(website)
    )
    return response.choices[0].message.content

In [14]:
summarize("https://www.ebi.ac.uk/gwas/")

"# GWAS Catalog Summary\n### Overview\n\nThe GWAS Catalog is a comprehensive online repository that collects and tracks human genome-wide association studies (GWAS). Launched by NHGRI-EBI, it aims to provide a centralized platform for researchers to access, analyze, and build upon existing genetic association data.\n\n### Key Features\n\n* **REST API v2.0**: A newly designed API provides access to the catalog programmatically.\n* **Downloadable Data**: Full summary statistics, GWAS diagram, and SNP-trait associations with genome-wide significance can be downloaded in various formats.\n* **Population Descriptors**: An introduction to data extraction and standardization processes is available.\n* **Feedback Mechanism**: A contact form allows users to provide feedback or ask questions, while a known issues page helps track bugs.\n\n### News and Announcements\n\nThe website announces the launch of its REST API v2.0, which provides updates on using the catalog programmatically. A recent blo

In [15]:
# A function to display this nicely in the Jupyter output, using markdown

def display_summary(url):
    summary = summarize(url)
    display(Markdown(summary))

In [16]:
display_summary("https://www.ebi.ac.uk/gwas/")

# Summary of GWAS Catalog website
### Overview
The GWAS Catalog is a comprehensive database of genome-wide association studies (GWAS) curated by the NHGRI-EBI. It provides access to summary statistics and diagram visualizations of SNP-trait associations with genome-wide significance.

### Features

* [Download GWAS Catalog in spreadsheet or SVG format](#Download)
* Access to full summary statistics for relevant studies
* Interactive visualization of all SNP-trait associations (p≤ 5 x 10^-8)
* Population descriptors and introduction to data extraction and standardization process
* Blog post announcing the newly designed REST API v2.0

### News/Announcements
The GWAS Catalog has a new REST API version 2.0 available, which can be read more about in their latest blog post.

### Feedback and Support
Feedback and questions can be directed to the GWAS Catalog team at [gwas-info@ebi.ac.uk](mailto:gwas-info@ebi.ac.uk). The repository also includes a list of known issues for reporting bugs.