# Download A Website

This notebook will download a few HTML pages and save them locally

**Note:** By default Llama-index uses OPENAI embeddings and LLMs.  Here we are experimenting with Llama@Replicate vs OpenAI

References

- a good example : https://docs.llamaindex.ai/en/stable/examples/index_structs/doc_summary/DocSummary/

## Step-1: Configuration

In [1]:
# If connection to https://huggingface.co/ failed, uncomment the following path
import os
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings

os.environ['HF_ENDPOINT'] = 'https://hf-mirror.com'

Settings.embed_model = HuggingFaceEmbedding(
    # model_name = 'sentence-transformers/all-MiniLM-L6-v2'
    model_name = 'BAAI/bge-small-en-v1.5'
)

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
import os,sys
## Load Settings from .env file
from dotenv import find_dotenv, load_dotenv

_ = load_dotenv(find_dotenv()) # read local .env file


### Setting an LLM

We have 2 choices

1. Llama at Replicate
2. Open AI (default)

Try them both and see which one gives you better answers.

In [3]:
## Setting LLAMA at replicate as LLM

from llama_index.llms.replicate import Replicate
from llama_index.core import Settings

llm = Replicate(
    # model= "meta/meta-llama-3-8b-instruct",
    model= "meta/meta-llama-3-70b-instruct",
    # model= "meta/meta-llama-3.1-405b-instruct",
    temperature=0.1
)

Settings.llm = llm

In [4]:
## Setting OpenAI as LLM - we need openAI key

# from llama_index.core import Settings
# from llama_index.llms.openai import OpenAI

# # llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
# llm = OpenAI(model="gpt-4o", temperature=0.1)

# Settings.llm = llm


In [5]:
## We will use llama-index-web-reader

from llama_index.readers.web import SimpleWebPageReader

urls = [
    'https://internet2.edu/cloud/cloud-learning-and-skills-sessions/developing-intelligent-applications-using-llms/',
    'https://internet2.edu/cloud/cloud-learning-and-skills-sessions/',  
    'https://internet2.edu/cloud/cloud-learning-and-skills-sessions/networking-in-the-cloud/', 
    ]

documents = SimpleWebPageReader(html_to_text=True).load_data(
    urls
)

print ('Loaded documents : ', len(documents))

Loaded documents :  3


In [6]:
import pprint 

pprint.pprint (documents[0])

Document(id_='https://internet2.edu/cloud/cloud-learning-and-skills-sessions/developing-intelligent-applications-using-llms/', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text="Skip to content\n\n[ Return Home ![Internet2 logo](https://internet2.edu/wp-\ncontent/uploads/2024/06/I2-clr.svg) ](https://internet2.edu) ![](/wp-\ncontent/themes/fas-base/assets/images/icon-menu.svg)\n\n  * Explore \n\n#### Connecting Research and Education\n\nInternet2 provides:\n\n    * [ a secure high-speed network](/network)\n    * [cloud solutions](/cloud)\n    * [research support](/community/research-engagement/)\n    * [services](/services) for the research and education [community](/community)\n\n[Learn more about Internet2](https://internet2.edu/community/about-us/)\n\n#####  [About Internet2](https://internet2.edu/community/about-us/)\n\nGet to know our history, our commitment, our community.\n\n#####  [Our Members](https://internet2.

## Step-3: Create Index

In [7]:
from llama_index.core import SummaryIndex

index = SummaryIndex.from_documents(documents)

## Step-4: Queries

In [8]:
# set Logging to DEBUG for more detailed outputs
query_engine = index.as_query_engine()

In [9]:

response = query_engine.query("Who is teaching the 'Networking in the Cloud' class?")

print (response)



The instructor of the "Networking in the Cloud" class is Scott Taylor, Internet2 Network Services.


In [10]:
response = query_engine.query("When is Sujee Maniyam teaching the class?")

print (response)



The original answer is still accurate, but it's not related to the provided context. The context appears to be about Internet2, a non-profit organization that provides networking and cloud services to the research and education community. It does not mention Sujee Maniyam or the specific workshop "Developing Intelligent Systems with LLMs and RAG".

Therefore, the refined answer remains the same as the original answer:

Sujee Maniyam is teaching the class on October 9, 2024, from 11 a.m. to 5 p.m. ET, specifically the "Developing Intelligent Systems with LLMs and RAG" workshop.


In [11]:
response = query_engine.query("When are the classes scheduled?")
print(response)



The class "Developing Intelligent Systems with LLMs and RAG" is scheduled for October 9, 2024, from 11 a.m. to 5 p.m. ET. Additionally, there are other classes and events available, such as "Networking in the Cloud" on November 7, 2024, from 11 a.m. to 2 p.m. ET, and "Developing Intelligent Applications Using LLMs" with no specific date mentioned. You can find more information about these classes and events on the Internet2 website.
