# Build a Website Chatbot using the OpenAI Assistants API

Build a Website Chatbot using the Python SDK for the OpenAI Assistants API using [Web Transpose Crawl](https://webtranspose.com).

In [None]:
!pip install openai webtranspose

In [None]:
import os
import webtranspose as webt
os.environ['WEBTRANSPOSE_API_KEY'] = 'YOUR_API_KEY'

## Get Website Data

We're going to use [Andrew Carnegie's Autobiography](https://www.gutenberg.org/files/17976/17976-h/17976-h.htm).

In [None]:
url = 'https://www.gutenberg.org/files/17976/17976-h/17976-h.htm'

In [None]:
crawl = webt.Crawl(
    url=url,
    max_pages=1,
    verbose=True,
)
await crawl.crawl()

## Upload Data to OpenAI

In [None]:
import io

page = crawl.get_page(crawl.base_url)
file = client.files.create(
    file=io.BytesIO(page['text'].encode('utf-8')),
    purpose="assistants",
)

## Create the OpenAI Assistant

In [None]:
from openai import OpenAI

client = OpenAI()

assistant = client.beta.assistants.create(
    name="Talk to Andrew Carnegie's Biography",
    instructions="You are a helpful assistant that takes the user's query, searches through its uploaded files to get more context and answers and then only using information from the context.",
    model="gpt-4-1106-preview",
    tools=[
        {"type": "retrieval"},
    ],
    file_ids=[
        file.id,
    ],
)


## Chat with our OpenAI Assistant

In [None]:
import time

def submit_message(assistant_id, thread, query):
    message = client.beta.threads.messages.create(
        thread_id=thread.id,
        role="user",
        content=query,
    )
    return client.beta.threads.runs.create(
        thread_id=thread.id,
        assistant_id=assistant_id,
    )

def wait_on_run(run, thread):
    while run.status == "queued" or run.status == "in_progress":
        run = client.beta.threads.runs.retrieve(
            thread_id=thread.id,
            run_id=run.id,
        )
        time.sleep(0.5)
    return run

In [None]:
query = "Tell me your thoughts on Pittsburgh"
thread = client.beta.threads.create()
run = submit_message(
    assistant.id,
    thread,
    query,
)
run = wait_on_run(run, thread)

In [None]:
messages = client.beta.threads.messages.list(thread_id=thread.id)

In [93]:
for msg in messages.data[::-1]:
    print(msg.role)
    print(msg.content[0].text.value)
    print('----')

user
Tell me your thoughts on Pittsburgh
----
assistant
My thoughts on Pittsburgh, based on the context provided in the document, reflect on its rich industrial history, particularly in the steel industry, as well as its strategic geographical importance. The city played a pivotal role during the industrialization of America, especially with the development of the steel-rail mills. This was largely due to the efforts and innovations of industrialists like Andrew Carnegie, who played a significant role in introducing hard-headed iron rails, the Bessemer process, and eventually the creation and success of steel rails in the United States.

The site for the steel-rail mills, in a historic area related to Braddock's defeat, suggests that Pittsburgh was not only an industrial hub but also a place of historical significance. The relics of past battles found during the mills' construction remind us of the layers of time that contribute to the essence of a place.

Moreover, the city's location

## Verify it retrieved data on the query

Depending on our system prompt, the model won't get retrieve our uploaded data. We can check this happened using the code below.

In [None]:
import json
def show_json(obj):
    display(json.loads(obj.model_dump_json()))

In [None]:
run_steps = client.beta.threads.runs.steps.list(
    thread_id=thread.id, run_id=run.id, order="asc"
)

In [74]:
for step in run_steps.data:
    step_details = step.step_details
    print(json.dumps(show_json(step_details), indent=4))

{'tool_calls': [{'id': 'call_PSrrJAGUDKWbpMEXdwyV9H8A',
   'retrieval': {},
   'type': 'retrieval'}],
 'type': 'tool_calls'}

null


{'message_creation': {'message_id': 'msg_CZDmVn8WRY72O7IoVDNyKojo'},
 'type': 'message_creation'}

null


✅ As you can see, it called `retrieval`.

## Your Turn

Get started with [Web Transpose](https://webtranspose.com) and start building projects using website data.