## Running Async Transformations in Jupyter

In [None]:
!pip install refuel-autolabel

In [1]:
import nest_asyncio
nest_asyncio.apply()

## Finding the State of National Park using Autolabel

We will use Autolabel to find the state of the national park given a url to the national park nps website. First, we will use a transform to extract the content of the website. Then, using the content, we will structure this as a question_answering task to extract the state of the park from this webpage.

Notice the "transforms" part of the config. Here we use the url column to extract the text on the webpage. This content of the webpage is sent to the column called "content" in the "output_columns" part of the transform. Next, in the "example_template" we use this "content" column in order to send the website text and ask the question about the state of the national park.

In [22]:
config = {
    "task_name": "NationalPark",
    "task_type": "question_answering",
    "dataset": {
    },
    "model": {
        "provider": "openai",
        "name": "gpt-3.5-turbo"
    },
    "transforms": [{
        "name": "webpage_transform",
        "params": {
            "url_column": "url"
        },
        "output_columns": {
            "content_column": "content"
        }
    }],
    "prompt": {
        "task_guidelines": "You are an expert at understanding websites of national parks. You will be given a webpage about a national park. Answer with the US State that the national park is located in.",
        "output_guidelines": "Answer in one word the state that the national park is located in.",
        "example_template": "Content of wikipedia page: {content}\State:",
    }
}

In [23]:
from autolabel import LabelingAgent, AutolabelDataset, AutolabelConfig
agent = LabelingAgent(config)

A small manually collected dataset of national parks and their websites containing information about them. We intend to use the LLM to find out the state which may be buried at different parts in the website.

In [34]:
import pandas as pd
df = pd.DataFrame([
    {
        "url": "https://www.visitmt.com/places-to-go/glacier-national-park",
        "name": "Glacier National Park"
    },
    {
        "url": "https://www.nps.gov/dena/index.htm",
        "name": "Denali National Park"
    },
    {
        "url": "https://www.nps.gov/lavo/index.htm",
        "name": "Lassen Volcanic National Park"
    },
    {
        "url": "https://www.nps.gov/olym/index.htm",
        "name": "Olympic National Park"
    },
    {
        "url": "https://www.nps.gov/pinn/index.htm",
        "name": "Pinnacles National Park"
    }
])

In [25]:
ds = AutolabelDataset(df, config = AutolabelConfig(config))

## Running the transform
First, we run transform in order to run the Webpage transformation and populate the content column of the dataset.

In [26]:
ds = agent.transform(ds)

Output()

## Running the labeling function
Now, we use the send the content of the website along with the question in order to return the state of the national park.

In [27]:
ds = agent.run(ds)

Output()

In [33]:
ds.df

Unnamed: 0,url,name,content,metadata_column,NationalPark_label,NationalPark_error,NationalPark_successfully_labeled,NationalPark_annotation
0,https://www.visitmt.com/places-to-go/glacier-n...,Glacier National Park,\n\n\n\n\n\n\nGlacier National Park\n\n\n\n\n\...,{'url': 'https://www.visitmt.com/places-to-go/...,Montana,,True,"{""successfully_labeled"": true, ""label"": ""Monta..."
1,https://www.nps.gov/dena/index.htm,Denali National Park,\n Denali National Park & Preserve (U.S. N...,"{'url': 'https://www.nps.gov/dena/index.htm', ...",Alaska,,True,"{""successfully_labeled"": true, ""label"": ""Alask..."
2,https://www.nps.gov/lavo/index.htm,Lassen Volcanic National Park,\n Lassen Volcanic National Park (U.S. Nat...,"{'url': 'https://www.nps.gov/lavo/index.htm', ...",California,,True,"{""successfully_labeled"": true, ""label"": ""Calif..."
3,https://www.nps.gov/olym/index.htm,Olympic National Park,\n Olympic National Park (U.S. National Pa...,"{'url': 'https://www.nps.gov/olym/index.htm', ...",Washington,,True,"{""successfully_labeled"": true, ""label"": ""Washi..."
4,https://www.nps.gov/pinn/index.htm,Pinnacles National Park,\n Pinnacles National Park (U.S. National ...,"{'url': 'https://www.nps.gov/pinn/index.htm', ...",California,,True,"{""successfully_labeled"": true, ""label"": ""Calif..."
