This notebook will help you set up all datasets and run a specific task on a specific dataset. We will also run the benchmark at the end so that you can test out our benchmarking script

### Get the data

In [None]:
# Removing any old csv's if they already exist
!rm -rf data/*.csv
# Get the datasets
!python data/get_data.py

In [1]:
%env OPENAI_API_KEY=***REMOVED***
%env ANTHROPIC_API_KEY=***REMOVED***

env: OPENAI_API_KEY=***REMOVED***
env: ANTHROPIC_API_KEY=***REMOVED***


In [2]:
from autolabel import LabelingAgent


### Run Classification on the Civil Comments dataset

In [11]:
civil_comments_config = {
    "task_name": "ToxicCommentClassification",
    "task_type": "classification",
    "dataset": {
        "label_column": "label",
        "delimiter": ","
    },
    "model": {
        "provider": "openai",
        "name": "gpt-3.5-turbo",
    },
    "prompt": {
        "task_guidelines": "You are an expert at identifying toxic comments and understanding if a comment is sexually explicit, obscene, toxic, insults a person, demographic or race.\nYour job is to correctly label the provided input example into one of the following categories.\nCategories:\n{labels}",
        "labels": [
            "toxic",
            "not toxic"
        ],
        "output_guidelines": "You will return the answer in a format that contains just the label and nothing else.",
        "few_shot_examples": [
            {
                "example": "It's ridiculous that these guys are being called 'protesters'. Being armed is a threat of violence, which makes them terrorists.",
                "label": "toxic"
            },
            {
                "example": "This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done!",
                "label": "not toxic"
            },
            {
                "example": "This bitch is nuts. Who would read a book by a woman",
                "label": "toxic"
            },
            {
                "example": "It was a great show. Not a combo I'd of expected to be good together but it was.",
                "label": "not toxic"
            }
        ],
        "few_shot_selection": "fixed",
        "few_shot_num": 3,
        "example_template": "Input: {example}\nOutput: {label}"
    }
}


In [12]:
l = LabelingAgent(config=civil_comments_config, cache=False)

In [13]:
l.plan('../data/civil_comments_test.csv', max_items=10)


Output()

Total Estimated Cost: $0.025
Number of examples to label: 10
Average cost per example: $0.00251


A prompt example:

You are an expert at identifying toxic comments and understanding if a comment is sexually explicit, obscene, toxic, insults a person, demographic or race.
Your job is to correctly label the provided input example into one of the following categories.
Categories:
toxic
not toxic

You will return the answer in a format that contains just the label and nothing else.

Some examples with their output answers are provided below:

Input: It's ridiculous that these guys are being called 'protesters'. Being armed is a threat of violence, which makes them terrorists.
Output: toxic

Input: This is so cool. It's like, 'would you want your mother to read this??' Really great idea, well done!
Output: not toxic

Input: This bitch is nuts. Who would read a book by a woman
Output: toxic

Now I want you to label the following example:
Input: [ Integrity means that you pay your debts.]

D

In [14]:
labels, df, metrics_list = l.run('../data/civil_comments_test.csv', max_items=50)


Output()

2023-05-28 17:03:49 openai INFO: error_code=None error_message='That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID c14f6a1e03af3ad7a5758f17ce2dff18 in your message.)' error_param=None error_type=server_error message='OpenAI API error received' stream_error=False


2023-05-28 17:04:23 openai INFO: error_code=None error_message='That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 5fce65eb404092e8139080838a41fec1 in your message.)' error_param=None error_type=server_error message='OpenAI API error received' stream_error=False


2023-05-28 17:04:55 openai INFO: error_code=None error_message='That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 6c9cec7ef5f7f143b3dae6d5d83f4b19 in your message.)' error_param=None error_type=server_error message='OpenAI API error received' stream_error=False


Metric: support: [(50, 'index=0')]
Metric: threshold: [(-inf, 'index=0')]
Metric: accuracy: [(0.76, 'index=0')]
Metric: completion_rate: [(1.0, 'index=0')]
Total number of failures: 0


In [15]:
df.head()

Unnamed: 0,example,label,ToxicCommentClassification_llm_labeled_successfully,ToxicCommentClassification_llm_label
0,[ Integrity means that you pay your debts.]\n\...,not toxic,yes,not toxic
1,This is malfeasance by the Administrator and t...,not toxic,yes,toxic
2,@Rmiller101 - Spoken like a true elitist. But ...,not toxic,yes,toxic
3,"Paul: Thank you for your kind words. I do, in...",not toxic,yes,not toxic
4,Sorry you missed high school. Eisenhower sent ...,not toxic,yes,not toxic


### Run Classification on the Banking dataset

In [22]:
banking77_config = {
    "task_name": "BankingComplaintsClassification",
    "task_type": "classification",
    "dataset": {
        "label_column": "label",
        "delimiter": ","
    },
    "model": {
        "provider": "openai",
        "name": "gpt-3.5-turbo",
    },
    "prompt": {
        "task_guidelines": "You are an expert at understanding twitter complaints.\nYour job is to correctly label the provided input example into one of the following categories.\nCategories:\n{labels}",
        "labels": [
            "activate_my_card",
            "age_limit",
            "apple_pay_or_google_pay",
            "atm_support",
            "automatic_top_up",
            "balance_not_updated_after_bank_transfer",
            "balance_not_updated_after_cheque_or_cash_deposit",
            "beneficiary_not_allowed",
            "cancel_transfer",
            "card_about_to_expire",
            "card_acceptance",
            "card_arrival",
            "card_delivery_estimate",
            "card_linking",
            "card_not_working",
            "card_payment_fee_charged",
            "card_payment_not_recognised",
            "card_payment_wrong_exchange_rate",
            "card_swallowed",
            "cash_withdrawal_charge",
            "cash_withdrawal_not_recognised",
            "change_pin",
            "compromised_card",
            "contactless_not_working",
            "country_support",
            "declined_card_payment",
            "declined_cash_withdrawal",
            "declined_transfer",
            "direct_debit_payment_not_recognised",
            "disposable_card_limits",
            "edit_personal_details",
            "exchange_charge",
            "exchange_rate",
            "exchange_via_app",
            "extra_charge_on_statement",
            "failed_transfer",
            "fiat_currency_support",
            "get_disposable_virtual_card",
            "get_physical_card",
            "getting_spare_card",
            "getting_virtual_card",
            "lost_or_stolen_card",
            "lost_or_stolen_phone",
            "order_physical_card",
            "passcode_forgotten",
            "pending_card_payment",
            "pending_cash_withdrawal",
            "pending_top_up",
            "pending_transfer",
            "pin_blocked",
            "receiving_money",
            "Refund_not_showing_up",
            "request_refund",
            "reverted_card_payment?",
            "supported_cards_and_currencies",
            "terminate_account",
            "top_up_by_bank_transfer_charge",
            "top_up_by_card_charge",
            "top_up_by_cash_or_cheque",
            "top_up_failed",
            "top_up_limits",
            "top_up_reverted",
            "topping_up_by_card",
            "transaction_charged_twice",
            "transfer_fee_charged",
            "transfer_into_account",
            "transfer_not_received_by_recipient",
            "transfer_timing",
            "unable_to_verify_identity",
            "verify_my_identity",
            "verify_source_of_funds",
            "verify_top_up",
            "virtual_card_not_working",
            "visa_or_mastercard",
            "why_verify_identity",
            "wrong_amount_of_cash_received",
            "wrong_exchange_rate_for_cash_withdrawal"
        ],
        "few_shot_examples": [
            {
                "example": "Is it free to transfer money or is there a fee?",
                "label": "transfer_fee_charged"
            },
            {
                "example": "I need my PIN, where is it?",
                "label": "get_physical_card"
            },
            {
                "example": "Can my salary be received and transferred to my current currency in my country?",
                "label": "receiving_money"
            },
            {
                "example": "Why isn't my purchase exchange rate correct?",
                "label": "card_payment_wrong_exchange_rate"
            },
            {
                "example": "Why was I charged a fee on a cash withdrawal?",
                "label": "cash_withdrawal_charge"
            }
        ],
        "few_shot_selection": "fixed",
        "few_shot_num": 5,
        "example_template": "Input: {example}\nOutput: {label}",
        "output_guidelines": "You will return the answer in a format that contains just the label and nothing else.",
    }
}


In [23]:
l = LabelingAgent(banking77_config, cache=False)

In [24]:
l.plan(dataset='../data/banking_test.csv', max_items=50)

Output()

Total Estimated Cost: $0.157
Number of examples to label: 50
Average cost per example: $0.00314


A prompt example:

You are an expert at understanding twitter complaints.
Your job is to correctly label the provided input example into one of the following categories.
Categories:
activate_my_card
age_limit
apple_pay_or_google_pay
atm_support
automatic_top_up
balance_not_updated_after_bank_transfer
balance_not_updated_after_cheque_or_cash_deposit
beneficiary_not_allowed
cancel_transfer
card_about_to_expire
card_acceptance
card_arrival
card_delivery_estimate
card_linking
card_not_working
card_payment_fee_charged
card_payment_not_recognised
card_payment_wrong_exchange_rate
card_swallowed
cash_withdrawal_charge
cash_withdrawal_not_recognised
change_pin
compromised_card
contactless_not_working
country_support
declined_card_payment
declined_cash_withdrawal
declined_transfer
direct_debit_payment_not_recognised
disposable_card_limits
edit_personal_details
exchange_charge
exchange_rate
exchange_

In [25]:
labels, df, metrics_list = l.run('../data/banking_test.csv', max_items=100)


Output()

2023-05-28 17:19:01 openai INFO: error_code=None error_message='That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 982aaf5cc49be31e0fed14e64031101b in your message.)' error_param=None error_type=server_error message='OpenAI API error received' stream_error=False


2023-05-28 17:20:09 openai INFO: error_code=None error_message='That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID e209523b7bdff382b5d2207ecd17e37b in your message.)' error_param=None error_type=server_error message='OpenAI API error received' stream_error=False


Metric: support: [(100, 'index=0')]
Metric: threshold: [(-inf, 'index=0')]
Metric: accuracy: [(0.71, 'index=0')]
Metric: completion_rate: [(1.0, 'index=0')]
Total number of failures: 0


In [14]:
banking77_config_v2 = {
    "task_name": "BankingComplaintsClassification",
    "task_type": "classification",
    "dataset": {
        "label_column": "label",
        "delimiter": ","
    },
    "model": {
        "provider": "openai",
        "name": "gpt-3.5-turbo",
    },
    "prompt": {
        "task_guidelines": "You are an expert at understanding twitter complaints.\nYour job is to correctly label the provided input example into one of the following categories. Categories:\n{labels}",
        "labels": [
            "activate_my_card",
            "age_limit",
            "apple_pay_or_google_pay",
            "atm_support",
            "automatic_top_up",
            "balance_not_updated_after_bank_transfer",
            "balance_not_updated_after_cheque_or_cash_deposit",
            "beneficiary_not_allowed",
            "cancel_transfer",
            "card_about_to_expire",
            "card_acceptance",
            "card_arrival",
            "card_delivery_estimate",
            "card_linking",
            "card_not_working",
            "card_payment_fee_charged",
            "card_payment_not_recognised",
            "card_payment_wrong_exchange_rate",
            "card_swallowed",
            "cash_withdrawal_charge",
            "cash_withdrawal_not_recognised",
            "change_pin",
            "compromised_card",
            "contactless_not_working",
            "country_support",
            "declined_card_payment",
            "declined_cash_withdrawal",
            "declined_transfer",
            "direct_debit_payment_not_recognised",
            "disposable_card_limits",
            "edit_personal_details",
            "exchange_charge",
            "exchange_rate",
            "exchange_via_app",
            "extra_charge_on_statement",
            "failed_transfer",
            "fiat_currency_support",
            "get_disposable_virtual_card",
            "get_physical_card",
            "getting_spare_card",
            "getting_virtual_card",
            "lost_or_stolen_card",
            "lost_or_stolen_phone",
            "order_physical_card",
            "passcode_forgotten",
            "pending_card_payment",
            "pending_cash_withdrawal",
            "pending_top_up",
            "pending_transfer",
            "pin_blocked",
            "receiving_money",
            "Refund_not_showing_up",
            "request_refund",
            "reverted_card_payment?",
            "supported_cards_and_currencies",
            "terminate_account",
            "top_up_by_bank_transfer_charge",
            "top_up_by_card_charge",
            "top_up_by_cash_or_cheque",
            "top_up_failed",
            "top_up_limits",
            "top_up_reverted",
            "topping_up_by_card",
            "transaction_charged_twice",
            "transfer_fee_charged",
            "transfer_into_account",
            "transfer_not_received_by_recipient",
            "transfer_timing",
            "unable_to_verify_identity",
            "verify_my_identity",
            "verify_source_of_funds",
            "verify_top_up",
            "virtual_card_not_working",
            "visa_or_mastercard",
            "why_verify_identity",
            "wrong_amount_of_cash_received",
            "wrong_exchange_rate_for_cash_withdrawal"
        ],
        "few_shot_examples": "../data/banking_seed.csv",
        "few_shot_selection": "semantic_similarity",
        "few_shot_num": 5,
        "example_template": "Input: {example}\nOutput: {label}"
    }
}


In [15]:
l = LabelingAgent(banking77_config_v2, cache=False)

In [12]:
l.plan('../data/banking_test.csv', max_items=100)

Output()

Total Estimated Cost: $3.6
Number of examples to label: 100
Average cost per example: $0.036


A prompt example:

You are an expert at understanding twitter complaints.
Your job is to correctly label the provided input example into one of the following categories. Categories:
activate_my_card
age_limit
apple_pay_or_google_pay
atm_support
automatic_top_up
balance_not_updated_after_bank_transfer
balance_not_updated_after_cheque_or_cash_deposit
beneficiary_not_allowed
cancel_transfer
card_about_to_expire
card_acceptance
card_arrival
card_delivery_estimate
card_linking
card_not_working
card_payment_fee_charged
card_payment_not_recognised
card_payment_wrong_exchange_rate
card_swallowed
cash_withdrawal_charge
cash_withdrawal_not_recognised
change_pin
compromised_card
contactless_not_working
country_support
declined_card_payment
declined_cash_withdrawal
declined_transfer
direct_debit_payment_not_recognised
disposable_card_limits
edit_personal_details
exchange_charge
exchange_rate
exchange_via

In [16]:
labels, df, metrics_list = l.run('../data/banking_test.csv', max_items=50)


Output()

2023-05-28 16:37:42 openai INFO: error_code=None error_message='That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 99c49b94ef84e845437e6ba77d9731b4 in your message.)' error_param=None error_type=server_error message='OpenAI API error received' stream_error=False


Metric: support: [(50, 'index=0')]
Metric: threshold: [(-inf, 'index=0')]
Metric: accuracy: [(0.64, 'index=0')]
Metric: completion_rate: [(1.0, 'index=0')]
Total number of failures: 0


### Run Entity Matching on Walmart-Amazon

In [22]:
walmart_amazon_config = {
    "task_name": "ProductCatalogEntityMatch",
    "task_type": "entity_matching",
    "dataset": {
        "label_column": "label",
        "delimiter": ","
    },
    "model": {
        "provider": "openai",
        "name": "gpt-3.5-turbo",
    },
    "prompt": {
        "task_guidelines": "You are an expert at identifying duplicate products from online product catalogs.\nYour job is to tell if the two given entities are duplicates or not. Your answer must be from one of the following options:\n{labels}",
        "labels": ["duplicate", "not duplicate"],
        "few_shot_examples": [
            {
                "entity1": "Title: lexmark extra high yield return pgm print cartridge - magenta; Category: printers; Brand: lexmark; ModelNo: c782u1mg; Price: 214.88;",
                "entity2": "Title: lexmark 18c1428 return program print cartridge black; Category: inkjet printer ink; Brand: lexmark; ModelNo: 18c1428; Price: 19.97;",
                "label": "not duplicate"
            },
            {
                "entity1": "Title: edge tech proshot 4gb sdhc class 6 memory card; Category: usb drives; Brand: edge tech; ModelNo: pe209780; Price: 10.88;",
                "entity2": "Title: 4gb edge proshot sdhc memory card class6; Category: computers accessories; Brand: edge; ModelNo: nan; Price: 17.83;",
                "label": "duplicate"
            },
            {
                "entity1": "Title: tomtom one carry case; Category: gps; Brand: tomtom; ModelNo: 9n00 .181; Price: 19.96;",
                "entity2": "Title: tomtom one carrying case; Category: cases; Brand: tomtom; ModelNo: 9n00 .181; Price: 4.99;",
                "label": "duplicate"
            },
            {
                "entity1": "Title: iosafe rugged 250gb usb 3.0 portable external hard drive; Category: hard drives; Brand: iosafe; ModelNo: pa50250u5yr; Price: 249.99;",
                "entity2": "Title: lacie rugged all-terrain 500 gb firewire 800 firewire 400 usb 2.0 portable external hard drive 301371; Category: external hard drives; Brand: lacie; ModelNo: 301371; Price: nan;",
                "label": "not duplicate"
            }
        ],
        "few_shot_selection": "fixed",
        "few_shot_num": 3,
        "example_template": "Entity1: {entity1}\nEntity2: {entity2}\nOutput: {label}"
    }
}


In [23]:
l = LabelingAgent(walmart_amazon_config, cache=False)


In [24]:
l.plan('../data/walmart_amazon_test.csv', max_items=100)

Output()

Total Estimated Cost: $0.288
Number of examples to label: 100
Average cost per example: $0.00288


A prompt example:

You are an expert at identifying duplicate products from online product catalogs.
Your job is to tell if the two given entities are duplicates or not. Your answer must be from one of the following options:
duplicate
not duplicate

You will return the answer with one element: "the correct option"


Some examples with their output answers are provided below:

Entity1: Title: lexmark extra high yield return pgm print cartridge - magenta; Category: printers; Brand: lexmark; ModelNo: c782u1mg; Price: 214.88;
Entity2: Title: lexmark 18c1428 return program print cartridge black; Category: inkjet printer ink; Brand: lexmark; ModelNo: 18c1428; Price: 19.97;
Output: not duplicate

Entity1: Title: edge tech proshot 4gb sdhc class 6 memory card; Category: usb drives; Brand: edge tech; ModelNo: pe209780; Price: 10.88;
Entity2: Title: 4gb edge proshot sdhc memory card class6; Categor

In [25]:
labels, df, metrics_list = l.run('../data/walmart_amazon_test.csv', max_items=50)


Output()

2023-05-28 16:43:53 openai INFO: error_code=None error_message='That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID 0a50afa39ebaa8af03d66fcfc551eff3 in your message.)' error_param=None error_type=server_error message='OpenAI API error received' stream_error=False


Metric: support: [(50, 'index=0')]
Metric: threshold: [(-inf, 'index=0')]
Metric: accuracy: [(0.98, 'index=0')]
Metric: completion_rate: [(1.0, 'index=0')]
Total number of failures: 0


## Squad V2

In [26]:
squad_v2_config = {
    "task_name": "OpenbookQAWikipedia",
    "task_type": "multi_choice_question_answering",
    "dataset": {
        "label_column": "answer",
        "delimiter": ","
    },
    "model": {
        "provider": "openai",
        "name": "gpt-3.5-turbo",
        "params": {}
    },
    "prompt": {
        "task_guidelines": "You are an expert at answering questions based on wikipedia articles.\nYour job is to answer the following questions using the context provided with the question. The answer is a continuous span of words from the context. Use the context to answer the question. If the question cannot be answered using the context, answer the question as unanswerable.",
        "example_template": "Context: {context}\nQuestion: {question}\nAnswer: {answer}"
    }
}


In [27]:
l = LabelingAgent(squad_v2_config, cache=False)


In [28]:
l.plan('../data/squad_v2_test.csv', max_items=100)


Output()

Total Estimated Cost: $0.255
Number of examples to label: 100
Average cost per example: $0.00255


A prompt example:

You are an expert at answering questions based on wikipedia articles.
Your job is to answer the following questions using the context provided with the question. The answer is a continuous span of words from the context. Use the context to answer the question. If the question cannot be answered using the context, answer the question as unanswerable.

You will return the answer one element: "the correct label"


Now I want you to label the following example:
Context: The final major evolution of the steam engine design was the use of steam turbines starting in the late part of the 19th century. Steam turbines are generally more efficient than reciprocating piston type steam engines (for outputs above several hundred horsepower), have fewer moving parts, and provide rotary power directly instead of through a connecting rod system or similar means. Steam turbines virtually

In [29]:
labels, df, metrics_list = l.run('../data/squad_v2_test.csv', max_items=50)


Output()

2023-05-28 16:45:46 openai INFO: error_code=None error_message='That model is currently overloaded with other requests. You can retry your request, or contact us through our help center at help.openai.com if the error persists. (Please include the request ID effe68e526c60dae9aa705f741ab1105 in your message.)' error_param=None error_type=server_error message='OpenAI API error received' stream_error=False


Metric: f1: [(0.6692053999878658, 'index=0')]
Metric: support: [(50, 'index=0')]
Metric: threshold: [-inf]
Metric: accuracy: [(0.52, 'index=0')]
Metric: completion_rate: [(1.0, 'index=0')]
Total number of failures: 0


## CONLL

In [30]:
conll_config = {
    "task_name": "PersonLocationOrgMiscNER",
    "task_type": "named_entity_recognition",
    "dataset": {
        "label_column": "CategorizedLabels",
        "text_column": "example",
        "delimiter": ","
    },
    "model": {
        "provider": "openai",
        "name": "gpt-3.5-turbo",
    },
    "prompt": {
        "task_guidelines": "You are an expert at extracting Person, Organization, Location, and Miscellaneous entities from text. Your job is to extract named entities mentioned in text, and classify them into one of the following categories.\nCategories:\n{labels}\n ",
        "labels": [
            "Location",
            "Organization",
            "Person",
            "Miscellaneous"
        ],
        "example_template": "Example: {example}\nOutput: {CategorizedLabels}"
    }
}


In [31]:
l = LabelingAgent(conll_config)


In [32]:
l.plan('../data/conll2003_test.csv', max_items=100)


Output()

Total Estimated Cost: $0.226
Number of examples to label: 100
Average cost per example: $0.00226


A prompt example:

You are an expert at extracting Person, Organization, Location, and Miscellaneous entities from text. Your job is to extract named entities mentioned in text, and classify them into one of the following categories.
Categories:
Location
Organization
Person
Miscellaneous
 

You will return the answer in CSV format, with two columns seperated by the % character. First column is the extracted entity and second column is the category. Rows in the CSV are separated by new line character.

Now I want you to label the following example:
Example: SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRISE DEFEAT .
Output: 


In [33]:
labels, df, metrics_list = l.run('../data/conll2003_test.csv', max_items=20)


Output()

Metric: f1: [(0.8735632183908046, 'index=0')]
Metric: support: [(20, 'index=0')]
Metric: threshold: [(-inf, 'index=0')]
Metric: accuracy: [(0.7804876145152159, 'index=0')]
Metric: completion_rate: [(1.0, 'index=0')]
Total number of failures: 0


In [8]:
df.head()

Unnamed: 0,example,IndividualLabels,CategorizedLabels,PersonLocationOrgMiscNER_llm_labeled_successfully,PersonLocationOrgMiscNER_llm_label
0,"SOCCER - JAPAN GET LUCKY WIN , CHINA IN SURPRI...","[{""Description"": ""Location"", ""Text"": ""JAPAN""},...","{""Location"": [""JAPAN""], ""Organization"": [], ""P...",yes,"[{'type': 'Location', 'text': 'JAPAN', 'start'..."
1,Nadim Ladki,"[{""Description"": ""Person"", ""Text"": ""Nadim Ladk...","{""Location"": [], ""Organization"": [], ""Person"":...",yes,"[{'type': 'Person', 'text': 'Nadim Ladki', 'st..."
2,"AL-AIN , United Arab Emirates 1996-12-06","[{""Description"": ""Location"", ""Text"": ""AL-AIN""}...","{""Location"": [""AL-AIN"", ""United Arab Emirates""...",yes,"[{'type': 'Location', 'text': 'AL-AIN', 'start..."
3,Japan began the defence of their Asian Cup tit...,"[{""Description"": ""Location"", ""Text"": ""Japan""},...","{""Location"": [""Japan"", ""Syria""], ""Organization...",yes,"[{'type': 'Location', 'text': 'Japan', 'start'..."
4,But China saw their luck desert them in the se...,"[{""Description"": ""Location"", ""Text"": ""China""},...","{""Location"": [""China"", ""Uzbekistan""], ""Organiz...",yes,"[{'type': 'Location', 'text': 'China', 'start'..."


In [9]:
df['CategorizedLabels'][0]

'{"Location": ["JAPAN"], "Organization": [], "Person": ["CHINA"], "Miscellaneous": []}'

In [11]:
print(l.task._json_to_llm_format(df['CategorizedLabels'][0]))


JAPAN%Location
CHINA%Person


In [12]:
serialized = l.task._json_to_llm_format(df['CategorizedLabels'][0])
print(l.task._llm_to_json_format(serialized))


{'Location': ['JAPAN'], 'Organization': [], 'Person': ['CHINA'], 'Miscellaneous': []}


In [34]:
conll_few_shot_config = {
    "task_name": "PersonLocationOrgMiscNER",
    "task_type": "named_entity_recognition",
    "dataset": {
        "label_column": "CategorizedLabels",
        "text_column": "example",
        "delimiter": ","
    },
    "model": {
        "provider": "openai",
        "name": "gpt-3.5-turbo",
    },
    "prompt": {
        "task_guidelines": "You are an expert at extracting Person, Organization, Location, and Miscellaneous entities from text. Your job is to extract named entities mentioned in text, and classify them into one of the following categories.\nCategories:\n{labels}\n ",
        "labels": [
            "Location",
            "Organization",
            "Person",
            "Miscellaneous"
        ],
        "example_template": "Example: {example}\nOutput: {CategorizedLabels}",
        "few_shot_examples": "../data/conll2003_seed.csv",
        "few_shot_selection": "semantic_similarity",
        "few_shot_num": 10,
    }
}


In [35]:
l = LabelingAgent(conll_few_shot_config)


In [36]:
l.plan('../data/conll2003_test.csv', max_items=10)


Output()

Total Estimated Cost: $0.028
Number of examples to label: 10
Average cost per example: $0.00284


A prompt example:

You are an expert at extracting Person, Organization, Location, and Miscellaneous entities from text. Your job is to extract named entities mentioned in text, and classify them into one of the following categories.
Categories:
Location
Organization
Person
Miscellaneous
 

You will return the answer in CSV format, with two columns seperated by the % character. First column is the extracted entity and second column is the category. Rows in the CSV are separated by new line character.

Some examples with their output answers are provided below:

Example: PARIS 1996-12-06
Output: PARIS%Location

Example: PARIS 1996-12-06
Output: PARIS%Location

Example: LONDON 1996-12-06
Output: LONDON%Location

Example: LONDON 1996-12-06
Output: LONDON%Location

Example: LONDON 1996-12-06
Output: LONDON%Location

Example: MUNICH , Germany 1996-12-06
Output: MUNICH%Location
Germany%Location


In [37]:
labels, df, metrics_list = l.run('../data/conll2003_test.csv', max_items=20)


Output()

Metric: f1: [(0.9382716049382716, 'index=0')]
Metric: support: [(20, 'index=0')]
Metric: threshold: [(-inf, 'index=0')]
Metric: accuracy: [(0.8048778524688164, 'index=0')]
Metric: completion_rate: [(1.0, 'index=0')]
Total number of failures: 0
