In [1]:
import pandas as pd


## Dataset Creation

First, let's use a dataset of violations stored in pandas dataframe. This will help us manage our data as we add more attributes, like predictions and labels for svc codes

In [3]:
df = pd.read_csv('speeding_violations_dataset.csv')

In [4]:
df

Unnamed: 0,text,ground_truth
0,EXCEED POSTED SPEED/BRIDGE BY 13 MPH,SPEED 11-15 OVER LIMIT
1,SPEEDING IN SCHOOL ZONE 16 MPH IN A 20 MPH ZONE,SPEED IN SCHOOL ZONE
2,SPEEDING 80/70,SPEED 6-10 OVER LIMIT
3,SPEEDING 10% ABOVE THE POSTED SPEED 112MPH IN ...,SPEED 46 PLUS OVER LIMIT
4,SPEEDING 15 MPH OVER IN A RESIDENTIAL AREA,SPEED 11-15 OVER LIMIT
5,SPEEDING 21+ MPH COUNTY/STATE,SPEED 21-25 OVER LIMIT
6,SPEEDING 86 MPH IN A 75 MPH ZONE,SPEED 11-15 OVER LIMIT
7,SPEEDING (CITY - NOT URBAN),SPEED - GENERAL
8,SPEEDING MORE THAN 10 MPH (11-14)76/55,SPEED 21-25 OVER LIMIT
9,SPEEDING IN EXCESSS OF LAWFUL MAXIMUM LIMIT(S15),SPEED - GENERAL


In [10]:
speeding_list = ['SPEED IN SCHOOL ZONE', 
    'SPEED IN WORK ZONE', 
    'SPEED LIMIT FOR TRUCKS AND BUSES', 
    'SPEED WHILE TOWING'
    'SPEED 1-5 OVER LIMIT',
    'SPEED 6-10 OVER LIMIT',
    'SPEED 11-15 OVER LIMIT',
    'SPEED 16-20 OVER LIMIT',
    'SPEED 21-25 OVER LIMIT',
    'SPEED 26-30 OVER LIMIT',
    'SPEED 31-35 OVER LIMIT',
    'SPEED 36-40 OVER LIMIT',
    'SPEED 41-45 OVER LIMIT',
    'SPEED 46 PLUS OVER LIMIT',
    'SPEED OVER 29 MPH IN EXCESS',
    'SPEED - GENERAL',
    'SPEED GREATER THAN REASONABLE OR PRUDENT',
    'DRIVE OVER MAXIMUM SPEED LIMIT',
    'EXCESSIVE SPEED',
    'SPECIAL SPEED LIMITATIONS',
    'STATE SPEED ZONES AS NOTED BY SIGNS'
    ]

We instantiate Dataset that uses this pandas dataframe as a data source. Dataset object takes care of input data schema and data streaming:

## Create Agent
To create Agent, we need to to define 2 things:

Skills - Agent's abilities are defined as Skills. Each agent can possess many different skills. In our case, this agent only has one labeling skill, to produce a classification of SVC violation codes for a given piece of text. To define this skill, we will leverage an LLM, passing it instructions and the set of labeles we expect to receive back.

Environment - that is where the Agent receives ground truth signal to improve its skill. Since we already created ground truth dataset, we can simply refer to the column from the dataframe. In the real world scenario, you may consider using a different environment where ground truth signal can be obtained asynchoronously by gathering real human feedback during agent's learning phase.

In [11]:
from adala.agents import Agent
from adala.environments import StaticEnvironment
from adala.skills import ClassificationSkill
from adala.runtimes import OpenAIChatRuntime, GuidanceRuntime
from rich import print

In [12]:
prompt = """

Your task is to classify each violation code into the correct category based on specific criteria. Consider the following instructions:

1. If the violation description includes 'WORK ZONE' OR 'CONSTRUCTION ZONE', assign the label "SPEED IN WORK ZONE." Apply the zone-specific labels regardless of mentions of numerical speeds like 1-10, 10-20, etc.

2. If the violation description includes 'SCHOOL ZONE', assign the label "SPEED IN SCHOOL ZONE." Apply the zone-specific labels regardless of mentions of numerical speeds like 1-10, 10-20, etc.

3. For violations in the format 'SPEEDING X/Y,' deduce the excess amount (|X - Y|) to find the appropriate category range.  If the calculated excess falls within a specific range like Example: SPEEDING 80/70 excess is 10, hence label would be SPEED 6-10 OVER LIMIT because 10 is in the 6-10 range.

4. For explicit speeds, such as 'SPEED X MPH IN A Y MPH ZONE,' calculate the excess as |X - Y| to determine the fitting category range. If the calculated excess is above 45, use the label "SPEED 46 PLUS OVER LIMIT."

5. Use the label 'SPEED - GENERAL' for violations with non-numerical descriptions or qualitative descriptions without specific speed details. Non-numerical descriptions mean that there is no mention of a range of speeds or quantifiable information regarding how much the speed exceeded by.

Remember to provide the correct category label based on the specified criteria for each violation code.
"""

In [13]:
prompt = promptlayer.prompts.get("speeding_violation_catergorize")['template']

In [14]:
prompt

'Your task is to classify each violation code into the correct category based on specific criteria. Consider the following instructions:\n\n1. If the violation description includes \'WORK ZONE\' OR \'CONSTRUCTION ZONE\', assign the label "SPEED IN WORK ZONE." Apply the zone-specific labels regardless of mentions of numerical speeds like 1-10, 10-20, etc.\n\n2. If the violation description includes \'SCHOOL ZONE\', assign the label "SPEED IN SCHOOL ZONE." Apply the zone-specific labels regardless of mentions of numerical speeds like 1-10, 10-20, etc.\n\n3. For violations in the format \'SPEEDING X/Y,\' deduce the excess amount (|X - Y|) to find the appropriate category range.  If the calculated excess falls within a specific range like Example: SPEEDING 80/70 excess is 10, hence label would be SPEED 6-10 OVER LIMIT because 10 is in the 6-10 range.\n\n4. For explicit speeds, such as \'SPEED X MPH IN A Y MPH ZONE,\' calculate the excess as |X - Y| to determine the fitting category range. 

In [15]:
agent = Agent(
    # define the agent's labeling skill that should classify text onto 2 categories
    skills=ClassificationSkill(
        name='speeding_violation_catergorize',
        description='Classify traffic violation codes based on the nature and severity of the speeding incident.',
        instructions=prompt,
        labels={'prediction': speeding_list},
        input_template='Input: {text}',
        output_template='Output: {prediction}'
    ),
    
    # basic environment extracts ground truth signal from the input records
    environment=StaticEnvironment(
        df=df,
        ground_truth_columns={'prediction': 'ground_truth'}
    ),
    
    runtimes = {
        'openai': GuidanceRuntime(),
    },
    default_runtime='openai',
    
    teacher_runtimes = {
      'openai-gpt3': OpenAIChatRuntime(model='gpt-3.5-turbo'),
      'openai-gpt4': OpenAIChatRuntime(model='gpt-3.5-turbo'),
    },
    
    # NOTE! If you don't have an access to gpt4 - replace it with "openai-gpt3"
    default_teacher_runtime='openai-gpt3'
)

print(agent)

Learning Agent
We will now let Agent learn from the ground truth. After every action, Agent returns its Experience, where it stores various observations like predicted data, errors, accuracy, etc.

In [206]:
ground_truth_signal = agent.learn(learning_iterations=3, accuracy_threshold=0.90)

  logprobs_out[k] = np.log(or_prob)
100%|██████████████████████████████████████████| 13/13 [00:00<00:00, 26.82it/s]


  logprobs_out[k] = np.log(or_prob)
100%|██████████████████████████████████████████| 13/13 [00:00<00:00, 28.86it/s]


  logprobs_out[k] = np.log(or_prob)
100%|██████████████████████████████████████████| 13/13 [00:00<00:00, 27.02it/s]


In [193]:
test_df = pd.DataFrame([
    "SPEEDING 45 IN A 25 MPH RESIDENTIAL ZONE",
    "SPEEDING 88 IN A 55 MPH HIGHWAY ZONE",
    "SPEEDING 25 MPH PLUS",
    "SPEEDING 50/35 IN DOWNTOWN TRAFFIC",
    "SPEEDING AT 95 MPH IN A 65 MPH INTERSTATE ZONE"
], columns=['text'])


In [194]:
test_df

Unnamed: 0,text
0,SPEEDING 45 IN A 25 MPH RESIDENTIAL ZONE
1,SPEEDING 88 IN A 55 MPH HIGHWAY ZONE
2,SPEEDING 25 MPH PLUS
3,SPEEDING 50/35 IN DOWNTOWN TRAFFIC
4,SPEEDING AT 95 MPH IN A 65 MPH INTERSTATE ZONE


In [207]:
predictions = agent.run(test_df)


  logprobs_out[k] = np.log(or_prob)
100%|████████████████████████████████████████████| 5/5 [00:00<00:00, 26.06it/s]


In [163]:
predictions

Unnamed: 0,text,prediction
0,SPEEDING 45 IN A 25 MPH RESIDENTIAL ZONE,SPEED 20 PLUS OVER LIMIT
1,SPEEDING 88 IN A 55 MPH HIGHWAY ZONE,SPEED 31 PLUS OVER LIMIT
2,SPEEDING 25 PLUS,SPEED 25 PLUS OVER LIMIT
3,SPEEDING 50/35 IN DOWNTOWN TRAFFIC,SPEED 15 PLUS OVER LIMIT
4,SPEEDING AT 95 MPH IN A 65 MPH INTERSTATE ZONE,SPEED 31 PLUS OVER LIMIT


In [208]:
predictions

Unnamed: 0,text,prediction
0,SPEEDING 45 IN A 25 MPH RESIDENTIAL ZONE,SPEED 16-20 OVER LIMIT
1,SPEEDING 88 IN A 55 MPH HIGHWAY ZONE,SPEED 31-35 OVER LIMIT
2,SPEEDING 25 MPH PLUS,SPEED - GENERAL
3,SPEEDING 50/35 IN DOWNTOWN TRAFFIC,SPEED 16-20 OVER LIMIT
4,SPEEDING AT 95 MPH IN A 65 MPH INTERSTATE ZONE,SPEED 31-35 OVER LIMIT


## PROMPTS