## Lead Parser Using DsPy Library

In this notebook we have successfully implemented a Lead Parser using the DsPy library. 

DSPy emphasizes programming over prompting. It unifies techniques for prompting and fine-tuning LMs as well as improving them with reasoning and tool/retrieval augmentation, all expressed through a minimalistic set of Pythonic operations that compose and learn.

DSPy provides composable and declarative modules for instructing LMs in a familiar Pythonic syntax. On top of that, DSPy introduces an automatic compiler that teaches LMs how to conduct the declarative steps in your program. The DSPy compiler will internally trace your program and then craft high-quality prompts for large LMs (or train automatic finetunes for small LMs) to teach them the steps of your task.

In [10]:
import os
import dspy

In [11]:
# Set up the LM.
turbo = dspy.OpenAI(
    model="gpt-3.5-turbo-instruct",
    max_tokens=250,
    api_key=os.getenv("OPENAI_API_KEY"),
)
dspy.settings.configure(lm=turbo)

In [19]:
# defining the signature, which contains the input and output fields along with the description of the signature/task

class ExtractLeadInfo(dspy.Signature):
    """Extract lead information from raw text."""
    raw_text = dspy.InputField(desc="Raw text containing demographic info")
    first_name = dspy.OutputField(desc="First name")
    last_name = dspy.OutputField(desc="Last name")
    phone = dspy.OutputField(desc="Phone number")
    email = dspy.OutputField(desc="Email address")
    address = dspy.OutputField(desc="Postal address")
    gender = dspy.OutputField(desc="Gender")


In [13]:
# import signature_to_template
from dspy.signatures.signature import signature_to_template

In [14]:
lead_parser_template = signature_to_template(ExtractLeadInfo)
print(lead_parser_template)

Template(Extract lead information from raw text., ['Raw Text:', 'First Name:', 'Last Name:', 'Phone:', 'Email:', 'Address:', 'Gender:'])


In [20]:
# create a chain of thought module which will use the signature and the template to extract the lead information from the raw text

class CoT(dspy.Module):
    def __init__(self):
        super().__init__()
        self.prog = dspy.ChainOfThought(ExtractLeadInfo)
        
    def forward(self, raw_text):
        return self.prog(raw_text=raw_text)


In [16]:
c = CoT()

In [17]:
raw_text = "John Doe, phone: 123-456-7890, email: john.doe@example.com, address: 123 Main St, gender: male"

In [18]:
# result of the chain of thought module
c.forward(raw_text)

Prediction(
    rationale='produce the gender. We first need to extract the demographic information from the raw text. This includes the first name, last name, phone number, email address, address, and gender. We can use the commas to separate the different pieces of information and then assign them to their respective categories.',
    first_name='John',
    last_name='Doe',
    phone='123-456-7890',
    email='john.doe@example.com',
    address='123 Main St',
    gender='Male'
)