Skip to content

Experiment with two-step address extraction #4

@nickoneill

Description

@nickoneill

Currently we ask the LLM to identify and extract address data in a single prompt, sometimes when you want to ask for something to be done in more detail (i.e. format suite numbers in a particular way), it doesn't quite get the message.

I wonder if making this two steps would improve the process? It's cheap enough that adding the extra prompts would probably not increase cost much, particularly with token caching for the same prompts.

  1. extract address blocks from website text
  2. extract structured address data from address blocks

Bonus: could we train a model that is particularly good at extracting address structured data from address blocks? No clue if we could do this with hosted tools or if we'd have to train something locally.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions