Currently we ask the LLM to identify and extract address data in a single prompt, sometimes when you want to ask for something to be done in more detail (i.e. format suite numbers in a particular way), it doesn't quite get the message.
I wonder if making this two steps would improve the process? It's cheap enough that adding the extra prompts would probably not increase cost much, particularly with token caching for the same prompts.
- extract address blocks from website text
- extract structured address data from address blocks
Bonus: could we train a model that is particularly good at extracting address structured data from address blocks? No clue if we could do this with hosted tools or if we'd have to train something locally.
Currently we ask the LLM to identify and extract address data in a single prompt, sometimes when you want to ask for something to be done in more detail (i.e. format suite numbers in a particular way), it doesn't quite get the message.
I wonder if making this two steps would improve the process? It's cheap enough that adding the extra prompts would probably not increase cost much, particularly with token caching for the same prompts.
Bonus: could we train a model that is particularly good at extracting address structured data from address blocks? No clue if we could do this with hosted tools or if we'd have to train something locally.