---
title: "Structuring Enums for Flawless LLM results with Instructor"
image: images/blog-20240706/header.png
author: Wayde Gilliam
date: "2024-07-06"
description: Enums enhance code readability and maintainability by replacing hard-coded constants with
  meaningful names and restricting variables to predefined values that can be used across all tiers of your application.
  This reduces bugs and improves code clarity, making it easier to refactor and understand. However, it can be frustrating
  to get LLMs and libraries like Instructor to use them correctly when dealing with structured output.

categories:
  - LLMs
  - pydantic
  - Instructor

toc: true
hide: false
search: true

output-file: 2024-07-06-llms-and-enums.html
---


## Instructor Best Practices and Cautions

<div>

I'm spending some time with the <a href="https://x.com/jxnlco" target="_blank">Jason Liu</a>'s <a href="https://useinstructor.com/" target="_blank">Instructor library</a> in building a function calling solution that returns structured output because, well, Hamel recommends it for proprietary models.

> <img src="images/blog-20240603/hamel-icon.png" alt="Hamel" style="border-radius:30px;width:auto;height:25px;"> For open models you should use outlines. for closed models APIs you should use instructor.

The library is intuitive, fun to use, and has some really nice documentation. When it comes to choosing whether to use enums or literals in your pydantic classes, <a href="https://python.useinstructor.com/tutorials/2-tips/?h=enums" target="_blank">the docs recommend the following</a>:

> For classification we've found theres generally two methods of modeling.
>
> 1. using Enums
> 2. using Literals
>
> Use an enum in Python when you need a set of named constants that are related and you want to ensure type safety, readability, and prevent
> invalid values. Enums are helpful for grouping and iterating over these constants.
>
> Use literals when you have a small, unchanging set of values that you don't need to group or iterate over, and when type safety and
> preventing invalid values is less of a concern. Literals are simpler and more direct for basic, one-off values.

... and they also seems to indicate that <a href="https://python.useinstructor.com/concepts/prompting/?h=enums#tips-for-enumerations" target="_blank">getting them to work as expected might be challenging</a> ...

> If you're having a hard time with Enum an alternative is to use Literal

I found this out first-hand when I was attempting to define an enum for a number of named entities I wanted an LLM to identifiy in a given document. My intial code worked pretty nicely with GPT-4o but failed miserabley time and time again with every Antrhopic model I tried (I'll explain why below). If you're looking for the TL;DR, the final version of my code at the end of this post represents a substantially more resiliant solution that works across vendors (I also tested this with <a href="https://fireworks.ai/" target="_blank">Fireworks</a>), offering a better guaranttee your LLM calls find the entities you care about correctly.

</div>


## v0: Using `Enum`

This is the initial `Enum` and pydantic classes I started with. It works pretty damn well with OpenAI's GPT-4o but fails spectacularly when using any of the Anthopic models.


When using the Anthropic models, I would consistently see it trying to set `entity_group` to a string rather than a proper enum value from the `EntityGroup` enum.

After iterating through a number of prompt and class/field description modifications, I decided to give up and replace my `Enum` with a `Literal`. And guess what, everything worked great across all model vendors.

I also decided to lookup the named entities used in Spacy and use those names in my `Enum` as it makes sense to me that perhaps these libraries might have been included in the training of these LLMs and so maybe will help it do a better job of finding the entities I care about.


## v1: Using `Literal`

Using the `Literal` type fixed everything and works great across all models! Here's what it looks like:


This works great ... but I really wanted to use an `Enum` for the reasons listed at the top of this post. And as I'm the kinda guy who enjoys fighting with CUDA installs on his local DL rig, I decided to give it a go after taking a few hours off to enjoy the Euros and Copa America tourneys (also Germany should have won; that was a handball but nah, I'm not angry, nope, not bent at all).


## v2: Using `Enum` Revisted

Here's the TL;DR version of the code. This version is working fabulously across all APIs and I have yet to encounter a single exception involving Instructor being unable to assign a valid value from the `Enum`.


Besides the return of the `Enum`, the most noticeable change involves the inclusion of a `BeforeValidator` that ensures the value is assigned to a valid enum as defined in `NamedEntity`. In cases where it wants to add an entity to the list of `named_entities` that isn't defined in the `NamedEntityType` enum or is named differently (e.g., "ORGANIZATION" vs. "ORG"), it will assign them to `OTHER`.

With this in place, I now have a solution that is:

1. More resiliant

2. Can be used in debugging named entity recogintion (e.g, I can explore what named entities might be missing from the `Enum` or getting named differently by looking at those that get associated with the `OTHER` value)

3. I can use that same beautiful `Enum` across all parts of my application


## v2.0.1: Using `Enum` and `fuzzywuzzy`

A suggestion from a Twitter user inspired me to enhance our approach by implementing similarity-based matching rather than relying on exact matches. To make it so, I installed the `fuzzywuzzy` library and made the necessary modifications to increase the likelihood of delivering high-quality results.


This improves those cases where, for example, the LLM wants to define the entity type as "ORGANIZATION" but it is defined in the `Enum` as "ORG".

Another option potentially worth exploring is to use the `llm_validator` function to make a call out to the LLM when exceptions happen and prompt it to coerce the value into something in the `Enum`. This could hike up your costs a bit but I imagine using a cheap model like GPT-3.5-Turbo could do the job just fine, and would likely you give an addtional robustness in quality results.


## Conclusion

That's it.

If you found this helpful and/or have suggestions on how to improve the use of `Enum`s in Instructor, lmk in the comments below or on <a href="https://x.com/waydegilliam" target="_blank">X</a>. Until then, time to enjoy some football and see if Brazil can make it into the semis.
