Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft Pydantic Intro #271

Closed
wants to merge 15 commits into from
Closed

Draft Pydantic Intro #271

wants to merge 15 commits into from

Conversation

jxnl
Copy link
Owner

@jxnl jxnl commented Dec 12, 2023

No description provided.

Copy link
Contributor

coderabbitai bot commented Dec 12, 2023

Important

Auto Review Skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository.

To trigger a single review, invoke the @coderabbitai review command.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on X ?


Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • You can reply to a review comment made by CodeRabbit.
  • You can tag CodeRabbit on specific lines of code or files in the PR by tagging @coderabbitai in a comment.
  • You can tag @coderabbitai in a PR comment and ask one-off questions about the PR and the codebase. Use quoted replies to pass the context for follow-up questions.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

Copy link

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice first draft 👍


# Steering Large Language Models with Pydantic

In the past year, significant progress has been made in utilizing large language models. Prompt engineering, in particular, has gained attention, and new prompting techniques are being developed to guide language models toward specific tasks. While many are building chat bots, an even more exciting application is the generation of structured outputs, whether its extracting structured data, augmenting your RAG application, or even generating

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you meant to finish the "or even generating..." at the end here, but otherwise I like this as an intro paragraph. As someone who hasn't worked with LLMs a bunch, it could be helpful to add an additional sentence going into more detail about what "prompt engineering" is.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added!

docs/blog/posts/pydantic.md Outdated Show resolved Hide resolved
docs/blog/posts/pydantic.md Outdated Show resolved Hide resolved
docs/blog/posts/pydantic.md Outdated Show resolved Hide resolved
docs/blog/posts/pydantic.md Show resolved Hide resolved
Copy link

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like the changes you've made!

docs/blog/posts/pydantic.md Outdated Show resolved Hide resolved
docs/blog/posts/pydantic.md Outdated Show resolved Hide resolved
docs/blog/posts/pydantic.md Outdated Show resolved Hide resolved
docs/blog/posts/pydantic.md Show resolved Hide resolved
docs/blog/posts/pydantic.md Show resolved Hide resolved
docs/blog/posts/pydantic.md Show resolved Hide resolved

In the past year, significant progress has been made in utilizing large language models. Prompt engineering, in particular, has gained attention, and new prompting techniques are being developed to guide language models toward specific tasks. While many are building chat bots, an even more exciting application is the generation of structured outputs, whether its extracting structured data, augmenting your RAG application, or even generating synthetic data.

!!! question "What is Prompt Engineering?"
Copy link

@dmontagu dmontagu Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like most people have heard the term "Prompt Engineering" now so kind of are familiar with the concept, but if the goal with this block is really to clarify for people who aren't familiar, I personally would be inclined to try to phrase it in a less technical way. I read this and it almost sounds like some advanced machine learning technique, maybe even involving math, etc. lol. And the linked article also talks about it in highly technical language that I expect would be confusing for someone who assumes that AI/machine learning is "above their pay grade".

Like, I am assuming the audience of this article might find this intimidating (screenshot from the link below):
image

Whereas I feel like my intuition for prompt engineering is closer to "Creatively rephrasing how you pose your question to the AI until it consistently gives responses you want". I understand how that might seem reductive, but I feel like it gives a better intuition and is more inviting to newcomers. And I think once you get that "prompt engineering" is something that's ultimately very intuitive (anyone who has used ChatGPT for more than a few questions has probably independently "discovered" the idea of prompt engineering) it's still easy to appreciate more sophisticated techniques, like using Pydantic models to validate outputs/guide inputs.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess best to say something like "You can think of it very technically like ..., or as just creatively rephrasing how you pose your question to the AI until it consistently gives responses you want".

jxnl and others added 2 commits December 14, 2023 11:38
Co-authored-by: David Montague <35119617+dmontagu@users.noreply.github.com>
Copy link

@dmontagu dmontagu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great.

These notes are mostly minor stylistic things, I'll share "bigger-picture" thoughts to you directly.

docs/blog/posts/pydantic.md Outdated Show resolved Hide resolved
docs/blog/posts/pydantic.md Outdated Show resolved Hide resolved
docs/blog/posts/pydantic.md Outdated Show resolved Hide resolved
client = OpenAI()


class Package(BaseModel):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class Package(BaseModel):
class PythonPackage(BaseModel):

and elsewhere below, or change where it says PythonPackage above to Package. But I think PythonPackage is clearer

docs/blog/posts/pydantic.md Outdated Show resolved Hide resolved
docs/blog/posts/pydantic.md Outdated Show resolved Hide resolved
docs/blog/posts/pydantic.md Outdated Show resolved Hide resolved
docs/blog/posts/pydantic.md Outdated Show resolved Hide resolved
docs/blog/posts/pydantic.md Outdated Show resolved Hide resolved
)
```

Now, by using the `response_model` argument, inspired by `FastAPI`, you can specify the desired output, and `instructor` will take care of the rest!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Now, by using the `response_model` argument, inspired by `FastAPI`, you can specify the desired output, and `instructor` will take care of the rest!
Now, by using the `response_model` argument (inspired by `FastAPI`) you can specify the desired output and `instructor` will take care of the rest!

Copy link

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few tiny things, but overall I think this looks great.


In the past year, significant progress has been made in utilizing large language models. Prompt engineering, in particular, has gained attention, and new prompting techniques are being developed to guide language models toward specific tasks. While many are building chat bots, an even more exciting application is the generation of structured outputs, whether its extracting structured data, augmenting your RAG application, or even generating synthetic data.

!!! question "What is Prompt Engineering?"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess best to say something like "You can think of it very technically like ..., or as just creatively rephrasing how you pose your question to the AI until it consistently gives responses you want".


## Pydantic

Unlike libraries like `dataclasses`, `Pydantic` goes a step further and defines a schema for your dataclass. This schema is used to validate data, but also to generate documentation and even to generate a JSON schema, which is perfect for our use case of generating structured data with language models!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically dataclasses isn't a library, but part of the standard library.

I would say pydantic goes further, in that it 1) enfoces the types in your class and 2) can generate JSON schema for your class.

docs/blog/posts/pydantic.md Outdated Show resolved Hide resolved
Co-authored-by: Samuel Colvin <s@muelcolvin.com>
@jxnl jxnl closed this Dec 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants