Ivan/tutorial cod #193

ivanleomk · 2023-11-19T14:29:20Z

Summary by CodeRabbit

Documentation
- Added guidance on accessing the original response in code documentation.
- Introduced a new tutorial on structured data handling with Pydantic and OpenAI.
- Expanded tutorials with practical examples of schema engineering and prompt crafting.
- Detailed the use of Retrieval Augmented Generation (RAG) models and structured output.
- Updated knowledge graph creation and visualization tutorial with new methods and functions.
- Enhanced validation techniques tutorial with deterministic and probabilistic approaches.
- Updated the README with installation instructions and troubleshooting tips.
New Features
- Integrated the instructor library for improved data structure and validation in tutorials.
Refactor
- Improved existing Jupyter notebooks with additional content and practical examples.
Chores
- Updated project dependencies to include new libraries and packages.

Co-authored-by: Jason Liu <jxnl@users.noreply.github.com>

…running the files

coderabbitai · 2023-11-19T14:29:26Z

## Walkthrough
The updates across various documents and notebooks introduce a comprehensive guide to using the `instructor` library alongside Pydantic and OpenAI for structured data handling, validation, and knowledge graph creation. The changes include examples of accessing raw responses, defining data schemas, integrating OpenAI's API for data generation, and leveraging language models for content validation. Additionally, the tutorials now cover the creation and visualization of knowledge graphs, and the README provides detailed setup instructions.

## Changes

| File Path | Change Summary |
|-----------|----------------|
| `docs/index.md` | Added note and example for accessing the original response using `_raw_response`. |
| `tutorials/1...ipynb`, `tutorials/2...ipynb`, `tutorials/3...ipynb`, `tutorials/4...ipynb`, `tutorials/5...ipynb` | Introduced `instructor` library usage, Pydantic schema definitions, OpenAI API integration for JSON data, knowledge graph creation and visualization, and validation techniques. |
| `tutorials/README.md` | Updated with an introduction to the Instructor library, installation instructions, and Graphviz troubleshooting. |
| `tutorials/article.txt` | Added a news article about celebrities advocating for body positivity. |
| `tutorials/requirements.txt` | Included new dependencies: `ipykernel`, `jupyter`, `instructor`, `openai>=1.1.0`, `pydantic`, `graphviz`, `spacy`, `nltk`. |

> 🐇 In the season of fall, where leaves may drift,  
> We've woven code and docs with a scholarly lift.  
> With Pydantic and AI, our knowledge graphs bloom,  
> As `instructor` guides us, through data's vast room. 🍂📚

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

If you reply to a review comment from CodeRabbit, the bot will automatically respond.
To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
Note: Review comments are made on code diffs or files, not on the PR overview.
Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

coderabbitai

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 1144f60 and fd829d2.

Files selected for processing (9)

docs/index.md (1 hunks)
tutorials/1.introduction.ipynb (1 hunks)
tutorials/2.tips.ipynb (1 hunks)
tutorials/3.applications-rag.ipynb (1 hunks)
tutorials/4.knowledge-graphs.ipynb (1 hunks)
tutorials/5.validation.ipynb (1 hunks)
tutorials/README.md (1 hunks)
tutorials/article.txt (1 hunks)
tutorials/requirements.txt (1 hunks)

Files skipped from review due to trivial changes (3)

tutorials/2.tips.ipynb
tutorials/README.md
tutorials/article.txt

Additional comments: 28

docs/index.md (1)

68-86: The addition of the note on accessing the original response using the _raw_response attribute is helpful for users who need to access metadata or other information not captured by the Pydantic model. The example provided is clear and demonstrates how to perform an assertion to check the type of the raw response. This is a valuable piece of documentation for users who may need to debug or extend the functionality of their applications.

tutorials/requirements.txt (1)

1-8: The addition of these dependencies aligns with the pull request's summary, which mentions the use of Pydantic for data validation, OpenAI's API for structured data handling, and Graphviz for visualizations. It's important to ensure that the versions of these libraries are compatible with each other and with the rest of the project's dependencies. Additionally, the inclusion of openai>=1.1.0 suggests that there are features or fixes in version 1.1.0 or later that are necessary for the tutorials, so it's good to verify that this version or newer is indeed available and stable.

tutorials/1.introduction.ipynb (14)

1-13: The introduction provides a good context for the problem that the notebook aims to solve. It's clear and sets the stage for the use of Pydantic and OpenAI's API.

20-22: The explanation of the problem with raw JSON handling is clear and sets up the rationale for using Pydantic well.

26-34: This code block demonstrates the initial problem with raw JSON data. It's a good example that shows potential inconsistencies in data types.

45-75: This code block illustrates the type of errors that can occur when working with raw JSON data. It's a practical demonstration of the issues described earlier.

95-119: Here, the Person class is defined using Pydantic, which is a good example of how to create a simple data model with type annotations.

123-142: This code block demonstrates how Pydantic can cast data to the correct types, which is a key feature for data validation.

166-187: This code block demonstrates Pydantic's validation error messages, which are helpful for debugging and understanding validation issues.

204-232: The integration with OpenAI's API is shown here, demonstrating how to extract structured data from a model's response.

236-259: Another example of using OpenAI's API to extract structured data, this time with a slightly different input.

270-313: This code block introduces a more complex schema with an additional birthday field and demonstrates how to handle temporal context in data extraction.

331-370: The function calling feature is explained and demonstrated here. It's a powerful concept that allows for more precise control over the output of language models.

380-402: The use of Pydantic to define and document schemas is shown, which is a valuable feature for maintaining clear and understandable codebases.

413-454: This code block demonstrates how to define nested schemas and add documentation, which is useful for complex data structures.

466-499: The instructor library is introduced here, showing how it can be used to enhance the OpenAI SDK with additional functionality.

Overall, the notebook is well-structured and provides a comprehensive introduction to using Pydantic with OpenAI's API for structured data handling. It also effectively introduces the instructor library and its benefits. The examples are practical and clearly demonstrate the concepts being taught.

tutorials/5.validation.ipynb (12)

1-22: The introduction provides a clear explanation of the purpose and types of validation, setting the stage for the rest of the notebook. It's important to ensure that the links provided (like the one to Pydantic validators) are up-to-date and accessible.

40-43: The example validation function is simple and demonstrates the basic structure of a validator. It's good practice to include such examples for clarity.

51-57: The validation process is well-explained, breaking down the steps involved in validation. This helps in understanding the flow of validation in the upcoming sections.

65-78: The applications of validators are well-chosen, covering both simple and complex scenarios. This showcases the versatility of the validation framework being discussed.

112-115: The code snippet demonstrates how to patch the OpenAI client with the instructor library. It's crucial that the instructor library is well-documented and that its integration with the OpenAI client is seamless.

165-171: The blacklist is a simple set of strings. It's important to note that in a real-world application, this list might need to be more comprehensive and possibly stored externally in a database or a configuration file for easier management.

200-216: The use of Pydantic's field_validator is a good example of implementing rule-based validation. The error handling is done correctly by raising a ValueError when a blacklisted word is found.

353-361: Setting a maximum word count is a good example of a simple rule-based validator. However, it's important to consider the context in which this validator is used, as some applications might require longer responses.

425-439: The use of context in validation is a sophisticated feature. It's important to ensure that the context is always provided where necessary and that the error messages are clear and actionable.

601-605: The Validation class is a good example of a custom response model for use with the instructor library. It's important to ensure that the error_message field is always populated with a useful message when is_valid is False.

716-718: The integration of validation with the OpenAI API is well-explained. It's important to ensure that the max_retries feature is tested thoroughly to handle cases where the LLM might not be able to provide a valid response even after several attempts.

776-785: The conclusion provides a good summary and actionable next steps for the reader. It's important to ensure that any external resources or tasks mentioned are well-documented and accessible.

Overall, the notebook is well-structured and provides a comprehensive guide to validation with LLMs. The use of the instructor library appears to simplify the integration and management of validation logic, which could be beneficial for developers working with OpenAI's API. It's important to ensure that all external libraries and APIs used in the notebook are maintained and compatible with the code examples provided.

tutorials/1.introduction.ipynb

tutorials/5.validation.ipynb

ivanleomk · 2023-11-19T14:36:54Z

Originally forked off the tutorials branch hence the long chain

tutorials/article.txt

jxnl

lets move to gpt-4-1106-preview so its a bit faster

…sistent with the other notebooks

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between fd829d2 and 54536f6.

Files selected for processing (1)

tutorials/assets/article.txt (1 hunks)

Files skipped from review due to trivial changes (1)

tutorials/assets/article.txt

jxnl and others added 17 commits November 7, 2023 18:02

add tutorials

b4e2126

add docs

ea91e99

add rag

144c792

Merge branch 'main' into tutorials

df20cb4

small fixes to tutorials (#168)

1ec9114

Co-authored-by: Jason Liu <jxnl@users.noreply.github.com>

update tutorials

9905497

Merge branch 'main' into tutorials

d1372cf

add graph

075d6b9

update docs

e7b5992

first version validation tutorial (#180)

37d82ae

clean up kg

ea6de65

Tutorials creative acts in documentation (#191)

9774df5

Merge branch 'main' into tutorials

90d8a8c

Fixed up some import issues, README and a requirements.txt for users …

0928ce7

…running the files

Removed extra spacing in the Applications.rag ipynb

8368f65

Tidied up Knowledge Graphs

e101af7

First draft of Chain Of Density

fd829d2

migrated spacy and nltk to requirements.txt

20c50f0

coderabbitai bot reviewed Nov 19, 2023

View reviewed changes

tutorials/1.introduction.ipynb Outdated Show resolved Hide resolved

tutorials/5.validation.ipynb Outdated Show resolved Hide resolved

tutorials/5.validation.ipynb Outdated Show resolved Hide resolved

Merge branch 'main' into ivan/tutorial-cod

41e258c

Merge branch 'main' into ivan/tutorial-cod

a1ca42c

jxnl reviewed Nov 19, 2023

View reviewed changes

tutorials/article.txt Outdated Show resolved Hide resolved

jxnl approved these changes Nov 19, 2023

View reviewed changes

Fixed up the renaming and migrated over to chain-of-density to be con…

54536f6

…sistent with the other notebooks

coderabbitai bot reviewed Nov 20, 2023

View reviewed changes

ivanleomk merged commit 4e4cc52 into main Nov 20, 2023
5 of 7 checks passed

ivanleomk deleted the ivan/tutorial-cod branch November 20, 2023 14:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ivan/tutorial cod #193

Ivan/tutorial cod #193

ivanleomk commented Nov 19, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 19, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

coderabbitai bot left a comment

ivanleomk commented Nov 19, 2023

jxnl left a comment

coderabbitai bot left a comment

Ivan/tutorial cod #193

Ivan/tutorial cod #193

Conversation

ivanleomk commented Nov 19, 2023 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Nov 19, 2023 • edited Loading

Chat with CodeRabbit Bot (@coderabbitai)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

coderabbitai bot left a comment

Choose a reason for hiding this comment

ivanleomk commented Nov 19, 2023

jxnl left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

ivanleomk commented Nov 19, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 19, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Configration File (`.coderabbit.yaml`)