Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ivan/tutorial cod #193

Merged
merged 21 commits into from
Nov 20, 2023
Merged

Ivan/tutorial cod #193

merged 21 commits into from
Nov 20, 2023

Conversation

ivanleomk
Copy link
Collaborator

@ivanleomk ivanleomk commented Nov 19, 2023

Summary by CodeRabbit

  • Documentation

    • Added guidance on accessing the original response in code documentation.
    • Introduced a new tutorial on structured data handling with Pydantic and OpenAI.
    • Expanded tutorials with practical examples of schema engineering and prompt crafting.
    • Detailed the use of Retrieval Augmented Generation (RAG) models and structured output.
    • Updated knowledge graph creation and visualization tutorial with new methods and functions.
    • Enhanced validation techniques tutorial with deterministic and probabilistic approaches.
    • Updated the README with installation instructions and troubleshooting tips.
  • New Features

    • Integrated the instructor library for improved data structure and validation in tutorials.
  • Refactor

    • Improved existing Jupyter notebooks with additional content and practical examples.
  • Chores

    • Updated project dependencies to include new libraries and packages.

Copy link
Contributor

coderabbitai bot commented Nov 19, 2023

## Walkthrough
The updates across various documents and notebooks introduce a comprehensive guide to using the `instructor` library alongside Pydantic and OpenAI for structured data handling, validation, and knowledge graph creation. The changes include examples of accessing raw responses, defining data schemas, integrating OpenAI's API for data generation, and leveraging language models for content validation. Additionally, the tutorials now cover the creation and visualization of knowledge graphs, and the README provides detailed setup instructions.

## Changes

| File Path | Change Summary |
|-----------|----------------|
| `docs/index.md` | Added note and example for accessing the original response using `_raw_response`. |
| `tutorials/1...ipynb`, `tutorials/2...ipynb`, `tutorials/3...ipynb`, `tutorials/4...ipynb`, `tutorials/5...ipynb` | Introduced `instructor` library usage, Pydantic schema definitions, OpenAI API integration for JSON data, knowledge graph creation and visualization, and validation techniques. |
| `tutorials/README.md` | Updated with an introduction to the Instructor library, installation instructions, and Graphviz troubleshooting. |
| `tutorials/article.txt` | Added a news article about celebrities advocating for body positivity. |
| `tutorials/requirements.txt` | Included new dependencies: `ipykernel`, `jupyter`, `instructor`, `openai>=1.1.0`, `pydantic`, `graphviz`, `spacy`, `nltk`. |

> 🐇 In the season of fall, where leaves may drift,  
> We've woven code and docs with a scholarly lift.  
> With Pydantic and AI, our knowledge graphs bloom,  
> As `instructor` guides us, through data's vast room. 🍂📚

Tips

Chat with CodeRabbit Bot (@coderabbitai)

  • If you reply to a review comment from CodeRabbit, the bot will automatically respond.
  • To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
  • Note: Review comments are made on code diffs or files, not on the PR overview.
  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • The JSON schema for the configuration file is available here.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between 1144f60 and fd829d2.
Files selected for processing (9)
  • docs/index.md (1 hunks)
  • tutorials/1.introduction.ipynb (1 hunks)
  • tutorials/2.tips.ipynb (1 hunks)
  • tutorials/3.applications-rag.ipynb (1 hunks)
  • tutorials/4.knowledge-graphs.ipynb (1 hunks)
  • tutorials/5.validation.ipynb (1 hunks)
  • tutorials/README.md (1 hunks)
  • tutorials/article.txt (1 hunks)
  • tutorials/requirements.txt (1 hunks)
Files skipped from review due to trivial changes (3)
  • tutorials/2.tips.ipynb
  • tutorials/README.md
  • tutorials/article.txt
Additional comments: 28
docs/index.md (1)
  • 68-86: The addition of the note on accessing the original response using the _raw_response attribute is helpful for users who need to access metadata or other information not captured by the Pydantic model. The example provided is clear and demonstrates how to perform an assertion to check the type of the raw response. This is a valuable piece of documentation for users who may need to debug or extend the functionality of their applications.
tutorials/requirements.txt (1)
  • 1-8: The addition of these dependencies aligns with the pull request's summary, which mentions the use of Pydantic for data validation, OpenAI's API for structured data handling, and Graphviz for visualizations. It's important to ensure that the versions of these libraries are compatible with each other and with the rest of the project's dependencies. Additionally, the inclusion of openai>=1.1.0 suggests that there are features or fixes in version 1.1.0 or later that are necessary for the tutorials, so it's good to verify that this version or newer is indeed available and stable.
tutorials/1.introduction.ipynb (14)
  • 1-13: The introduction provides a good context for the problem that the notebook aims to solve. It's clear and sets the stage for the use of Pydantic and OpenAI's API.

  • 20-22: The explanation of the problem with raw JSON handling is clear and sets up the rationale for using Pydantic well.

  • 26-34: This code block demonstrates the initial problem with raw JSON data. It's a good example that shows potential inconsistencies in data types.

  • 45-75: This code block illustrates the type of errors that can occur when working with raw JSON data. It's a practical demonstration of the issues described earlier.

  • 95-119: Here, the Person class is defined using Pydantic, which is a good example of how to create a simple data model with type annotations.

  • 123-142: This code block demonstrates how Pydantic can cast data to the correct types, which is a key feature for data validation.

  • 166-187: This code block demonstrates Pydantic's validation error messages, which are helpful for debugging and understanding validation issues.

  • 204-232: The integration with OpenAI's API is shown here, demonstrating how to extract structured data from a model's response.

  • 236-259: Another example of using OpenAI's API to extract structured data, this time with a slightly different input.

  • 270-313: This code block introduces a more complex schema with an additional birthday field and demonstrates how to handle temporal context in data extraction.

  • 331-370: The function calling feature is explained and demonstrated here. It's a powerful concept that allows for more precise control over the output of language models.

  • 380-402: The use of Pydantic to define and document schemas is shown, which is a valuable feature for maintaining clear and understandable codebases.

  • 413-454: This code block demonstrates how to define nested schemas and add documentation, which is useful for complex data structures.

  • 466-499: The instructor library is introduced here, showing how it can be used to enhance the OpenAI SDK with additional functionality.

Overall, the notebook is well-structured and provides a comprehensive introduction to using Pydantic with OpenAI's API for structured data handling. It also effectively introduces the instructor library and its benefits. The examples are practical and clearly demonstrate the concepts being taught.

tutorials/5.validation.ipynb (12)
  • 1-22: The introduction provides a clear explanation of the purpose and types of validation, setting the stage for the rest of the notebook. It's important to ensure that the links provided (like the one to Pydantic validators) are up-to-date and accessible.

  • 40-43: The example validation function is simple and demonstrates the basic structure of a validator. It's good practice to include such examples for clarity.

  • 51-57: The validation process is well-explained, breaking down the steps involved in validation. This helps in understanding the flow of validation in the upcoming sections.

  • 65-78: The applications of validators are well-chosen, covering both simple and complex scenarios. This showcases the versatility of the validation framework being discussed.

  • 112-115: The code snippet demonstrates how to patch the OpenAI client with the instructor library. It's crucial that the instructor library is well-documented and that its integration with the OpenAI client is seamless.

  • 165-171: The blacklist is a simple set of strings. It's important to note that in a real-world application, this list might need to be more comprehensive and possibly stored externally in a database or a configuration file for easier management.

  • 200-216: The use of Pydantic's field_validator is a good example of implementing rule-based validation. The error handling is done correctly by raising a ValueError when a blacklisted word is found.

  • 353-361: Setting a maximum word count is a good example of a simple rule-based validator. However, it's important to consider the context in which this validator is used, as some applications might require longer responses.

  • 425-439: The use of context in validation is a sophisticated feature. It's important to ensure that the context is always provided where necessary and that the error messages are clear and actionable.

  • 601-605: The Validation class is a good example of a custom response model for use with the instructor library. It's important to ensure that the error_message field is always populated with a useful message when is_valid is False.

  • 716-718: The integration of validation with the OpenAI API is well-explained. It's important to ensure that the max_retries feature is tested thoroughly to handle cases where the LLM might not be able to provide a valid response even after several attempts.

  • 776-785: The conclusion provides a good summary and actionable next steps for the reader. It's important to ensure that any external resources or tasks mentioned are well-documented and accessible.

Overall, the notebook is well-structured and provides a comprehensive guide to validation with LLMs. The use of the instructor library appears to simplify the integration and management of validation logic, which could be beneficial for developers working with OpenAI's API. It's important to ensure that all external libraries and APIs used in the notebook are maintained and compatible with the code examples provided.

tutorials/1.introduction.ipynb Outdated Show resolved Hide resolved
tutorials/5.validation.ipynb Outdated Show resolved Hide resolved
tutorials/5.validation.ipynb Outdated Show resolved Hide resolved
@ivanleomk
Copy link
Collaborator Author

Originally forked off the tutorials branch hence the long chain

tutorials/article.txt Outdated Show resolved Hide resolved
Copy link
Collaborator

@jxnl jxnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. lets move to gpt-4-1106-preview so its a bit faster

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits Files that changed from the base of the PR and between fd829d2 and 54536f6.
Files selected for processing (1)
  • tutorials/assets/article.txt (1 hunks)
Files skipped from review due to trivial changes (1)
  • tutorials/assets/article.txt

@ivanleomk ivanleomk merged commit 4e4cc52 into main Nov 20, 2023
5 of 7 checks passed
@ivanleomk ivanleomk deleted the ivan/tutorial-cod branch November 20, 2023 14:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants