Blog: Mastering Caching #219

jxnl · 2023-11-24T21:41:41Z

Summary by CodeRabbit

New Features
- Published a new blog post titled "How can I effectively cache my functions in Python?"
- Introduced a comprehensive guide on caching strategies in Python, including in-memory, disk-based, and distributed caching with Redis.
Documentation
- Updated blog posts with minor hyperlink corrections and improved wording.
- Enhanced formatting and updated URLs in the "rag-and-beyond" blog post.
Bug Fixes
- Fixed a typo from "start" to "star" in the blog post citations section.
Refactor
- Implemented caching examples using functools, diskcache, and Redis for optimizing function performance in Python.

coderabbitai · 2023-11-24T21:41:49Z

Walkthrough

The recent updates introduce a series of blog posts and code examples focused on caching strategies in Python, particularly for functions returning Pydantic models. The posts cover in-memory caching with functools.cache, persistent caching with diskcache, and distributed caching using Redis. Each strategy is accompanied by code examples and a new caching decorator, instructor_cache. Minor corrections and formatting improvements are also made across various documents.

Changes

File Path	Change Summary
`docs/blog/index.md`	Added a new post titled "How can I effectively cache my functions in Python?"
`docs/blog/posts/caching.md`	Introduced a comprehensive guide on caching strategies in Python with code examples and a new author.
`docs/blog/posts/citations.md`	Corrected a typo from "start" to "star" in a hyperlink text.
`docs/blog/posts/rag-and-beyond.md`	Formatting fixes, updated URLs, improved wording, and updated example JSON output.
`examples/caching/diskcache.py` `examples/caching/example_diskcache.py`	Added code for `diskcache` caching, including a new `instructor_cache` decorator and a `UserDetail` Pydantic model.
`examples/caching/lru.py`	Added `instructor` module usage and modified `extract` function for in-memory caching with timing tests.
`examples/caching/redis.py` `examples/caching/example_redis.py`	Introduced Redis caching mechanism with `instructor_cache` decorator and added a `test_extract` function.
`.gitignore`	Added the directory "my_cache_directory/" to the repository.
`docs/concepts/caching.md`	Introduced sections discussing different caching techniques and the `instructor_cache` decorator.

Poem

In the burrows of code, where the data hops around, 🐇💻
A rabbit devised a cache, without a sound. 🤫
"Store it in memory, on disk, or in Redis' vast space, 🧠💾🌐
Quick retrieval for all, let's pick up the pace!" 🚀
With decorators draped, and models so fine,
The functions now swift, like the stars they align. ✨
So here's to the changes, may they serve you right,
Happy coding to all, and to all a good byte! 🌙💤

Tips

Chat with CodeRabbit Bot (`@coderabbitai`)

If you reply to a review comment from CodeRabbit, the bot will automatically respond.
To engage with CodeRabbit bot directly around the specific lines of code in the PR, mention @coderabbitai in your review comment
Note: Review comments are made on code diffs or files, not on the PR overview.
Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Note: For conversation with the bot, please use the review comments on code diffs or files.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

coderabbitai

Review Status

Actionable comments generated: 11

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 7de55a9 and d837228.

Files selected for processing (7)

docs/blog/index.md (1 hunks)
docs/blog/posts/caching.md (1 hunks)
docs/blog/posts/citations.md (1 hunks)
docs/blog/posts/rag-and-beyond.md (10 hunks)
examples/caching/diskcache.py (1 hunks)
examples/caching/lru.py (1 hunks)
examples/caching/redis.py (1 hunks)

Files skipped from review due to trivial changes (3)

docs/blog/index.md
docs/blog/posts/citations.md
docs/blog/posts/rag-and-beyond.md

Additional comments: 8

examples/caching/diskcache.py (7)

1-6:
The imports are correctly added as per the summary.

8-8:
The client variable is correctly initialized.

10-12:
The UserDetail model is correctly defined with the appropriate fields.

14-14:
The disk cache is correctly initialized at the specified directory.

16-38:
The instructor_cache decorator is correctly implemented to cache the results of functions returning Pydantic models.

40-48:
The extract function is correctly decorated with instructor_cache and uses the client as expected.

51-70:
The test_extract function is correctly implemented to test the caching behavior and performance.

examples/caching/lru.py (1)

12-20:

The functools.lru_cache decorator requires the function's return value to be hashable, but the client.chat.completions.create method is likely returning a non-hashable object (like a dictionary or a custom object). Verify that the return value is compatible with lru_cache.

coderabbitai · 2023-11-24T21:43:20Z

docs/blog/posts/caching.md

+> Instructor make working with language models easy, but they are still computationally expensive.
+
+Today, we're diving into optimizing instructor code while maintaining the excellent DX offered by Pydantic models. We'll tackle the challenges of caching Pydantic models, typically incompatible with `pickle`, and explore solutions that use `decorators` like using `functools.cache`. Then, we'll craft custom decorators with `diskcache` and `redis`.


The word "Instructor" should be lowercase when not used as a proper noun, and "DX" should be expanded to "Developer Experience" for clarity.

coderabbitai · 2023-11-24T21:43:20Z

docs/blog/posts/caching.md

+
+!!! warning "Changing the Model does not Invalidate the Cache"
+
+    Note that changing the model does not invalidate the cache. This is because the cache key is based on the function's name and arguments, not the model. This means that if we change the model, the cache will still return the old result.


The warning about changing the model not invalidating the cache is important, but it should be made clear that this is a potential pitfall and developers should manually invalidate the cache if the model changes.

coderabbitai · 2023-11-24T21:43:20Z

docs/blog/posts/caching.md

+def instructor_cache(func):
+    """Cache a function that returns a Pydantic model"""
+    return_type = inspect.signature(func).return_annotation
+    if not issubclass(return_type, BaseModel):
+        raise ValueError("The return type must be a Pydantic model")
+
+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        key = f"{func.__name__}-{str(args)}-{str(kwargs)}"
+        # Check if the result is already cached
+        if (cached := cache.get(key)) is not None:
+            # Deserialize from JSON based on the return type
+            if issubclass(return_type, BaseModel):
+                return return_type.model_validate_json(cached)
+
+        # Call the function and cache its result
+        result = func(*args, **kwargs)
+        serialized_result = result.model_dump_json()
+        cache.set(key, serialized_result)
+
+        return result
+
+    return wrapper


The instructor_cache decorator is using string concatenation to create the cache key, which could lead to collisions if the string representations of different arguments result in the same string. Consider using a more robust method for generating unique cache keys.

coderabbitai · 2023-11-24T21:43:20Z

docs/blog/posts/caching.md

+
+!!! note "Looking carefully"
+
+    If you look carefully at the code above you'll notice that we're using the same `instructor_cache` decorator as before. The implemntations is the same, but we're using a different caching backend!


There is a typo in the word "implementations". It should be corrected to "implementation".

coderabbitai · 2023-11-24T21:43:20Z

examples/caching/lru.py

+    start = time.perf_counter()
+    model = extract("Extract jason is 25 years old")
+    assert model.name.lower() == "jason"
+    assert model.age == 25
+    print(f"Time taken: {time.perf_counter() - start}")


The test assertions are based on the assumption that the external service will always return the same output for a given input, which may not be reliable. Consider mocking the external service to ensure consistent test results.

coderabbitai · 2023-11-24T21:43:20Z

examples/caching/redis.py

+import redis
+import functools
+import inspect
+import json
+import instructor
+
+from pydantic import BaseModel
+from openai import OpenAI
+
+client = instructor.patch(OpenAI())
+cache = redis.Redis("localhost")
+
+def instructor_cache(func):
+    """Cache a function that returns a Pydantic model"""
+    return_type = inspect.signature(func).return_annotation
+    if not issubclass(return_type, BaseModel):
+        raise ValueError("The return type must be a Pydantic model")
+
+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        key = f"{func.__name__}-{str(args)}-{str(kwargs)}"
+        # Check if the result is already cached
+        if (cached := cache.get(key)) is not None:
+            # Deserialize from JSON based on the return type
+            if issubclass(return_type, BaseModel):
+                return return_type.model_validate_json(cached)
+
+        # Call the function and cache its result
+        result = func(*args, **kwargs)
+        serialized_result = result.model_dump_json()
+        cache.set(key, serialized_result)
+
+        return result
+
+    return wrapper
+
+
+class UserDetail(BaseModel):
+    name: str
+    age: int
+
+@instructor_cache
+def extract(data) -> UserDetail:
+    # Assuming client.chat.completions.create returns a UserDetail instance
+    return client.chat.completions.create(
+        model="gpt-3.5-turbo",
+        response_model=UserDetail,
+        messages=[
+            {"role": "user", "content": data},
+        ]
+    )
+
+def test_extract():
+    import time 
+
+    start = time.perf_counter()
+    model = extract("Extract jason is 25 years old")
+    assert model.name.lower() == "jason"
+    assert model.age == 25
+    print(f"Time taken: {time.perf_counter() - start}")
+
+    start = time.perf_counter()
+    model = extract("Extract jason is 25 years old")
+    assert model.name.lower() == "jason"
+    assert model.age == 25
+    print(f"Time taken: {time.perf_counter() - start}")
+
+
+if __name__ == "__main__":
+    test_extract()
+    # Time taken: 0.798335583996959
+    # Time taken: 0.00017016706988215446


The instructor_cache decorator assumes that the function it decorates will return a Pydantic model, which is a strong assumption that may not always hold true. This could be made more flexible or clearly documented.

The cache key generation in line 21 does not account for mutable types or non-hashable types in args and kwargs, which could lead to issues.

There is no expiration time set for the cache entries, which could lead to stale data issues.

The test_extract function uses hardcoded assertions that may not be reliable for testing if the AI service's response changes over time.

The test_extract function prints the time taken but does not assert on the expected caching behavior, such as a significantly reduced time for the second call.

coderabbitai · 2023-11-24T21:43:20Z

examples/caching/redis.py

+
+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        key = f"{func.__name__}-{str(args)}-{str(kwargs)}"


The cache key generation strategy may lead to collisions or may not handle complex data types well. Consider using a more robust serialization method for cache keys.

coderabbitai · 2023-11-24T21:43:21Z

examples/caching/redis.py

+            if issubclass(return_type, BaseModel):
+                return return_type.model_validate_json(cached)


Redundant check for return_type being a subclass of BaseModel. This check is already performed before defining the wrapper function.

coderabbitai · 2023-11-24T21:43:21Z

examples/caching/redis.py

+        serialized_result = result.model_dump_json()
+        cache.set(key, serialized_result)


Consider adding an expiration time (TTL) for the cache entries to prevent stale data from persisting indefinitely.

coderabbitai · 2023-11-24T21:43:21Z

examples/caching/redis.py

+def test_extract():
+    import time 
+
+    start = time.perf_counter()
+    model = extract("Extract jason is 25 years old")
+    assert model.name.lower() == "jason"
+    assert model.age == 25
+    print(f"Time taken: {time.perf_counter() - start}")
+
+    start = time.perf_counter()
+    model = extract("Extract jason is 25 years old")
+    assert model.name.lower() == "jason"
+    assert model.age == 25
+    print(f"Time taken: {time.perf_counter() - start}")


The test_extract function should include assertions or checks to confirm that the caching behavior is working as expected, such as verifying that the second call hits the cache and does not perform the full computation.

coderabbitai

Review Status

Actionable comments generated: 3

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between d837228 and c90fe06.

Files selected for processing (3)

docs/blog/posts/caching.md (1 hunks)
examples/caching/example_diskcache.py (1 hunks)
examples/caching/example_redis.py (1 hunks)

Files skipped from review due to trivial changes (1)

docs/blog/posts/caching.md

Additional comments: 9

examples/caching/example_diskcache.py (4)

6-7:
The imports are appropriate for the functionality described in the summary.

17-39:
The instructor_cache decorator is correctly implemented to cache the output of functions returning a Pydantic model.

41-49:
The extract function is correctly decorated with instructor_cache and is set to return a UserDetail instance.

52-65:
The test_extract function correctly tests the caching functionality by asserting the results of two calls to the extract function with the same input.

examples/caching/example_redis.py (5)

1-10:
The imports and global variable initialization for client and cache are correctly implemented as per the summary.

12-34:
The instructor_cache decorator is correctly implemented to cache the result of a function that returns a Pydantic model. However, there is a potential issue with the cache key generation and the deserialization process.

37-50:
The UserDetail model and the extract function are correctly implemented and decorated with @instructor_cache as per the summary.

52-65:
The test_extract function is correctly implemented to test the caching behavior of the extract function.

68-71:
The main block is correctly implemented to run the test_extract function and print the time taken for execution.

coderabbitai · 2023-11-25T16:12:21Z

examples/caching/example_diskcache.py

+    # Time taken: 0.7285366660216823
+    # Time taken: 9.841693099588156e-05


The commented time taken values should be removed or updated to avoid confusion, as they may not reflect the current performance after changes.

examples/caching/example_redis.py

coderabbitai · 2023-11-25T16:12:22Z

examples/caching/example_redis.py

+        if (cached := cache.get(key)) is not None:
+            # Deserialize from JSON based on the return type
+            if issubclass(return_type, BaseModel):
+                return return_type.model_validate_json(cached)
+


The deserialization from JSON assumes that the result is stored as JSON, but the serialization process is not explicitly converting the result to JSON before caching it.

yes it does on line 29

@jxnl My apologies for the oversight. You are correct, the serialization to JSON is indeed handled on line 29 with serialized_result = result.model_dump_json(). Thank you for pointing that out.

🐇

coderabbitai

Review Status

Actionable comments generated: 0

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between c90fe06 and c19acda.

Files ignored due to filter (1)

mkdocs.yml

Files selected for processing (2)

.gitignore (1 hunks)
docs/concepts/caching.md (1 hunks)

Files skipped from review due to trivial changes (1)

.gitignore

Additional comments: 8

docs/concepts/caching.md (8)

1-5:
The introduction to the functools.cache section is well-written and provides a clear context for when this caching method is appropriate.

19-27:
The use of functools.cache to decorate the extract function is correctly implemented for in-memory caching.

30-32:
The warning about the cache not being invalidated when the model changes is a crucial piece of information for users to understand the limitations of the caching mechanism.

47-48:
The output comments in the example show the performance improvement due to caching, but the second time taken appears to be an unrealistic example. Please verify the accuracy of these times.

87-123:
The note about reusing the instructor_cache decorator for both diskcache and redis caching promotes code reuse and consistency.

96-117:
The instructor_cache decorator for diskcache is well-defined, with checks for the return type being a Pydantic model and appropriate serialization/deserialization.

178-181:
The explanation of the instructor_cache decorator's functionality, including key generation and result serialization, is clear and informative.

198-220:
The instructor_cache decorator for Redis caching is correctly implemented, mirroring the diskcache version's checks and serialization logic.

add caching blog

d837228

coderabbitai bot reviewed Nov 24, 2023

View reviewed changes

jxnl added 3 commits November 25, 2023 11:00

updates

8e76da2

bump

cc8048e

nits

c90fe06

coderabbitai bot reviewed Nov 25, 2023

View reviewed changes

update cache concepts

c19acda

coderabbitai bot reviewed Nov 25, 2023

View reviewed changes

jxnl and others added 2 commits November 25, 2023 11:16

bump index

70c2e48

Merge branch 'main' into blog-caching

4ba40d2

jxnl merged commit d65150a into main Nov 26, 2023
6 of 8 checks passed

jxnl deleted the blog-caching branch November 26, 2023 00:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blog: Mastering Caching #219

Blog: Mastering Caching #219

jxnl commented Nov 24, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 24, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (`.coderabbit.yaml`)

coderabbitai bot left a comment

coderabbitai bot Nov 24, 2023

coderabbitai bot Nov 24, 2023

coderabbitai bot Nov 24, 2023

coderabbitai bot Nov 24, 2023

coderabbitai bot Nov 24, 2023

coderabbitai bot Nov 24, 2023

coderabbitai bot Nov 24, 2023

coderabbitai bot Nov 24, 2023

coderabbitai bot Nov 24, 2023

coderabbitai bot Nov 24, 2023

coderabbitai bot left a comment

coderabbitai bot Nov 25, 2023

coderabbitai bot Nov 25, 2023 •

edited

Loading

jxnl Nov 25, 2023

coderabbitai bot Nov 25, 2023

coderabbitai bot left a comment

		> Instructor make working with language models easy, but they are still computationally expensive.

		Today, we're diving into optimizing instructor code while maintaining the excellent DX offered by Pydantic models. We'll tackle the challenges of caching Pydantic models, typically incompatible with `pickle`, and explore solutions that use `decorators` like using `functools.cache`. Then, we'll craft custom decorators with `diskcache` and `redis`.


		!!! warning "Changing the Model does not Invalidate the Cache"

		Note that changing the model does not invalidate the cache. This is because the cache key is based on the function's name and arguments, not the model. This means that if we change the model, the cache will still return the old result.


		!!! note "Looking carefully"

		If you look carefully at the code above you'll notice that we're using the same `instructor_cache` decorator as before. The implemntations is the same, but we're using a different caching backend!

		if issubclass(return_type, BaseModel):
		return return_type.model_validate_json(cached)

		serialized_result = result.model_dump_json()
		cache.set(key, serialized_result)

		# Time taken: 0.7285366660216823
		# Time taken: 9.841693099588156e-05

Blog: Mastering Caching #219

Blog: Mastering Caching #219

Conversation

jxnl commented Nov 24, 2023 • edited by coderabbitai bot Loading

Summary by CodeRabbit

coderabbitai bot commented Nov 24, 2023 • edited Loading

Walkthrough

Changes

Poem

Chat with CodeRabbit Bot (@coderabbitai)

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 24, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 24, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 24, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 24, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 24, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 24, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 24, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 24, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 24, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 24, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 25, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 25, 2023 • edited Loading

Choose a reason for hiding this comment

jxnl Nov 25, 2023

Choose a reason for hiding this comment

coderabbitai bot Nov 25, 2023

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

jxnl commented Nov 24, 2023 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 24, 2023 •

edited

Loading

Chat with CodeRabbit Bot (`@coderabbitai`)

CodeRabbit Configration File (`.coderabbit.yaml`)

coderabbitai bot Nov 25, 2023 •

edited

Loading