Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: feat: add chroma memory store #426

Closed
wants to merge 41 commits into from
Closed

Python: feat: add chroma memory store #426

wants to merge 41 commits into from

Conversation

joowon-dm-snu
Copy link
Contributor

@joowon-dm-snu joowon-dm-snu commented Apr 12, 2023

Motivation and Context

#403

Description

I've added one quick E2E test example without adding any unit tests.
I tried to make it as identical to the original classes(VolatileDataStore & VolatileMemoryStore) as possible.

This PR has 3 issues.

  1. collection name does not allow upper cases(link). I've written a snake case to replace it inside the current class.
  2. get_nearest_matches_async function can be more complicated if your team wants to calculate relevance score like langchain.
  3. I didn't add requirements (pip install chromadb)

Contribution Checklist

dluc and others added 30 commits March 17, 2023 01:50
### Motivation and Context
This PR simplifies `@sk_*`  decorators while porting the core TextSkill

### Description
This PR is a first step at adapting the python codebase to be more
*pythonic*, it contains the following modifications :

1. Merged the decorators `@sk_function_context_parameter`,
`@sk_function_input` , `@sk_function_name` with `@sk_function`.
The decorators were replaced with new kwargs on `sk_function` : `name`,
`input_description`, `input_default_value`
2. The `name` kwarg is optional, the name of the method will be used if
none is provided.
3. Ported core skill - TextSkill
4. Added some pytest unit test for  SK decorators and TextSkill
5. Changed how skills are imported in the kernel by using instance of
the class, not relying on static method anymore for the discovery.
e.g.
```
kernel.import_skill(TextSkill())
```
### Motivation and Context
This PR adds a Lint Github workflow so that code style rules can be
enforced for PRs.

1. This forms a baseline workflow that can be used as a template for
future workflows that will be added.
2. It helps enforce code style rules. Currently it only checks
PyCodeStyle and PyFlakes. Others will be added in the future.
3. Contributes to automated testing
4. Does not fix any open issue.

### Description
Added Ruff as a dev dependency. Added Github workflow which runs Ruff.

---------

Co-authored-by: Aditya Gudimella <adgudime@microsoft.com>
Co-authored-by: Devis Lucato <dluc@users.noreply.github.com>
### Motivation and Context
Fixes formatting issue to make Lint github workflow passes.

### Description
Contains only reformatting changes.
### Motivation and Context
1. Why is this change required? compatibility for python 3.9
2. What problem does it solve? if user use PromptTemplateEngine with
skills inside, it does not work.
3. What scenario does it contribute to? PromptTemplateEngine will work
4. If it fixes an open issue, please link to the issue here.
#182 



### Description
detailed in #182 
similar concept with #169 

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows SK Contribution Guidelines
(https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
- [x] The code follows the .NET coding conventions
(https://learn.microsoft.com/dotnet/csharp/fundamentals/coding-style/coding-conventions)
verified with `dotnet format`
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄

<!-- Thank you for your contribution to the semantic-kernel repo! -->
### Motivation and Context
1. Why is this change required? `pytest .` raises error
2. What problem does it solve? pass all test
3. What scenario does it contribute to? `infer_delegate_type` will not
raise error
4. If it fixes an open issue, please link to the issue here.
#168

### Description
static function has no attribute __wrapped__ and this raises error.
use __func__ for this.
I think `wrapped = getattr(value, "__wrapped__", getattr(value,
"__func__", None))` is reasonable for this issue.

```@staticmethod
def infer_delegate_type(function) -> DelegateTypes:
    # Get the function signature
    function_signature = signature(function)
    awaitable = iscoroutinefunction(function)

    for name, value in DelegateInference.__dict__.items():
        if name.startswith("infer_") and hasattr(
>               value.__wrapped__, "_delegate_type"
        ):
E           AttributeError: 'staticmethod' object has no attribute '__wrapped__'

../semantic_kernel/orchestration/delegate_inference.py:240: AttributeError
======================================================================================= short test summary info =======================================================================================
FAILED test_text_skill.py::test_can_be_imported - AttributeError: 'staticmethod' object has no attribute '__wrapped__'
FAILED test_text_skill.py::test_can_be_imported_with_name - AttributeError: 'staticmethod' object has no attribute '__wrapped__'
===================================================================================== 2 failed, 9 passed in 0.23s =====================================================================================
```
### Motivation and Context
Porting the FileIOSkill to python.

### Description
This is the port of the FileIOSkill to python with unit tests.  
```
kernel = sk.create_kernel()
kernel.import_skill(FileIOSkill(), "file")
context = kernel.create_new_context()
context["path"] = "test_file_io_skill.txt"
context["content"] = "Hello, world!"
```

Using the same function names than the C# version 

```
{{file.readAsync $path}}

{{file.writeAsync}}
```
modification to the depencies :
- Adding the package : aiofiles 
- Adding the dev package : pytest-asyncio
…nb` (#166)

### Motivation and Context
Please help reviewers and future users, providing the following
information:

1. Why is this change required?
`SemanticTextMemory.save_reference_async` do nothing for its storage.
2. What problem does it solve? `save_reference_async` will work
appropriately
3. What scenario does it contribute to? embeddings
4. If it fixes an open issue, please link to the issue here.
#165


### Description

I add missing codes with references of
`SemanticTextMemory.save_information_async` and C# version of
`SemanticTextMemory.save_reference_async`
### Motivation and Context

This PR provides a path to using the ChatGPT API in the Python Preview
of Semantic Kernel. In addition, this provides `Azure*` versions of many
existing models (so Python users can leverage Azure OpenAI).

I think that, in general, it may be worth considering how best to work
with models that have different modalities: now we have text
completions, embeddings, chat completions (and I'd imagine images/etc.
may be nice to support someday too).

Regardless, this PR provides a fast path to using the exciting new Chat
APIs from OpenAI with SK!

See the new `python/tests/chat_gpt_api.py` for a usage example.
### Motivation and Context
Port of the core TimeSkill

### Description
This PR adds the core TimeSkill with unit tests.

```
kernel = sk.create_kernel()
kernel.import_skill(TimeSkill(), "time")
```

```
sk_prompt = """
{{time.now}}
"""
```
…rsion) (#200)

### Motivation and Context

The C# Semantic Kernel has recently undergone an upgrade to its
`PromptTemplateEngine`. This brings Python back in line with the
semantics of Prompt Templates in C#.

### Description

Here, unlike the original port, I've tried to make things more
pythonic/idiomatic (instead of trying to directly mirror the C#
codebase). I've also brought over the corresponding unit tests (and
added some of my own as I was building/validating).
### Motivation and Context

We recently merged an upgrade to the `PromptTemplateEngine`, let's make
the rest of the tests consistent with the directory structure used in
that PR.


### Description

This PR does three (small) things:

1. Makes the `./tests` directory have consistent structure
2. Re-names some of the "tests" that were at the top-level of the
`./tests` dir to `./tests/end-to-end` (which better describes their
purpose: things like `basics.py` and `memory.py` are end-to-end examples
of using SK and ways to verify nothing is horribly broken)
3. Applies `isort` (which we plan to add to our linting workflow here on
GitHub soon)
### Motivation and Context
This PR fixes #235 and adds a test for the functionality. This is ahead
of the planning skill.

### Description
1. Fix to fix the typo in the `from_dict()` method in
`prompt_template_config.py`
2. Add assignment back to skill config so results from JSON aren't
discarded in `import_semantic_skill_from_directory.py`
3. Add tests
#203)

Added logger warning and error for cosine similarity computation for
zero vectors
### Motivation and Context

Right now, there are lots of Config classes in SK. This doesn't feel
very "pythonic" and can be a bit confusing at times. To reduce the
amount of indirection, this PR removes several config classes from the
Python port.

NOTE: this PR is also critical preparation for a large change to re-sync
with how C# SK handles multi-modality. I have changes ready for adding a
`./connectors` dir and syncing with the Text/Chat/Image/Embedding
support, but there's a few "prep" PRs I need to get through first.

### Description

This PR makes the following changes:

1. We remove the `(Azure)OpenAIConfig` classes (simplifying the code
related to creating/managing backends)
2. We remove the `./configuration` sub-directory and move the
`KernelConfig` class to the module's root dir (this mirrors the C#
version)
3. We re-tool the `KernelConfig` class to behave similarly to the
current C# version (again, this is prep for later changes RE
multi-modality)
4. We make corresponding updates across the code/tests now that we've
removed some config classes

In future PRs, it'd be great to also simplify:

1. `(Chat|Completion)RequestSettings` 
2. `(ReadOnly)SkillsCollection` and related classes
3. Take a hard look at `ContextVariables` and `SKContext` (maybe
`ContextVariables` is just a `dict` in SK python)
### Motivation and Context

The `python-preview` branch is under active development and many
docs/tests/examples are getting a bit out of sync. There are also a few
small cleanup chores/refactors to make _usage_ easier (e.g., to make
examples cleaner). I've grouped these small fixes (plus a few bug fixes)
into this "cleanup" PR.

### Description

This PR does the following:

1. Updates the out-of-date example in `README.md`
2. Fixes all 5 of the notebooks we support right now (1,2,3,4, and 6;
notebook 5 is planner, I've blanked out most of the code there so we
don't confuse people, especially as the planner API is changing)
3. Fixes a bug w/ `stop_sequences` in `(Azure)OpenAITextCompletion`
4. Fixes a bug introduced in our upgrades to the embeddings cosine-sim
check (we should add more thorough tests here)
5. Cleans up tests
6. Ensures end-to-end tests are all runnable 
7. Fixes up the `kernel_extensions` to work more naturally (so that end
users can use them directly on the `kernel` object and get good
type-hinting)
### Motivation and Context

In an effort to make the Python port more idiomatic, let's remove the
`Verify` class and the `./diagnostics` sub-module. There are many more
follow-on tasks to do here, but this is a good start (and already a
large enough change).

### Description

This PR does the following:

1. Removes the `Verify` class, and re-writes all instances of `Verify.*`
2. Adds a `validation.py` in the `./utils` sub-module to hand some of
the more complex cases from `Verify` (checking that skills/functions/and
function params have valid names)
3. Removes the rest of the `./diagnostics` sub-module (w/ a longer-term
goal of removing all/most of our custom exception classes and, instead,
using appropriate built-in error classes)
### Motivation and Context
#309 

### Description
missing default value in `stop_sequences`
### Motivation and Context

AAD tokens offer greater authentication security and is used by several products.

### Description

Add support for Azure Active Directory auth for the `Azure*` backends.
### Motivation and Context

READMEs and examples are out of date and showing incorrect code. There
are also a few bugs in the SKFunction blocking simpler syntax.

Extend SKFunction to allow synchronous calls and have simpler syntax
when async is not required.

### Description

* Update homepage README, moving all Python examples under python/README
* Make SKFunction callable as per v0
* Fix bugs in SKFunction
* Fix examples using realistic code
* Allow to use functions synchronously
### Motivation and Context
Porting from C#

### Description
The `semantic_text_partitioner` class and
`SKFunctionBase.aggregate_partitioned_results_async(...)` method still
need to be implemented for the skill to be operational, but for the sake
of modularity and PR granularity, I will leave these implementations
outside the scope of this particular PR.

---------

Co-authored-by: Kit (Hong Long Nguyen) <honnguyen@microsoft.com>
### Motivation and Context

The pip package was uploaded without a LICENSE file and without a
license mentioned in `pyproject.toml`. I tried to reupload, but the
filenames are the same. This updates the version number so we can upload
a new version of the package to pip with a LICENSE.

### Description
- Update package version to 0.2.1dev in pyproject.toml.
- Add `license = "MIT"` to pyproject.toml
-  Ran `poetry build` and saw the built packages with the new version: 
```
Building semantic-kernel (0.2.1.dev)
  - Building sdist
  - Built semantic_kernel-0.2.1.dev0.tar.gz
  - Building wheel
  - Built semantic_kernel-0.2.1.dev0-py3-none-any.whl
```
- Uploaded a test to
https://test.pypi.org/project/semantic-kernel/#description. Note that
this version is 0.2.0 because I did not include `license = "MIT"` the
first time with 0.2.1 and couldn't reupload to testpypi with 0.2.1
again, so I had to go back down to 0.2.0 which I had not uploaded yet.
### Motivation and Context
Unit tests were failing due to a missing setter. Added setter in SKContext

---------

Co-authored-by: Jerome Tremblay <jerome_tremblay@nuance.com>
@alexchaomander alexchaomander added the python Pull requests for the Python Semantic Kernel label Apr 12, 2023
### Motivation and Context
The linter's max-line-length was 88, so I bumped it to 120. 

### Description
- Increased max-line-length to 120 for ruff and flake8 linters
- Ran linters and formatters to fix style issues.
This will run Python unit tests when code is pushed to the branch.

### Description
- Runs `poetry run pytest` on ubuntu-latest, macos-latest, and
windows-latest for python versions 3.8, 3.9, and 3.10
- This action was run when this workflow was pushed to my forked branch:
https://github.com/mkarle/semantic-kernel/actions/runs/4672544786
@shawncal shawncal changed the title feat: add chroma memory store Python: feat: add chroma memory store Apr 12, 2023
@shawncal
Copy link
Member

FYI: Python PRs on hold while we stage to merge /python into main. See #423

1 similar comment
@shawncal
Copy link
Member

FYI: Python PRs on hold while we stage to merge /python into main. See #423


return output_list

# TODO: Need to decide semantic-kernel will use chroma's similarity score or not
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By default, we should use chroma's similarity score. Introducing a constructor argument for using custom similarity is something we can consider for the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope this d418b68 is what you're talking about

try:
# TODO: ChromeDB reject camel case collection names.
# need to decide ChromaDataStore will do auto conversion inside or user takes responseibilities
collection_snake_case = camel_to_snake(collection)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to keep the conversion internal to the memory store. This makes it interchangeable, though at small risk of container collision. I'd only really be concerned about this if the collection names were designed to be auto generated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, I'll leave this as a TODO and we can decide after chroma fixes the issue.

@dluc dluc changed the base branch from python-preview-archived-dont-delete to main April 14, 2023 00:38
@dluc dluc closed this Apr 14, 2023
@dluc
Copy link
Collaborator

dluc commented Apr 14, 2023

@joowon-dm-snu we've merged the python branch into main, and GitHub doesn't allow to point this PR to main because of some rebase steps (I tried and GH automatically closed the PR, without an option to reopen it).

Could you send the PR again?

dluc added a commit that referenced this pull request May 17, 2023
Add support for Chroma https://docs.trychroma.com/

> Chroma is the open-source embedding database. Chroma makes it easy to
build LLM apps by making knowledge, facts, and skills pluggable for
LLMs.

### Motivation and Context

* #403 Support for Chroma embedding database
* #426 Python: feat: add chroma memory store
* #449 Python: feat: add chroma memory store

---------

Co-authored-by: Abby Harrison <54643756+awharrison-28@users.noreply.github.com>
Co-authored-by: Abby Harrison <abby.harrison@microsoft.com>
Co-authored-by: Devis Lucato <dluc@users.noreply.github.com>
shawncal pushed a commit to johnoliver/semantic-kernel that referenced this pull request May 19, 2023
Add support for Chroma https://docs.trychroma.com/

> Chroma is the open-source embedding database. Chroma makes it easy to
build LLM apps by making knowledge, facts, and skills pluggable for
LLMs.

### Motivation and Context

* microsoft#403 Support for Chroma embedding database
* microsoft#426 Python: feat: add chroma memory store
* microsoft#449 Python: feat: add chroma memory store

---------

Co-authored-by: Abby Harrison <54643756+awharrison-28@users.noreply.github.com>
Co-authored-by: Abby Harrison <abby.harrison@microsoft.com>
Co-authored-by: Devis Lucato <dluc@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests for the Python Semantic Kernel
Projects
None yet
Development

Successfully merging this pull request may close these issues.