Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python: adding external tokenizer to python text_chunker #1388

Merged
merged 4 commits into from
Jul 8, 2023

Conversation

gramhagen
Copy link
Contributor

Motivation and Context

addressing issue #1387
chunking text should allow use of an external tokenizer

Description

added pass-through of an token counting function, defaulting to the existing _token_counter() method
while fixing a type hint bug I got sucked into making a few changes to clean up the code.

future work would be nice to add chunk overlap functionality similar to langchain's TextSplitter

Contribution Checklist

@github-actions github-actions bot added the python Pull requests for the Python Semantic Kernel label Jun 8, 2023
@shawncal shawncal changed the title adding external tokenizer to python text_chunker Python: adding external tokenizer to python text_chunker Jun 29, 2023
@shawncal shawncal requested a review from a team as a code owner July 8, 2023 04:42
@shawncal
Copy link
Member

shawncal commented Jul 8, 2023

@gramhagen Cool change! Thanks for the contribution.

Welcome to Semantic Kernel!

@shawncal shawncal added this pull request to the merge queue Jul 8, 2023
Merged via the queue into microsoft:main with commit 8527c58 Jul 8, 2023
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python Pull requests for the Python Semantic Kernel
Projects
Archived in project
Status: Sprint: Done
Development

Successfully merging this pull request may close these issues.

None yet

4 participants