New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add code splitter #7100
add code splitter #7100
Conversation
@kevinlu1248 can you take a look at the autolanguage logic? I don't think it's 100% necessary, although it's a cool idea if it works. |
Taking a look. I'll play with it a bit later tonight. |
Talked to Yi, and the plan is to ship this for now and add auto-language detection later, possibly using https://github.com/yoeo/guesslang. I also added some tests for some additional common languages. Technical detail: It seems tree-sitter-languages' version of typescript can parse TSX. At Sweep, we just use the TSX parser directly, not sure what tree-sitter-languages is using under the hood. |
thanks guys this is awesome! |
Another possibility: Requires tensorflow |
Makes sense, deep learning based language detector is overkill |
Could it be that the Seems not to be used in the Code ... |
Description
Add code splitter to text splitters. Thanks to @kevinlu1248 from Sweep AI for the idea and push.
Type of Change
Please delete options that are not relevant.
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Suggested Checklist: