Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking down large files into smaller chunks based on context window size #2

Open
joshpxyne opened this issue Jun 28, 2023 · 9 comments
Assignees

Comments

@joshpxyne
Copy link
Owner

No description provided.

@joshpxyne joshpxyne self-assigned this Jun 29, 2023
This was referenced Jul 3, 2023
@othmanelhoufi
Copy link

othmanelhoufi commented Jul 12, 2023

@0xpayne This is a highly important fix. When will it be available pls?

@Sineos
Copy link

Sineos commented Jul 16, 2023

IMO, this is quite a dangerous thing. At least from some experiments using the regular GPT webinterface, I found that "carelessly" splitting a larger file can lead to vastly crappy results when some code relies on previous functions / definitions / variables.

@othmanelhoufi
Copy link

othmanelhoufi commented Jul 16, 2023

@Sineos I totally agree. GPT can't create/handle logic, even more if the code is broke down to chunks. The code quality is correlated with the dependency between variables, functions, libraries, classes, etc...
The only way I see this working (not perfectly) is if we can push the entire codebase as input, and that probably requiers a 1 million token model.

@TheCutestCat
Copy link

This problem can be partial solved with AST tree.

@rlippmann
Copy link

I've been (slowly) working on a solution for this where I've abstracted away the separately "compilable" parts. My initial aim was to use it for a project like this that I was going to write. But it seems like adding it to this project would be more worthwhile.

source splitter

@doyled-it
Copy link

@rlippmann
Copy link

@doyled-it

So does mine, but yours looks way more advanced.

It looks like you're trying to do the same thing as this project?

@doyled-it
Copy link

It looks like you're trying to do the same thing as this project?

It has some differences. We're trying to focus on other aspects of modernization that aren't just direct translation of source files. Although, we still have that functionality.

And we don't have the loop where we run code, get an error, and update the code based on output.

@rlippmann
Copy link

rlippmann commented Apr 3, 2024

@doyled-it
I was looking at trying to do translation with distributed inference, i.e. through litellm. That way it could be more useful for open source developers since they could run local/free inference endpoints.

Looks like your project could use that too....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants