Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tree-style file selection feature #937

Merged
merged 30 commits into from
Jan 7, 2024

Conversation

similato87
Copy link
Collaborator

Implement Tree-Style File Selection

Introduced a tree-style file selection method, allowing users to easily select files via a .toml configuration file. This update includes:

  • Removal of the graphical file selection dependency.
  • Implementation of a tree representation for file selection.
  • Direct editing of file selection through the user's default text editor.
  • Enhanced feedback by displaying selected files in a tree-style format before proceeding.

This update enhances usability for terminal-based interactions and streamlines the file selection process. Below is the gif for the feature:

feature

@ATheorell
Copy link
Collaborator

ATheorell commented Dec 27, 2023

Great work.

Here is a wishlist from my first testing!

  • I got an error since vim was not accessible in my shell. First of all, thanks for nice error handling in the function open_with_default_editor which made this easy to understand. A potential improvement could be, if EDITOR is not defined, try a list of potential editors (like vim, notepad, gedit, nano) and pick the first one that is accessible. That should avoid errors in most cases.
  • Set the file default to false instead of true in the toml. For a small project where we really want to include everything, it is easy to change to true, for a large project where we only want to change a few files, its cumbersome to set everything to false.
  • I get .pyc files in the default toml. There is actually a function for filtering this out is_in_ignoring_extensions however, I think a much better idea is to go through each file and check if it can be decoded with "utf-8". Ideas for such checks are provided here: https://stackoverflow.com/questions/3269293/how-to-write-a-check-in-python-to-see-if-file-is-valid-utf-8 . This makes sense, since we can broadly ignore all non utf-8 files, since we cannot edit them meaningfully anyway.
  • At the end of the process, the regular file_list.txt is persisted on disk, instead of the .toml file. It would be much nicer if the toml file is persisted and checked for in repeated runs, and that the file_list.txt is removed from gpt-engineer entirely. That would also imply not making a persistent file_list.txt using the command line file selector.

Awesome work! These are only small changes and we are close to the goal! @similato87

@similato87
Copy link
Collaborator Author

Great work.

Here is a wishlist from my first testing!

  • I got an error since vim was not accessible in my shell. First of all, thanks for nice error handling in the function open_with_default_editor which made this easy to understand. A potential improvement could be, if EDITOR is not defined, try a list of potential editors (like vim, notepad, gedit, nano) and pick the first one that is accessible. That should avoid errors in most cases.
  • Set the file default to false instead of true in the toml. For a small project where we really want to include everything, it is easy to change to true, for a large project where we only want to change a few files, its cumbersome to set everything to false.
  • I get .pyc files in the default toml. There is actually a function for filtering this out is_in_ignoring_extensions however, I think a much better idea is to go through each file and check if it can be decoded with "utf-8". Ideas for such checks are provided here: https://stackoverflow.com/questions/3269293/how-to-write-a-check-in-python-to-see-if-file-is-valid-utf-8 . This makes sense, since we can broadly ignore all non utf-8 files, since we cannot edit them meaningfully anyway.
  • At the end of the process, the regular file_list.txt is persisted on disk, instead of the .toml file. It would be much nicer if the toml file is persisted and checked for in repeated runs, and that the file_list.txt is removed from gpt-engineer entirely. That would also imply not making a persistent file_list.txt using the command line file selector.

Awesome work! These are only small changes and we are close to the goal! @similato87

Hi @ATheorell ,

Thank you for your valuable feedback and suggestions! Here's a brief outline addressing each point of your wishlist:

  1. Improved Editor Detection: I'll enhance the open_with_default_editor function to cycle through a list of common editors (vim, notepad, gedit, nano) if EDITOR isn't defined. This should make the feature more robust across different environments.

  2. Default Selection State: Changing the default selection state to false in the .toml file is a sensible idea, especially for large projects. I'll implement this adjustment.

  3. Filtering Non-UTF-8 Files: Incorporating a check for UTF-8 encoding to filter out files like .pyc makes a lot of sense. I'll add this feature to ensure only editable text files are included in the .toml.

  4. Persisting .toml and Removing file_list.txt: Shifting to persisting the .toml file for future selections and eliminating the older file_list.txt approach will streamline the process and maintain consistency. I'll ensure that the .toml file is the primary reference for repeated runs.

I'm on it and will update the PR with these enhancements soon. Thanks for the encouragement and detailed insights.

@ATheorell
Copy link
Collaborator

Hi!
I see that you solved 1-3 above great @similato87!
I tested again and I have some input for 4. The persisted file is called toml now, but the content is not the .toml file, but the content of the old file_list, but you may already be aware of this.

I want to suggest a small change to the workflow: The current workflow is that, after the first run, a .toml is persisted and the next time we run, gpt-engineer immediately assumes we want to use the files in that .toml. I want to suggest that, in case a toml file is present, always open it in the default editor. In case the user doesn't want to change the selection, he/she only needs to close the editor to proceed, which shouldn't be too much work. I think this is much better than forcing the user to delete the file manually to change the selection.

A small improvement to the toml file. Make a one sentence explanation in a comment at the top of the file about how to use the file.

Regarding the command line selection: I see 2 possibilities. Either remove it completely, or make it completely file free. When using it, we don't save any file at all. Let me know what alternative you prefer.

@similato87
Copy link
Collaborator Author

Hi! I see that you solved 1-3 above great @similato87! I tested again and I have some input for 4. The persisted file is called toml now, but the content is not the .toml file, but the content of the old file_list, but you may already be aware of this.

I want to suggest a small change to the workflow: The current workflow is that, after the first run, a .toml is persisted and the next time we run, gpt-engineer immediately assumes we want to use the files in that .toml. I want to suggest that, in case a toml file is present, always open it in the default editor. In case the user doesn't want to change the selection, he/she only needs to close the editor to proceed, which shouldn't be too much work. I think this is much better than forcing the user to delete the file manually to change the selection.

A small improvement to the toml file. Make a one sentence explanation in a comment at the top of the file about how to use the file.

Regarding the command line selection: I see 2 possibilities. Either remove it completely, or make it completely file free. When using it, we don't save any file at all. Let me know what alternative you prefer.

Hi @ATheorell !

I'm currently in the testing phase of this function and not yet fully confident in its performance. Your clear and timely suggestion for the .toml workflow and content is incredibly helpful and encouraging. It points out necessary improvements and provides a user-centered perspective that's crucial for refinement.

The idea to automatically open the .toml file for review and the addition of an explanatory comment will enhance the user experience significantly. I'm also carefully considering the options for the command line selection to best suit user preferences.

Your input is greatly valued as I work towards making this tool more reliable and intuitive. Thank you for your supportive and constructive feedback!

@similato87
Copy link
Collaborator Author

Hi @ATheorell ,

I've completed some significant updates and refinements to the file selection process:

  1. Removed Terminal-Based File Selector: To simplify our code and avoid the need to manage multiple conditions on subsequent runs (like .toml after .toml/Terminal, Terminal after .toml/Terminal, etc.), I've removed the terminal-based file selector.

  2. Persistent .toml File Handling: Fixed a bug where the .toml file was being rewritten by the old file_list. Now, the persisted .toml file accurately remembers users' selections from the first time.

  3. Always Open .toml for Editing: Whether initializing or on subsequent runs, the .toml file will always open for users to edit their selections. I've made sure that notifications in the terminal are clear, and users can quickly recognize their last selections.

Encountered Problem:

During the update, I faced an issue with Poetry install and pytest. It seems to detect errors in a .toml file within the projects/example folder, which should be ignored by git. I didn't upload this file and am not sure why it's causing issues. Could you provide some guidance or assist in resolving this?

Thank you for your support and looking forward to your feedback or any further instructions!

@captivus
Copy link
Collaborator

The .gitignore file wasn't properly ignoring the projects directory. This has been resolved in #939. Please update your branch with this change and let us know if the issue persists.

@captivus captivus reopened this Dec 29, 2023
@captivus
Copy link
Collaborator

Never mind -- that wasn't it. I've checked out your branch and its failing in test execution. Please run poetry run pytest locally and work through the errors.

@ATheorell
Copy link
Collaborator

Regarding the error: To run the tests on github without an api key, we use a caching system. When a change in the code changes a prompt to the LLM in one of the tests, the cache needs to be updated locally. If you run pytest locally, you should see that here is a cache file that gets updated and you need to commit that. If this isn't working for you for some reason, feel free to simple ignore this error and I will update the cache before we merge this @similato87 @captivus .

I believe a ToDo for me is to make a better error message.

@ATheorell
Copy link
Collaborator

ATheorell commented Dec 29, 2023

I only have tiny nitpicks left before merging now:

After choosing files, there is a print to the consol which is formatted a little strangely. I get this for example:

You have selected the following files:

│   └──  controller.py

It would probably be more informative if it prints the relative path.

Thanks for adding the instruction:

"# Select or Deselect files for processing by setting 'selected' to true or false."

What about the slightly more informative:

"# Change 'selected' from false to true to include files in the edit. GPT-engineer can only read and edit the files that set to true. Including irrelevant files will degrade coding performance, cost additional tokens and potentially lead to violations of the token limit, resulting in runtime errors."

@ATheorell
Copy link
Collaborator

One more small UX thingy: If I set all files to false, I get this in the terminal

No files were selected.

and then the program commences, but can't do anything.
If the user selects nothing, it would probably be better to directly raise an informative error and not waste any more tokens.

@similato87
Copy link
Collaborator Author

similato87 commented Dec 30, 2023

The .gitignore file wasn't properly ignoring the projects directory. This has been resolved in #939. Please update your branch with this change and let us know if the issue persists.

Hi @captivus,
Thank you for addressing the .gitignore issue with the projects directory in #939. I've updated my branch with these changes.

However, it appears the root cause was different. The issue stemmed from updates I made to the file_selector workflow, which I hadn't realized affected specific tests. I'll provide more details in the comment after the next one.

@similato87
Copy link
Collaborator Author

I only have tiny nitpicks left before merging now:

After choosing files, there is a print to the consol which is formatted a little strangely. I get this for example:

You have selected the following files:

│   └──  controller.py

It would probably be more informative if it prints the relative path.

Thanks for adding the instruction:

"# Select or Deselect files for processing by setting 'selected' to true or false."

What about the slightly more informative:

"# Change 'selected' from false to true to include files in the edit. GPT-engineer can only read and edit the files that set to true. Including irrelevant files will degrade coding performance, cost additional tokens and potentially lead to violations of the token limit, resulting in runtime errors."

Hi @ATheorell,

Thanks for reminding me of that. I have added the parent folder for better print information and also updated the comment with yours.

One more small UX thingy: If I set all files to false, I get this in the terminal

No files were selected.

and then the program commences, but can't do anything. If the user selects nothing, it would probably be better to directly raise an informative error and not waste any more tokens.

Indeed! Now there is an exception for no file selection situation instead of printing no file selection.

@similato87
Copy link
Collaborator Author

similato87 commented Dec 30, 2023

Regarding the error: To run the tests on github without an api key, we use a caching system. When a change in the code changes a prompt to the LLM in one of the tests, the cache needs to be updated locally. If you run pytest locally, you should see that here is a cache file that gets updated and you need to commit that. If this isn't working for you for some reason, feel free to simple ignore this error and I will update the cache before we merge this @similato87 @captivus .

I believe a ToDo for me is to make a better error message.

Thanks for the update. After refreshing the cache, I encountered the same issue again. Upon further investigation, I realized the root cause is due to original tests, one example located at

# Runs gpt-engineer with improve mode and improves an existing project in the specified path.
def test_improve_existing_project(self, tmp_path, monkeypatch):
def improve_generator():
yield "2"
yield "all"
yield "y" # First response
while True:
yield "n" # Subsequent responses
gen = improve_generator()
monkeypatch.setattr("builtins.input", lambda _: next(gen))
p = tmp_path / "projects/example"
p.mkdir(parents=True)
(p / "prompt").write_text(prompt_text)
simplified_main(str(p), "improve")
ex_env = DiskExecutionEnv(path=p)
ex_env.run(f"bash {ENTRYPOINT_FILE}")
assert (p / "output.txt").exists()
text = (p / "output.txt").read_text().strip()
assert text == "hello"

The test was originally designed for the old file_selector workflow. Given the recent changes, it's now causing failures. Please hold off on merging this branch as it may introduce errors into subsequent commits. I am actively working to update the affected tests and anticipate having them completed by tomorrow.

Thank you for your patience and understanding.

@ATheorell
Copy link
Collaborator

Regarding the error: To run the tests on github without an api key, we use a caching system. When a change in the code changes a prompt to the LLM in one of the tests, the cache needs to be updated locally. If you run pytest locally, you should see that here is a cache file that gets updated and you need to commit that. If this isn't working for you for some reason, feel free to simple ignore this error and I will update the cache before we merge this @similato87 @captivus .
I believe a ToDo for me is to make a better error message.

Thanks for the update. After refreshing the cache, I encountered the same issue again. Upon further investigation, I realized the root cause is due to original tests, one example located at

# Runs gpt-engineer with improve mode and improves an existing project in the specified path.
def test_improve_existing_project(self, tmp_path, monkeypatch):
def improve_generator():
yield "2"
yield "all"
yield "y" # First response
while True:
yield "n" # Subsequent responses
gen = improve_generator()
monkeypatch.setattr("builtins.input", lambda _: next(gen))
p = tmp_path / "projects/example"
p.mkdir(parents=True)
(p / "prompt").write_text(prompt_text)
simplified_main(str(p), "improve")
ex_env = DiskExecutionEnv(path=p)
ex_env.run(f"bash {ENTRYPOINT_FILE}")
assert (p / "output.txt").exists()
text = (p / "output.txt").read_text().strip()
assert text == "hello"

The test was originally designed for the old file_selector workflow. Given the recent changes, it's now causing failures. Please hold off on merging this branch as it may introduce errors into subsequent commits. I am actively working to update the affected tests and anticipate having them completed by tomorrow.

Thank you for your patience and understanding.

Thanks for also looking into the tests. Feel free to ask for support if you feel like the scope of this is getting too wide.

@similato87
Copy link
Collaborator Author

similato87 commented Dec 31, 2023

Hi @ATheorell,

I've been progressing on implementing tests for the improved file selection workflow, but I've encountered a significant challenge that I'd like to discuss. The core of the issue is simulating user interactions within a default text editor during the tests. Specifically, our improve process involves opening a file in the default editor, making changes (such as selecting or deselecting files), and then saving and closing the editor during this line running:

simplified_main(str(p), "improve")
— a sequence that's straightforward for a human user but complex to automate in a testing environment.

I am exploring potential methods to effectively simulate this behavior in our automated tests but haven't yet found a robust solution.

@similato87 similato87 mentioned this pull request Dec 31, 2023
@similato87
Copy link
Collaborator Author

Hi @ATheorell ,

Happy new year:)

I've made progress on the test cases and have successfully fixed them in my local development environment. However, when I push the changes to run in the GitHub workflow, they fail.

Upon reviewing the logs, it seems like the issue might be related to hitting a maximum limit. So, I initiated a new run thinking it might resolve any temporary issues, but unfortunately, the tests are still failing.

I'm currently unsure of the next steps to rectify this issue. It's unclear if the problem is with the environment setup, dependencies, or something else entirely that differs between the local and CI environments.

Any insights or suggestions you might have would be greatly appreciated as I navigate this issue.

Thank you for your support.

@ATheorell
Copy link
Collaborator

@similato87
Just letting you know that I'm on it. Fix will be up soon

@similato87
Copy link
Collaborator Author

Hi @ATheorell ,

I've completed the refactoring and commenting of the file selection module. Detailed comments have been added throughout to improve maintainability and understanding.

Could you please review and merge the PR? And now I'm ready to move on to the next feature development.

Thanks!

@ATheorell
Copy link
Collaborator

Great @similato87!
Since this is a change to the general workflow, I'll request board approval for merging. This should be quite quick though.

@TheoMcCabe
Copy link
Collaborator

Was there an issue with the file selector box or a reason we dont want to support it? It seems to me the easiest option for non technical users to select files so im wodering if replacing it is removing some functionality for users

@TheoMcCabe
Copy link
Collaborator

One enhancement to include only if you have time and want to:

One of the biggest issues with file selecting at the moment IMO is when new files are created by gpteng they arent added to the file selection list. could we make it so that any new files created are automatically added to the improve list?

@similato87
Copy link
Collaborator Author

Was there an issue with the file selector box or a reason we dont want to support it? It seems to me the easiest option for non technical users to select files so im wodering if replacing it is removing some functionality for users

Hi @TheoMcCabe ,

We've updated the file selection in GPT-engineer for these reasons:

  1. Multi-file Selection: The new method inspired by issue #725 allows easier multi-file selection across directories, replacing the limited GUI with a more intuitive tree representation.

  2. Persistent Selections: Users' selections are now saved in a .toml file, allowing for easy re-use and modification in subsequent runs without needing to reselect every time.

  3. Direct Editing: Users can directly edit their file selections in their preferred text editors with the new .toml method, leveraging familiar functionalities and enhancing the user experience.

  4. Advanced Configuration: The .toml file allows for more advanced configurations like excluding patterns, using comments for organization, and managing multiple sets of selections, offering greater flexibility and control.

@similato87
Copy link
Collaborator Author

One enhancement to include only if you have time and want to:

One of the biggest issues with file selecting at the moment IMO is when new files are created by gpteng they arent added to the file selection list. could we make it so that any new files created are automatically added to the improve list?

That's a great suggestion!

Currently, all files are set to selected: false by default in the .toml file for user convenience, especially in large projects. This is akin to an unselected checkbox, allowing users to explicitly choose what they want to include.

I agree that automatically adding newly created files by gpteng to the file selection list would enhance the user experience. We can offer this as an option, toggling between automatic inclusion or the current manual selection. This would require a discussion on recognizing and handling patterns of newly generated files, which I'll bring up with @ATheorell to consider implementation strategies.

@similato87
Copy link
Collaborator Author

Using Improved File Selection in GPT-Engineer

Hello team! While I've been in frequent discussions with @ATheorell about our ongoing improvements, I want to take a moment to share with everyone, including @captivus, @TheoMcCabe, @AntonOsika, and @pbharrin, how to use the newly improved file selection function in our project. This update is for issue(#725).

Initial Setup and Selection

  1. Initiate File Selection:

    • Run the command gpt-engineer -i your_project/, where your_project/ is your target folder.
  2. First Run File Selection:

    • On the first run, all files will be marked as unselected in the interactive .toml file.
    • Manually change the status of the files you want to include for improvement by setting them to selected.
    • Your selections will be saved in a file_selection.toml file within the directory for future reference.
    • The files you select will be displayed in a tree in the terminal, like
      image

Subsequent Runs

  1. Future Runs and New File Detection:
    • On subsequent runs, gpt-engineer -i your_project/ will detect new files and add them to the interactive selection interface.
    • Newly detected files and any changes to file selections will be reflected and saved in the file_selection.toml file.
    • This allows you to maintain a consistent set of files for context improvement across sessions.

Below is a GIF showing this process:

file_selector

@TheoMcCabe
Copy link
Collaborator

Thanks @similato87 great work

Copy link
Collaborator

@TheoMcCabe TheoMcCabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great. Thank you

@ATheorell
Copy link
Collaborator

Amazing, and thanks for great documentation and gif @similato87 !

Tiny thing, looks like the lock file has to be updated:

$ poetry install
Installing dependencies from lock file
Warning: poetry.lock is not consistent with pyproject.toml. You may be getting improper dependencies. Run poetry lock [--no-update] to fix it.

No dependencies to install or update

Installing the current project: gpt-engineer (0.2.6)

To fix, run:
´´´
poetry lock --no-update
´´´
When you commit the new lock file, you have my merge approval!

(side note, potentially this inconsistency comes from my recent PR rather than yours. Anyway, the lock should be consistent in the PR).

@similato87
Copy link
Collaborator Author

Amazing, and thanks for great documentation and gif @similato87 !

Tiny thing, looks like the lock file has to be updated:

$ poetry install Installing dependencies from lock file Warning: poetry.lock is not consistent with pyproject.toml. You may be getting improper dependencies. Run poetry lock [--no-update] to fix it.

No dependencies to install or update

Installing the current project: gpt-engineer (0.2.6)

To fix, run: ´´´ poetry lock --no-update ´´´ When you commit the new lock file, you have my merge approval!

(side note, potentially this inconsistency comes from my recent PR rather than yours. Anyway, the lock should be consistent in the PR).

Hi @ATheorell,

I've run the poetry lock --no-update command as suggested and pushed the updated lock file to the PR. Could you please verify if the changes in this commit-175a715 align with what you were expecting?

@ATheorell ATheorell merged commit 09a6870 into gpt-engineer-org:main Jan 7, 2024
3 checks passed
@captivus
Copy link
Collaborator

captivus commented Jan 7, 2024

Thanks for the hard work @similato87 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants