Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate crewAI-training, see how it could apply to our agents #298

Closed
evangriffiths opened this issue Jul 2, 2024 · 3 comments
Closed

Comments

@evangriffiths
Copy link
Contributor

https://docs.crewai.com/core-concepts/Training-Crew

Have only had a glance so far. Looks like it does some form of RL to tune the prompts! This is exactly the kind of feature we want for our PM agents to learn over time!

@evangriffiths
Copy link
Contributor Author

Here's the commit where the training feature is added crewAIInc/crewAI@175d5b3.

My understanding of how it works:

  • it builds on top of an existing 'human feedback' feature that crewai has, where, if enabled:
    • when an agent has finished its task, it gives the proposed output to the user, who can give feedback
    • the agent re-runs with this feedback to produce the final output
  • crew.train turns this into an iterative process, where it runs the agent normally, with the 'human feedback' feature enabled, but...
  • each iteration the proposed output, the human feedback, and the final output are appended to a .pkl file
  • and each iteration this is loaded and appended to the prompt for each agent
  • then at the end of the training process, the contents of the .pkl file (the "training data") for each agent are sythesized/summarized into useful learnings, and these are saved to a separate .pkl file
  • now when the crew is run normally, outside of the train process, it loads the contents of the second .pkl file into the agent's prompt

All-in-all, this is quite similar to what we were trying to do with RememberPastActions, but with important differences:

  • it uses human-in-the-loop to evaluate agent actions -- we don't have any explicit functionality atm for the agent to evaluate its strategy.
  • it is way more structured -- i.e. we just let the agent decide when to do RememberPastActions, but here the human gives feedback every iteration, and this is appended to the prompt in a fixed way.

I haven't tried crewai.train, so don't know if it works that well, but I think we could take inspiration from this for the general agent for sure.

@evangriffiths
Copy link
Contributor Author

evangriffiths commented Jul 18, 2024

Overlaps with #207. Think about these issues together

@evangriffiths
Copy link
Contributor Author

I've written up an idea for how to apply the learnings from this feature in a new issue here #370, so closing this issue.

@kongzii kongzii removed this from the General Agent - Task Queue milestone Aug 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants