📖 Related article: Evaluating code generation agents—LangChain and CodeChain
This is a demonstration of how to run HumanEval on GPT-3.5 and GPT-4 while taking advantage of LangSmith's visibility and tracing features:
- human-eval: Fork of OpenAI's HumanEval framework used in this workflow.
- humaneval-results: Repository of HumanEval solutions generated with this workflow.
- codechain: A simple library for generating code with LLMs.
- agenteval: Early version of a framework for evaluating code generation agents.