-
Notifications
You must be signed in to change notification settings - Fork 6
Mikec/disable_braintrust #76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general, looks good. but a decent amount of comments about code style inside
def sanitize_output(text: str) -> str: | ||
try: | ||
# Remove ANSI CSI, OSC (BEL/ST terminated), hyperlinks, and 7-bit C1 escapes | ||
patterns = [ | ||
r"\x1B\[[0-?]*[ -/]*[@-~]", # CSI sequences | ||
r"\x1B\][^\x07]*\x07", # OSC sequences terminated by BEL | ||
r"\x1B\]8;;.*?\x1B\\", # OSC 8 hyperlinks (ST-terminated) | ||
r"\x1B[@-Z\\-_]", # 7-bit C1 escapes | ||
] | ||
out = text | ||
for p in patterns: | ||
out = re.sub(p, "", out) | ||
return out | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the reason for doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Windows thing. I was getting strange character outputs
runner/models/model_codegen.py
Outdated
if model.provider == ModelProvider.OPENAI: | ||
url = "https://api.openai.com/v1" | ||
elif model.provider == ModelProvider.ANTHROPIC: | ||
url = "https://api.anthropic.com/v1" | ||
elif model.provider == ModelProvider.TOGETHER: | ||
url = "https://api.together.xyz/v1" | ||
elif model.provider == ModelProvider.GOOGLE: | ||
url = "https://generativelanguage.googleapis.com/v1beta" | ||
elif model.provider == ModelProvider.XAI: | ||
url = "https://api.x.ai/v1" | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a match statement here would probably be a bit cleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, ye I unfortunately dont know a lot about python, I was wondering if this would be better as some sort of pattern match. Ill get that fixed up
runner/scorer.py
Outdated
|
||
try: | ||
generate_code(output_project_dir_abs) | ||
print(f"[{category}/{name}] Running convex codegen", flush=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should these log lines be behind an env variable? it seems like they would be pretty noisy if they are on by default. especially if you run these tests locally
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay ill put them behind an env var but I really wanted more visibility as to what the run is doing otherwise it would seemingly hang for a long time with no output
runner/scorer.py
Outdated
def typecheck_code(project_dir): | ||
results = [] | ||
convex_dir = os.path.abspath(os.path.join(project_dir, "convex")) | ||
cmd1 = ["bunx", "tsc", "-noEmit", "-p", convex_dir] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add better names here instead of cmd1
abd cmd2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
runner/scorer.py
Outdated
def lint_code(project_dir): | ||
results = [] | ||
eslint_config = os.path.abspath("eslint.config.mjs") | ||
cmd1 = ["bunx", "eslint", "-c", eslint_config, "convex"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above
runner/scorer.py
Outdated
|
||
try: | ||
install_dependencies(output_project_dir_abs) | ||
def run_command_step(log_path, handler, prefix, error_label, *, cmd_prefix=""): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we define this function in the same place as log_cmd_results
down below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure!
okay thanks for the review @jordanhunt22 I have now implemented the things mentioned above :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM besides a few small comments. Great work!
- Per-eval result with ✅/❌ and a clickable output dir | ||
- `local_results.jsonl` plus a `run.log` in each eval’s output directory | ||
|
||
Optional Convex summary posting (still local mode): set both `CONVEX_EVAL_ENDPOINT` and `CONVEX_AUTH_TOKEN`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should probably disable this when running locally
|
||
url = matching_asset["browser_download_url"] | ||
print("Downloading:", url) | ||
log_info("Downloading:", url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we should still always print out the log lines that we had previously with print
. but, for all the new ones, use the log_info
pattern. the logging before felt like it was useful to see progress
whoops, sorry I did fix those things you mentioned there but did it on the wrong branch, ill commit those changes directly to main now |
This PR adds the ability to "disable braintrust" that is to entirely skip reporting results to Braintrust and to exclude using the brantrust AI proxy.
The motivation for this is that I wanted to be able to run some experiments locally without polluting Braintrust or using up Braintrust tokens.
I also was getting frustrating errors regularly from Braintrust:
Also I am planning on a "self-improving" AI loop that would run over an extended period of time as per: #31 and thus I dont want to report to Braintrust during that process.
There are more changes than I would like in here but while I was at it I made some other additional changes that help with tracking down the cause of eval failures.
Now there are more high-level log lines that are reported to the console so you know what is happening, then when an eval finishes you are given a link to view the eval output directory.
Inside that directory is now a
run.log
file that gives you a log of what happened so you can track down the failures. This will be very important for a self-improving agent.There is also a nice "Eval Summary" reported to the console each run:
I should be clear that this PR should not affect normal running and reporting to braintrust as I have tested that it still works (https://www.braintrust.dev/app/Convex/p/Convex%20Coding) this simply adds a new env var that you can optionally use to disable braintrust.