Mikec/disable_braintrust #76

mikecann · 2025-08-21T05:18:34Z

This PR adds the ability to "disable braintrust" that is to entirely skip reporting results to Braintrust and to exclude using the brantrust AI proxy.

The motivation for this is that I wanted to be able to run some experiments locally without polluting Braintrust or using up Braintrust tokens.

I also was getting frustrating errors regularly from Braintrust:

Retrying API request experiment-comparison2 {'experiment_id': 'ef7e6686-1643-4f75-9390-06322be77788', 'base_experiment_id': '51b388e1-383d-4476-886c-ac175c765897'} 504 {"message": "Endpoint request timed out"}

Also I am planning on a "self-improving" AI loop that would run over an extended period of time as per: #31 and thus I dont want to report to Braintrust during that process.

There are more changes than I would like in here but while I was at it I made some other additional changes that help with tracking down the cause of eval failures.

Now there are more high-level log lines that are reported to the console so you know what is happening, then when an eval finishes you are given a link to view the eval output directory.

Inside that directory is now a run.log file that gives you a log of what happened so you can track down the failures. This will be very important for a self-improving agent.

There is also a nice "Eval Summary" reported to the console each run:

I should be clear that this PR should not affect normal running and reporting to braintrust as I have tested that it still works (https://www.braintrust.dev/app/Convex/p/Convex%20Coding) this simply adds a new env var that you can optionally use to disable braintrust.

jordanhunt22

in general, looks good. but a decent amount of comments about code style inside

jordanhunt22 · 2025-08-22T09:22:22Z

runner/logging.py

+def sanitize_output(text: str) -> str:
+    try:
+        # Remove ANSI CSI, OSC (BEL/ST terminated), hyperlinks, and 7-bit C1 escapes
+        patterns = [
+            r"\x1B\[[0-?]*[ -/]*[@-~]",   # CSI sequences
+            r"\x1B\][^\x07]*\x07",       # OSC sequences terminated by BEL
+            r"\x1B\]8;;.*?\x1B\\",      # OSC 8 hyperlinks (ST-terminated)
+            r"\x1B[@-Z\\-_]",            # 7-bit C1 escapes
+        ]
+        out = text
+        for p in patterns:
+            out = re.sub(p, "", out)
+        return out
+    except Exception:


what is the reason for doing this?

Windows thing. I was getting strange character outputs

jordanhunt22 · 2025-08-22T09:23:49Z

runner/models/model_codegen.py

+            if model.provider == ModelProvider.OPENAI:
+                url = "https://api.openai.com/v1"
+            elif model.provider == ModelProvider.ANTHROPIC:
+                url = "https://api.anthropic.com/v1"
+            elif model.provider == ModelProvider.TOGETHER:
+                url = "https://api.together.xyz/v1"
+            elif model.provider == ModelProvider.GOOGLE:
+                url = "https://generativelanguage.googleapis.com/v1beta"
+            elif model.provider == ModelProvider.XAI:
+                url = "https://api.x.ai/v1"
+            else:


a match statement here would probably be a bit cleaner

sure, ye I unfortunately dont know a lot about python, I was wondering if this would be better as some sort of pattern match. Ill get that fixed up

jordanhunt22 · 2025-08-22T09:27:13Z

runner/scorer.py


-    try:
-        generate_code(output_project_dir_abs)
+    print(f"[{category}/{name}] Running convex codegen", flush=True)


should these log lines be behind an env variable? it seems like they would be pretty noisy if they are on by default. especially if you run these tests locally

Okay ill put them behind an env var but I really wanted more visibility as to what the run is doing otherwise it would seemingly hang for a long time with no output

jordanhunt22 · 2025-08-22T09:29:28Z

runner/scorer.py

 def typecheck_code(project_dir):
+    results = []
    convex_dir = os.path.abspath(os.path.join(project_dir, "convex"))
+    cmd1 = ["bunx", "tsc", "-noEmit", "-p", convex_dir]


can we add better names here instead of cmd1 abd cmd2

jordanhunt22 · 2025-08-22T09:29:54Z

runner/scorer.py

 def lint_code(project_dir):
+    results = []
    eslint_config = os.path.abspath("eslint.config.mjs")
+    cmd1 = ["bunx", "eslint", "-c", eslint_config, "convex"]


same comment as above

jordanhunt22 · 2025-08-22T09:33:31Z

runner/scorer.py


-    try:
-        install_dependencies(output_project_dir_abs)
+    def run_command_step(log_path, handler, prefix, error_label, *, cmd_prefix=""):


should we define this function in the same place as log_cmd_results down below?

mikecann · 2025-08-25T01:46:18Z

okay thanks for the review @jordanhunt22 I have now implemented the things mentioned above :)

jordanhunt22

LGTM besides a few small comments. Great work!

jordanhunt22 · 2025-08-28T01:15:15Z

README.md

+- Per-eval result with ✅/❌ and a clickable output dir
+- `local_results.jsonl` plus a `run.log` in each eval’s output directory
+
+Optional Convex summary posting (still local mode): set both `CONVEX_EVAL_ENDPOINT` and `CONVEX_AUTH_TOKEN`.


we should probably disable this when running locally

jordanhunt22 · 2025-08-28T01:16:44Z

runner/convex_backend.py


        url = matching_asset["browser_download_url"]
-        print("Downloading:", url)
+        log_info("Downloading:", url)


i think we should still always print out the log lines that we had previously with print. but, for all the new ones, use the log_info pattern. the logging before felt like it was useful to see progress

mikecann · 2025-08-29T02:06:42Z

whoops, sorry I did fix those things you mentioned there but did it on the wrong branch, ill commit those changes directly to main now

mikecann added 18 commits August 13, 2025 15:53

version

57ce94e

wip

0d563c2

seems to work and output results

55db9ea

more logs

e286408

wip

128e15e

pretty printing to console

c841b72

pulled out some reporting stuff into its own file

fdd16f0

logging pass or vail to console

3699d07

fixed test run

308249f

updated readme

ab955c0

more minial logging

4573cb4

more wip

dd1b219

local mode and braintrust mode

4378b8e

makeing local mode null out the key

c4ab0b8

updated readme again

790f8b2

catching potential future fails

f32c004

wip

5f41834

wip

ebeb9ca

mikecann requested a review from jordanhunt22 August 21, 2025 05:18

jordanhunt22 reviewed Aug 22, 2025

View reviewed changes

mikecann added 4 commits August 25, 2025 09:26

replacing if statements with a match

fb539d4

putting logging behind an env var

1829352

moved logging

5e64fca

fixed backend loading issues on windows

07ced69

mikecann mentioned this pull request Aug 27, 2025

Mikec/upgrade graders #77

Open

jordanhunt22 approved these changes Aug 28, 2025

View reviewed changes

mikecann merged commit 9d4fb0f into main Aug 29, 2025

Mikec/disable_braintrust #76

Mikec/disable_braintrust #76

Uh oh!

Conversation

mikecann commented Aug 21, 2025

Uh oh!

jordanhunt22 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikecann commented Aug 25, 2025

Uh oh!

jordanhunt22 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mikecann commented Aug 29, 2025

Uh oh!

Uh oh!