Skip to content

Conversation

@timmc-edx
Copy link
Contributor

@timmc-edx timmc-edx commented Apr 14, 2025

  • Ensure we always run both local and remote arms of the codejail service darklaunch, even on unexpected errors.
  • Include exceptions in logged info.
  • Fix bug where local exec polluted remote globals.

Copy link
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that no test changes make me wonder if we have poor test coverage of our darklaunch features. Could that be enhanced?

@timmc-edx timmc-edx changed the title Restructure codejail darklaunch to better capture errors Restructure codejail darklaunch to better capture errors, fix globals pollution Apr 15, 2025
),
call(
"Local execution in darklaunch mode produces globals={}, "
"emsg='division by zero', exception=ZeroDivisionError('division by zero')"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this ZeroDivisionError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...huh. That's left over from a previous iteration of the code. I'll have to figure out why that didn't cause an assertion error. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fixed now. I pushed up a commit, but checks aren't running. I'm going to rebase onto master and squash the fixups -- maybe this will also cause actions to re-run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(github is being weird and not showing the fix commit from before I squash it, so here's an alternative link to see the latest change: fc79748)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to assume that the old assertion for ZeroDivisionError started failing after you made your test fix, or did you change everything together without ensuring that the assertion actually works?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I left the bug in place long enough to check that it failed under the new structure.

It's so weird. Before making the fix, I set a breakpoint to find out what was going on, and I still don't understand. The assertion for the logs was raising an AssertionError, and it propagated up to the pytest.raises... but somehow pytest.raises was saying "yup, that's a BaseException with message matching unexpected".

Now that I'm not using pytest.raises, the assertion error propagates normally.

…exec

During dark launch of remote codejail, we want to ensure we always run both
local and remote execution -- otherwise we're missing data for the remote
side in an important situation.

This will help answer the question of whether the unexpected exception
happens on both sides, even though it may not look exactly the same due to
differences in how unexpected errors are handled.

An example input that provokes this in unsafe execution mode is
`raise BaseException("hi")`; in safe execution mode, printing to
`sys.__stdout__` should also produce an appropriate error.
- Catch all exceptions, not just Exception, to better prevent errors from
  interfering with mainline responses.
- Introduce a separate try block around the monitoring code so that bugs
  there don't cause issues.
- Print exception information as well for both sides (but only if not a
  SafeExecException, which is redundant with emsg).

Some formatting changes to log messages as well.

Example outputs:

For `1/0`:

```
2025-04-14 17:26:34,239 INFO 10232 [xmodule.capa.safe_exec.safe_exec] [user 3] [ip 172.18.0.1] safe_exec.py:240 - Remote execution in darklaunch mode produces globals={'expect': None, 'ans': '1/0'}, emsg=None, exception=None
2025-04-14 17:26:34,239 INFO 10232 [xmodule.capa.safe_exec.safe_exec] [user 3] [ip 172.18.0.1] safe_exec.py:245 - Local execution in darklaunch mode produces globals={'expect': None, 'ans': '1/0'}, emsg='ZeroDivisionError: division by zero', exception=None
```

For `raise BaseException("hi")`:

```
2025-04-14 17:26:13,359 INFO 10232 [xmodule.capa.safe_exec.safe_exec] [user 3] [ip 172.18.0.1] safe_exec.py:240 - Remote execution in darklaunch mode produces globals={'expect': None, 'ans': 'raise BaseException("hi")'}, emsg=None, exception=None
2025-04-14 17:26:13,359 INFO 10232 [xmodule.capa.safe_exec.safe_exec] [user 3] [ip 172.18.0.1] safe_exec.py:245 - Local execution in darklaunch mode produces globals={'expect': None, 'ans': 'raise BaseException("hi")'}, emsg='hi', exception=BaseException('hi')
```

With codejail-service down, and `out = 1 + 2`:

```
2025-04-14 17:30:28,597 INFO 12484 [xmodule.capa.safe_exec.safe_exec] [user 3] [ip 172.18.0.1] safe_exec.py:241 - Remote execution in darklaunch mode produces globals={'expect': None, 'ans': 'out = 1 + 2', 'out': 3, 'cfn_return': {'input_list': [{'ok': True, 'msg': 'Output:\n3', 'grade_decimal': 1}]}}, emsg=None, exception=CodejailServiceUnavailable('Codejail API Service is unavailable. Please try again in a few minutes.')
2025-04-14 17:30:28,597 INFO 12484 [xmodule.capa.safe_exec.safe_exec] [user 3] [ip 172.18.0.1] safe_exec.py:246 - Local execution in darklaunch mode produces globals={'expect': None, 'ans': 'out = 1 + 2', 'out': 3, 'cfn_return': {'input_list': [{'ok': True, 'msg': 'Output:\n3', 'grade_decimal': 1}]}}, emsg=None, exception=None
```
- Separate test for misconfiguration
- Add helper method for generic dark launch testing
- Test two darklaunch scenarios: Globals interference, and error that would
  previously have caused the remote side not to run
- Rename mocks to have our usual `mock_` prefix
We were running local exec before making the copy of globals_dict for
remote_exec, so remote exec has been getting a polluted version of the
globals.
Copy link
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question to ensure the test does what we want now. Other than that, looks good. Thanks.

@timmc-edx timmc-edx merged commit 3fac56b into master Apr 16, 2025
49 checks passed
@timmc-edx timmc-edx deleted the timmc/cj-dl-more branch April 16, 2025 13:51
@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

1 similar comment
@edx-pipeline-bot
Copy link
Contributor

2U Release Notice: This PR has been deployed to the edX production environment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants