Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create machine logs to support AI retraining #585

Merged
merged 14 commits into from
May 24, 2024

Conversation

jlewi
Copy link
Contributor

@jlewi jlewi commented May 23, 2024

To support collecting implicit feedback and retraining an AI we need to create machine readable logs that contain the actual command executed.

The design is described in #574.

With this change and ai logs enabled here is what the logs look like.

On the console a sample log message looks like the following in dev mode

2024-05-22T18:05:57.469-0700	INFO	cmd/server.go:98	started listening	{"addr": "127.0.0.1:7890"}
{"level":"info","ts":1716508179.861355,"caller":"runner/service.go:231","msg":"received initial request","_id":"01HYKXRFCN5W0NAM6A0ZT1HA36","knownID":"01HYKXRFCM4635SGV5FZ7K519R","req":{"commandMode":"COMMAND_MODE_INLINE_SHELL","commands":["echo 1","sleep 1","echo 2"],"knownId":"01HYKXRFCM4635SGV5FZ7K519R","programName":"bash"}}

In non dev mode the console output is

{"level":"info","ts":1716426636.145299,"caller":"cmd/server.go:98","msg":"started listening","addr":"127.0.0.1:7890"}

In the JSON log we get a log message like the following

{"level":"info","ts":1716426810.6269372,"caller":"cmd/server.go:98","function":"github.com/stateful/runme/v3/internal/cmd.serverCmd.func2","msg":"started listening","addr":"127.0.0.1:7890"}
{"level":"info","ts":1716426817.861532,"caller":"runner/service.go:207","function":"github.com/stateful/runme/v3/internal/runner.(*runnerService).Execute","msg":"running Execute in runnerService","_id":"01HYHG5GA5YQDWGN4RG5DQ0B29"}
{"level":"info","ts":1716508331.223864,"caller":"runner/service.go:231","function":"github.com/stateful/runme/v3/internal/runner.(*runnerService).Execute","msg":"received initial request","_id":"01HYKXX36Q3DDH8HBK18STME3M","req":{"commands":["ls"]}}

Note that "req" is logged as a nested json object; its not string field containing the serialized JSON of the request.

Prior to this change here's what the console logs look like.

In dev mode:

2024-05-22T18:02:20.122-0700	INFO	cmd/server.go:97	started listening	{"addr": "127.0.0.1:7890"}

In non dev mode:

{"level":"info","ts":1716426314.946205,"caller":"cmd/server.go:97","msg":"started listening","addr":"127.0.0.1:7890"}
  • Rename GetDefaultConfigHome -> GetUserConfigHome
    • This makes it clear the directory is for user configuration and not configuration that should be stored in project/repository configuration.

Related to #574

jlewi added 5 commits May 22, 2024 17:58
To support collecting implicit feedback and retraining an AI we need to create
machine readable logs that contain the actual command executed.

The design is described in stateful#574.

With this change and ai logs enabled here is what the logs look like.

On the console a sample log message looks like the following

1.716425780157274e+09	info	runner/service.go:233	received initial request	{"_id": "01HYHF5TXWTPV0ZGSQP9WKMZP9", "req": "{\"commands\":[\"ls\"]}"}

In the JSON log we get a log message like the following

{"severity":"info","time":1716425780.157274,"caller":"runner/service.go:233","function":"github.com/stateful/runme/v3/internal/runner.(*runnerService).Execute","message":"received initial request","_id":"01HYHF5TXWTPV0ZGSQP9WKMZP9","req":"{\"commands\":[\"ls\"]}"}

Related to stateful#574
@jlewi jlewi marked this pull request as ready for review May 23, 2024 01:36
@jlewi
Copy link
Contributor Author

jlewi commented May 23, 2024

@sourishkrout This is ready for you.

@sourishkrout sourishkrout self-requested a review May 23, 2024 15:06
Copy link
Member

@sourishkrout sourishkrout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense 👍; I've added some comments I'd like to address before merging.

cmd.Flags().StringVar(&tlsDir, "tls", defaultTLSDir, "Directory in which to generate TLS certificates & use for all incoming and outgoing messages")

cmd.Flags().StringVar(&configDir, configDirF, GetDefaultConfigHome(), "If ai logs is enabled logs will be written to ${config-dir}/logs")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to reuse fLogFilePath as base? The config home is a bit weird on darwin (and frankly, tentative). Also, as part of the beta commands (incomplete alpha) we are migrating config away from a central location to a runme.yaml that resides inside a project/repository.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think we can break this down into separate questions

  1. What's the default location for logs? e.g. is it ${CONFIGDIR}/logs?
  2. What flags control the location of the logs?

Regarding the first question.

  • I don't think we want the default value to be a temporary directory because the logs are too valuable to be lost when the temporary directory gets cleaned up

we are migrating config away from a central location to a runme.yaml that resides inside a project/repository

Are you suggesting the logs get stored relative to the project/repo specific runme.yaml? Or is this just an indication that the runme user configuration is going away in favor of per project/repository configuration.

Either way; I think of log location as being configured per user not per project/repository. Which implies we need some per user directory.

The config home is a bit weird on darwin

What about ${HOME}/.runme for runme user configuration and ${HOME}/.runme/logs

Two; I think once we agree on where logs should be stored we can decide if it makes sense to change fLogFilePath to use it and if so use it to compute the location of the AI Logs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point here, especially that it's a user vs a server log. That really answers both questions 1. & 2. for me. Here's my suggestion, we roll with your code change as is, however, we rename GetDefaultConfigHome to GetUserConfigHome to make the difference in base directories more apparent and future proof @jlewi?

Ideally GetDefaultConfigHome / GetUserConfigHome would actually be rooted in the home directory (on all platforms) but we can fix that at a later time. I'll create a bug ticket for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

logger.Error("failed to marshal request", zap.Error(err))
}

// We want to always log the request because it is used for AI training.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to add a tests for his in https://github.com/jlewi/runme/blob/jlewi/logs/internal/runner/service_test.go - the reason being is that I'm currently in the middle of charting a migration from runnerv1 (this impl) to runnerv2 (alpha; future) and the tests serve as a spec to not have things fall thru the cracks.

runnerv2 lives over here https://github.com/jlewi/runme/tree/jlewi/logs/internal/runnerv2service but instead of syncing the service impl, I'd rather have a comprehensive test suite in runnerv1 as guard rails for filling in all the gaps.

I'm happy to take a stab at adding a test case if you need a template.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me take a look at this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sourishkrout added a test. The test passes locally when I run it individually and the tests appear to be green in CI. However, as noted in #587 I'm seeing test failures when running locally

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look at the test failures. Some of the newly introduced tests for runnerv2 have flaking issue which is something we're working on separately. They haven't gone through the hardening of runnerv1 tests. I'll take a look to get some clarity here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the #587 expedition. After rerunning the C/I job it looks like it's indeed the flaking issue we are working on.

@jlewi
Copy link
Contributor Author

jlewi commented May 24, 2024

@sourishkrout PTAL; I did the rename of the logging directory.

@sourishkrout sourishkrout self-requested a review May 24, 2024 17:06
Copy link
Member

@sourishkrout sourishkrout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅

@sourishkrout sourishkrout merged commit 7d1c469 into stateful:main May 24, 2024
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants