Create machine logs to support AI retraining #585

jlewi · 2024-05-23T01:03:32Z

To support collecting implicit feedback and retraining an AI we need to create machine readable logs that contain the actual command executed.

The design is described in #574.

With this change and ai logs enabled here is what the logs look like.

On the console a sample log message looks like the following in dev mode

2024-05-22T18:05:57.469-0700	INFO	cmd/server.go:98	started listening	{"addr": "127.0.0.1:7890"}
{"level":"info","ts":1716508179.861355,"caller":"runner/service.go:231","msg":"received initial request","_id":"01HYKXRFCN5W0NAM6A0ZT1HA36","knownID":"01HYKXRFCM4635SGV5FZ7K519R","req":{"commandMode":"COMMAND_MODE_INLINE_SHELL","commands":["echo 1","sleep 1","echo 2"],"knownId":"01HYKXRFCM4635SGV5FZ7K519R","programName":"bash"}}

In non dev mode the console output is

{"level":"info","ts":1716426636.145299,"caller":"cmd/server.go:98","msg":"started listening","addr":"127.0.0.1:7890"}

In the JSON log we get a log message like the following

{"level":"info","ts":1716426810.6269372,"caller":"cmd/server.go:98","function":"github.com/stateful/runme/v3/internal/cmd.serverCmd.func2","msg":"started listening","addr":"127.0.0.1:7890"}
{"level":"info","ts":1716426817.861532,"caller":"runner/service.go:207","function":"github.com/stateful/runme/v3/internal/runner.(*runnerService).Execute","msg":"running Execute in runnerService","_id":"01HYHG5GA5YQDWGN4RG5DQ0B29"}
{"level":"info","ts":1716508331.223864,"caller":"runner/service.go:231","function":"github.com/stateful/runme/v3/internal/runner.(*runnerService).Execute","msg":"received initial request","_id":"01HYKXX36Q3DDH8HBK18STME3M","req":{"commands":["ls"]}}

Note that "req" is logged as a nested json object; its not string field containing the serialized JSON of the request.

Prior to this change here's what the console logs look like.

In dev mode:

2024-05-22T18:02:20.122-0700	INFO	cmd/server.go:97	started listening	{"addr": "127.0.0.1:7890"}

In non dev mode:

{"level":"info","ts":1716426314.946205,"caller":"cmd/server.go:97","msg":"started listening","addr":"127.0.0.1:7890"}

Rename GetDefaultConfigHome -> GetUserConfigHome
- This makes it clear the directory is for user configuration and not configuration that should be stored in project/repository configuration.

Related to #574

To support collecting implicit feedback and retraining an AI we need to create machine readable logs that contain the actual command executed. The design is described in stateful#574. With this change and ai logs enabled here is what the logs look like. On the console a sample log message looks like the following 1.716425780157274e+09 info runner/service.go:233 received initial request {"_id": "01HYHF5TXWTPV0ZGSQP9WKMZP9", "req": "{\"commands\":[\"ls\"]}"} In the JSON log we get a log message like the following {"severity":"info","time":1716425780.157274,"caller":"runner/service.go:233","function":"github.com/stateful/runme/v3/internal/runner.(*runnerService).Execute","message":"received initial request","_id":"01HYHF5TXWTPV0ZGSQP9WKMZP9","req":"{\"commands\":[\"ls\"]}"} Related to stateful#574

jlewi · 2024-05-23T01:36:55Z

@sourishkrout This is ready for you.

sourishkrout

This makes sense 👍; I've added some comments I'd like to address before merging.

sourishkrout · 2024-05-23T15:42:18Z

internal/cmd/server.go

 	cmd.Flags().StringVar(&tlsDir, "tls", defaultTLSDir, "Directory in which to generate TLS certificates & use for all incoming and outgoing messages")
-
+	cmd.Flags().StringVar(&configDir, configDirF, GetDefaultConfigHome(), "If ai logs is enabled logs will be written to ${config-dir}/logs")


Any reason not to reuse fLogFilePath as base? The config home is a bit weird on darwin (and frankly, tentative). Also, as part of the beta commands (incomplete alpha) we are migrating config away from a central location to a runme.yaml that resides inside a project/repository.

So I think we can break this down into separate questions

What's the default location for logs? e.g. is it ${CONFIGDIR}/logs?

What flags control the location of the logs?

Regarding the first question.

I don't think we want the default value to be a temporary directory because the logs are too valuable to be lost when the temporary directory gets cleaned up

we are migrating config away from a central location to a runme.yaml that resides inside a project/repository

Are you suggesting the logs get stored relative to the project/repo specific runme.yaml? Or is this just an indication that the runme user configuration is going away in favor of per project/repository configuration.

Either way; I think of log location as being configured per user not per project/repository. Which implies we need some per user directory.

The config home is a bit weird on darwin

What about ${HOME}/.runme for runme user configuration and ${HOME}/.runme/logs

Two; I think once we agree on where logs should be stored we can decide if it makes sense to change fLogFilePath to use it and if so use it to compute the location of the AI Logs.

I see your point here, especially that it's a user vs a server log. That really answers both questions 1. & 2. for me. Here's my suggestion, we roll with your code change as is, however, we rename GetDefaultConfigHome to GetUserConfigHome to make the difference in base directories more apparent and future proof @jlewi?

Ideally GetDefaultConfigHome / GetUserConfigHome would actually be rooted in the home directory (on all platforms) but we can fix that at a later time. I'll create a bug ticket for it.

sourishkrout · 2024-05-23T15:46:48Z

internal/runner/service.go

+		logger.Error("failed to marshal request", zap.Error(err))
+	}
+
+	// We want to always log the request because it is used for AI training.


We should try to add a tests for his in https://github.com/jlewi/runme/blob/jlewi/logs/internal/runner/service_test.go - the reason being is that I'm currently in the middle of charting a migration from runnerv1 (this impl) to runnerv2 (alpha; future) and the tests serve as a spec to not have things fall thru the cracks.

runnerv2 lives over here https://github.com/jlewi/runme/tree/jlewi/logs/internal/runnerv2service but instead of syncing the service impl, I'd rather have a comprehensive test suite in runnerv1 as guard rails for filling in all the gaps.

I'm happy to take a stab at adding a test case if you need a template.

Let me take a look at this.

@sourishkrout added a test. The test passes locally when I run it individually and the tests appear to be green in CI. However, as noted in #587 I'm seeing test failures when running locally

I'll take a look at the test failures. Some of the newly introduced tests for runnerv2 have flaking issue which is something we're working on separately. They haven't gone through the hardening of runnerv1 tests. I'll take a look to get some clarity here.

Apologies for the #587 expedition. After rerunning the C/I job it looks like it's indeed the flaking issue we are working on.

jlewi · 2024-05-24T15:43:48Z

@sourishkrout PTAL; I did the rename of the logging directory.

sourishkrout

LGTM ✅

jlewi added 5 commits May 22, 2024 17:58

Fix lint and logging field names.

9356ac4

Tidy imports.

958f619

Run gofumpt.

b42342c

Fix permission.

d5df867

jlewi marked this pull request as ready for review May 23, 2024 01:36

sourishkrout self-requested a review May 23, 2024 15:06

sourishkrout requested changes May 23, 2024

View reviewed changes

jlewi added 8 commits May 23, 2024 16:00

Work on a unittest for the logging.

c9dad20

Uncomment the test.

5a8807b

Test is passing again.

52f5c99

Fix test.

83bf525

Fix imports

507a96e

Auto format.

c155f8d

Update test.

d293afd

Rename GetDefaultConfigHome -> GetUserConfigHome

07a1b4c

Merge branch 'main' into jlewi/logs

8c5bc51

sourishkrout self-requested a review May 24, 2024 17:06

sourishkrout approved these changes May 24, 2024

View reviewed changes

sourishkrout merged commit 7d1c469 into stateful:main May 24, 2024
4 of 5 checks passed

jlewi mentioned this pull request May 24, 2024

Support using Foyle's AI capabilities as a plugin #574

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create machine logs to support AI retraining #585

Create machine logs to support AI retraining #585

jlewi commented May 23, 2024 •

edited

Loading

jlewi commented May 23, 2024

sourishkrout left a comment

sourishkrout May 23, 2024

jlewi May 23, 2024

sourishkrout May 24, 2024

jlewi May 24, 2024

sourishkrout May 23, 2024

jlewi May 23, 2024

jlewi May 24, 2024

sourishkrout May 24, 2024

sourishkrout May 24, 2024

jlewi commented May 24, 2024

sourishkrout left a comment

		cmd.Flags().StringVar(&tlsDir, "tls", defaultTLSDir, "Directory in which to generate TLS certificates & use for all incoming and outgoing messages")

		cmd.Flags().StringVar(&configDir, configDirF, GetDefaultConfigHome(), "If ai logs is enabled logs will be written to ${config-dir}/logs")

Create machine logs to support AI retraining #585

Create machine logs to support AI retraining #585

Conversation

jlewi commented May 23, 2024 • edited Loading

jlewi commented May 23, 2024

sourishkrout left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jlewi commented May 24, 2024

sourishkrout left a comment

Choose a reason for hiding this comment

jlewi commented May 23, 2024 •

edited

Loading