Skip to content

Conversation

@kate-osborn
Copy link
Contributor

This is a draft PR for pushing nginx configuration to connected agents.

This is not ready for a full review, some tests are broken, and others are missing altogether. There are still some open questions that are tagged as TODOs and others that I will call out in the comments. I would appreciate any feedback/opinions on these as you see them.

I'm looking for feedback on the overall design. The agent configuration is implemented using the Observer design pattern, specifically, the pull variation of Observer. The pull variation forces the ConfigStore to be thread-safe, as there will be many readers (agent connections) and one writer. This variation simplifies the update logic for agent connections because they pull the latest config whenever they are done waiting for a previous config apply to complete. However, I'm worried about lock contention. If we have many agents, will we create a bottleneck in the ConfigStore, with each agent trying to get the latest config? We could address this by using the push variation of the Observer pattern. In this variation, the ConfigStore would send the latest config on every Notify() call to all its observers (agent connections). This would move the locking from the ConfigStore to the individual agent connections. Each connection would save the latest config and queue a config update. This would require a lock for the config (could use atomic.Value) to prevent a race condition between a connection reader and writer. This variation could make the connection logic more complex but could be better overall.

@github-actions github-actions bot added dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation enhancement New feature or request labels Apr 4, 2023
@github-advanced-security
Copy link

You have successfully added a new Trivy configuration .github/workflows/ci.yml:build. As part of the setup process, we have scanned this repository and found no existing alerts. In the future, you will see all code scanning alerts on the repository Security tab.

if err != nil {
return err
h.cfg.Logger.Error(err, "error storing dataplane configuration")
// FIXME(kate-osborn): Update status to indicate that the gateway is not accepted or programmed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pleshakov how do you feel about making this task a separate story to reduce the scope of this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense to me!

c.logger.Info("Received agent connect request", "message ID", cmd.GetMeta().GetMessageId())
c.logger.Info("Received agent connect request")

c.logger = c.logger.WithValues("podName", req.GetMeta().DisplayName)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pleshakov can you think of any reason adding a value to the logger during runtime would cause a problem? I think this is ok, but I might be missing something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a data race (multiple go routings has access to c.logger and at least one of them is writing)

perhaps we can refactor the connection so that when it starts, it always waits for the connect to finish, it then extracts the pod name, updates the logger and starts the connection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we can refactor the connection so that when it starts, it always waits for the connect to finish, it then extracts the pod name, updates the logger and starts the connection?

Intriguing idea. I'll play around with it.

)

const (
// TODO: do we need another file mode for config files?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pleshakov thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we expect manual (not from the agent changes to files)? if not, I think we should all (conf and secrets) of them as 0o400

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we don't expect manual changes. I'll change to 400

// WriteAllRequestedSecrets writes all requested secrets to disk.
WriteAllRequestedSecrets() error
// GetAllRequestedSecrets returns all request secrets as Files.
GetAllRequestedSecrets() []File
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pleshakov

I left the WriteAllRequestedSecrets method because I'm toying with the idea of implementing the ConfigStorer, Subject, and Observer interfaces for configuring an nginx instance in the same Pod. That would allow us to benchmark the two approaches. What do you think?

Either way, I still need to rename this interface, or separate the Get functionality from the Write.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would allow us to benchmark the two approaches.

what kind of benchmarks do you have in mind?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure yet... it would probably be an e2e test where we benchmark the time it takes to update the config in the dataplane.

@kate-osborn kate-osborn requested review from a team and pleshakov and removed request for a team April 4, 2023 22:16
Copy link
Contributor

@pleshakov pleshakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I'm worried about lock contention. If we have many agents, will we create a bottleneck in the ConfigStore, with each agent trying to get the latest config?

is there a way to quickly test it?

We could address this by using the push variation of the Observer pattern. In this variation, the ConfigStore would send the latest config on every Notify() call to all its observers (agent connections). This would move the locking from the ConfigStore to the individual agent connections. Each connection would save the latest config and queue a config update. This would require a lock for the config (could use atomic.Value) to prevent a race condition between a connection reader and writer. This variation could make the connection logic more complex but could be better overall.

I believe the current approach has a locking problem (mentioned it in an inline comment), so additional locking logic is required either way.

Perhaps another approach here could be for agents to create a one-time watch. Something like below (note: canclelation is omitted for simplicity):

	var latestCfgID int
	
	for {
		cfg <- configStore.GetConfigWithGreaterID(latestCfgID)
		
		push(cfg)
		
		latestCfgID = cfg.ID
	}

That will also remove the need for Register and Remove methods.

if err != nil {
return err
h.cfg.Logger.Error(err, "error storing dataplane configuration")
// FIXME(kate-osborn): Update status to indicate that the gateway is not accepted or programmed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense to me!

c.logger.Info("Received agent connect request", "message ID", cmd.GetMeta().GetMessageId())
c.logger.Info("Received agent connect request")

c.logger = c.logger.WithValues("podName", req.GetMeta().DisplayName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a data race (multiple go routings has access to c.logger and at least one of them is writing)

perhaps we can refactor the connection so that when it starts, it always waits for the connect to finish, it then extracts the pod name, updates the logger and starts the connection?

)

const (
// TODO: do we need another file mode for config files?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we expect manual (not from the agent changes to files)? if not, I think we should all (conf and secrets) of them as 0o400

c.sendConfigApplyResponse(ctx, car)
}

func (c *connection) sendConfigApplyResponse(ctx context.Context, response configApplyResponse) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"send" is a bit misleading here, because we're not sending anything to the agent

}
}

func (u *NginxConfigBuilder) generateZippedFile(dir directory) (*proto.ZippedFile, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're not using the receiver u at all inside the function, right?

// ConfigStore implements the observer.Subject interface,
// so that it can notify the agent observers when the configuration changes.
// ConfigStore is thread-safe.
type ConfigStore struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense for this type to be more generalized? so that it doesn't deal with the NGK specific configuration. This way, it can potentially be reused in other projects, to configure a set of data plane replicas with the same configuration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that work? Would we just make this generic and have it store and return the any type?

I'm worried about extracting the building piece of this. I only want to build the nginx agent config once because it involves costly operations like checksums.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I could perform the dataplane.Configuration -> agent.NginxConfiguration before storing the config and notifying the observers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I could perform the dataplane.Configuration -> agent.NginxConfiguration before storing the config and notifying the observers.

yep

}

// Notify notifies all registered observers.
func (a *ConfigStore) Notify() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this method need to be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, my bad

@kate-osborn
Copy link
Contributor Author

@pleshakov this is ready for another review -- still a draft.

Since it's a significant refactor and it's been a while since your last review, would it be easier to open a fresh PR?

@pleshakov
Copy link
Contributor

@kate-osborn

Since it's a significant refactor and it's been a while since your last review, would it be easier to open a fresh PR?

that makes sense

@kate-osborn
Copy link
Contributor Author

Will reopen

@kate-osborn kate-osborn closed this May 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants