Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sds api: implement SDS api #3700

Closed
wants to merge 1 commit into from
Closed

Conversation

JimmyCYJ
Copy link
Member

@JimmyCYJ JimmyCYJ commented Jun 22, 2018

Implement SDS api which fetches dynamic secrets.

Description:
Implement SDS api that fetches secrets from remote SDS server. Secrets are stored in Secret Manager.
Updating secrets at listener and cluster will follow.

From the high level design PR: #3748,
or a smaller PR (istio#4) which has removed the changes from this PR and #3754.

Here is the lifetime and ownership of DynamicSecretProvider.

  1. Same config_source shares same DynamicSecretProvider object.
  2. Each ContextConfigImpl owns a ref_count to the DynamicSecretProvider object.
  3. ContextConfigImpl is owned by SslSocketFactory which is owned by ListenImpl.

When all ListenImpl instances are going away, DynamicSecretProvider object will be deleted.

Here is the relationship between SecretCallback and DynamicSecretProvider.

  1. Multiple ContextConfigImpl instances may share same DynamicSecretProvider If they have same config_source.
  2. SslSocketFactory owns ContextConfigImpl and Context. It implements SecretCallback interface to wait for dynamic secret update. The callback is registered to DynamicSecretProvider. When the secret is updated, the callback will be called. SslSocketFactory will create a new Context, the new connections will start to use the new Context.
  3. Context is shared_ptr. The ssl_socket is holding a ref_count of Context to keep the Context it is using alive.

Risk Level: Low

Fixes #1194

Signed-off-by: Jimmy Chen jimmychen.0102@gmail.com

name: "abc.com"
tls_certificate:
certificate_chain:
filename: "test/config/integration/certs/cacert.pem"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use {{ test_rundir }} and substitute, same for readFileToStringForTest below

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks!

// TODO(jaebong): replace next line with
// "envoy.service.discovery.v2.SecretDiscoveryService.StreamSecrets" to support streaming
// service
"envoy.service.discovery.v2.SecretDiscoveryService.FetchSecrets");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just use StreamSecrets here, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to StreamSecrets and removed TODO.

@lizan lizan self-assigned this Jun 22, 2018
@lizan lizan requested a review from PiotrSikora June 22, 2018 16:56
@lizan
Copy link
Member

lizan commented Jun 26, 2018

@JimmyCYJ can you take a look on test failures?

@PiotrSikora @mattklein123 PTAL once test passes.

@mattklein123
Copy link
Member

Please ping me when ready.

@mattklein123 mattklein123 self-assigned this Jun 26, 2018
@JimmyCYJ
Copy link
Member Author

@lizan both coverage and ipv6_tests show that some directories do not exist. I am not clear how to fix them. Have you seen that before?
"bazel-out/k8-dbg/genfiles/external/com_lyft_protoc_gen_validate: warning: directory does not exist.
bazel-out/k8-dbg/genfiles/external/com_google_protobuf/src: warning: directory does not exist.
bazel-out/k8-dbg/genfiles/external/googleapis: warning: directory does not exist."

@lizan
Copy link
Member

lizan commented Jun 27, 2018

@JimmyCYJ you can ignore those warnings, download full log. The coverage failure is due to the test coverage is 97.9%, which is below envoy threshold 98.0%.

See https://67413-65214191-gh.circle-artifacts.com/0/build/envoy/generated/coverage/coverage.html for detail reports.


class SecretManagerUtil {
public:
virtual ~SecretManagerUtil() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not need for this destructor. or just change SecretManagerUtil to namespace

* @param config_source envoy::api::v2::core::ConfigSource.
* @return hash code.
*/
static std::string configSourceHash(const envoy::api::v2::core::ConfigSource& config_source) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that it is only used in one place, so not need to have it as a separate file. call in inside the findOrCreateDynamicSecretProvider.

@@ -102,6 +68,8 @@ ContextConfigImpl::ContextConfigImpl(const envoy::api::v2::auth::CommonTlsContex
tlsVersionFromProto(config.tls_params().tls_minimum_protocol_version(), TLS1_VERSION)),
max_protocol_version_(
tlsVersionFromProto(config.tls_params().tls_maximum_protocol_version(), TLS1_2_VERSION)) {
readConfig(config);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename it to readCertChainConfig

private:
class InitManager : public Init::Manager {
public:
void initialize(std::function<void()> callback);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this function defined?


std::string yaml =
R"EOF(
name: "abc.com"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we combine this static secret config with overall config defined in line 307? I think it will be easier to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion, I am going to keep it this way.


TEST_F(SdsApiTest, BasicTest) {
::testing::InSequence s;
Server::MockInstance server;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrap it with NiceMock<> to eliminate these warnings:

Uninteresting mock function call - taking default action specified at:
test/mocks/server/mocks.cc:124:
Function call: random()
Returns: @0x7ffff7fc3ff8 8-byte object <F8-F5 C6-04 00-00 00-00>
unknown file: Failure

@JimmyCYJ
Copy link
Member Author

JimmyCYJ commented Jul 3, 2018

@mattklein123, would you mind to take a look? Thanks!

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. Some comments to get started. I didn't review tests yet. Please make sure you have 100% test coverage for all the code you added. Thank you!

/**
* An interface to fetch dynamic secret.
*/
class DynamicSecretProvider {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the plan here to add other secret types? Otherwise should this be named DynamicTlsCertificateSecretProvider or similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we plan to support other secret types, https://github.com/envoyproxy/data-plane-api/blob/master/envoy/api/v2/auth/cert.proto#L320. Is it okay to add TODO here and keep the name?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit different from the static secret API, so I would match what we did there and make these typed, or change static to be similar.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Then I will rename it to DynamicTlsCertificateSecretProvider. Thanks!

* @return the dynamic secret provider.
*/
virtual DynamicSecretProviderSharedPtr
findOrCreateDynamicSecretProvider(const envoy::api::v2::core::ConfigSource& config_source,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per comment above should this be specific to TlsCertificates for now like the static versions?

/**
* @return true of the config is valid.
*/
virtual bool isValid() const PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is going to be used in next PR. Please refer to our high level design PR(https://github.com/envoyproxy/envoy/pull/3748/files). For example, ContextManagerImpl::createSslClientContext() needs this method to decide whether to create ssl client context.
I find it is convenient to use this method in tests when I add tests into context_impl_test.cc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I don't have time to go through the other change. It seems a bit odd to have an isValid() on this. The context should always be valid or it should throw an exception, so I would change the logic and remove this. Is there something I'm missing here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe isValid() is not accurate. It returns true if we have static secrets to create context, or we have created a dynamic secret provider and the provider has downloaded secrets already. We are going to check if secrets are ready or not before creating any context objects. If we have created a secret provider but that provider has not downloaded any secret, then we should not create context. Does that make sense? Is it better to rename it to isSecretAvailable()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will you do if the secrets aren't ready? Can this ever happen with warming? I don't think it should ever happen...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are considering that if the gRPC connection breaks when envoy is waiting for secrets, then maybe we need isValid() before creating context. I am going to remove isValid() from this PR, and come up with a better idea in the next PR.

/**
* @return the DynamicSecretProvider object.
*/
virtual Secret::DynamicSecretProvider* getDynamicSecretProvider() const PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this return a const pointer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to high level design PR(https://github.com/envoyproxy/envoy/pull/3748/files), this method is needed for adding and removing callbacks. Perhaps I can remove it from this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove code that is not used in this PR. It's hard to review without context.


void SecretManagerImpl::addOrUpdateSecret(const envoy::api::v2::auth::Secret& secret) {
std::string configSourceHash(const envoy::api::v2::core::ConfigSource& config_source) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the advice! I will use MessageUtil::hash() to replace this hash function.

return;
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we need to throw some type of config error here if neither of the above cases are true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is okay to do nothing here. Without this change, the readConfig() returns empty string.
https://github.com/envoyproxy/envoy/blob/master/source/common/ssl/context_config_impl.cc#L19
In that case cert_chain_ and private_key_ are empty. I am trying to make this readCertChainConfig() consistent with existing behavior.

const envoy::api::v2::core::ConfigSource sds_config_;
std::unique_ptr<Config::Subscription<envoy::api::v2::auth::Secret>> subscription_;
std::function<void()> initialize_callback_;
std::string sds_config_name_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const


// Manages pairs of secret name and Ssl::TlsCertificateConfig.
std::unordered_map<std::string, Ssl::TlsCertificateConfigPtr> static_tls_certificate_secrets_;
mutable std::shared_timed_mutex static_tls_certificate_secrets_mutex_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this lock needed? AFAICT all processing in this area should happen on the main thread.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock will be removed.


// map hash code of SDS config source and SdsApi object.
std::unordered_map<std::string, std::weak_ptr<DynamicSecretProvider>> dynamic_secret_providers_;
mutable std::shared_timed_mutex dynamic_secret_providers_mutex_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this lock needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not needed. I will remove this lock and the one above.

mutable std::shared_timed_mutex static_tls_certificate_secrets_mutex_;

// map hash code of SDS config source and SdsApi object.
std::unordered_map<std::string, std::weak_ptr<DynamicSecretProvider>> dynamic_secret_providers_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you handle removal? If there is no removal do we need a weak pointer here? Though it seems like we should handle removal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the PR description, that covers the lifetime and ownership of DynamicTlsCertificateSecretProvider. This DynamicTlsCertificateSecretProvider object will be removed if nobody is using it. The weak_ptr entry will remain in this map.

@JimmyCYJ
Copy link
Member Author

JimmyCYJ commented Jul 4, 2018

@mattklein123 PTAL, thanks!

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more comments to work through, then will take another pass.

*/
virtual DynamicTlsCertificateSecretProviderSharedPtr
findOrCreateDynamicTlsCertificateSecretProvider(
const envoy::api::v2::core::ConfigSource& config_source, std::string config_name) PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const std::string& config_name

@@ -5,6 +5,7 @@
#include <vector>

#include "envoy/common/pure.h"
#include "envoy/secret/dynamic_secret_provider.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this needed?

@@ -1,22 +1,37 @@
#pragma once

#include <shared_mutex>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

del

SdsApi::SdsApi(Server::Instance& server, const envoy::api::v2::core::ConfigSource& sds_config,
std::string sds_config_name)
: server_(server), sds_config_(sds_config), sds_config_name_(sds_config_name), secret_hash_(0) {
server_.initManager().registerTarget(*this);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe this is correct in all cases. For listener secrets, you are going to need to use the listener init manager, not the server manager. Likewise for cluster secrets you need to use the cluster init manager (which might be the server during startup). You should remove all references to server in these files are pass the init manager directly.

Copy link
Member Author

@JimmyCYJ JimmyCYJ Jul 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out.

I think we can add this method into SdsApi,

void SdsApi::registerInitTarget(Init::Manager& init_manager) {
  init_manager.registerTarget(*this);
}

and add secret_provider_->registerInitTarget(init_manager) into ContextConfigImpl::readCertChainConfig().

We can add accessor virtual Init::Manager&initManager() PURE into Configuration::TransportSocketFactoryContext.

  1. Pass listener init manager to ContextConfigImpl.
    As ListenerImpl inherits TransportSocketFactoryContext and provides initManager(), we can modify ServerContextConfigImpl(), and pass context.initManager() into ServerContextConfigImpl() via DownstreamSslSocketFactory::createTransportSocketFactory(), and then pass initManager to ContextConfigImpl().

  2. Pass cluster init manager to ContextConfigImpl.
    To pass cluster init manager to ClientContextConfigImpl, we can implement ClusterInfoImpl::initManager(), as ClusterInfoImpl inherits Server::Configuration::TransportSocketFactoryContext(). Then pass initManager to ClientContextConfigImpl() via UpstreamSslSocketFactory::createTransportSocketFactory().

Does this make sense? I haven't figured out an convenient way to pass initManager to ClusterInfoImpl. Not sure if it is okay to just let ClusterInfoImpl::initManager() return secret_manager_.Server() (assume that we expose SecretManager::server_). Do you have any suggestions? Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we definitely cannot add any concept of InitManager to ClusterInfo. The init manager stuff must happen on the main thread only and cluster info is a multi-threaded concept. I would need to sit down with the code in detail to figure out the right way to handle this, but it's possible that the overall design will need rethinking. In the mean time I would consult with @lizan and @PiotrSikora and see if they can help you.

Copy link
Member Author

@JimmyCYJ JimmyCYJ Jul 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattklein123 @lizan and @PiotrSikora I have added a new member sds_init_manager_ into ClusterImplBase, and register SdsApi target at sds_init_manager_. sds_init_manager_ initializes registered targets when ClusterImplBase::onPreInitComplete() is called.

Both cluster init manager and listener manager are passed to TransportSocketFactoryContext, so that ContextConfigImpl could use init manager to create SdsApi objects.

: secret_manager_(secret_manager),
init_manager_(init_manager),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If context_config object is holding a reference to init_manager, need to make sure its life time is longer than context_config. For LDS, its initManager is the whole ListenerImpl, which is fine. Not sure about ClusterManagerImpl initManager, what is its lifetime?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per offline discussion, this init_manager_ is now removed from ContextConfigImpl. Only the ClusterInfoImpl owns a reference to ClusterImplBase::sds_init_manager_.

We registered SDS api at this init manager within the constructor of ClusterInfoImpl, and ClusterInfoImpl is created by the constructor of ClusterImplBase.

This ClusterImplBase::sds_init_manager_ is only called to initialize SDS api within ClusterImplBase::onPreInitComplete(). Nobody is using it once ClusterImplBase is destructed. I think it is safe.

@@ -474,7 +481,8 @@ class ClusterImplBase : public Cluster, protected Logger::Loggable<Logger::Id::u
* over and determines if there is an initial health check pass needed, etc.
*/
void onPreInitComplete();


Server::InitManagerImpl sds_init_manager_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add comment to state that this init_manager is only used by SDS api.

@@ -506,6 +508,8 @@ void ClusterImplBase::onPreInitComplete() {
pending_initialize_health_checks_ += host_set->hosts().size();
}

sds_init_manager_.initialize([]() -> void {});
Copy link
Contributor

@qiwzhang qiwzhang Jul 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is wrong.

How about split the ClusterImplBase::onPreInitComplete into following two functions:

ClusterImplBase::onPreInitComplete() {
   sds_init_manager_.initialize( [this]() { onSdsInitDone(); });
}

ClusterImplBase::onSdsInitDone() {
// exact same code as onPreInitComplete
}

@mattklein123
Copy link
Member

I'm going to look at this change in detail later today. Something about all of this does not sound right to me, but until I look in detail I don't have a lot to say about it.

* Called by every concrete cluster when pre-init is complete. At this point, shared init takes
* over and determines if there is an initial health check pass needed, etc.
* Called by every concrete cluster when pre-init is complete. At this point, SDS init manager
* initializes every registered SDS api target.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update the comment as:
At this point, shared init starts sds_init_manager_ initialization and determines if there is an initial health check pass needed, etc.

@mattklein123
Copy link
Member

@JimmyCYJ @lizan @PiotrSikora I went through the code. Largely, I think we are in good shape, but there is something I think we need to change to not confuse things in the future. I don't think it's a super fundamental design change so I think it shouldn't be too hard to do. I'm going to describe my concerns and the solutions below.

Primarily, the cluster init manager has be owned by the cluster manager, and not the cluster info. The cluster manager is actually where init happens and in the future we might add other things that require init at this level. So we we want to add an init manager concept within the cluster manager, and pass that through as needed.

Concretely, there are two changes I would like to make:

  1. Instead of creating an SDS specific init manager within cluster info, I would like you to handle it at the cluster manager level. Fundamentally, you are going to have to plumb an InitManager into the loadCluster calls (e.g., https://github.com/envoyproxy/envoy/blob/master/source/common/upstream/cluster_manager_impl.cc#L383). Take a look at listener_manager_impl in terms of how we sometimes use the server/global init manager and sometimes we use a dynamic/local init manager. You will need to do the same thing here with almost identical logic. Right now in the cluster manager we just use the cluster init callback to move things forward, so I would take a look at the logic and see what makes the most sense in terms of how to manage the flow here (does cluster manager own it all, is it split a bit between CM and the cluster, etc.). There are other additional things to think about here. For example, should we allow SDS configuration for clusters in phase 1 init (static, etc.) since then we can get into a deadlock situation and never initialize? Because of how tricky this is, I would like you to start a new PR that just adds init manager support to cluster manager and plumbs it through, with tests. Then we can sort this out. Once that part is in place I think the rest of this PR is pretty straightforward.

  2. This is pre-existing code, but I would prefer it if we change ClusterInfoImpl to not derive from Server::Configuration::TransportSocketFactoryContext. I think it's a little confusing. AFAICT, there is only needed in the constructor where we call createTransportSocketFactory. Can we instead stack allocate a factory context for the transport socket and just call it with constructor args? I think this will simplify things and avoid storing extra state that we don't need and might be abused later.

Thank you for working on this and please LMK if this makes sense or if you need additional clarification before proceeding. Hopefully, @lizan can also help you more directly with this set of changes.

dynamic_secret_providers_[map_key] = dynamic_secret_provider;
}

for (auto it = dynamic_secret_providers_.begin(); it != dynamic_secret_providers_.end();) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract this part to a function, use weak_ptr::expired and std::remove_if.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have extracted this part into SecretManagerImpl::removeDeletedSecretProvider(), and use weak_ptr::expired() instead of lock(). Thanks!

@@ -434,14 +414,52 @@ ClusterSharedPtr ClusterImplBase::create(const envoy::api::v2::Cluster& cluster,
return std::move(new_cluster);
}

Stats::ScopePtr ClusterImplBase::generateStatsScope(const envoy::api::v2::Cluster& config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not need to be a member function. It can be a global function in the annoymous namespace.

: std::string(config.alt_stat_name())));
}

Network::TransportSocketFactoryPtr ClusterImplBase::createTransportSocketFactory(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be a global function inside annoymous namespace

Stats::IsolatedStoreImpl load_report_stats_store_;
mutable ClusterLoadReportStats load_report_stats_;
Network::TransportSocketFactoryPtr transport_socket_factory_;
Stats::ScopePtr stats_scope_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is stats_scope_ needed in this class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need this to own the scope, it is passed by rvalue-ref.

createTransportSocketFactory(cluster, *stats_scope_.get(),
ssl_context_manager, secret_manager,
init_manager_),
stats_scope_.release(), added_via_api)) {
Copy link
Contributor

@qiwzhang qiwzhang Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need following in the ClusterImplBase constuctor

 ClusterImplBase:: ClusterImplBase(...) {
    auto stat_scope = generateStatsScope(cluster, stats);
    auto socket_factory = createTransportSocketFactory(cluster, *stat_scope, ...);
    info_ = std::make_unique<ClusterInfoImpl>(...,  socket_factory,  std::move(stat_scope), ...);
...
}

@qiwzhang
Copy link
Contributor

@lizan @PiotrSikora This PR is ready for review. Could you help to review it? Thanks

* @return true if the config is valid. Only when SDS dynamic secret is needed, but has not been
* downloaded yet, the config is invalid.
*/
virtual bool isValid() const PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would isReady be better name for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1


private:
std::unordered_map<std::string, Ssl::TlsCertificateConfigPtr> tls_certificate_secrets_;
void removeDeletedSecretProvider();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removeExpiredSecretProvider?

public:
ClusterInfoImpl(const envoy::api::v2::Cluster& config,
const envoy::api::v2::core::BindConfig& bind_config, Runtime::Loader& runtime,
Stats::Store& stats, Ssl::ContextManager& ssl_context_manager,
Secret::SecretManager& secret_manager, bool added_via_api);
Network::TransportSocketFactoryPtr&& socket_factory,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qiwzhang @JimmyCYJ Sorry I might miss some context for this PR during vacation, what is the reason to refactor this and create TransportSocketFactoryContextImpl?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lizan sorry for the missing context. I had them do this. Before we were passing a bunch of stuff into ClusterInfo and also exposing accessors, which IMO was pretty confusing as Info is multi-threaded where is the cluster base is only for the main thread. IMO what is done now is clearer and easier to understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation. I'm OK with this, just was curious why :)

Stats::IsolatedStoreImpl load_report_stats_store_;
mutable ClusterLoadReportStats load_report_stats_;
Network::TransportSocketFactoryPtr transport_socket_factory_;
Stats::ScopePtr stats_scope_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need this to own the scope, it is passed by rvalue-ref.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JimmyCYJ @qiwzhang for your hard work. This is close. Just some more questions and minor comments.

@@ -54,18 +55,29 @@ class ContextConfigImpl : public virtual Ssl::ContextConfig {
unsigned minProtocolVersion() const override { return min_protocol_version_; };
unsigned maxProtocolVersion() const override { return max_protocol_version_; };

bool isValid() const override {
// either secret_provider_ is nullptr or secret_provider_->secret() is NOT nullptr.
return !secret_provider_ || secret_provider_->secret();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this function actually end up getting called. Is it always on the main thread? Or will it be called from multiple threads? I'm not sure this is thread safe. Can you explain the full flow? This ends up being quite confusing.

Copy link
Contributor

@qiwzhang qiwzhang Jul 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is called from the main thread. It is called in the ContextManagerImpl::createClientContext() and createServerContext(), if pass-in ContextConfig is not valid, return nullptr for SocketFactory.

In the next PR, SocketFactory will register a UpdateCallback to SDS_api, if SDS secret is fetched, it will call the onUpdate(), SocketFactory will call ContextManagerImpl::createClientContext() again to create a valid Context if ContextConfig is valid. This is called in the main thread too since SDS_api onConfigUpdate will be called in the main thread.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks. I think we will need proper locking around the shared_ptr swaps and we might want to use TLS at a later time, but this should work for now and we can figure it out in the follow up PR.

public:
ClusterInfoImpl(const envoy::api::v2::Cluster& config,
const envoy::api::v2::core::BindConfig& bind_config, Runtime::Loader& runtime,
Stats::Store& stats, Ssl::ContextManager& ssl_context_manager,
Secret::SecretManager& secret_manager, bool added_via_api);
Network::TransportSocketFactoryPtr&& socket_factory,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lizan sorry for the missing context. I had them do this. Before we were passing a bunch of stuff into ClusterInfo and also exposing accessors, which IMO was pretty confusing as Info is multi-threaded where is the cluster base is only for the main thread. IMO what is done now is clearer and easier to understand.

*/
void onPreInitComplete();

/**
* Called by every concrete cluster after all Sds api targets registered at SDS init manager are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove SDS from comments, this is generic for the future.

@@ -22,6 +22,10 @@ void ContextManagerImpl::releaseContext(Context* context) {

ClientContextPtr ContextManagerImpl::createSslClientContext(Stats::Scope& scope,
const ClientContextConfig& config) {
if (!config.isValid()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this going to work on the future? Can you take me through the logic that is coming in the follow up update PRs so I understand? Right now, this will return a null context which is fine, things will fail (and we can have integration tests in this PR just proving things fail). But in the future, how is the context going to get updated inside the transport factory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from my other comment.

In the next PR, SocketFactory will register a UpdateCallback to SDS_api, if SDS secret is fetched or updated, it will call the SocketFactory::onUpdate(), SocketFactory will call ContextManagerImpl::createClientContext() to create a valid Context if ContextConfig is valid. This is called in the main thread too since SDS_api::onConfigUpdate will be called in the main thread.

@mattklein123
Copy link
Member

@JimmyCYJ @qiwzhang @lizan two other things that we need (can be done as follow up PRs):

  • SDS API specific stats (secrets added, secrets updated, secrets removed, secrets total, etc.)
  • The block we discussed so that if a cluster type other than EDS tries to use SDS it will fail for now.

Can you please add some TODOs to this PR for ^ and we can do both of these in follow ups? Thank you.

@JimmyCYJ
Copy link
Member Author

@mattklein123 Thanks for the advice, will add TODO in this PR.

@stale
Copy link

stale bot commented Jul 20, 2018

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jul 20, 2018
@lizan lizan added no stalebot Disables stalebot from closing an issue and removed stale stalebot believes this issue/PR has not been touched recently labels Jul 20, 2018
@stale
Copy link

stale bot commented Jul 30, 2018

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Jul 30, 2018
Signed-off-by: Wayne Zhang <qiwzhang@google.com>
@stale stale bot removed the stale stalebot believes this issue/PR has not been touched recently label Jul 31, 2018
@qiwzhang
Copy link
Contributor

Just updated this PR with all changes needed to support dynamic SDS.

This PR only serves as a reference PR for complete implementation. Its changes will be split into smaller PR to actual code review.

@stale
Copy link

stale bot commented Aug 8, 2018

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Aug 8, 2018
@mattklein123 mattklein123 removed the stale stalebot believes this issue/PR has not been touched recently label Aug 8, 2018
@mattklein123
Copy link
Member

@JimmyCYJ @lizan can we close this now? I think we have the last PR up and this doesn't serve much purpose anymore?

@JimmyCYJ JimmyCYJ closed this Aug 16, 2018
@JimmyCYJ
Copy link
Member Author

JimmyCYJ commented Aug 16, 2018

@mattklein123 Yes, I am closing it. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
no stalebot Disables stalebot from closing an issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants