Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement nbthreads for haproxy router #19975

Merged
merged 1 commit into from
Jul 10, 2018

Conversation

imcsk8
Copy link
Contributor

@imcsk8 imcsk8 commented Jun 12, 2018

@imcsk8 imcsk8 requested review from knobunc and removed request for ironcladlou and pweil- June 12, 2018 08:09
@openshift-ci-robot openshift-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 12, 2018
@imcsk8 imcsk8 requested a review from rajatchopra June 12, 2018 08:10
@imcsk8
Copy link
Contributor Author

imcsk8 commented Jun 12, 2018

PTAL @openshift/sig-networking

@knobunc
Copy link
Contributor

knobunc commented Jun 12, 2018

/hold
This is a 3.11 feature. Holding until the branch opens.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 12, 2018
@@ -33,6 +33,11 @@

global
maxconn {{env "ROUTER_MAX_CONNECTIONS" "20000"}}
{{- if isTrue (env "ROUTER_ENABLE_TRHEADS") }}
nbthread {{env "ROUTER_THREADS" "4"}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just do it based on whether ROUTER_THREADS is defined and > 1, and then default to 1 thread.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, ROUTER_ENABLE_TRHEADS seems redundant.

@@ -308,6 +311,7 @@ func NewCmdRouter(f *clientcmd.Factory, parentName, name string, out, errout io.
cmd.Flags().StringVar(&cfg.Ciphers, "ciphers", cfg.Ciphers, "Specifies the cipher suites to use. You can choose a predefined cipher set ('modern', 'intermediate', or 'old') or specify exact cipher suites by passing a : separated list. Not supported for F5.")
cmd.Flags().BoolVar(&cfg.StrictSNI, "strict-sni", cfg.StrictSNI, "Use strict-sni bind processing (do not use default cert). Not supported for F5.")
cmd.Flags().BoolVar(&cfg.Local, "local", cfg.Local, "If true, do not contact the apiserver")
cmd.Flags().StringVar(&cfg.Threads, "threads", cfg.Threads, "Specifies the amount of threads for the haproxy router.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifies the number of threads... (number instead of amount)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use cmd.Flags().Int32Var() instead of cmd.Flags().StringVar() then you don't need to do integer validations for this param.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a max thread limit? (very high thread count does not seem practical)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i haven't tested the thread limit, i'll check that

{{- if ne (env "ROUTER_THREADS") "" }}
nbthread {{env "ROUTER_THREADS"}}
{{- end }}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be simplified by setting the default threads to 1 and removing the conditional entirely, assuming the controller code is the only consumer of this template

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not have the nbthreads word in the config file if it is not being used... the feature is beta in haproxy, so while it should be equivalent, who knows for sure?

But why not use 1 as the default for env and say gt 1?
{{- if gt (env "ROUTER_THREADS" 1) 1 }}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree re: the beta status, good point. Regarding the conditional, the way you suggest here makes more sense once the data type is corrected to be an int.

@@ -228,6 +228,9 @@ type RouterConfig struct {
// Strict SNI (do not use default cert)
StrictSNI bool

// Number of threads to start per process
Threads string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be an int

{{- if ne (env "ROUTER_THREADS") "" }}
nbthread {{env "ROUTER_THREADS"}}
{{- end }}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather not have the nbthreads word in the config file if it is not being used... the feature is beta in haproxy, so while it should be equivalent, who knows for sure?

But why not use 1 as the default for env and say gt 1?
{{- if gt (env "ROUTER_THREADS" 1) 1 }}

Copy link

@pravisankar pravisankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also needs corresponding docs pr

@@ -33,6 +33,11 @@

global
maxconn {{env "ROUTER_MAX_CONNECTIONS" "20000"}}
{{- if ne (env "ROUTER_THREADS") "" }}
nbthread {{env "ROUTER_THREADS"}}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want this for ROUTER_THREADS > 1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be configuring cpu-map here to make sure threads are mapped to cpu sets (ensuring the workload is distributed)? If we detect the number of cores and set nbthread to that there is an auto setting.

ref

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpu-map needs nbproc, this could be more complex , but it might be possible since we're already reading processor information.
we could have something like this

nbproc {{env "ROUTER_NUM_CPUS"}}
cpu-map auto:/all 0-{{env "ROUTER_NUM_CPUS" - 1}} this operation is not valid on the template is just as an example
nbthread {{env "ROUTER_THREADS"}}

@@ -5949,6 +5949,8 @@ _oc_adm_router()
local_nonpersistent_flags+=("--disable-namespace-ownership-check")
flags+=("--dry-run")
local_nonpersistent_flags+=("--dry-run")
flags+=("--enable-threads")
local_nonpersistent_flags+=("--enable-threads")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to re-generate bash completions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah i forgot to run the completions again

@@ -308,6 +311,7 @@ func NewCmdRouter(f *clientcmd.Factory, parentName, name string, out, errout io.
cmd.Flags().StringVar(&cfg.Ciphers, "ciphers", cfg.Ciphers, "Specifies the cipher suites to use. You can choose a predefined cipher set ('modern', 'intermediate', or 'old') or specify exact cipher suites by passing a : separated list. Not supported for F5.")
cmd.Flags().BoolVar(&cfg.StrictSNI, "strict-sni", cfg.StrictSNI, "Use strict-sni bind processing (do not use default cert). Not supported for F5.")
cmd.Flags().BoolVar(&cfg.Local, "local", cfg.Local, "If true, do not contact the apiserver")
cmd.Flags().StringVar(&cfg.Threads, "threads", cfg.Threads, "Specifies the amount of threads for the haproxy router.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use cmd.Flags().Int32Var() instead of cmd.Flags().StringVar() then you don't need to do integer validations for this param.

@@ -308,6 +311,7 @@ func NewCmdRouter(f *clientcmd.Factory, parentName, name string, out, errout io.
cmd.Flags().StringVar(&cfg.Ciphers, "ciphers", cfg.Ciphers, "Specifies the cipher suites to use. You can choose a predefined cipher set ('modern', 'intermediate', or 'old') or specify exact cipher suites by passing a : separated list. Not supported for F5.")
cmd.Flags().BoolVar(&cfg.StrictSNI, "strict-sni", cfg.StrictSNI, "Use strict-sni bind processing (do not use default cert). Not supported for F5.")
cmd.Flags().BoolVar(&cfg.Local, "local", cfg.Local, "If true, do not contact the apiserver")
cmd.Flags().StringVar(&cfg.Threads, "threads", cfg.Threads, "Specifies the amount of threads for the haproxy router.")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a max thread limit? (very high thread count does not seem practical)

@@ -228,6 +228,9 @@ type RouterConfig struct {
// Strict SNI (do not use default cert)
StrictSNI bool

// Number of threads to start per process
Threads int

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we wanted to use fixed width integers even for internal types, i.e. int32/64 over int?
Refer: #9366 and 865faaf
@knobunc @ironcladlou

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, although I don't think we're dealing with formal versioned API here... no objection to uint32 regardless

@@ -308,6 +311,7 @@ func NewCmdRouter(f *clientcmd.Factory, parentName, name string, out, errout io.
cmd.Flags().StringVar(&cfg.Ciphers, "ciphers", cfg.Ciphers, "Specifies the cipher suites to use. You can choose a predefined cipher set ('modern', 'intermediate', or 'old') or specify exact cipher suites by passing a : separated list. Not supported for F5.")
cmd.Flags().BoolVar(&cfg.StrictSNI, "strict-sni", cfg.StrictSNI, "Use strict-sni bind processing (do not use default cert). Not supported for F5.")
cmd.Flags().BoolVar(&cfg.Local, "local", cfg.Local, "If true, do not contact the apiserver")
cmd.Flags().IntVar(&cfg.Threads, "threads", cfg.Threads, "Specifies the amount of threads for the haproxy router.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why we would configure this - I think we want it to be autodetected by default. Probably based on the number of cores configured for this pod.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a generally correct value we should be choosing, then we should have it autodetect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could set as default half or a quarter of the max number of threads per process (/proc/sys/kernel/threads-max) but i think that we should allow the sysadmin to tune this setting to suit his needs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Auto-detecting a default sounds great, but what I wonder is if we need to also provide a way to override the default; if so, then the defaulting logic could be a followup. If not an explicit value configuration, would we at least want an overall feature flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add the --enable-threads flag, if no --threads flag is present a default is used for the nbthreads setting.
If the --threads flag is set just check that it's value is not higher than the one in /proc/sys/kernel/threads-max

Another option could be to only have the --threads flag, if no value is set use the default but i think this could be confusing

@@ -655,6 +659,7 @@ func RunCmdRouter(f *clientcmd.Factory, cmd *cobra.Command, out, errout io.Write
"STATS_PORT": strconv.Itoa(cfg.StatsPort),
"STATS_USERNAME": cfg.StatsUsername,
"STATS_PASSWORD": cfg.StatsPassword,
"ROUTER_THREADS": strconv.Itoa(cfg.Threads),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would prefer we not set this here. I don't see why a user would set this flag, and not just expect us to do the right thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the right number for the threads? How do we know what else they want to use the machine for? Or are you suggesting we pick it up from the container limits? The other problem there is that the number of processes that are running will vary if they have long-running connections and have router reloads.

I really don't want to enable this by default yet, so we at least need a way to disable it. I'm fine with an "auto" setting for num threads, but I suspect we will need a way to tune it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on auto setting the default num threads based on pod cpu limits and allow this value to be overridden via env var or some other mechanism.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without knowing how effective our auto-detected number is in practice, does it make any sense to add an explicitly unsupported parameter to give an escape hatch and allow us to test tunings without a new build?

When we have higher confidence our auto-detected algo is solid we could talk about removing the override. There is a bit of UX complexity there in explaining the mutual exclusivity or precedence order of the auto/explicit options, but if the override is considered unsupported/debugging-only maybe it isn't as big an issue?

@openshift-ci-robot openshift-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jun 22, 2018
"ROUTER_ENABLE_THREADS": strconv.FormatBool(cfg.EnableThreads),
}

if cfg.EnableThreads {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should an error be returned if a number of threads is specified but threads are not enabled?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, an error or at least a warning that the --threads is ignored when --enable-threads is not set.

"ROUTER_ENABLE_THREADS": strconv.FormatBool(cfg.EnableThreads),
}

if cfg.EnableThreads {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, an error or at least a warning that the --threads is ignored when --enable-threads is not set.

}

if cfg.EnableThreads {
if cfg.Threads >= 1 {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add validations for cfg.Threads (check >= minThreads and <= maxThreads)?

cpuinfo := readProcFile("/proc/cpuinfo")
siblings, _ := strconv.Atoi(cpuinfo["siblings"])
cores, _ := strconv.Atoi(cpuinfo["cpucores"])
threads := siblings * cores

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we want to consider CPU requests/limits specified by the router pod/container?
https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/#specify-a-cpu-request-and-a-cpu-limit

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the router is running as a privileged container, /proc/cpuinfo will return container or host limits?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the router is running as a privileged container, /proc/cpuinfo will return container or host limits?

Worse than that, it will be /proc/cpuinfo on the host where the administrator runs oc adm router, which is probably not the node host. Calculating the default number of threads really needs to be done in pkg/router/template/router.go, not pkg/oc/admin/router/router.go.

@@ -257,6 +265,7 @@ var helperFunctions = template.FuncMap{
"env": env, //tries to get an environment variable, returns the first non-empty default value or "" on failure
"matchPattern": matchPattern, //anchors provided regular expression and evaluates against given string
"isInteger": isInteger, //determines if a given variable is an integer
"toInt": toInt, //Convert string to integer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call it toInteger in the map for consistency with isInteger?

func toInt(s string) int {
ret, err := strconv.Atoi(s)
if err != nil {
return 0
Copy link
Contributor

@Miciah Miciah Jul 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is becoming a part of the informal API for router templates, we should consider how it might be used or expected to work in the general case. Is returning 0 for invalid input a sensible default? Or should we return nil? For example, we could do the following:

func toInt(s string) *int {
	ret, err := strconv.Atoi(s)
	if err != nil {
		return nil
	}
	return &ret
}

Which could be used as follows:

{{- with $threads := toInt (env "ROUTER_THREADS") }}
  {{- if gt $threads 0 }}
    nbthread {{ $threads }}
  {{- end }}
{{-  end }}

The part inside the with block will get skipped entirely if strconv.Atoi returns an error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe document it as private (FWIW)?

I think this is another example of where a new field on an object passed to the template helps. There's no reason to do conversion inside the template itself: why would a user ever want to treat ROUTER_THREADS as anything but an int? If I were making a custom template and had to deal with this conversion manually I'd be annoyed

@knobunc
Copy link
Contributor

knobunc commented Jul 3, 2018

I'm not sure that we want to add all this complexity... I was good with the ENV as long as we changed gt to ne... @ironcladlou your call though.

@ironcladlou
Copy link
Contributor

I'm fine with the ne solution in #19975 (comment) given our reluctance to modify the data struct or add functions

@imcsk8
Copy link
Contributor Author

imcsk8 commented Jul 3, 2018

I find the ne solution more complex than adding a conversion function which simplifies the logic on the template

@ironcladlou
Copy link
Contributor

@imcsk8

I find the ne solution more complex than adding a conversion function which simplifies the logic on the template

I agree, however using the ne solution reduces the amount of new "implicit API surface area" by 1, so that gets my vote (although I would really like to have a separate convo about the API going forward)

@imcsk8
Copy link
Contributor Author

imcsk8 commented Jul 4, 2018

Ok, i'll change the code for the ne solution

@@ -33,6 +33,12 @@

global
maxconn {{env "ROUTER_MAX_CONNECTIONS" "20000"}}
{{- $threads := env "ROUTER_THREADS"
{{- if ne "" (firstMatch "[1-9][0-9]*" $threads) }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did the clearer alternative not work?

if ne "0" (env "ROUTER_THREADS" "0")

And then adding range validation in the Go code

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, ^^ this looks clean to me.

@@ -33,6 +33,12 @@

global
maxconn {{env "ROUTER_MAX_CONNECTIONS" "20000"}}
{{- $threads := env "ROUTER_THREADS"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing }}.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully that was caught by an automated test...

Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Jul 9, 2018
@knobunc
Copy link
Contributor

knobunc commented Jul 9, 2018

@juanvallejo @enj @soltysh please approve the completions changes.

@knobunc
Copy link
Contributor

knobunc commented Jul 9, 2018

/retest

@juanvallejo
Copy link
Contributor

completion changes lgtm

Copy link

@pravisankar pravisankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold
You need to squash your commits.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 9, 2018
@pravisankar
Copy link

@imcsk8 You preferred regex match over #19975 (comment) because once router dc is generated, user could edit the dc and can fiddle with 'ROUTER_THREADS' env which needs revalidation?

@knobunc
Copy link
Contributor

knobunc commented Jul 10, 2018

@juanvallejo Can you say /approve please so the bot is happy.

@juanvallejo
Copy link
Contributor

/approve

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 10, 2018
Add threading support for router.

Trello: https://trello.com/c/0thyZwBw/25-3-implement-nbthreads-for-haproxy-router-routerperformanceoperations

Add completions

Fix typo

Simplify the threading options

Change argument type

Add enable threads flag

Fix template condition

Remove defaut threads setting

Add proper integer condition for threads

Change thread check condition

Fix typo
@openshift-ci-robot openshift-ci-robot removed the lgtm Indicates that a PR is ready to be merged. label Jul 10, 2018
Copy link
Contributor

@knobunc knobunc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold cancel
/lgtm

@openshift-ci-robot openshift-ci-robot added lgtm Indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Jul 10, 2018
@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: imcsk8, juanvallejo, knobunc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link

@pravisankar pravisankar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. component/networking lgtm Indicates that a PR is ready to be merged. sig/networking size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.