Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOM killed when using the nuclei SDK with the standard templates #4756

Closed
stan-threatmate opened this issue Feb 12, 2024 · 24 comments · Fixed by #4833
Closed

OOM killed when using the nuclei SDK with the standard templates #4756

stan-threatmate opened this issue Feb 12, 2024 · 24 comments · Fixed by #4833
Assignees
Labels
Priority: Critical This should be dealt with ASAP. Not fixing this issue would be a serious error. Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@stan-threatmate
Copy link

Nuclei version:

3.1.8

Current Behavior:

I run the nuclei SDK as part of a binary that is deployed in a linux container (alpine) with memory limits of 8GB and 16GB. I use the standard templates. In both cases it gets OOM killed. Here are the settings I specify.

Rate Limit:             150 per second
Exclude Severity        []string{"info", "low"}
Template Concurrency    25
Host Concurrency        100
Scan Strategy           "template-spray"
Network Timeout         10
Network Retries         2
Disable Host Errors     true
Max Host Errors         15000
Probe Non Http Targets  0
Enable Code Templates   0
Stats                   true

I tried this with 115 and 380 hosts and both are having memory issues. What is causing the high memory utilization? I am saving the results from the nuclei scan in a list. Could the results be so large that they fill in the memory?

I run nuclei like this:

	ne, err := nuclei.NewNucleiEngine(opts...)
	if err != nil {
		return err
	}
	defer ne.Close()

	ne.LoadTargets(liveHosts, n.ProbeNonHttpTargets)
	err = ne.LoadAllTemplates()
	if err != nil {
		return err
	}

	var results []*NucleiResult
	err = ne.ExecuteWithCallback(func(event *output.ResultEvent) {
		// Convert output.ResultEvent into NucleiResult ...
		res := &NucleiResult{...}
		results = append(results, res)
	})

Expected Behavior:

The nuclei SDK should trivially handle scanning hosts with the above settings. It will be great to have an example of the SDK settings that match the default nuclei cli scan settings.

What would be the equivalent settings for the SDK?

nuclei -u example.com

Additionally what settings in the SDK control the memory utilization? It will be good to document those as well.

Steps To Reproduce:

Use the above settings and set up a scan. Watch it take a lot of memory over time. Better if you use 115 (or more) web sites.

Anything else:

@stan-threatmate stan-threatmate added the Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Feb 12, 2024
@AgoraSecurity
Copy link

Have you tried reproducing with the latest version: https://github.com/projectdiscovery/nuclei/releases/tag/v3.1.10 ?

@stan-threatmate
Copy link
Author

stan-threatmate commented Feb 12, 2024

I looked at the change log but I don't see any memory improvements. I can give it a try. Does the SDK settings look good to you? Am I missing something obvious?

Also this is what it looks like in terms of memory utilization:

image

You can clearly see when it was killed. This is for an 8GB container.

@tarunKoyalwar
Copy link
Member

@stan-threatmate , there was a minor change related to js pooling and upgrade in other pd dependencies so please try with latest version or even dev if required.

  1. memory usage/consumption directly correlates with concurrency & other options . last time i ran on 1.2k targets with default concurrency ( i.e template concurrency 25 , host concurrency 25) . can you try running from sdk with this config ?

  2. when there are more than 100 targets i would always recommend using host-spray scan strategy its efficient in many ways

  3. can you include pprof(https://pkg.go.dev/net/http/pprof#hdr-Usage_examples) in your code and share profiles for inflection points ( ex: in above graph it would be a profile in (2-3PM) and seconds profile around (3:30PM) ) [ <- these are interesting/required profile locations of above graph but you would have to choose based on resource usage and manually dump this profile from cli using go tool pprof ]

@stan-threatmate
Copy link
Author

I've been running it all day today with the latest version v3.1.10 but I see the same issues. Also added GOMEMLIMIT=500MiB and GOGC=20 but still ran out of memory even though the GC started to work pretty hard to clear it. I am about to instrument memory profiling and see if I can get some meaningful data.

Also your suggestions in the comment above contradict this document which I used to set the above options:
https://github.com/projectdiscovery/nuclei-docs/blob/main/docs/nuclei/get-started.md

User should select Scan Strategy based on number of targets and Each strategy has its own pros & cons.

  • When targets < 1000 . template-spray should be used . this strategy is slightly faster than host-spray but uses more RAM and doesnot optimally reuse connections.
  • When targets > 1000 . host-spray should be used . this strategy uses less RAM than template-spray and reuses HTTP connections along with some minor improvements and these are crucial when mass scanning.

Concurrency & Bulk-Size

Whatever the scan-strategy is -concurrency and -bulk-size are crucial for tuning any type of scan. While tuning these parameters following points should be noted.

  • If scan-strategy is template-spray
  • -concurrency < bulk-size (Ex: -concurrency 10 -bulk-size 200)
  • If scan-strategy is host-spray
  • -concurrency > bulk-size (Ex: -concurrency 200 -bulk-size 10)

Can you please provide a recommendation on what settings effect the memory consumption the most and what settings effect the speed of execution? For example I've noticed the rate limit option doesn't really play much of a role in the SDK as reported by the stats which print the RPS. I assume the RPS is the request per second as defined by the rate limit?

I'll do some runs with your suggestion: 25 template and host concurrency. I wish there was a way to understand the system resource utilization based on the settings so we can plan for it based on the number of hosts.

@stan-threatmate
Copy link
Author

Here is a pprof from a successful run on a smaller scale:

Showing nodes accounting for 421.32MB, 91.26% of 461.68MB total
Dropped 861 nodes (cum <= 2.31MB)
Showing top 50 nodes out of 159
      flat  flat%   sum%        cum   cum%
   64.17MB 13.90% 13.90%    64.17MB 13.90%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/common/generators.MergeMaps (inline)
   61.90MB 13.41% 27.31%   113.17MB 24.51%  fmt.Errorf
   50.85MB 11.01% 38.32%    51.71MB 11.20%  github.com/projectdiscovery/utils/errors.(*enrichedError).captureStack (inline)
   29.22MB  6.33% 44.65%    29.22MB  6.33%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).responseToDSLMap
   23.67MB  5.13% 49.78%    23.67MB  5.13%  runtime.malg
   19.24MB  4.17% 53.94%    78.68MB 17.04%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*requestGenerator).generateRawRequest
   16.50MB  3.57% 57.52%    16.51MB  3.58%  reflect.New
   15.02MB  3.25% 60.77%    15.02MB  3.25%  github.com/projectdiscovery/utils/maps.(*OrderedMap[go.shape.string,go.shape.[]string]).Set (inline)
   13.11MB  2.84% 63.61%    13.11MB  2.84%  net/http.NewRequestWithContext
   12.07MB  2.61% 66.22%    13.08MB  2.83%  github.com/yl2chen/cidranger.newPrefixTree
      12MB  2.60% 68.83%    12.01MB  2.60%  github.com/syndtr/goleveldb/leveldb/memdb.New
   10.24MB  2.22% 71.04%    10.24MB  2.22%  gopkg.in/yaml%2ev2.(*parser).scalar
    8.12MB  1.76% 72.80%    30.34MB  6.57%  github.com/projectdiscovery/utils/url.ParseURL
       7MB  1.52% 74.32%     8.30MB  1.80%  github.com/projectdiscovery/utils/reader.NewReusableReadCloser
    6.71MB  1.45% 75.77%     6.71MB  1.45%  regexp/syntax.(*compiler).inst (inline)
    6.64MB  1.44% 77.21%     6.64MB  1.44%  strings.(*Builder).grow
    5.93MB  1.28% 78.50%     5.93MB  1.28%  bytes.growSlice
    5.30MB  1.15% 79.64%    29.26MB  6.34%  github.com/projectdiscovery/nuclei/v3/pkg/parsers.ParseTemplate
    5.18MB  1.12% 80.77%    43.54MB  9.43%  github.com/projectdiscovery/nuclei/v3/pkg/templates.parseTemplate
    4.91MB  1.06% 81.83%     4.91MB  1.06%  bytes.(*Buffer).String (inline)
    4.14MB   0.9% 82.73%     4.47MB  0.97%  github.com/ulule/deepcopier.getTagOptions
    3.67MB   0.8% 83.52%     3.67MB   0.8%  reflect.mapassign_faststr0
    3.39MB  0.74% 84.26%    10.92MB  2.37%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http/httpclientpool.wrappedGet
    3.38MB  0.73% 84.99%     7.92MB  1.72%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/utils.generateVariables
    3.25MB   0.7% 85.69%     3.25MB   0.7%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/utils.GenerateDNSVariables
    2.97MB  0.64% 86.34%     2.97MB  0.64%  github.com/projectdiscovery/retryablehttp-go.DefaultReusePooledTransport
    2.82MB  0.61% 86.95%     2.82MB  0.61%  github.com/projectdiscovery/ratelimit.(*Limiter).Take
    2.78MB   0.6% 87.55%     3.74MB  0.81%  github.com/projectdiscovery/nuclei/v3/pkg/model/types/stringslice.(*StringSlice).UnmarshalYAML

Note on the second line how much the fmt.Errorf takes. I expect a ton of errors as show by the stats:

[0:17:35] | Templates: 3891 | Hosts: 8 | RPS: 141 | Matched: 2 | Errors: 144915 | Requests: 149322/159840 (93%)

Also this stat is printed after nuclei has finished but it shows 93% and the stat never stops printing.

@stan-threatmate
Copy link
Author

A profile of a more intense run:

top50
Showing nodes accounting for 1118.57MB, 92.07% of 1214.85MB total
Dropped 1016 nodes (cum <= 6.07MB)
Showing top 50 nodes out of 139
      flat  flat%   sum%        cum   cum%
  356.81MB 29.37% 29.37%   356.81MB 29.37%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/common/generators.MergeMaps (inline)
   95.77MB  7.88% 37.25%    95.77MB  7.88%  runtime.malg
   75.52MB  6.22% 43.47%   313.64MB 25.82%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*requestGenerator).generateRawRequest
   68.30MB  5.62% 49.09%    68.38MB  5.63%  net/http.NewRequestWithContext
   58.39MB  4.81% 53.90%    58.39MB  4.81%  github.com/projectdiscovery/utils/maps.(*OrderedMap[go.shape.string,go.shape.[]string]).Set (inline)
   57.80MB  4.76% 58.66%    63.10MB  5.19%  github.com/ulule/deepcopier.getTagOptions
   42.65MB  3.51% 62.17%   135.82MB 11.18%  github.com/projectdiscovery/utils/url.ParseURL
   29.94MB  2.46% 64.63%    48.56MB  4.00%  fmt.Errorf
   28.74MB  2.37% 67.00%    28.74MB  2.37%  net/textproto.MIMEHeader.Set (inline)
   27.06MB  2.23% 69.22%    32.54MB  2.68%  github.com/projectdiscovery/utils/reader.NewReusableReadCloser
   19.45MB  1.60% 70.83%    19.45MB  1.60%  bytes.(*Buffer).String (inline)
   18.97MB  1.56% 72.39%    18.99MB  1.56%  strings.(*Builder).grow
   18.63MB  1.53% 73.92%    45.56MB  3.75%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/utils.generateVariables
   17.99MB  1.48% 75.40%    18.06MB  1.49%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/utils.GenerateDNSVariables
   17.13MB  1.41% 76.81%    17.15MB  1.41%  reflect.New
   17.02MB  1.40% 78.21%    21.78MB  1.79%  github.com/projectdiscovery/utils/errors.(*enrichedError).captureStack (inline)
   13.82MB  1.14% 79.35%    13.82MB  1.14%  github.com/projectdiscovery/ratelimit.(*Limiter).Take
   13.58MB  1.12% 80.47%    13.60MB  1.12%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).responseToDSLMap
   12.81MB  1.05% 81.52%    13.82MB  1.14%  github.com/yl2chen/cidranger.newPrefixTree
   12.65MB  1.04% 82.56%   116.63MB  9.60%  github.com/projectdiscovery/retryablehttp-go.NewRequestFromURLWithContext
      12MB  0.99% 83.55%    12.01MB  0.99%  github.com/syndtr/goleveldb/leveldb/memdb.New
   11.89MB  0.98% 84.53%    82.14MB  6.76%  github.com/projectdiscovery/utils/url.absoluteURLParser
   11.31MB  0.93% 85.46%    11.31MB  0.93%  github.com/projectdiscovery/utils/maps.NewOrderedMap[go.shape.string,go.shape.[]string] (inline)
   10.92MB   0.9% 86.36%    11.39MB  0.94%  github.com/projectdiscovery/utils/url.NewOrderedParams (inline)
   10.28MB  0.85% 87.21%    10.28MB  0.85%  gopkg.in/yaml%2ev2.(*parser).scalar
    7.35MB  0.61% 87.81%   731.95MB 60.25%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*requestGenerator).Make
    7.33MB   0.6% 88.41%     7.33MB   0.6%  bytes.growSlice
    7.03MB  0.58% 88.99%     7.03MB  0.58%  regexp/syntax.(*compiler).inst (inline)
    5.40MB  0.44% 89.44%    29.48MB  2.43%  github.com/projectdiscovery/nuclei/v3/pkg/parsers.ParseTemplate
    5.31MB  0.44% 89.88%    55.44MB  4.56%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*requestGenerator).generateHttpRequest
    5.08MB  0.42% 90.29%    43.94MB  3.62%  github.com/projectdiscovery/nuclei/v3/pkg/templates.parseTemplate
    4.61MB  0.38% 90.67%    30.72MB  2.53%  fmt.Sprintf
    3.31MB  0.27% 90.95%    11.10MB  0.91%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http/httpclientpool.wrappedGet
    2.81MB  0.23% 91.18%    21.36MB  1.76%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http/raw.readRawRequest
    2.77MB  0.23% 91.40%     6.43MB  0.53%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/common/replacer.Replace
    2.56MB  0.21% 91.62%    74.73MB  6.15%  github.com/projectdiscovery/utils/url.(*OrderedParams).Decode
    1.03MB 0.085% 91.70%    85.59MB  7.05%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).executeRequest
    0.87MB 0.072% 91.77%     8.20MB  0.67%  bytes.(*Buffer).grow
    0.67MB 0.055% 91.83%     9.21MB  0.76%  regexp.compile
    0.60MB 0.049% 91.88%   841.32MB 69.25%  github.com/projectdiscovery/nuclei/v3/pkg/tmplexec/generic.(*Generic).ExecuteWithResults
    0.58MB 0.047% 91.92%     7.34MB   0.6%  github.com/projectdiscovery/retryablehttp-go.NewClient
    0.51MB 0.042% 91.97%    72.52MB  5.97%  net/http.(*Transport).dialConn
    0.50MB 0.041% 92.01%    20.95MB  1.72%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).Compile
    0.29MB 0.024% 92.03%   841.64MB 69.28%  github.com/projectdiscovery/nuclei/v3/pkg/tmplexec.(*TemplateExecuter).Execute
    0.12MB  0.01% 92.04%    69.88MB  5.75%  github.com/projectdiscovery/fastdialer/fastdialer.(*Dialer).DialTLS
    0.11MB 0.0092% 92.05%    45.04MB  3.71%  github.com/projectdiscovery/nuclei/v3/pkg/templates.Parse
    0.09MB 0.0075% 92.06%    65.17MB  5.36%  github.com/projectdiscovery/fastdialer/fastdialer.AsZTLSConfig
    0.08MB 0.0068% 92.06%   809.25MB 66.61%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).executeParallelHTTP
    0.07MB 0.0055% 92.07%     6.47MB  0.53%  net/http.(*persistConn).writeLoop
    0.06MB 0.0051% 92.07%    69.73MB  5.74%  github.com/projectdiscovery/nuclei/v3/pkg/catalog/loader.(*Store).LoadTemplatesWithTags

@stan-threatmate
Copy link
Author

stan-threatmate commented Feb 14, 2024

And this uses what you suggested 25 template/host concurrency:

(pprof) top50
Showing nodes accounting for 1191.23MB, 92.63% of 1285.96MB total
Dropped 960 nodes (cum <= 6.43MB)
Showing top 50 nodes out of 136
      flat  flat%   sum%        cum   cum%
  220.54MB 17.15% 17.15%   403.46MB 31.37%  fmt.Errorf
  182.10MB 14.16% 31.31%   187.27MB 14.56%  github.com/projectdiscovery/utils/errors.(*enrichedError).captureStack (inline)
  177.04MB 13.77% 45.08%   177.04MB 13.77%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/common/generators.MergeMaps (inline)
  104.67MB  8.14% 53.22%   104.82MB  8.15%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).responseToDSLMap
   74.08MB  5.76% 58.98%    81.28MB  6.32%  github.com/ulule/deepcopier.getTagOptions
   70.54MB  5.49% 64.46%    70.54MB  5.49%  runtime.malg
   50.89MB  3.96% 68.42%   208.12MB 16.18%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*requestGenerator).generateRawRequest
   40.33MB  3.14% 71.56%    40.33MB  3.14%  github.com/projectdiscovery/utils/maps.(*OrderedMap[go.shape.string,go.shape.[]string]).Set (inline)
   34.22MB  2.66% 74.22%    34.22MB  2.66%  net/http.NewRequestWithContext
   22.66MB  1.76% 75.98%    22.66MB  1.76%  bytes.growSlice
   21.43MB  1.67% 77.65%    80.65MB  6.27%  github.com/projectdiscovery/utils/url.ParseURL
   18.23MB  1.42% 79.06%    21.67MB  1.69%  github.com/projectdiscovery/utils/reader.NewReusableReadCloser
   17.32MB  1.35% 80.41%    17.32MB  1.35%  bytes.(*Buffer).String (inline)
   17.03MB  1.32% 81.74%    17.03MB  1.32%  reflect.New
   14.07MB  1.09% 82.83%    14.07MB  1.09%  strings.(*Builder).grow
   12.20MB  0.95% 83.78%    13.44MB  1.05%  github.com/yl2chen/cidranger.newPrefixTree
      12MB  0.93% 84.71%    12.02MB  0.93%  github.com/syndtr/goleveldb/leveldb/memdb.New
   10.28MB   0.8% 85.51%    10.28MB   0.8%  gopkg.in/yaml%2ev2.(*parser).scalar
    9.84MB  0.77% 86.28%   201.71MB 15.69%  fmt.Sprintf
    9.06MB   0.7% 86.98%    21.64MB  1.68%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/utils.generateVariables
    9.02MB   0.7% 87.68%     9.02MB   0.7%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/utils.GenerateDNSVariables
    7.67MB   0.6% 88.28%   558.04MB 43.39%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).executeRequest
    7.21MB  0.56% 88.84%     7.21MB  0.56%  strings.genSplit
    6.75MB  0.52% 89.36%     6.75MB  0.52%  github.com/projectdiscovery/ratelimit.(*Limiter).Take
    6.75MB  0.52% 89.89%     6.75MB  0.52%  regexp/syntax.(*compiler).inst (inline)
    5.39MB  0.42% 90.31%    63.35MB  4.93%  github.com/projectdiscovery/retryablehttp-go.NewRequestFromURLWithContext
    5.25MB  0.41% 90.72%    53.95MB  4.20%  github.com/projectdiscovery/utils/url.absoluteURLParser
    5.20MB   0.4% 91.12%    29.08MB  2.26%  github.com/projectdiscovery/nuclei/v3/pkg/parsers.ParseTemplate
    4.90MB  0.38% 91.50%       43MB  3.34%  github.com/projectdiscovery/nuclei/v3/pkg/templates.parseTemplate
    3.44MB  0.27% 91.77%   370.45MB 28.81%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*requestGenerator).Make
    3.27MB  0.25% 92.02%     7.54MB  0.59%  net/http.(*Client).do.func2
    3.13MB  0.24% 92.27%    10.94MB  0.85%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http/httpclientpool.wrappedGet
    1.68MB  0.13% 92.40%    48.86MB  3.80%  github.com/projectdiscovery/utils/url.(*OrderedParams).Decode
    0.55MB 0.043% 92.44%     8.64MB  0.67%  regexp.compile
    0.54MB 0.042% 92.48%     7.26MB  0.56%  github.com/projectdiscovery/retryablehttp-go.NewClient
    0.52MB 0.041% 92.52%   100.17MB  7.79%  net/http.(*Transport).dialConn
    0.52MB  0.04% 92.56%    20.74MB  1.61%  github.com/projectdiscovery/nuclei/v3/pkg/protocols/http.(*Request).Compile
    0.19MB 0.015% 92.58%    22.85MB  1.78%  bytes.(*Buffer).grow
    0.13MB  0.01% 92.59%    83.98MB  6.53%  github.com/projectdiscovery/fastdialer/fastdialer.AsZTLSConfig
    0.12MB 0.0096% 92.60%    97.36MB  7.57%  github.com/projectdiscovery/fastdialer/fastdialer.(*Dialer).DialTLS
    0.08MB 0.0064% 92.60%   431.91MB 33.59%  github.com/projectdiscovery/nuclei/v3/pkg/tmplexec/generic.(*Generic).ExecuteWithResults
    0.08MB 0.0061% 92.61%    44.34MB  3.45%  github.com/projectdiscovery/nuclei/v3/pkg/templates.Parse
    0.06MB 0.0049% 92.62%    68.69MB  5.34%  github.com/projectdiscovery/nuclei/v3/pkg/catalog/loader.(*Store).LoadTemplatesWithTags
    0.05MB 0.0037% 92.62%   194.20MB 15.10%  github.com/projectdiscovery/utils/errors.(*enrichedError).Error
    0.04MB 0.0034% 92.62%    21.30MB  1.66%  net/http.(*persistConn).writeLoop
    0.04MB 0.0032% 92.63%    17.75MB  1.38%  gopkg.in/yaml%2ev2.(*decoder).sequence
    0.04MB 0.0027% 92.63%   431.95MB 33.59%  github.com/projectdiscovery/nuclei/v3/pkg/tmplexec.(*TemplateExecuter).Execute
    0.03MB 0.0021% 92.63%   196.37MB 15.27%  net/url.(*Error).Error
    0.02MB 0.0015% 92.63%   425.83MB 33.11%  github.com/projectdiscovery/retryablehttp-go.(*Client).Do
    0.01MB 0.00092% 92.63%    97.23MB  7.56%  github.com/projectdiscovery/fastdialer/fastdialer.(*Dialer).dial

@stan-threatmate
Copy link
Author

Actually your proposal of having 25 concurrent hosts/templates worked on one of my test setups. I setup a constrained memory container with 2048MB RAM and aggressive GC settings: GOMEMLIMIT=500MiB and GOGC=20. I saw when the scan reached 35% the ram increased suddenly and the GC was trying really hard to free the memory. It got right to 2GB and stayed there for a bit. I thought it will be OOM killed but it managed to keep the pace of allocs/frees so it didn't get killed and it went down to a sustainable levels.

Then around the 75% complete it shot up again this time staying for a very long time at 2GB and the CPU really hurting at 1500% trying to free all this memory. It was successful and completed at the end.

My theory is that some templates are allocating a ton of memory and if the concurrency settings are above a certain threshold it can lead to the allocation rate surpassing the ability of the GC to free the memory which ultimately leads to an OOM kill. The only saving grace would be good amount of free memory and/or fast CPUs that can help the GC free up memory faster. But we really need guidance on the performance characteristics of nuclei: what is the RAM and CPU requirements for X many hosts and Y many templates etc.

Is there a way to know which templates use the most memory? Can we measure the CPU/memory of individual templates? That would be a very good metric to know. If I want to speed things up I'd like to be able to run efficient templates faster but slow down on the memory heavy ones in order to not run out of memory. Some of the big allocations are around the raw requests and responses.

@stan-threatmate
Copy link
Author

Another update - using 25/25 for host/template concurrency when scanning 360 targets still resulted in OOM kill but it ran for a significantly longer time. I will set the garbage collection to aggressive values and try again: GOGC=20 GOMEMLIMIT=500MiB

@tarunKoyalwar
Copy link
Member

tarunKoyalwar commented Feb 14, 2024

@stan-threatmate , although it is helpful in production but tuning GC while debugging memory leaks might not help so i would recommending just to try out with normal options . because as you already know go does not immediately release memory but it gradually releases it in hope of reusing it instead of allocating it again and again .

that is why tuning GC aggresively would only cause more cpu utilization without any direct output (especially in this case )

Just now i have added more docs on how nuclei consumes resources and all the factors involved based on your suggestion at https://docs.projectdiscovery.io/tools/nuclei/mass-scanning-cli

From the profile details you have shared it looks like these are not the actual inflection points

Showing nodes accounting for 1191.23MB, 92.63% of 1285.96MB total <- heap memory is 1.2GB

Also looking at above profile data i can only tell that the top functions using heap as shown in above profiles are expected . generator.MergeMaps , generateRawRequest etc contain raw response data in maps and looking at concurrency i think this much is expected and since this data is actually obtained from targets being run its difficult to estimate how much data is currently being held .

If you think its related to particular set of templates i would suggest splitting templates and running different scan

  • with just http templates
  • with templates other than http protocol
  • with javascript protocol templates
    ^ you can use protocolType option in nuclei sdk to effectively filter out templates. If the problem is related to a specific set of template or a feature related to a specific template . it will be visible to you in one of the above scan results / observations

^ This is one of effective strategy i used which worked when i fixed memory leaks recently in v3.1.8-10

Other/Alternative strategy is to continiously capture snapshots and nuclei process memory (using memstats or manually via bash script using PID) . subtracting profiles between normal and sudden spike using -diff_base will pinpoint the function responsible for it

I will try to reproduce this using CLI with 800 targets.

Finally if you want to customize any execution behaviour or use your own logic . i would suggest taking a look at core package which contains how targets x templates are executed

@tarunKoyalwar
Copy link
Member

tarunKoyalwar commented Feb 14, 2024

btw nuclei-docs repo is deprecated and latest docs are available at https://docs.projectdiscovery.io/introduction

@tarunKoyalwar tarunKoyalwar added the Investigation Something to Investigate label Feb 14, 2024
@tarunKoyalwar tarunKoyalwar self-assigned this Feb 14, 2024
@stan-threatmate
Copy link
Author

stan-threatmate commented Feb 14, 2024

I'll continue debugging this but here is a run on a 16GB container with 8 CPUs which is scanning 358 hosts with 25/25 host/template concurrency and max host error set to 15000:

image image

You can clearly see that there is some event that causes a runaway memory utilization at the end of the scan. At this point the nuclei stats showed 43% completion but I am not sure how trustworthy that percentage is.

You can see the GC working really hard throughout the scan to keep the memory low.

@stan-threatmate
Copy link
Author

@tarunKoyalwar thank you for looking into this issue!

I have a question about -max-host-error. I want to use nuclei to try all templates on a host. If I understand this correctly we need to set the mhe to a large number in order to not stop the scan prematurely, right?

Also about the-response-size-readoption - do templates care about this value and if I set it 0 to save on memory would that hurt how templates work?

About the -rate-limit option - I haven't seen it make any difference at least according to the nuclei stats. Is the RPS metric reported by the stats controlled by this option?

@stan-threatmate
Copy link
Author

Update: I am scanning 47 hosts with the following settings and I stil get OOM killed on a 16GB RAM 8 CPU container:

  • template/host concurrency 15
  • rate limit 10
  • exclude severity: info, low
  • scan strategy: host-spray
  • max host errors: 30
  • GOMEMLIMIT=500MiB GOGC=20

I suspect a template or a group of templates that spike up and cause large memory allocations because the memory allocations are stable until an inflection point where things spike.

image

The steep climb is what makes me believe it is a very specific template or related templates that cause this.

@lebik
Copy link

lebik commented Feb 16, 2024

I have the same problem. And after revert to v. 2.9.15 everything works well.
So I think that the problem is with one of 118 templates, which is not supported in v. 2.9.15

image

@stan-threatmate
Copy link
Author

I can confirm that it has to be one of the critical templates. Here are two scans. The first is using only the critical severity templates. The second is using everything but the critical severity templates:

image

We can see when we don't do the critical severity templates the memory is minimal.

@Mzack9999 Mzack9999 self-assigned this Feb 22, 2024
@tarunKoyalwar tarunKoyalwar added Priority: Critical This should be dealt with ASAP. Not fixing this issue would be a serious error. and removed Investigation Something to Investigate labels Feb 22, 2024
@tarunKoyalwar
Copy link
Member

@stan-threatmate FYI , we were able to reproduce this some time ago and working on locating and fixing the vulnerable code

@stan-threatmate
Copy link
Author

@tarunKoyalwar thank you!

@Mzack9999
Copy link
Member

The issue will be fixed as part of #4800

@tarunKoyalwar
Copy link
Member

tarunKoyalwar commented Feb 25, 2024

@stan-threatmate , can you try running scan using sdk by disabling these 4 templates ? you can use -et and itscorresponding option in sdk to disable these templates

http/cves/2019/CVE-2019-17382.yaml
http/cves/2023/CVE-2023-24489.yaml
http/fuzzing/header-command-injection.yaml
http/fuzzing/wordpress-weak-credentials.yaml

@stan-threatmate
Copy link
Author

@tarunKoyalwar I removed the templates you mention and my first test scan finished successfully. Memory looks great. Next I am running a large scan over 400 hosts but it will take 15h to complete so I'll report tomorrow. I also used more aggressive settings:

  • template concurrency: 50
  • host concurrency: 50
  • rate limit: 150
  • excluded severity: info, low
image

@stan-threatmate
Copy link
Author

Removing the 5 templates allowed us to scan about 400 hosts with no problem on a 16GB container with 8 CPUs

image

@Mzack9999
Copy link
Member

@stan-threatmate The issue is about high parallelism in bruteforce templates, which causes a lot of buffer allocations to read http responses (up to default 10mb). To mitigate the issue a generic memory monitor mechanism, has been implemented in #4833 (when the global RAM occupation is above 75% the parallelism is decreased to 5), I was able to complete multiple runs without the scan being killed on an 8gb system.

@stan-threatmate
Copy link
Author

stan-threatmate commented Mar 6, 2024

@Mzack9999 thank you! How is the RAM limit determined? Is it based on the free memory or the total memory? Can we configure the limits (75% and 5 threads) in the SDK?

Update:
I looked at your changes and added some comments.

Second update:
Another mechanism you can use is a rate limit on the memory allocations per second. If 10MB buffers can be allocated we can limit the buffer allocations per second to 50 for 500MB of RAM per second. Ideally this will be configurable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Critical This should be dealt with ASAP. Not fixing this issue would be a serious error. Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
5 participants