introduce scan context #4373

dogancanbakir · 2023-11-15T10:40:20Z

Proposed changes (initial)

Please describe your feature request:

When I use nuclei to run my test cases using matcher-status option that generates a result event for failed matchers, it doesn't include results when the host got errored out for multiple reasons, which nuclei failed to report.

nuclei have a separate option to track errored hosts with error information using -elog option, and unfortunately, it can not be mapped directly with the result itself.

Describe the use case of this feature:

I'm running nuclei in my CI pipeline, nuclei reported multiple results upon 1st run; as a result of one vulnerability, I've taken down the vulnerable host, upon next nuclei run, nuclei failed to report the same vulnerabilities as it silently failed to run as host is not accessible.

Instead, nuclei can still generate failed match events and populate error information as it does when using with -elog option.

Here is example run with this support:

1. Test Template

id: failed_test

info:
  name: Test HTTP Template
  author: pdteam
  severity: info

http:
  - method: GET
    path:
      - "{{BaseURL}}"

    matchers:
      - type: word
        words:
          - "This is test matcher text"

2. Test Run

echo https://googleaaaaaaaaa.com | nuclei -t test.yaml -matcher-status -jsonl

                     __     _
   ____  __  _______/ /__  (_)
  / __ \/ / / / ___/ / _ \/ /
 / / / / /_/ / /__/ /  __/ /
/_/ /_/\__,_/\___/_/\___/_/   v3.0.2

		projectdiscovery.io

[INF] Current nuclei version: v3.0.2 (latest)
[INF] Current nuclei-templates version: v9.6.7 (latest)
[INF] New templates added in latest release: 1
[INF] Templates loaded for current scan: 1
[WRN] Executing 1 unsigned templates. Use with caution.
[INF] Targets loaded for current scan: 1

3. Test Result

{
  "template-id": "failed_test",
  "template-path": "/Users/geekboy/Github/nuclei-templates/test.yaml",
  "info": {
    "name": "Test HTTP Template",
    "author": [
      "pdteam"
    ],
    "tags": null,
    "severity": "info"
  },
  "type": "http",
  "host": "https://gggggggggggle.com",
  "request": "GET / HTTP/1.1\r\nHost: gggggggggggle.com\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2866.71 Safari/537.36\r\nConnection: close\r\nAccept: */*\r\nAccept-Language: en\r\nAccept-Encoding: gzip\r\n\r\n",
-  "response": "XXXX",
  "timestamp": "2023-10-27T18:08:24.24882+05:30",
+  "error": "context deadline exceeded",
  "matcher-status": false
}

Note:

This support needs to be reflected for SDK uses as well.

Proposed changes

Refactor Nuclei to use `ScanCtx`

Jargon

scan = 1 template x 1 target

when we say stateless in this context it means stateless for a scan

Background / Intro

In Nuclei we have TemplateExecuter for executing a template on one or more than 1 targets (tmplexec package).

func (e *TemplateExecuter) ExecuteWithResults(input *contextargs.Context, callback protocols.OutputEventCallback) error {

This is function signature of method used to execute a scan ( scan == 1 target + 1 template )
This is called N times in several goroutines when there is more than 1 target and Hence it is necessary that TemplateExecuter and all its field are stateless (in context of scan) . simply put a scan should not update/modify any data in TemplateExecuter or it will cause a race condition / panic

Requirement / Issue

we are trying to return any error occured during execution of a scan in json response when -ms flag is used (ref: #4299) .The solution for this is to add a field called error in ResultEvent . But problem is that it doesn’t seem to be possible with current arch/design . since we generate ResultEvent in callbacks, looking at above function signature we can tell that error is returned seperately and we exit function on error ( at multiple places).

generating event in callback may not be good idea and we have so many nested callbacks in nuclei but unfortunately there is no simple / alternate solution to replace callbacks at least for now

How does current error logging work (like elog etc) ??

TemplateExecuter/ Protocols etc have a field called ExecuterOptions which contains global (ideally) read-only resources that are shared across all protocols ex: browser instance , interactsh instance and lot more . we have output.Writer field in ExecuterOptions that is called to log errors to a io.Writer .

// Request logs a request in the trace log
Request(templateID, url, requestType string, err error)

Since this is a global writer it is not possible to hijack error’s and pass it to ResultEvent

`*contextargs.Context`

this is input context . more specifically contains target and any arguments from workflows and is specific to scan
In Nuclei v3 we introduced templateContext (a global map of variables per scan) for this we are hashing contextArgs and using that as key . since input *contextargs.Context is scan specific . ( it is cloned at protocol level etc but more or less remains constant for a scan )
This key is then used to read/modify/update variables by using a synchronized Map in ExecuterOptions . this ideally should not happen since we aim to keep TemplateExecuter stateless but this was only/easiest hack to use at that moment

Proposed Solution

I think solution is not something entirely new but it just extends to existing implementation of contextArgs.Context and following a strict practice/guidelines while coding

The idea remains the same i.e keeping TemplateExecuter and Protocols.Request stateless and move all update/modify related data (like variables/templateContext) to ScanContext . The idea is to treat it the same way as we use context.Context in golang .

ScanContext is wrapper around input *contextargs.Context and will include all stateful data related to a scan

type ScanContext struct {
	// embed context.Context
	context.Context
	// scanID
	scanID string // MD5 (templateID+target+ip)
	// existing input/target related info
	input *contextargs.Context
	// templateInfo
	Info model.Info
	// Globally shared args aka template Context
	TemplateMap map[string]interface{}
  // stats tracker like req count etc
	Stats *Stats 

	// hooks / callbacks
	OnError func(error)

	OnResult func(e *InternalWrappedEvent)

	// other fields to be added here
}

func (s *ScanContext) GenerateResult() *output.ResultEvent

func (s *ScanContext) LogError(err error) error {}
// other required fields

so any function invocations in ExecuteWithResults function should follow the almost same new function signature as shown below

func (x *Type) XyzMethod(ctx *ScanContext,......) ... {

Example

this is example execute method that protocols implement

// Example Execute method of a protocol like ssl , http
func Execute(ScanCtx context.Context) error {
  generatedHttpRequest, err := generator.Make(...)
	if err != nil {
		return ScanCtx.LogError(err)
	}
	return nil
}

After change we will do following in tmplexec package

current implementation

// ExecuteWithResults executes the protocol requests and returns results instead of writing them.
func (e *TemplateExecuter) ExecuteWithResults(input *contextargs.Context, callback protocols.OutputEventCallback) error {
	userCallback := func(event *output.InternalWrappedEvent) {
		if event != nil {
			callback(event)
		}
	}
	return e.engine.ExecuteWithResults(input, userCallback)
}

new Implementation

// ExecuteWithResults executes the protocol requests and returns results instead of writing them.
func (e *TemplateExecuter) ExecuteWithResults(input *contextargs.Context) (*ResultEvent, error) {
	// create scanCtx using input 
  scanCtx := NewScanCtx(input)
	// execute template / engine
	err := e.engine.ExecuteWithResults(scanctx)
	if err != nil && !e.options.MatcherStatus {
			return nil, err
	}
	return scanCtx.GenerateResult(),err
}

when we use LogError method it will save given error in ScanContext struct and will be used when generating result .

With this we can include much more than just a error in ResultEvent since GenerateResult is a method of ScanContext and it has all scan related data . we can include everything that will be required in clean way . Others things that can be included in ResultEvent are

All Template Context Variables
Stats like total request sent / failed etc
Analytics related data like Time taken for a scan
All existing stuff like template Metadata etc
Request / Response Logs ( in all protocols)
Other Future requirements related to scan

Conclusion

we are abstracting callbacks from function invocations . and later on we can slowly refactor / remove callbacks with any better solutions

Checklist

Pull request is created against the dev branch
All checks passed (lint, unit/integration/regression tests etc.) with my changes
I have added tests that prove my fix is effective or that my feature works
I have added necessary documentation (if appropriate)

tarunKoyalwar · 2023-11-15T13:46:39Z

TODO

Introduce new public/exported methods (both in sdk and core package) without any callbacks as arguments (ref: ExecuteWithCallback: Callback does not fire if UseOutputWriter is configured #4329 )
Callback's should exist internally (until refactored) and users should not be able to use/configure this callback either from sdk or runner [since we are implementing errors and other features if we execute callback functions errors will not be reflected in them and it will be difficult to maintain both of them]

tarunKoyalwar

initial implementation seems good . we should probably circle back and divide implementation into small tasks and only implement necessary changes in this PR and implement other changes ( template context , stats etc in follow up issues)

more details in TODO

dogancanbakir · 2023-11-17T10:04:24Z

$ go run . -t test.yaml -u https://gggggggggggle.com -matcher-status -jsonl -silent | jq
{
  "template-id": "failed_test",
  "template-path": "/workspaces/nuclei/cmd/nuclei/test.yaml",
  "template-encoded": "aWQ6IGZhaWxlZF90ZXN0CgppbmZvOgogIG5hbWU6IFRlc3QgSFRUUCBUZW1wbGF0ZQogIGF1dGhvcjogcGR0ZWFtCiAgc2V2ZXJpdHk6IGluZm8KCmh0dHA6CiAgLSBtZXRob2Q6IEdFVAogICAgcGF0aDoKICAgICAgLSAie3tCYXNlVVJMfX0iCgogICAgbWF0Y2hlcnM6CiAgICAgIC0gdHlwZTogd29yZAogICAgICAgIHdvcmRzOgogICAgICAgICAgLSAiVGhpcyBpcyB0ZXN0IG1hdGNoZXIgdGV4dCI=",
  "info": {
    "name": "Test HTTP Template",
    "author": [
      "pdteam"
    ],
    "tags": null,
    "severity": "info"
  },
  "type": "http",
  "host": "https://gggggggggggle.com",
  "request": "GET / HTTP/1.1\r\nHost: gggggggggggle.com\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2656.18 Safari/537.36\r\nConnection: close\r\nAccept: */*\r\nAccept-Language: en\r\nAccept-Encoding: gzip\r\n\r\n",
  "timestamp": "2023-11-17T10:01:37.209519923Z",
  "matcher-status": false,
  "error": "GET https://gggggggggggle.com giving up after 2 attempts: Get \"https://gggggggggggle.com\": no address found for host"
}

tarunKoyalwar

implementation lgtm ! and seems to be working correctly. I think we should complete below necessary changes and then decide changes to be implemented in follow up steps

$ ./nuclei -id tech-detect -u https://scanmescanme.ssh -ms -jsonl -silent | jq .
{
  "template": "http/technologies/tech-detect.yaml",
  "template-url": "https://templates.nuclei.sh/public/tech-detect",
  "template-id": "tech-detect",
  "template-path": "/Users/tarun/nuclei-templates/http/technologies/tech-detect.yaml",
  "info": {
    "name": "Wappalyzer Technology Detection",
    "author": [
      "hakluke"
    ],
    "tags": [
      "tech"
    ],
    "severity": "info",
    "metadata": {
      "max-request": 1
    }
  },
  "type": "http",
  "host": "https://scanmescanme.ssh",
  "request": "GET / HTTP/1.1\r\nHost: scanmescanme.ssh\r\nUser-Agent: Mozilla/5.0 (X11; OpenBSD i386) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36\r\nConnection: close\r\nAccept: */*\r\nAccept-Language: en\r\nAccept-Encoding: gzip\r\n\r\n",
  "timestamp": "2023-11-21T21:19:41.615583+05:30",
  "matcher-status": false,
  "error": "GET https://scanmescanme.ssh giving up after 2 attempts: Get \"https://scanmescanme.ssh\": no address found for host"
}

Changes

Update Executer Interface to align with current logic (i.e without callback in input)

// Existing Executer interface
type Executer interface {
	// Compile compiles the execution generators preparing any requests possible.
	Compile() error
	// Requests returns the total number of requests the rule will perform
	Requests() int
	// Execute executes the protocol group and returns true or false if results were found.
	Execute(ctx *scan.ScanContext) (bool, error)
	// ExecuteWithResults executes the protocol requests and returns results instead of writing them.
	ExecuteWithResults(ctx *scan.ScanContext) ([]*output.ResultEvent, error)
}

// New or Proposed Executer interfacoe
type Executer interface {
	// Compile compiles the execution generators preparing any requests possible.
	Compile() error
	// Requests returns the total number of requests the rule will perform
	Requests() int
	// Execute executes the protocol group and returns true or false if results were found.
	ExecuteWithScanCtx(ctx *scan.ScanContext) error
	// ExecuteWithResults executes the protocol requests and returns results instead of writing them.
	// Note: In case of error it will also include a ResultEvent which provides more accurate information
	// about error and other info like total request sent and other necessary info
	ExecuteWithInput(input *contextargs.Context) ([]*output.ResultEvent, error)
}

we still implement logic regarding printing results in tmplExec package instead it should be defined in ScanContext itself providing better seperation between cli and sdk usage of nuclei without needing to use MockOutputWriter in sdk at all . example

// Customizable context args for sdk use cases (i.e no stdout logging implemented here)
func NewScanCtx(ctx *contextargs.Context)

// For CLI use case . in this function we use output writers etc from opts 
// and add that logic while initializing scanCtx using protocols.ExecuterOptions
func NewScanCtxWithOpts(ctx *contextargs.Context, opts *protocols.ExecuterOptions)

Follow-up tickets related to refactor

refactor stats logic to use ScanCtx and fix incorrect stats etc
refactor interactsh output logic in nuclei (i.e implement cooldown in ScanCtx instead of outputWriter) (also add meta-fields to track whether interaction was recieved at all or not)
refactor sdk internals to use ScanCtx after resolving above two tickets to make it more robust

tarunKoyalwar

lgtm !

$ ./nuclei -id tech-detect -u https://scanmescanme.ssh -ms -jsonl -silent | jq .
{
  "template": "http/technologies/tech-detect.yaml",
  "template-url": "https://templates.nuclei.sh/public/tech-detect",
  "template-id": "tech-detect",
  "template-path": "/Users/tarun/nuclei-templates/http/technologies/tech-detect.yaml",
  "info": {
    "name": "Wappalyzer Technology Detection",
    "author": [
      "hakluke"
    ],
    "tags": [
      "tech"
    ],
    "severity": "info",
    "metadata": {
      "max-request": 1
    }
  },
  "type": "http",
  "host": "https://scanmescanme.ssh",
  "request": "GET / HTTP/1.1\r\nHost: scanmescanme.ssh\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36\r\nConnection: close\r\nAccept: */*\r\nAccept-Language: en\r\nAccept-Encoding: gzip\r\n\r\n",
  "timestamp": "2023-11-28T00:16:43.77864+05:30",
  "matcher-status": false,
  "error": "GET https://scanmescanme.ssh giving up after 2 attempts: Get \"https://scanmescanme.ssh\": no address found for host"
}

will create followup tickets shortly

introduce scan context

33679f5

dogancanbakir requested a review from tarunKoyalwar November 15, 2023 10:40

dogancanbakir self-assigned this Nov 15, 2023

dogancanbakir linked an issue Nov 15, 2023 that may be closed by this pull request

-matcher-status option to generate result event for errored cases #4299

Closed

minor

f3447cc

tarunKoyalwar mentioned this pull request Nov 15, 2023

ExecuteWithCallback: Callback does not fire if UseOutputWriter is configured #4329

Closed

tarunKoyalwar reviewed Nov 15, 2023

View reviewed changes

dogancanbakir requested a review from tarunKoyalwar November 16, 2023 12:44

dogancanbakir added 5 commits November 16, 2023 12:57

add joined errors to resultevents

722da28

change executor funcs' signature

ead7ff9

fix tests

0d923ec

join errors in LogError func

be8ad40

change func signature

6554a98

dogancanbakir marked this pull request as ready for review November 17, 2023 10:04

dogancanbakir requested a review from ehsandeep November 17, 2023 10:04

add guard

d96db48

tarunKoyalwar requested changes Nov 21, 2023

View reviewed changes

ehsandeep modified the milestone: nuclei v3.1.0 Nov 23, 2023

ehsandeep approved these changes Nov 27, 2023

View reviewed changes

ehsandeep requested a review from tarunKoyalwar November 27, 2023 11:05

tarunKoyalwar approved these changes Nov 27, 2023

View reviewed changes

ehsandeep merged commit ce5df9c into dev Nov 27, 2023
12 checks passed

ehsandeep deleted the introduce_scan_context branch November 27, 2023 18:54

This was referenced Nov 27, 2023

Add ScanContext in SDK for more control & customization #4432

Open

Refactor Interactsh logic to use ScanContext #4433

Open

Revamp Stats + improvements using scan context #4434

Open

BrewTestBot mentioned this pull request Nov 30, 2023

nuclei 3.1.0 Homebrew/homebrew-core#156031

Merged

tovask mentioned this pull request Apr 2, 2024

Incorrect result output from conditional workflows #4933

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

introduce scan context #4373

introduce scan context #4373

dogancanbakir commented Nov 15, 2023 •

edited

Loading

tarunKoyalwar commented Nov 15, 2023

tarunKoyalwar left a comment

dogancanbakir commented Nov 17, 2023

tarunKoyalwar left a comment

tarunKoyalwar left a comment

introduce scan context #4373

introduce scan context #4373

Conversation

dogancanbakir commented Nov 15, 2023 • edited Loading

Proposed changes (initial)

Please describe your feature request:

Describe the use case of this feature:

Proposed changes

Refactor Nuclei to use ScanCtx

Jargon

Background / Intro

Requirement / Issue

How does current error logging work (like elog etc) ??

*contextargs.Context

Proposed Solution

Example

current implementation

new Implementation

Conclusion

Checklist

tarunKoyalwar commented Nov 15, 2023

TODO

tarunKoyalwar left a comment

Choose a reason for hiding this comment

dogancanbakir commented Nov 17, 2023

tarunKoyalwar left a comment

Choose a reason for hiding this comment

Changes

Follow-up tickets related to refactor

tarunKoyalwar left a comment

Choose a reason for hiding this comment

dogancanbakir commented Nov 15, 2023 •

edited

Loading

Refactor Nuclei to use `ScanCtx`

`*contextargs.Context`