Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce scan context #4373

Merged
merged 8 commits into from
Nov 27, 2023
Merged

introduce scan context #4373

merged 8 commits into from
Nov 27, 2023

Conversation

dogancanbakir
Copy link
Member

@dogancanbakir dogancanbakir commented Nov 15, 2023

Proposed changes (initial)

Please describe your feature request:

When I use nuclei to run my test cases using matcher-status option that generates a result event for failed matchers, it doesn't include results when the host got errored out for multiple reasons, which nuclei failed to report.

nuclei have a separate option to track errored hosts with error information using -elog option, and unfortunately, it can not be mapped directly with the result itself.

Describe the use case of this feature:

I'm running nuclei in my CI pipeline, nuclei reported multiple results upon 1st run; as a result of one vulnerability, I've taken down the vulnerable host, upon next nuclei run, nuclei failed to report the same vulnerabilities as it silently failed to run as host is not accessible.

Instead, nuclei can still generate failed match events and populate error information as it does when using with -elog option.

Here is example run with this support:

1. Test Template
id: failed_test

info:
  name: Test HTTP Template
  author: pdteam
  severity: info

http:
  - method: GET
    path:
      - "{{BaseURL}}"

    matchers:
      - type: word
        words:
          - "This is test matcher text"
2. Test Run
echo https://googleaaaaaaaaa.com | nuclei -t test.yaml -matcher-status -jsonl

                     __     _
   ____  __  _______/ /__  (_)
  / __ \/ / / / ___/ / _ \/ /
 / / / / /_/ / /__/ /  __/ /
/_/ /_/\__,_/\___/_/\___/_/   v3.0.2

		projectdiscovery.io

[INF] Current nuclei version: v3.0.2 (latest)
[INF] Current nuclei-templates version: v9.6.7 (latest)
[INF] New templates added in latest release: 1
[INF] Templates loaded for current scan: 1
[WRN] Executing 1 unsigned templates. Use with caution.
[INF] Targets loaded for current scan: 1
3. Test Result
{
  "template-id": "failed_test",
  "template-path": "/Users/geekboy/Github/nuclei-templates/test.yaml",
  "info": {
    "name": "Test HTTP Template",
    "author": [
      "pdteam"
    ],
    "tags": null,
    "severity": "info"
  },
  "type": "http",
  "host": "https://gggggggggggle.com",
  "request": "GET / HTTP/1.1\r\nHost: gggggggggggle.com\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2866.71 Safari/537.36\r\nConnection: close\r\nAccept: */*\r\nAccept-Language: en\r\nAccept-Encoding: gzip\r\n\r\n",
-  "response": "XXXX",
  "timestamp": "2023-10-27T18:08:24.24882+05:30",
+  "error": "context deadline exceeded",
  "matcher-status": false
}

Note:

  • This support needs to be reflected for SDK uses as well.

Proposed changes

Refactor Nuclei to use ScanCtx

Jargon

  • scan = 1 template x 1 target

when we say stateless in this context it means stateless for a scan

Background / Intro

  • In Nuclei we have TemplateExecuter for executing a template on one or more than 1 targets (tmplexec package).
func (e *TemplateExecuter) ExecuteWithResults(input *contextargs.Context, callback protocols.OutputEventCallback) error {
  • This is function signature of method used to execute a scan ( scan == 1 target + 1 template )
  • This is called N times in several goroutines when there is more than 1 target and Hence it is necessary that TemplateExecuter and all its field are stateless (in context of scan) . simply put a scan should not update/modify any data in TemplateExecuter or it will cause a race condition / panic

Requirement / Issue

we are trying to return any error occured during execution of a scan in json response when -ms flag is used (ref: #4299) .The solution for this is to add a field called error in ResultEvent . But problem is that it doesn’t seem to be possible with current arch/design . since we generate ResultEvent in callbacks, looking at above function signature we can tell that error is returned seperately and we exit function on error ( at multiple places).

generating event in callback may not be good idea and we have so many nested callbacks in nuclei but unfortunately there is no simple / alternate solution to replace callbacks at least for now

How does current error logging work (like elog etc) ??

TemplateExecuter/ Protocols etc have a field called ExecuterOptions which contains global (ideally) read-only resources that are shared across all protocols ex: browser instance , interactsh instance and lot more . we have output.Writer field in ExecuterOptions that is called to log errors to a io.Writer .

// Request logs a request in the trace log
Request(templateID, url, requestType string, err error)

Since this is a global writer it is not possible to hijack error’s and pass it to ResultEvent

*contextargs.Context

  • this is input context . more specifically contains target and any arguments from workflows and is specific to scan
  • In Nuclei v3 we introduced templateContext (a global map of variables per scan) for this we are hashing contextArgs and using that as key . since input *contextargs.Context is scan specific . ( it is cloned at protocol level etc but more or less remains constant for a scan )
  • This key is then used to read/modify/update variables by using a synchronized Map in ExecuterOptions . this ideally should not happen since we aim to keep TemplateExecuter stateless but this was only/easiest hack to use at that moment

Proposed Solution

I think solution is not something entirely new but it just extends to existing implementation of contextArgs.Context and following a strict practice/guidelines while coding

The idea remains the same i.e keeping TemplateExecuter and Protocols.Request stateless and move all update/modify related data (like variables/templateContext) to ScanContext . The idea is to treat it the same way as we use context.Context in golang .

ScanContext is wrapper around input *contextargs.Context and will include all stateful data related to a scan

type ScanContext struct {
	// embed context.Context
	context.Context
	// scanID
	scanID string // MD5 (templateID+target+ip)
	// existing input/target related info
	input *contextargs.Context
	// templateInfo
	Info model.Info
	// Globally shared args aka template Context
	TemplateMap map[string]interface{}
  // stats tracker like req count etc
	Stats *Stats 

	// hooks / callbacks
	OnError func(error)

	OnResult func(e *InternalWrappedEvent)

	// other fields to be added here
}

func (s *ScanContext) GenerateResult() *output.ResultEvent

func (s *ScanContext) LogError(err error) error {}
// other required fields

so any function invocations in ExecuteWithResults function should follow the almost same new function signature as shown below

func (x *Type) XyzMethod(ctx *ScanContext,......) ... {

Example

  • this is example execute method that protocols implement
// Example Execute method of a protocol like ssl , http
func Execute(ScanCtx context.Context) error {
  generatedHttpRequest, err := generator.Make(...)
	if err != nil {
		return ScanCtx.LogError(err)
	}
	return nil
}
  • After change we will do following in tmplexec package

current implementation

// ExecuteWithResults executes the protocol requests and returns results instead of writing them.
func (e *TemplateExecuter) ExecuteWithResults(input *contextargs.Context, callback protocols.OutputEventCallback) error {
	userCallback := func(event *output.InternalWrappedEvent) {
		if event != nil {
			callback(event)
		}
	}
	return e.engine.ExecuteWithResults(input, userCallback)
}

new Implementation

// ExecuteWithResults executes the protocol requests and returns results instead of writing them.
func (e *TemplateExecuter) ExecuteWithResults(input *contextargs.Context) (*ResultEvent, error) {
	// create scanCtx using input 
  scanCtx := NewScanCtx(input)
	// execute template / engine
	err := e.engine.ExecuteWithResults(scanctx)
	if err != nil && !e.options.MatcherStatus {
			return nil, err
	}
	return scanCtx.GenerateResult(),err
}

when we use LogError method it will save given error in ScanContext struct and will be used when generating result .

With this we can include much more than just a error in ResultEvent since GenerateResult is a method of ScanContext and it has all scan related data . we can include everything that will be required in clean way . Others things that can be included in ResultEvent are

  • All Template Context Variables
  • Stats like total request sent / failed etc
  • Analytics related data like Time taken for a scan
  • All existing stuff like template Metadata etc
  • Request / Response Logs ( in all protocols)
  • Other Future requirements related to scan

Conclusion

  • we are abstracting callbacks from function invocations . and later on we can slowly refactor / remove callbacks with any better solutions

Checklist

  • Pull request is created against the dev branch
  • All checks passed (lint, unit/integration/regression tests etc.) with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

@tarunKoyalwar
Copy link
Member

TODO

  • Introduce new public/exported methods (both in sdk and core package) without any callbacks as arguments (ref: ExecuteWithCallback: Callback does not fire if UseOutputWriter is configured #4329 )
  • Callback's should exist internally (until refactored) and users should not be able to use/configure this callback either from sdk or runner [since we are implementing errors and other features if we execute callback functions errors will not be reflected in them and it will be difficult to maintain both of them]

Copy link
Member

@tarunKoyalwar tarunKoyalwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial implementation seems good . we should probably circle back and divide implementation into small tasks and only implement necessary changes in this PR and implement other changes ( template context , stats etc in follow up issues)

more details in TODO

@dogancanbakir dogancanbakir marked this pull request as ready for review November 17, 2023 10:04
@dogancanbakir
Copy link
Member Author

$ go run . -t test.yaml -u https://gggggggggggle.com -matcher-status -jsonl -silent | jq
{
  "template-id": "failed_test",
  "template-path": "/workspaces/nuclei/cmd/nuclei/test.yaml",
  "template-encoded": "aWQ6IGZhaWxlZF90ZXN0CgppbmZvOgogIG5hbWU6IFRlc3QgSFRUUCBUZW1wbGF0ZQogIGF1dGhvcjogcGR0ZWFtCiAgc2V2ZXJpdHk6IGluZm8KCmh0dHA6CiAgLSBtZXRob2Q6IEdFVAogICAgcGF0aDoKICAgICAgLSAie3tCYXNlVVJMfX0iCgogICAgbWF0Y2hlcnM6CiAgICAgIC0gdHlwZTogd29yZAogICAgICAgIHdvcmRzOgogICAgICAgICAgLSAiVGhpcyBpcyB0ZXN0IG1hdGNoZXIgdGV4dCI=",
  "info": {
    "name": "Test HTTP Template",
    "author": [
      "pdteam"
    ],
    "tags": null,
    "severity": "info"
  },
  "type": "http",
  "host": "https://gggggggggggle.com",
  "request": "GET / HTTP/1.1\r\nHost: gggggggggggle.com\r\nUser-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2656.18 Safari/537.36\r\nConnection: close\r\nAccept: */*\r\nAccept-Language: en\r\nAccept-Encoding: gzip\r\n\r\n",
  "timestamp": "2023-11-17T10:01:37.209519923Z",
  "matcher-status": false,
  "error": "GET https://gggggggggggle.com giving up after 2 attempts: Get \"https://gggggggggggle.com\": no address found for host"
}

Copy link
Member

@tarunKoyalwar tarunKoyalwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementation lgtm ! and seems to be working correctly. I think we should complete below necessary changes and then decide changes to be implemented in follow up steps

$ ./nuclei -id tech-detect -u https://scanmescanme.ssh -ms -jsonl -silent | jq .
{
  "template": "http/technologies/tech-detect.yaml",
  "template-url": "https://templates.nuclei.sh/public/tech-detect",
  "template-id": "tech-detect",
  "template-path": "/Users/tarun/nuclei-templates/http/technologies/tech-detect.yaml",
  "info": {
    "name": "Wappalyzer Technology Detection",
    "author": [
      "hakluke"
    ],
    "tags": [
      "tech"
    ],
    "severity": "info",
    "metadata": {
      "max-request": 1
    }
  },
  "type": "http",
  "host": "https://scanmescanme.ssh",
  "request": "GET / HTTP/1.1\r\nHost: scanmescanme.ssh\r\nUser-Agent: Mozilla/5.0 (X11; OpenBSD i386) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36\r\nConnection: close\r\nAccept: */*\r\nAccept-Language: en\r\nAccept-Encoding: gzip\r\n\r\n",
  "timestamp": "2023-11-21T21:19:41.615583+05:30",
  "matcher-status": false,
  "error": "GET https://scanmescanme.ssh giving up after 2 attempts: Get \"https://scanmescanme.ssh\": no address found for host"
}

Changes

  • Update Executer Interface to align with current logic (i.e without callback in input)
// Existing Executer interface
type Executer interface {
	// Compile compiles the execution generators preparing any requests possible.
	Compile() error
	// Requests returns the total number of requests the rule will perform
	Requests() int
	// Execute executes the protocol group and returns true or false if results were found.
	Execute(ctx *scan.ScanContext) (bool, error)
	// ExecuteWithResults executes the protocol requests and returns results instead of writing them.
	ExecuteWithResults(ctx *scan.ScanContext) ([]*output.ResultEvent, error)
}

// New or Proposed Executer interfacoe
type Executer interface {
	// Compile compiles the execution generators preparing any requests possible.
	Compile() error
	// Requests returns the total number of requests the rule will perform
	Requests() int
	// Execute executes the protocol group and returns true or false if results were found.
	ExecuteWithScanCtx(ctx *scan.ScanContext) error
	// ExecuteWithResults executes the protocol requests and returns results instead of writing them.
	// Note: In case of error it will also include a ResultEvent which provides more accurate information
	// about error and other info like total request sent and other necessary info
	ExecuteWithInput(input *contextargs.Context) ([]*output.ResultEvent, error)
}
  • we still implement logic regarding printing results in tmplExec package instead it should be defined in ScanContext itself providing better seperation between cli and sdk usage of nuclei without needing to use MockOutputWriter in sdk at all . example
// Customizable context args for sdk use cases (i.e no stdout logging implemented here)
func NewScanCtx(ctx *contextargs.Context)

// For CLI use case . in this function we use output writers etc from opts 
// and add that logic while initializing scanCtx using protocols.ExecuterOptions
func NewScanCtxWithOpts(ctx *contextargs.Context, opts *protocols.ExecuterOptions)

Follow-up tickets related to refactor

  • refactor stats logic to use ScanCtx and fix incorrect stats etc
  • refactor interactsh output logic in nuclei (i.e implement cooldown in ScanCtx instead of outputWriter) (also add meta-fields to track whether interaction was recieved at all or not)
  • refactor sdk internals to use ScanCtx after resolving above two tickets to make it more robust

@ehsandeep ehsandeep modified the milestone: nuclei v3.1.0 Nov 23, 2023
Copy link
Member

@tarunKoyalwar tarunKoyalwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm !

$ ./nuclei -id tech-detect -u https://scanmescanme.ssh -ms -jsonl -silent | jq .
{
  "template": "http/technologies/tech-detect.yaml",
  "template-url": "https://templates.nuclei.sh/public/tech-detect",
  "template-id": "tech-detect",
  "template-path": "/Users/tarun/nuclei-templates/http/technologies/tech-detect.yaml",
  "info": {
    "name": "Wappalyzer Technology Detection",
    "author": [
      "hakluke"
    ],
    "tags": [
      "tech"
    ],
    "severity": "info",
    "metadata": {
      "max-request": 1
    }
  },
  "type": "http",
  "host": "https://scanmescanme.ssh",
  "request": "GET / HTTP/1.1\r\nHost: scanmescanme.ssh\r\nUser-Agent: Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36\r\nConnection: close\r\nAccept: */*\r\nAccept-Language: en\r\nAccept-Encoding: gzip\r\n\r\n",
  "timestamp": "2023-11-28T00:16:43.77864+05:30",
  "matcher-status": false,
  "error": "GET https://scanmescanme.ssh giving up after 2 attempts: Get \"https://scanmescanme.ssh\": no address found for host"
}

will create followup tickets shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

-matcher-status option to generate result event for errored cases
3 participants