Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML #11006

Merged
merged 6 commits into from Mar 1, 2019

Conversation

Projects
None yet
3 participants
@adriansr
Copy link
Member

commented Mar 1, 2019

Golang's xml parser is refusing to process XML documents that contain an ASCII control character, resulting in those events being dropped. This patch pre-processes the XML document in order to replace invalid characters with the unicode escape sequence \uNNNN.

adriansr added some commits Mar 1, 2019

[Winlogbeat] Escape control characters in XML
Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".
[Winlogbeat] Do not drop events after XML rendering error
This updates the wineventlog logic to ingest windows event logs whose
rendering fails.

@adriansr adriansr requested a review from elastic/beats as a code owner Mar 1, 2019

return output(n)
}

func newXmlSafeReader(rawXML []byte) io.Reader {

This comment was marked as resolved.

Copy link
@houndci-bot

houndci-bot Mar 1, 2019

func newXmlSafeReader should be newXMLSafeReader

@andrewkroh

This comment has been minimized.

Copy link
Member

commented Mar 1, 2019

As a side-effect, at batch_size=100 it's 47.2% faster! 🥇

(update: Something else must have been using CPU during the first run b/c I couldn't replicate the results from the first run. Bother master and this branch performed about the same.)

PS C:\Gopath\src\github.com\elastic\beats\winlogbeat\eventlog> go test -v -benchtime 10s -benchtest -benchmem -run TestB
enchmarkBatchReadSize  .
=== RUN   TestBenchmarkBatchReadSize
--- PASS: TestBenchmarkBatchReadSize (96.50s)
    bench_test.go:98: batch_size=10, total_events=30000, batch_time=4.452175ms, events_per_sec=2246.0932016374018, bytes_alloced_per_event=21 kB, total_allocs=7344626
    bench_test.go:98: batch_size=100, total_events=30000, batch_time=43.886812ms, events_per_sec=2278.5888389432344, bytes_alloced_per_event=19 kB, total_allocs=6592877
    bench_test.go:98: batch_size=500, total_events=25000, batch_time=230.723254ms, events_per_sec=2167.098423464503, bytes_alloced_per_event=14 kB, total_allocs=2437639
    bench_test.go:98: batch_size=1000, total_events=30000, batch_time=444.63076ms, events_per_sec=2249.0571727426145, bytes_alloced_per_event=6.9 kB, total_allocs=235204
PASS
ok      github.com/elastic/beats/winlogbeat/eventlog    96.544s
PS C:\Gopath\src\github.com\elastic\beats\winlogbeat\eventlog> go test -v -benchtime 10s -benchtest -benchmem -run TestB
enchmarkBatchReadSize  .
=== RUN   TestBenchmarkBatchReadSize
--- PASS: TestBenchmarkBatchReadSize (65.73s)
    bench_test.go:98: batch_size=10, total_events=50000, batch_time=3.090045ms, events_per_sec=3236.1988255834463, bytes_alloced_per_event=18 kB, total_allocs=12380608
    bench_test.go:98: batch_size=100, total_events=50000, batch_time=29.812684ms, events_per_sec=3354.276991632152, bytes_alloced_per_event=18 kB, total_allocs=12310026
    bench_test.go:98: batch_size=500, total_events=50000, batch_time=148.310654ms, events_per_sec=3371.301969985245, bytes_alloced_per_event=18 kB, total_allocs=12301886
    bench_test.go:98: batch_size=1000, total_events=50000, batch_time=297.246802ms, events_per_sec=3364.2077669854966, bytes_alloced_per_event=18 kB, total_allocs=12300631
PASS
ok      github.com/elastic/beats/winlogbeat/eventlog    65.767s

@adriansr adriansr changed the title [WIP] [Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML Mar 1, 2019

adriansr added some commits Mar 1, 2019

@adriansr

This comment has been minimized.

Copy link
Member Author

commented Mar 1, 2019

@andrewkroh please have another look to see if you like the changes I did to RenderErr. Now error.message can be a list of strings.

Kill me if I know where that speedup comes from. Buffering? In my tests, decoding the XML was marginally slower than before (from 105 to 125 us per op)

@adriansr

This comment has been minimized.

Copy link
Member Author

commented Mar 1, 2019

jenkins, test this

@andrewkroh
Copy link
Member

left a comment

Can you add a test case passes that causes some invalid XML.

Otherwise it LGTM.

@adriansr adriansr merged commit a6102a8 into elastic:master Mar 1, 2019

4 of 5 checks passed

beats-ci Build finished.
Details
CLA All commits in pull request signed
Details
Hound No violations found. Woof!
continuous-integration/jenkins/pr-merge This commit looks good
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

adriansr added a commit to adriansr/beats that referenced this pull request Mar 1, 2019

[Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML (
…elastic#11006)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)

@adriansr adriansr added v6.7.0 and removed needs_backport labels Mar 1, 2019

adriansr added a commit to adriansr/beats that referenced this pull request Mar 1, 2019

[Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML (
…elastic#11006)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)

adriansr added a commit to adriansr/beats that referenced this pull request Mar 1, 2019

[Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML (
…elastic#11006)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)

adriansr added a commit to adriansr/beats that referenced this pull request Mar 1, 2019

[Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML (
…elastic#11006)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)

adriansr added a commit to adriansr/beats that referenced this pull request Mar 1, 2019

[Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML (
…elastic#11006)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)

adriansr added a commit to adriansr/beats that referenced this pull request Mar 1, 2019

[Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML (
…elastic#11006)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)

@adriansr adriansr added the v6.6.2 label Mar 1, 2019

@adriansr adriansr added the v7.0.0 label Mar 1, 2019

adriansr added a commit to adriansr/beats that referenced this pull request Mar 1, 2019

[Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML (
…elastic#11006)

* [Winlogbeat] Escape control characters in XML

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)

@adriansr adriansr added the v7.2.0 label Mar 1, 2019

adriansr added a commit that referenced this pull request Mar 2, 2019

[Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML (
…#11006) (#11039)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)

adriansr added a commit that referenced this pull request Mar 2, 2019

[Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML (
…#11006) (#11040)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)

adriansr added a commit that referenced this pull request Mar 2, 2019

[Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML (
…#11006) (#11041)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)

adriansr added a commit that referenced this pull request Mar 2, 2019

[Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML (
…#11006) (#11038)

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)

andrewkroh added a commit to andrewkroh/beats that referenced this pull request Mar 4, 2019

[Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML (
…elastic#11006)

* [Winlogbeat] Escape control characters in XML

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)

andrewkroh added a commit that referenced this pull request Mar 4, 2019

[Winlogbeat] Prevent Winlogbeat from dropping events with invalid XML (
…#11006) (#11066)

* [Winlogbeat] Escape control characters in XML

Golang's xml parser is pretty strict about the presence of control
characters in the XML it is fed. This patch replaces those characters
with an unicode escape sequence: "\uNNNN".

(cherry picked from commit a6102a8)

adriansr added a commit to adriansr/beats that referenced this pull request Mar 21, 2019

Fix Winlogbeat escaping CRLF sequences
Previous fix (elastic#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes elastic#11328

adriansr added a commit that referenced this pull request Mar 21, 2019

Fix Winlogbeat escaping CRLF and TAB characters (#11357)
Previous fix (#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes #11328

adriansr added a commit to adriansr/beats that referenced this pull request Mar 21, 2019

Fix Winlogbeat escaping CRLF and TAB characters (elastic#11357)
Previous fix (elastic#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes elastic#11328

(cherry picked from commit 6865403)

adriansr added a commit that referenced this pull request Mar 21, 2019

Fix Winlogbeat escaping CRLF and TAB characters (#11357) (#11369)
Previous fix (#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes #11328

(cherry picked from commit 6865403)

adriansr added a commit to adriansr/beats that referenced this pull request Mar 21, 2019

Fix Winlogbeat escaping CRLF and TAB characters (elastic#11357)
Previous fix (elastic#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes elastic#11328

(cherry picked from commit 6865403)

adriansr added a commit to adriansr/beats that referenced this pull request Mar 21, 2019

Fix Winlogbeat escaping CRLF and TAB characters (elastic#11357)
Previous fix (elastic#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes elastic#11328

(cherry picked from commit 6865403)

adriansr added a commit to adriansr/beats that referenced this pull request Mar 21, 2019

Fix Winlogbeat escaping CRLF and TAB characters (elastic#11357)
Previous fix (elastic#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes elastic#11328

(cherry picked from commit 6865403)

adriansr added a commit to adriansr/beats that referenced this pull request Mar 21, 2019

Fix Winlogbeat escaping CRLF and TAB characters (elastic#11357)
Previous fix (elastic#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes elastic#11328

(cherry picked from commit 6865403)

adriansr added a commit that referenced this pull request Mar 21, 2019

Fix Winlogbeat escaping CRLF and TAB characters (#11357) (#11373)
Previous fix (#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes #11328

(cherry picked from commit 6865403)

adriansr added a commit that referenced this pull request Mar 22, 2019

Cherry-pick #11357 to 5.6: Fix Winlogbeat escaping CRLF sequences (#1…
…1372)

Previous fix (#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes #11328

(cherry picked from commit 6865403)

adriansr added a commit that referenced this pull request Mar 28, 2019

Cherry-pick #11357 to 6.6: Fix Winlogbeat escaping CRLF sequences (#1…
…1370)

Previous fix (#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes #11328

(cherry picked from commit 6865403)

DebashisMondal added a commit to DebashisMondal/beats that referenced this pull request May 10, 2019

Fix Winlogbeat escaping CRLF and TAB characters (elastic#11357)
Previous fix (elastic#11006) made Winlogbeat escape CRLF control characters
which are expected in Windows event logs.

Fixes elastic#11328
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.