Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix content type detection with leading whitespace #32632

Conversation

jasontedor
Copy link
Member

Today content type detection on an input stream works by peeking up to twenty bytes into the stream. If the stream is headed by more whitespace than twenty bytes, we might fail to detect the content type. We should be ignoring this whitespace before attempting to detect the content type. This commit does that by ignoring all leading whitespace in an input stream before attempting to guess the content type.

Relates #32357

Today content type detection on an input stream works by peeking up to
twenty bytes into the stream. If the stream is headed by more whitespace
than twenty bytes, we might fail to detect the content type. We should
be ignoring this whitespace before attempting to detect the content
type. This commit does that by ignoring all leading whitespace in an
input stream before attempting to guess the content type.
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@andrershov
Copy link
Contributor

@jasontedor what is the reason to constantly double readLimit in si.mark() call? Why not just to call
si.mark(Integer.MAX_VALUE-8) and skip all whitespaces during the first pass? I think all sane implementations allocate reading buffer lazily, increasing its capacity when data does not fit.

break;
int iteration = 1;
while (true) {
si.mark(iteration * GUESS_HEADER_LENGTH);
Copy link
Contributor

@bleskes bleskes Aug 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where I was a bit uncomfortable with requiring the underlying stream to have an unbound support for mark. If I read things correctly we end up with a stream that just wraps a byte reference and we therefore don't care. If you're comfortable relying on that behavior, then I'm good but then I also think we don't need to allocate a growing size of firstBytes but rather read byte by byte until we find the first non-white space.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am comfortable with relying on that behavior. I pushed 86227ba.

…pe-detection-with-leading-whitespace

* elastic/master: (34 commits)
  Cross-cluster search: preserve cluster alias in shard failures (elastic#32608)
  Handle AlreadyClosedException when bumping primary term
  [TEST] Allow to run in FIPS JVM (elastic#32607)
  [Test] Add ckb to the list of unsupported languages (elastic#32611)
  SCRIPTING: Move Aggregation Scripts to their own context (elastic#32068)
  Painless: Use LocalMethod Map For Lookup at Runtime (elastic#32599)
  [TEST] Enhance failure message when bulk updates have failures
  [ML] Add ML result classes to protocol library (elastic#32587)
  Suppress LicensingDocumentationIT.testPutLicense in release builds (elastic#32613)
  [Rollup] Update wire version check after backport
  Suppress Wildfly test in FIPS JVMs (elastic#32543)
  [Rollup] Improve ID scheme for rollup documents (elastic#32558)
  ingest: doc: move Dot Expander Processor doc to correct position (elastic#31743)
  [ML] Add some ML config classes to protocol library (elastic#32502)
  [TEST]Split transport verification mode none tests (elastic#32488)
  Core: Move helper date formatters over to java time (elastic#32504)
  [Rollup] Remove builders from DateHistogramGroupConfig (elastic#32555)
  [TEST} unmutes SearchAsyncActionTests and adds debugging info
  [ML] Add Detector config classes to protocol library (elastic#32495)
  [Rollup] Remove builders from MetricConfig (elastic#32536)
  ...
@jasontedor
Copy link
Member Author

@andrershov It was to deal with possibility of insane implementations but it appears that we should always have a sane implementation here.

@jasontedor jasontedor requested a review from bleskes August 6, 2018 12:47
Copy link
Contributor

@bleskes bleskes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jasontedor jasontedor merged commit 3fb0923 into elastic:master Aug 6, 2018
jasontedor added a commit that referenced this pull request Aug 6, 2018
Today content type detection on an input stream works by peeking up to
twenty bytes into the stream. If the stream is headed by more whitespace
than twenty bytes, we might fail to detect the content type. We should
be ignoring this whitespace before attempting to detect the content
type. This commit does that by ignoring all leading whitespace in an
input stream before attempting to guess the content type.
jasontedor added a commit that referenced this pull request Aug 6, 2018
Today content type detection on an input stream works by peeking up to
twenty bytes into the stream. If the stream is headed by more whitespace
than twenty bytes, we might fail to detect the content type. We should
be ignoring this whitespace before attempting to detect the content
type. This commit does that by ignoring all leading whitespace in an
input stream before attempting to guess the content type.
jasontedor added a commit that referenced this pull request Aug 6, 2018
Today content type detection on an input stream works by peeking up to
twenty bytes into the stream. If the stream is headed by more whitespace
than twenty bytes, we might fail to detect the content type. We should
be ignoring this whitespace before attempting to detect the content
type. This commit does that by ignoring all leading whitespace in an
input stream before attempting to guess the content type.
@jasontedor jasontedor deleted the improve-content-type-detection-with-leading-whitespace branch August 6, 2018 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants