New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scala Common Enrich: bump scala-uri to 0.5.0 #2893

Closed
christoph-buente opened this Issue Sep 23, 2016 · 18 comments

Comments

Projects
None yet
6 participants
@christoph-buente
Copy link

christoph-buente commented Sep 23, 2016

We see decent amount of requests ending up in the bad bucket, because of page_url contains more than one # character. Like this one:

Provided URI string [http://www.example.com/path/index.ssf/2016/09/taco_charlton_jourdan_lewis_tr.html#incart_river_index#incart_m-rpt-2] could not be parsed by Netaporter: [Illegal character in fragment at index 104: http://www.example.com/path/index.ssf/2016/09/taco_charlton_jourdan_lewis_tr.html#incart_river_index#incart_m-rpt-2]

Essentially all browsers i tried could handle that URL, even though regarding to the spec it might not be valid. Is there a way to let those type of event through?

@alexanderdean alexanderdean changed the title Scala stream collector: Events with page_url field containing two # characters are classified bad Scala Common Enrich: events with page_url field containing two # characters are classified bad Sep 23, 2016

@christoph-buente

This comment has been minimized.

Copy link
Author

christoph-buente commented Oct 24, 2016

Are we the only ones seeing this behaviour? @alexanderdean?

@alexanderdean

This comment has been minimized.

Copy link
Member

alexanderdean commented Oct 24, 2016

The Netaporter library we use is the most permissive URI parser available for the JVM - it would be worth raising an issue there and seeing if they are up for tolerating URIs with 2 fragments attached? If not, we're really back looking at #351...

@christoph-buente

This comment has been minimized.

Copy link
Author

christoph-buente commented Oct 24, 2016

Thx @alexanderdean,

and you know what, there is an issue already, regarding double fragment separators:
NET-A-PORTER/scala-uri#114

I left a comment and asked for the changes.

@alexanderdean

This comment has been minimized.

Copy link
Member

alexanderdean commented Oct 24, 2016

Cool, sounds like a plan @christoph-buente !

@christoph-buente

This comment has been minimized.

Copy link
Author

christoph-buente commented Oct 24, 2016

@alexanderdean: I've never seen a faster fix!
NET-A-PORTER/scala-uri@22dd767

@alexanderdean

This comment has been minimized.

Copy link
Member

alexanderdean commented Oct 24, 2016

Wow! Thanks so much @theon.

@alexanderdean alexanderdean changed the title Scala Common Enrich: events with page_url field containing two # characters are classified bad Scala Common Enrich: bump Netaporter URI library to 0.4.16 Oct 24, 2016

@alexanderdean alexanderdean self-assigned this Oct 24, 2016

@alexanderdean alexanderdean added this to the R8x [HAD] 4 webhooks milestone Oct 24, 2016

@theon

This comment has been minimized.

Copy link

theon commented Oct 25, 2016

No problem!

@christoph-buente

This comment has been minimized.

Copy link
Author

christoph-buente commented Apr 24, 2017

@alexanderdean Has this dependency bump released ever since?

@alexanderdean

This comment has been minimized.

Copy link
Member

alexanderdean commented Apr 24, 2017

Not yet!

@matogertel

This comment has been minimized.

Copy link

matogertel commented Aug 2, 2017

Any update on this ?

@alexanderdean alexanderdean modified the milestones: R9x [SPK] Batch priority fixes, R9x [HAD] 4 webhooks Aug 2, 2017

@alexanderdean alexanderdean added data-quality and removed data-loss labels Aug 2, 2017

@alexanderdean

This comment has been minimized.

Copy link
Member

alexanderdean commented Aug 2, 2017

Given this issue causes data quality issues, we will prioritise...

@christoph-buente

This comment has been minimized.

Copy link
Author

christoph-buente commented Aug 3, 2017

Thanks @alexanderdean, would be wonderful to see this being released. We're still losing out on 200k events every day.

@DrGomi

This comment has been minimized.

Copy link

DrGomi commented Aug 5, 2017

Cheers, just wanted to tell you guys that christoph is not the only one waiting for this fix.
We are also having this issue in our ionic v1 based hybrid app.

@matogertel

This comment has been minimized.

Copy link

matogertel commented Aug 8, 2017

If you're using the JavaScript tracker, I've added some code to our tracker initialization script to fix the "invalid" urls. Essentially, it url-escapes all # characters after the first one.
This is the general idea:

var url = window.location.href;
var matches = url.match(/#/g);
if (matches && matches.length>1) {
    var fixed = url.split('#').slice(0,1).join('') + '#' + url.split('#').slice(1).join('%23');
    snowplow('setCustomUrl',fixed);
}

@alexanderdean alexanderdean modified the milestones: R92 Virunum (Stream refresh), R9x [BAT] Priority fixes & ZSTD support Aug 19, 2017

@alexanderdean

This comment has been minimized.

Copy link
Member

alexanderdean commented Aug 19, 2017

Adding to R92

@theon

This comment has been minimized.

Copy link

theon commented Aug 19, 2017

FYI: I have moved the scala-uri project over to https://github.com/lemonlabsuk/scala-uri and the latest version is 0.5.0 now.

@BenFradet BenFradet changed the title Scala Common Enrich: bump Netaporter URI library to 0.4.16 Scala Common Enrich: bump Netaporter URI library to 0.5.0 Aug 23, 2017

@BenFradet BenFradet changed the title Scala Common Enrich: bump Netaporter URI library to 0.5.0 Scala Common Enrich: bump scala-uri to 0.5.0 Aug 23, 2017

BenFradet added a commit that referenced this issue Aug 23, 2017

BenFradet added a commit that referenced this issue Aug 24, 2017

@BenFradet BenFradet closed this in 8f4187b Oct 3, 2017

@christoph-buente

This comment has been minimized.

Copy link
Author

christoph-buente commented Jan 31, 2018

@alexanderdean and @BenFradet from what collector version was this Scala Common Enrich version being used? Is it R92?

@alexanderdean

This comment has been minimized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment