Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scala Common Enrich: bump scala-uri to 0.5.0 #2893

Closed
christoph-buente opened this issue Sep 23, 2016 · 18 comments
Closed

Scala Common Enrich: bump scala-uri to 0.5.0 #2893

christoph-buente opened this issue Sep 23, 2016 · 18 comments
Assignees

Comments

@christoph-buente
Copy link
Contributor

We see decent amount of requests ending up in the bad bucket, because of page_url contains more than one # character. Like this one:

Provided URI string [http://www.example.com/path/index.ssf/2016/09/taco_charlton_jourdan_lewis_tr.html#incart_river_index#incart_m-rpt-2] could not be parsed by Netaporter: [Illegal character in fragment at index 104: http://www.example.com/path/index.ssf/2016/09/taco_charlton_jourdan_lewis_tr.html#incart_river_index#incart_m-rpt-2]

Essentially all browsers i tried could handle that URL, even though regarding to the spec it might not be valid. Is there a way to let those type of event through?

@alexanderdean alexanderdean changed the title Scala stream collector: Events with page_url field containing two # characters are classified bad Scala Common Enrich: events with page_url field containing two # characters are classified bad Sep 23, 2016
@christoph-buente
Copy link
Contributor Author

Are we the only ones seeing this behaviour? @alexanderdean?

@alexanderdean
Copy link
Member

The Netaporter library we use is the most permissive URI parser available for the JVM - it would be worth raising an issue there and seeing if they are up for tolerating URIs with 2 fragments attached? If not, we're really back looking at #351...

@christoph-buente
Copy link
Contributor Author

Thx @alexanderdean,

and you know what, there is an issue already, regarding double fragment separators:
NET-A-PORTER/scala-uri#114

I left a comment and asked for the changes.

@alexanderdean
Copy link
Member

Cool, sounds like a plan @christoph-buente !

@christoph-buente
Copy link
Contributor Author

@alexanderdean: I've never seen a faster fix!
NET-A-PORTER/scala-uri@22dd767

@alexanderdean
Copy link
Member

Wow! Thanks so much @theon.

@alexanderdean alexanderdean changed the title Scala Common Enrich: events with page_url field containing two # characters are classified bad Scala Common Enrich: bump Netaporter URI library to 0.4.16 Oct 24, 2016
@alexanderdean alexanderdean self-assigned this Oct 24, 2016
@alexanderdean alexanderdean added this to the R8x [HAD] 4 webhooks milestone Oct 24, 2016
@theon
Copy link

theon commented Oct 25, 2016

No problem!

@christoph-buente
Copy link
Contributor Author

@alexanderdean Has this dependency bump released ever since?

@alexanderdean
Copy link
Member

Not yet!

@matogertel
Copy link

Any update on this ?

@alexanderdean alexanderdean modified the milestones: R9x [SPK] Batch priority fixes, R9x [HAD] 4 webhooks Aug 2, 2017
@alexanderdean
Copy link
Member

Given this issue causes data quality issues, we will prioritise...

@christoph-buente
Copy link
Contributor Author

Thanks @alexanderdean, would be wonderful to see this being released. We're still losing out on 200k events every day.

@DrGomi
Copy link

DrGomi commented Aug 5, 2017

Cheers, just wanted to tell you guys that christoph is not the only one waiting for this fix.
We are also having this issue in our ionic v1 based hybrid app.

@matogertel
Copy link

If you're using the JavaScript tracker, I've added some code to our tracker initialization script to fix the "invalid" urls. Essentially, it url-escapes all # characters after the first one.
This is the general idea:

var url = window.location.href;
var matches = url.match(/#/g);
if (matches && matches.length>1) {
    var fixed = url.split('#').slice(0,1).join('') + '#' + url.split('#').slice(1).join('%23');
    snowplow('setCustomUrl',fixed);
}

@alexanderdean alexanderdean modified the milestones: R92 Virunum (Stream refresh), R9x [BAT] Priority fixes & ZSTD support Aug 19, 2017
@alexanderdean
Copy link
Member

Adding to R92

@theon
Copy link

theon commented Aug 19, 2017

FYI: I have moved the scala-uri project over to https://github.com/lemonlabsuk/scala-uri and the latest version is 0.5.0 now.

@BenFradet BenFradet changed the title Scala Common Enrich: bump Netaporter URI library to 0.4.16 Scala Common Enrich: bump Netaporter URI library to 0.5.0 Aug 23, 2017
@BenFradet BenFradet changed the title Scala Common Enrich: bump Netaporter URI library to 0.5.0 Scala Common Enrich: bump scala-uri to 0.5.0 Aug 23, 2017
@christoph-buente
Copy link
Contributor Author

@alexanderdean and @BenFradet from what collector version was this Scala Common Enrich version being used? Is it R92?

@alexanderdean
Copy link
Member

oguzhanunlu pushed a commit to snowplow/common-enrich that referenced this issue May 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants