Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad rows, regex fails to parse user-agent. #576

Closed
pkallos opened this issue Mar 20, 2014 · 9 comments
Closed

Bad rows, regex fails to parse user-agent. #576

pkallos opened this issue Mar 20, 2014 · 9 comments
Assignees

Comments

@pkallos
Copy link
Contributor

pkallos commented Mar 20, 2014

I am seeing about 5% of my records marked as bad because the user agent parsing fails. Example:

{
    "line": {
        "timestamp": 1395299873568,
        "collector": "ssc-0.1.0-kinesis",
        ...
        "headers": [..., "User-Agent: Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko", ...],
        "networkUserId": "7e98e0a0-e316-40ae-94f4-5a723264397c",
        ....
        "setNetworkUserId": true
    },
    "errors": ["Exception parsing useragent [Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko]: [No group 1]"]
}
@alexanderdean
Copy link
Member

Super-weird, scheduling... @yalisassoon have you ever seen this before?

@alexanderdean alexanderdean added this to the Version 0.9.2 milestone Mar 20, 2014
@alexanderdean alexanderdean self-assigned this Mar 20, 2014
@alexanderdean
Copy link
Member

Two things I can think:

  1. We bumped the version of our useragent library recently
  2. Maybe the Scala Stream Collector is having some kind of issue recording user agents

@yalisassoon
Copy link
Member

I can't ever remember seeing that error...
I have double checked and can't find it in any of our records for either
Snowplow or Psychic Bazaar. Does make me wonder if it's a Scala Stream
Collector specific issue (maybe to do with the encoding of the user agent
string?)

On Thu, Mar 20, 2014 at 9:17 AM, Alexander Dean notifications@github.comwrote:

Two things I can think:

We bumped the version of our useragent library recently
Maybe the Scala Stream Collector is having some kind of issue recording user agents

Reply to this email directly or view it on GitHubhttps://github.com//issues/576#issuecomment-38147190
.

Co-founder
Snowplow Analytics http://snowplowanalytics.com/
The Roma Building, 32-38 Scrutton Street, London EC2A 4RQ, United Kingdom
+44 (0)203 589 6116
+44 7841 954 117
@yalisassoon https://twitter.com/yalisassoonhttps://twitter.com/yalisassoon

@alexanderdean
Copy link
Member

Agree

@pkallos
Copy link
Contributor Author

pkallos commented Apr 18, 2014

OK so it turns out this was browser useragent utils failing to recognize IE 11's useragent.

I first took a stab at #62 but the deltas between ua_parser and the existing library are pretty significant.

ua_parser doesn't report as many fields so the clientattributes would have to be paired down to

   case class ClientAttributes(
       // Browser
-      browserName: String,
       browserFamily: String,
-      browserVersion: Option[String],
-      browserType: String,
-      browserRenderEngine: String,
+      browserVersion: String,
       // OS the browser is running on
-      osName: String,
       osFamily: String,
-      osManufacturer: String,
       // Hardware the OS is running on
       deviceType: String,
       deviceIsMobile: Boolean)

not sure if worth doing but in my case losing the convenience of "browserType" -> Computer, Tablet, Mobile, ... is kind of a non-starter.

Will open a PR that addresses this ticket.

@alexanderdean
Copy link
Member

Thanks Phil - we expected there would be a mismatch between the two UA parsers - bit of a shame though that ua-parser loses the convenient browserType field. Will add a comment to #62

@alexanderdean alexanderdean modified the milestones: Useragent phase 2, Version 0.9.6 Jun 2, 2014
@alexanderdean alexanderdean modified the milestones: Aalekh milestone 1, Snowplow Core 2015 refresh Jan 21, 2015
@fblundun
Copy link
Contributor

fblundun commented Mar 6, 2015

Can I close this? I think it's covered by #62 and #792.

@alexanderdean
Copy link
Member

Plus the underlying bug was fixed in user-agent utils 1.12, which @pkallos pulled in via #662. So yes let's close.

@alexanderdean
Copy link
Member

Cleared milestone too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants