Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace User Agent Parser libs #24

Merged
merged 1 commit into from
Sep 15, 2013

Conversation

wormyourhonor
Copy link

Replace User Agent Parser libs with UADetector libs.

UADetector relies on the same external resources, but can rely on cached User Agent Strings in the lib, if user-agent-strings.info is not available.

Also, easy to keep cached datastore up-to-date by updating POM to latest/greatest UADetector in Maven Central.

A regression though, didn't maintain support uaCache option, doesn't seem necessary anymore since UADetector should handle keeping it up-to-date.

This is a potential implementation of #9

…or.sourceforge.net/).

UADetector relies on the same external resources, but can rely on cached User Agent Strings in the lib, if `user-agent-strings.info` is not available.

Also, easy to keep cached datastore up-to-date by updating POM to latest/greatest UADetector in Maven Central.

A regression though, didn't maintain support `uaCache` option, doesn't seem necessary anymore sind UADetector should handle keeping it up-to-date.
@lightbody
Copy link
Owner

Nice - I will check out soon.

Also: is that a Pink Floyd reference I see? :)

On Sep 4, 2013, at 1:05 PM, wormyourhonor notifications@github.com wrote:

Replace User Agent Parser libs with UADetector libs.

UADetector relies on the same external resources, but can rely on cached User Agent Strings in the lib, if user-agent-strings.info is not available.

Also, easy to keep cached datastore up-to-date by updating POM to latest/greatest UADetector in Maven Central.

A regression though, didn't maintain support uaCache option, doesn't seem necessary anymore since UADetector should handle keeping it up-to-date.

This is a potential implementation of #9

You can merge this Pull Request by running

git pull https://github.com/wormyourhonor/browsermob-proxy ReplaceUserAgentParser
Or view, comment on, or merge it at:

#24

Commit Summary

Replace User Agent Parser libs with UADetector libs.
File Changes

M pom.xml (6)
D src/main/java/cz/mallat/uasparser/BrowserEntry.java (117)
D src/main/java/cz/mallat/uasparser/CachingOnlineUpdateUASparser.java (136)
D src/main/java/cz/mallat/uasparser/OnlineUpdateUASparser.java (76)
D src/main/java/cz/mallat/uasparser/OsEntry.java (94)
D src/main/java/cz/mallat/uasparser/RobotEntry.java (126)
D src/main/java/cz/mallat/uasparser/UASparser.java (330)
D src/main/java/cz/mallat/uasparser/UserAgentInfo.java (155)
D src/main/java/cz/mallat/uasparser/Utils.java (9)
D src/main/java/cz/mallat/uasparser/fileparser/Entry.java (36)
D src/main/java/cz/mallat/uasparser/fileparser/PHPFileParser.java (92)
D src/main/java/cz/mallat/uasparser/fileparser/Section.java (37)
M src/main/java/net/lightbody/bmp/proxy/guice/ConfigModule.java (15)
M src/main/java/net/lightbody/bmp/proxy/http/BrowserMobHttpClient.java (36)
Patch Links:

https://github.com/lightbody/browsermob-proxy/pull/24.patch
https://github.com/lightbody/browsermob-proxy/pull/24.diff

@rcrowell
Copy link

rcrowell commented Sep 4, 2013

This would be a great fix. Got bit by this bug yesterday!

@wormyourhonor
Copy link
Author

@lightbody - heheh, you've exposed me before my peers!

@rcrowell - yeah, Tuesday's outage was my motivation.

Couple of potential concerns...

Problem 1:
UADetector will attempt to retrieve an updated UAS list once the parser is instantiated as a background process. If, however, a parser is invoked before the update occurs, it will use the stored/library version from the resource module defined in the POM. If using BMP as a standalone service, then probably not an issue, only takes a few seconds from the time browsermob.bat is launched and an updated UAS.xml is downloaded.

I suppose, if users are using BMP as a library & instantiating BrowserMobHttpClient.java manually, this could lead to referencing stale UAS defs, the first time it's called. Subsequent calls should be fine, since UADetector would have been running for a bit and attempted to update the UAS.xml.

Problem 2:
UADetector's documentation mentions the UAS parsing can be expensive and suggest caching the parsed results. Note, the use case of caching stale UAS results (i.e from Problem 1) would need to be considered. To keep the implementation simple, I didn't include any result caching.

However, if BMP performance regressions are seen, this could be added.

lightbody added a commit that referenced this pull request Sep 15, 2013
@lightbody lightbody merged commit e5d53e5 into lightbody:master Sep 15, 2013
@lightbody
Copy link
Owner

@wormyourhonor Sadly, it looks like this pull request broke some stuff that I'm just noticing. Right now when I start up BMP from master the UA parser breaks down. There is definitely a very large XML file in temp, but when the code tries parsing it this exception happens. Any ideas?

WARN 10/27 21:10:00 n.s.u.i.d.XmlDataHa~ - Error while reading UAS data: Element type "devices" must be declared. (line: 21167)
org.xml.sax.SAXParseException; lineNumber: 21167; columnNumber: 12; Element type "devices" must be declared.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:198)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.error(ErrorHandlerWrapper.java:134)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:437)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:368)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:325)
    at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.handleStartElement(XMLDTDValidator.java:1906)
    at com.sun.org.apache.xerces.internal.impl.dtd.XMLDTDValidator.startElement(XMLDTDValidator.java:742)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanStartElement(XMLDocumentFragmentScannerImpl.java:1323)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2717)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:607)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:489)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:835)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:123)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1210)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:568)
    at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:302)
    at javax.xml.parsers.SAXParser.parse(SAXParser.java:195)
    at net.sf.uadetector.datareader.XmlDataReader$XmlParser.parse(XmlDataReader.java:61)
    at net.sf.uadetector.datareader.XmlDataReader.readXml(XmlDataReader.java:111)
    at net.sf.uadetector.datareader.XmlDataReader.read(XmlDataReader.java:148)
    at net.sf.uadetector.datastore.UpdateOperationWithCacheFileTask.readAndSave(UpdateOperationWithCacheFileTask.java:143)
    at net.sf.uadetector.datastore.UpdateOperationWithCacheFileTask.readDataIfNewerAvailable(UpdateOperationWithCacheFileTask.java:221)
    at net.sf.uadetector.datastore.UpdateOperationWithCacheFileTask.call(UpdateOperationWithCacheFileTask.java:211)
    at net.sf.uadetector.datastore.AbstractUpdateOperation$1.run(AbstractUpdateOperation.java:197)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:722)

@wormyourhonor
Copy link
Author

hmm...no idea, I'll take a look.

@wormyourhonor
Copy link
Author

ich habe un idea!

Per UADetector issue 44, schema for UA XML from user-agent.info changed recently.

So, the downloaded update, uas.xml, can't be parsed and UADetector falls back to the library version of the user agent strings.

@wormyourhonor
Copy link
Author

So the last comment on the aforementioned UADetector issue, states that the library will be updated in the next release to handle the new schema/format.

Based on their history of updates to maven central, I'd expect the version to be available by the end of the week (no later than Monday???). In the future though, schema changes/breakages like these would, hopefully, be rare.

I can create a new pull request once the update is available & tested.

Worth noting/reiterating, while the updated uas.xml wasn't used, UADetector continued to parse UAS's. It just relied on an older set of UAS definitions (i.e. from Aug 2013). I'm not familiar enough with UAS parsing to know how much of an impact this would have on real world test cases. Any thoughts/concerns?

@lightbody
Copy link
Owner

Yeah, the backup still works so I think we're fine. Thanks -- looking forward to the updated pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants