Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Google Publisher Plugin bot crawler isn't excluded from visits #9567

Closed
jwleon opened this issue Jan 19, 2016 · 10 comments
Closed

Google Publisher Plugin bot crawler isn't excluded from visits #9567

jwleon opened this issue Jan 19, 2016 · 10 comments
Assignees
Labels
Milestone

Comments

@jwleon
Copy link

@jwleon jwleon commented Jan 19, 2016

For the first time after using Piwik for several projects, I am seeing heavy googlebot-activity for one of my pages. This project is the only one using AdSense.

However: Since I couldn't find a good way to exclude this "user" from my stats, I consider this as a kind of bug. Maybe you can add this to the bot-list for one of the next releases.

All page-links are called with the following URL parameter:

http://.../...&google_publisher_plugin_page_details=1

@tsteur

This comment has been minimized.

Copy link
Member

@tsteur tsteur commented Jan 19, 2016

@sgiehl can you have a look at this one?

@sgiehl

This comment has been minimized.

Copy link
Member

@sgiehl sgiehl commented Jan 19, 2016

@TheCodePianist Do you have access to your access logs? Would you mind having a look there, if all those requests are coming with the same useragent?

@jwleon

This comment has been minimized.

Copy link
Author

@jwleon jwleon commented Jan 19, 2016

Piwik recognizes all visits to be from Mountain View, CA. Browser is allways Chrome, device varies between Mac and Android. The IP address is different for each visit, but DNS lookup always follows this pattern (where x represents the IP):

crawl-xxx-xxx-xxx-xxx.googlebot.com

Hope this helps, if not let me know which information you need! :)

@sgiehl

This comment has been minimized.

Copy link
Member

@sgiehl sgiehl commented Jan 19, 2016

We can only exclude those visits using the IP or the useragent. As the first may vary it would be better to use the useragent. The useragent isn't displayed within Piwik. You can only get that information from your webservers access logs. Are you able to get those?

@RMastop

This comment has been minimized.

Copy link
Contributor

@RMastop RMastop commented Jan 19, 2016

Thanks, @TheCodePianist,
What @sgiehl is looking for is the user agent found in the access logfiles for these requests.
The way the bots and browsers are identified, is by checking for differences in the user agent string.

Do you have access to the access logs?
If so, could you share the user agent string from the corresponding IP addresses?

@jwleon

This comment has been minimized.

Copy link
Author

@jwleon jwleon commented Jan 19, 2016

Sure, but I am not quite sure I got what you need. I searched the access log for the IP-addresses and picked three examples for you to choose from (the IPs at the beginning of the entry match the ones Piwik shows as the visitor IP):

66.249.65.91 - - [19/Jan/2016:05:41:33 +0100] "GET /piwik.php?action_name=***g&idsite=6&rec=1&r=794831&h=20&m=41&s=19&url=http%3A%2F%2F***%2F%3Fcbp%3D1ri4tg27jjg68%26google_publisher_plugin_page_details%3D1&_id=&_idts=1453178480&_idvc=1&_idn=1&_refts=0&_viewts=1453178480&send_image=0&cookie=0&res=1024x768 HTTP/1.1" 204 - analytics.***.de "http://***/?cbp=1ri4tg27jjg68&google_publisher_plugin_page_details=1" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_3) AppleWebKit/537.36 (KHTML, like Gecko, Google-Publisher-Plugin) Chrome/27.0.1453 Safari/537.36" "-"

66.249.65.88 - - [19/Jan/2016:05:41:25 +0100] "GET /robots.txt HTTP/1.1" 200 24978 analytics.***.de "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-"

66.249.65.94 - - [18/Jan/2016:21:36:15 +0100] "GET /piwik.php?action_name=***&idsite=6&rec=1&r=710635&h=12&m=36&s=11&url=http%3A%2F%2F***%2F%3Fcbp%3D14zu8er8jr4xc%26google_publisher_plugin_page_details%3D1&_id=&_idts=1453149372&_idvc=1&_idn=1&_refts=0&_viewts=1453149372&send_image=0&cookie=0&res=640x1136 HTTP/1.1" 204 - analytics.***.de "http://***/?cbp=14zu8er8jr4xc&google_publisher_plugin_page_details=1" "Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/537.36 (KHTML, like Gecko; Google-Publisher-Plugin) Chrome/27.0.1453 Mobile Safari/537.36" "-"
@sgiehl

This comment has been minimized.

Copy link
Member

@sgiehl sgiehl commented Jan 19, 2016

I've created matomo-org/device-detector#5415 which will fix this issue.

@sgiehl

This comment has been minimized.

Copy link
Member

@sgiehl sgiehl commented Jan 19, 2016

@TheCodePianist are you using Wordpress and the Google Adsense plugin? Or the Google Publisher Toolbar in Chrome?

@sgiehl sgiehl added the Enhancement label Jan 19, 2016
@sgiehl sgiehl self-assigned this Jan 19, 2016
@sgiehl

This comment has been minimized.

Copy link
Member

@sgiehl sgiehl commented Jan 19, 2016

Will be fixed with the next version of piwik/device-detector

@sgiehl sgiehl closed this Jan 19, 2016
@mattab mattab changed the title Googlebot Crawler isn't excluded from visits Google Publisher Plugin bot crawler isn't excluded from visits Jan 20, 2016
@jwleon

This comment has been minimized.

Copy link
Author

@jwleon jwleon commented Jan 20, 2016

Thank you for the fast response and fix!

I am using the publisher-toolbar, but the visits tracked by Piwik are at times where my PC was turned off...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.