Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Googlebot is generating hits #45

Closed
rubenvanerk opened this issue Jun 11, 2020 · 6 comments · Fixed by #48
Closed

Googlebot is generating hits #45

rubenvanerk opened this issue Jun 11, 2020 · 6 comments · Fixed by #48

Comments

@rubenvanerk
Copy link
Contributor

image
Is there an option to filter out bots or is something wrong with my configuration?
Anyway, thanks for this project. Looking promising so far!

@milesmcc
Copy link
Owner

Googlebot hits are recoded, but they don’t affect your overall stats (sessions, pageviews, etc). Or at least they shouldn’t. If you’re looking for a more drastic approach that even excludes Googlebot from the session list, you could add Google’s crawler IP range to your Ignored IPs.

Hope this helps.

@rubenvanerk
Copy link
Contributor Author

rubenvanerk commented Jun 11, 2020

That's weird because Googlebot hits are definitely affecting my overall stats.

Device types section:
image
Totals:
image
I took a quick look and I expected the hits to be filtered out on the line below, but I can't find anything that filters out the device_type of ROBOT.

service=self, start_time__gt=start_time, start_time__lt=end_time

@milesmcc
Copy link
Owner

You're right--I'm mixing this up with another change I made. A few weeks ago I made a commit that removed crawlers from showing up as mobile devices (because sometimes Googlebot will try to request the mobile site, sometimes tricking the device type field), and got that mixed up with an overall exclude.

Eventually, I'd like to implement much more advanced filtering (i.e., by far more than just date), and bots will certainly be a filter option.

@rubenvanerk
Copy link
Contributor Author

rubenvanerk commented Jun 11, 2020

What you're describing is also happening in my case:
image
I have 44 real sessions in the past ~24 hours

@rubenvanerk
Copy link
Contributor Author

Google themselves discourage to ignore IPs

For now, I just regularly delete robot sessions through the admin.

@milesmcc
Copy link
Owner

Maybe the best thing to do is an an option to ignore all bots? Then they won’t ever be logged in the database, thus saving you space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants