New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some problems to understand your definition of bad #143

Closed
varocarbas opened this Issue Mar 13, 2018 · 20 comments

Comments

Projects
None yet
6 participants
@varocarbas

varocarbas commented Mar 13, 2018

You have included in your bad-bot list the ones (current/old versions) which are building my web domain ranking: RankingBot and RankingBot2.

They are clearly identified, pursue a neutral (even positive) goal, store minimal information, fully respect the robots.txt indications unless under extreme circumstances, etc. All this is clearly explained in their own page.

Can you please explain me how have you come to the conclusion that my clearly-identified, inoffensive and respectful bots are bad? Have you seen any kind of malfeasance about which I am not aware (bear in mind that I am their sole author and operator)? Or this is just some kind of because-I-say-it-so list?

@mitchellkrogza

This comment has been minimized.

Owner

mitchellkrogza commented Mar 21, 2018

@varocarbas Thanks for your question, they are defined as follows.

This system was built to protect my own web sites from:

  • bad bots first
  • secondly from referrer spam
  • thirdly from a number of SEO companies selling services to people and aggressively scanning other people's web sites to help competitors gain insights on your SEO
  • lastly from backlink scanners and ranking systems which provide similar insights

This system was and is however designed in such a way that individual users who want to allow those User-Agents access can simply bypass them using the include files.

@mitchellkrogza mitchellkrogza self-assigned this Mar 21, 2018

@varocarbas

This comment has been minimized.

varocarbas commented Mar 21, 2018

@mitchellkrogza
Thanks for taking the time to finally answer me (8 days afterwards! It might have been worse).

First thing I want to highlight is that this is your list, your designations ("bad bots") and your decisions. Everything completely up to you and to the people who relies/trusts/appraises on it. You are free to do whatever you wish. I will not insist or try to convince you about anything. This whole issue is about clearly describing why I think that my bots (or similar) in any list on these lines makes little sense and is quite unfair, even arbitrary.

I understand that you consider my bots part of your fourth point ( lastly from backlink scanners and ranking systems which provide similar insights) and, as such, I see various issues:

  • Most of web bots focus on text/links. For example, all the ones of search engines, directories, indexers, etc. I didn't see any mention to bots from Google, Bing, Yandex, etc. which are extremely much more likely to have a higher bothering impact (bigger companies with much more resources analysing many sites over and over). I personally get a relevant number of weekly visits from this kind of bots, but not from mine (a couple of times per year).
  • The other problem with censoring this type of bots is that your site stops being considered by whatever rankings/search engines, what is usually against the intend of most of site owners. It does make sense to avoid certain pages to be analysed, but rarely the whole site. One of the points of being in internet is precisely being searchable/listed. For example, all the sites restricting the access to my bots are provoking them to ignore their information, marginally reducing the overall quality of the ranking and seriously affecting the impact on it from that specific site (= their problem more than mine).
  • You seem to assume that my bots (or I) have anything to do with ranking/SEO when this is extremely far away from the reality. I built this whole ranking (the bots, the underlying system, backups, automated updates, etc.) to showcase my programming skills. More specifically, my efficiency-focused side as far as this whole system is working under very limited hardware resources (e.g., all the bots are running from a pretty crappy old computer); also my custom-algorithm building (developed all this completely from scratch) and scalability (bots running for over 1 year already and finding thousands of new domains on a daily basis) skills. If you go to any page (out of the virtually infinite number of them) of the associated ranking, you would see that there is no links. The whole point is showing positions by applying as generically objective rules as possible. I don't earn anything from this, other than over-work, giving a slightly bad impression if the results don't look good enough and, eventually, having to deal with generic arbitrariness negatively affecting me like what your list represents.

What bothers me the most is the "bad" label which you seem to issue quite arbitrarily and easily. Your first category are "bad bots", which I understand that are malware or similar. Your second category is spam that also sounds quite bad. But from that point downwards, things start getting a bit weird. I am not precisely a defender of marketing-whatever, but nothing of this seems to belong to the same category than the first two ones. At least, not when being as less invasive as possible, the case of my bots which only store minimal, virtually anonymous, information. Nothing of this seems fair, logical or even sensible. Much less by bearing in mind the small impact of my bots (visiting important sites around 20-50 times and smaller ones 1-10 times during a whole year?), what makes me wonder about the exact technique used to populate your list.

In summary, my problem isn't sites allowing/restricting my bots (it would mostly be bad for them), but these bots, the ranking and, ultimately, my (honest, open, not-trying-to-bother-anyone, etc.) efforts being unfairly tagged as what they aren't not. They might be unkown, small, persistent, a drop in the ocean, etc. but certainly not bad regardless of how you define that word.

@mitchellkrogza

This comment has been minimized.

Owner

mitchellkrogza commented Mar 21, 2018

@varocarbas no problem and point taken, thanks for your valuable input and I will certainly review these again. I was away for a week but normally I respond within a day or two though.

@varocarbas

This comment has been minimized.

varocarbas commented Mar 21, 2018

@mitchellkrogza OK, thanks. No problem.

@corbolais

This comment has been minimized.

corbolais commented Mar 21, 2018

FTR, I chose UBBB for what I want to achieve: exclusion of bots. Therefore, I explicitely welcome the inclusion of any bot in the list. I do not want bots on my sites. I think bots are an evil per se and render the net all the worse. It would be better w/o any bot whatsoever.

I think this position is clear.

@mitchellkrogza keep up the good work.

@varocarbas stop developing software, and your programming skills do not matter when it comes to my sites. Bad reasoning on top of it.

@varocarbas

This comment has been minimized.

varocarbas commented Mar 21, 2018

@corbolais FTR, I chose UBBB for what I want to achieve: exclusion of bots.

Extremely naive statement. None of the most active bots are in that list: the ones from sites like Google.

I think bots are an evil per se and render the net all the worse

So, you aren't using search engines I understand, because without bots there will be no search engines. Or any even slightly complex understanding of anything on the web (or what do you think? That there are people performing most of the actions manually?). Bots are pieces of software performing automated actions, they are one of the most essential parts of internet or any other scenario involving huge amounts of data. The fact that some people build pieces of software performing dishonest actions and call them 'bots' doesn't mean that they do the same than any other piece of software with that name. Are you aware about the importance of context/properly understanding whatever?

stop developing software, and your programming skills do not matter when it comes to my sites. Bad reasoning on top of it.

Why? Have I asked you to stop having a website? My programming skills aren't promoted by my bots visiting your site or not. In fact, my bots will not visit your site if you don't want (no need of this list, just include its name in your robots.txt or tell me your sites directly). BTW, my bots collect in each visit much lower amount of information than any person (or most of other bots): pretty much just counters of links to other sites, nothing else and nothing more. No text, HTML, anything else is stored. I am not even logging their activity. The closest to absolutely anonymous/careless interactions you can imagine, this is what you are so worried about: a piece of software looking for very specific bits, storing a very small amount of information and not caring about anything else. All that to build an objective domain ranking alternative for anyone interested, exclusively based on objective correctness. Do you prefer to blindly trust what for-profit corporations tell you? OK. But, please, don't insult me by insinuating that my modest, technically-focused, no-direct-profit, etc. actions are worse than what big sites do every day at a much larger scale.

You are very aggressively defending a position about which you don't seem to have too much knowledge.

@ScrewLooseDan

This comment has been minimized.

ScrewLooseDan commented Mar 21, 2018

I built this whole ranking to showcase my programming skills

This is why I'll continue to block your bot regardless if the author decides to exclude you from his list.

Best of luck.

@varocarbas

This comment has been minimized.

varocarbas commented Mar 21, 2018

@ScrewLooseDan Completely up to you, but you don't seem to have got my point. Building a comprehensive software system (formed by the bots, ranking algorithms, communication/backups subsystems, having everything up and running during over a year already, etc. all this under very restricted hardware conditions) is the promotion, working fine and as intended I mean.

Also I am not sure about why that motivation is worse than having built it for a direct profit? Wouldn't that second scenario be much more bad-things-prone? So, someone does something which you might eventually enjoy and his motivation is doing it properly (to showcase his skills, professionalism and even principles), but you would have preferred he doing it to get an immediate benefit? Kind of weird, but good luck to you too (I don't need it, BTW).

@thezoggy

This comment has been minimized.

thezoggy commented Mar 23, 2018

I personally view unwanted requests to my site as a resource drain. I have a limited bandwidth and low powered vps.. so I'd rather cater towards requests that are legit.

@varocarbas

This comment has been minimized.

varocarbas commented Mar 23, 2018

@thezoggy As said to others before, you are free to do whatever you wish. This isn't about convincing anyone to allow my bots, but to complain about them having been unfairly tagged as "bad".

In any case, do you mind to elaborate on your "being legit"? I have explained in detail the purpose of my bots and their low impact everywhere (actually, I found you including them in your list quite surprising! How could you know about something which has a so modest activity? The chances are very low!). Their main goal is being as objective and absolute-correctness-focused as possible. No nepotism. No paid improvements. They don't even tolerate artificial/dishonest inter-linking. Whatever domain ranks higher is because it deserves it (and/or I made a mistake). Their impact on any site is negligible, much less lately because I am speeding them up (the less time they spend in a site, the lower the load on my server and, theoretically, the more comprehensive are the conclusions; in this way, they find a lower number of new domains, but after having already found over 16M this isn't a concern anymore).

If one of them visits your site, it would go to around 20-100 different pages very quickly (just looking for links), will not store any information about you and will move to the next one. Any user spending 5 minutes in one of your pages would consume way much more resources than my bots. Any bot of a major site systematically collecting every single data bit from everywhere would have a way much higher impact than my bots. I don't see any of these bots in your list. Why aren't you applying the same rules to everyone (or even more drastically to the more aggressive alternatives) and ban the bots from Google and company? Or, at least, the ones from secondary, irrelevant indexing/search engines. I get lots of visits from Czech, Chinese, Russian search engines about which I don't care at all; some of them haven't even indexed my site! All of them with a much higher impact on my site that what my bots will have during a whole year (visiting me 1-3 times).

So, do you mind to elaborate a bit on your position or shall I feel offended because of you having implied that my bots aren't "legit"? What is quite ironic when you are the ones who have unilaterally (and, until this point, still unmotivatedly IMO) called me and my work "bad" without any warning (knew about your list by pure accident). I haven't still said what I think of your list "legit"-wise, should I start?

@mitchellkrogza

This comment has been minimized.

Owner

mitchellkrogza commented Mar 24, 2018

@varocarbas you are taking too much offence to the general and very commonly used term "bad bots". The word bad is not intended to defame you or anyone it is merely a term as old as the internet itself.

Undesirable may be a better word but almost every User-Agent blocking system has adopted the word bad. Perhaps even the words "trusted" or "untrusted" might be better terms.

I am going to allow your bot to crawl some of my sites for the next few days to help make a decision.

@varocarbas

This comment has been minimized.

varocarbas commented Mar 24, 2018

@mitchellkrogza
It is not exactly offended, but feeling that my work/effort is being unfairly mistagged. All the words you are using sound equally non-applicable to my bots IMHO. A word like "irrelevant" seems much more correct. Bear in mind that my whole activity is online/remote based and I put a major focus on issues like solid principles, objective correctness, etc. I don't care if you call me poor or too aggressive or even anti-social, but censoring the quality or honesty of my work is a different story.

Anyway, thanks for giving them a shot. But you might need more than just a few days for any of my bots to reach any of your sites. Just to have a rough idea about the amount of information involved and the limited resources: at the moment, it takes them around 3-4 months to go from the firstly-found domain to the last one (almost 17M now) even though they are currently in high speed mode. And this is just a very small part of all the available domains, you might even have some websites which they haven't found yet. This is perhaps the issue which puzzles me more about all this: how could a so modest activity have triggered any kind of alarm?!

I look forward to your impressions and final decision.

@mitchellkrogza

This comment has been minimized.

Owner

mitchellkrogza commented Mar 24, 2018

A search through 13 months of my server logs spanning 36 web sites reveals only 109 hits.

I will remove this from the main list of bots in next commit and continue to monitor the activities of this crawler.

grep -o 'RankingBot' /var/log/nginx/*-access.log* | wc -l 
109

and

grep -o 'RankingBot2' /var/log/nginx/*-access.log* | wc -l 
109

Users who wish to continue blocking RankingBot may continue to do so by adding

"~*\bRankingBot\b" 3;

into the blacklist-user-agents.conf include file.

@varocarbas

This comment has been minimized.

varocarbas commented Mar 24, 2018

@mitchellkrogza Thanks.

@corbolais

This comment has been minimized.

corbolais commented Mar 24, 2018

@mitchellkrogza Hey there, so all it just takes is a nag that keeps coming back until you comply?
Look at the sheer word count this guy spews out. That alone would warrant a deny imo. ubbb for me was about blocking unwanted bots including nagger's bots.
You ask about opinion, you got a couple o' "nay!" and then you go "okay! happy to comply!". Seriously?
Pathetic.

@mitchellkrogza

This comment has been minimized.

Owner

mitchellkrogza commented Mar 24, 2018

@corbolais I look at various aspects before considering removal. I am monitoring it and will be happy to re-include it again if needs be. If you feel it should remain here please motivate, for now I have not seen enough counts of it in my logs but that does not mean others like you may be seeing more inflated hit counts so, please can you contribute any findings.

@StarkRavingZA

This comment has been minimized.

Contributor

StarkRavingZA commented Mar 27, 2018

IMHO it should remain on the list, it falls into the same category as many other SEO companies and user agents listed here already.

@varocarbas

This comment has been minimized.

varocarbas commented Apr 30, 2018

@mitchellkrogza Sorry if I have misunderstood something, but your list still includes references to my bots.

A short while after my last comment, I did confirm that they were deleted from your list but now they are back?! Is this normal within your bot-removal process? Or shall I understand that you changed your mind? Apparently for no reason or is there anything you want to share with me? You know? His sole author, operator and responsible, the guy who is absolutely certain that nothing has changed during the last month.

@mitchellkrogza

This comment has been minimized.

Owner

mitchellkrogza commented Apr 30, 2018

@varocarbas users of the blocker have requested it's re-addition. The only way for users to allow the bot through is to add it in their own custom whitelists that the blocker provides.

Repository owner locked as resolved and limited conversation to collaborators Apr 30, 2018

@mitchellkrogza

This comment has been minimized.

Owner

mitchellkrogza commented Apr 30, 2018

varocarbas referenced this issue Apr 30, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.