-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add blacklisting based on hostname #972
Comments
Shouldn't we just use regular expressions instead of wildcards? It will be both more flexible, and easier to implement... And if we want to expose this in a user-friendly UI, converting wildcards to regular expressions is also trivial, while the reverse is impossible... |
I'm looking into this. @na-- I can see you want to ensure this doesn't impede valuable features. What would you both think about an expanded syntax?
This can be accepted through all option channels. They can all be translated to regex internally so there's a single match logic.
|
@bookmoons, yeah, this approach seems very good to me! The simple things are kept simple, but you can make complicated blocks as well. |
I noticed that this issue has been stagnant for a while. Mind if I try taking it on? |
@krashanoff, sure, go ahead! |
@na-- Getting on this right now, just a quick question: should we enforce valid hostname patterns, or should that be left to the user? Also, should blacklisted hosts throw the same error code as blacklisted IPs, or use a new one? |
@krashanoff, I don't think you can enforce valid hostname patterns, if you go with the approach suggested in #972 (comment). I'd be tricky when you have That said, I just realized that implementing this with regexes will probably incur some potentially serious performance issues for large lists, similar to the issues of the current naive implementation of IP blocking (#1256). If we allow regexes, I can't think of an easy implementation approach that is not just iterating over the list of blacklisted hostnames and checking against each... At that point we might as well extend this and call it "blacklisted URLs" and allow blocking by any part of the URL... Whereas if we allow only plain hostnames and leaf And we might also want to use a different option name than In short, sorry for the delay @krashanoff, but I think there's some further evaluation of this issue needed, before you can start implementing it. cc @robingustafsson @mstoykov @imiric, what are your thoughts about ⬆️ ? |
Agree with most of what you said @na--.
|
I like the idea of the trie and reversing it ( I remember seeing this done for exactly this, somewhere).
|
You've convinced me about the option name, so 👍 for In regards to the wildcard, in the initial version of this, I think we should only allow wildcards in the beginning of the value, i.e.
This seems relatively easy to implement if you treat domains as reversed strings, while also fairly powerful and devoid of any surprise behavior to users, I think... |
Agree with what's been said. We definitely need wildcards because the block lists would otherwise potentially be even larger than they'll already be (eg. k6 Cloud). Only allowing the wildcard in the beginning is fine. |
Just throwing something out there... In the majority of my tests, i usually have the opposite need - to only allow specific hosts (ie; my own domain) and deny all the others (which are invariably third parties who i don't want to annoy). |
Thank you all for taking the time to write back! This is my first time contributing to open source so it's been a good experience. Just summing up what I gleaned from the above conversation, it seems like this is the consensus on implementation:
The only thing that is really left up to debate at this point is whether regexes should be permitted. I personally think it would be nice to have, but as @na-- mentioned, there's big issues with performance and implementation. I was tooling around with using Go's Pertaining to @hynd's idea of |
@krashanoff, @hynd, sorry for the late response... 😞 Regarding The biggest use case I can think for this is, when you've recorded a browser session, to exclude requests to any external services like analytics, ads, script CDNs, etc. So if you record an Whereas, to me, the biggest use case for an There are other benefits and non-k6-cloud uses of the @krashanoff, regarding your other points, 👍 for the option name, new error code, simple validation (though I'm not sure how these should be handled), wildcards only in the beginning, trie-based implementation. 👎 for regex support, for the initial version of this feature at least... We can always change that decision in the future, and #972 (comment) illustrates how we can add regex support in a backwards compatible way, if we decide to do so... |
Ah fair point, |
No worries about the late reply, thank you for taking the time. I think a preliminary implementation will just validate wildcard placement. Then, to account for internationalized domain names, we can use a combined character class of Thank you for poring over the details and the detailed responses. Will get to work on this very soon 👍 |
Similar to how k6 supports blacklisting of IP ranges it should allow blacklisting of hostnames, including wildcards.
I propose we add this functionality as follows:
error_code
anderror
tags should be emitted and a JSBlacklistedHotname
error will be thrown, that unless caught will exit the current VU iteration.--blacklist-hostname HOSTNAME
that can be specified multiple times, and whereHOSTNAME
can contain*
wildcards.K6_BLACKLIST_HOSTNAMES
with a comma separated list of hostnames to block, again wildcards allowed.blacklistHostnames
:Pointers for implementation:
The text was updated successfully, but these errors were encountered: