Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Wildcard to Hosts #2747

Merged
merged 5 commits into from Dec 6, 2022
Merged

Add Wildcard to Hosts #2747

merged 5 commits into from Dec 6, 2022

Conversation

eugercek
Copy link
Contributor

@eugercek eugercek commented Oct 25, 2022

Implements #2724

The current implementation works and passes all the tests. But I'm not sure about some of the parts, we can discuss them.

  • Should we check Hosts keys and values before execution? Currently, it fails at runtime.
  • matchTrieNode.match is a mess 😞 I didn't want to merge ifs because IMO this is cleaner.
  • I couldn't use HostAddress because of circular dependency, used net.TCPAddr if we move HostAddress to types, we can use that instead of net.TCPAddr
  • I couldn't find a better way than using a Trie. Maybe regex checking on all map elements concurrently can be faster, have no clue though 🤔
  • Since we don't use Go 1.18 couldn't use generics to abstract trieNode to use it as matchTrieNode

@codecov-commenter
Copy link

codecov-commenter commented Oct 26, 2022

Codecov Report

Merging #2747 (4494b47) into master (7f823f0) will increase coverage by 0.57%.
The diff coverage is 87.44%.

@@            Coverage Diff             @@
##           master    #2747      +/-   ##
==========================================
+ Coverage   75.59%   76.16%   +0.57%     
==========================================
  Files         202      210       +8     
  Lines       15992    16496     +504     
==========================================
+ Hits        12089    12565     +476     
- Misses       3151     3157       +6     
- Partials      752      774      +22     
Flag Coverage Δ
ubuntu 76.16% <87.44%> (+0.77%) ⬆️
windows ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
api/common/context.go 100.00% <ø> (ø)
api/server.go 80.95% <ø> (+9.52%) ⬆️
api/v1/client/client.go 0.00% <ø> (ø)
api/v1/client/metrics.go 0.00% <ø> (ø)
api/v1/client/status.go 0.00% <ø> (ø)
api/v1/errors.go 76.92% <ø> (ø)
api/v1/group.go 54.05% <ø> (ø)
api/v1/group_jsonapi.go 95.74% <ø> (ø)
api/v1/group_routes.go 60.86% <ø> (ø)
api/v1/metric.go 54.54% <ø> (-13.64%) ⬇️
... and 262 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@mstoykov mstoykov added this to the v0.42.0 milestone Oct 26, 2022
@codebien
Copy link
Collaborator

Hi @eugercek,
thanks for the effort in doing this 👏 🙇

Should we check Hosts keys and values before execution? Currently, it fails at runtime.

Which check would you suggest? Can you point to me an example, please?

I couldn't use HostAddress because of circular dependency, used net.TCPAddr if we move HostAddress to types, we can use that instead of net.TCPAddr

I think it would be a good idea.

I couldn't find a better way than using a Trie. Maybe regex checking on all map elements concurrently can be faster, have no clue though thinking

As requested in the issue, can you include a benchmark comparison between the old and the current version, please?

Since we don't use Go 1.18 couldn't use generics to abstract trieNode to use it as matchTrieNode

We are building k6 binaries using Go 1.19 so generics are already supported. Anyway, generics aren't the unique way of doing abstraction in Go. Which part of matchTrieNode is missing in trieNode for re-using it or that is not possible to integrate?

Generally speaking, we should share the code between BlockedHostnames and Hosts as much as we can.

@eugercek
Copy link
Contributor Author

thanks for the effort in doing this 👏🏻 🙇🏻

My pleasure 😊

Which check would you suggest? Can you point to me an example, please?

We can check if the key is a valid hostname or hostname:port and if the value is valid IP or IP:port. We do this check in HostnameTrie

var validHostnamePattern *regexp.Regexp = regexp.MustCompile(`^(\*\.?)?((([a-zA-Z0-9]|[a-zA-Z0-9][a-zA-Z0-9\-]*[a-zA-Z0-9])\.)*([A-Za-z0-9]|[A-Za-z0-9][A-Za-z0-9\-]*[A-Za-z0-9]))?$`)
func isValidHostnamePattern(s string) error {
if len(validHostnamePattern.FindString(s)) != len(s) {
return fmt.Errorf("invalid hostname pattern '%s'", s)
}
return nil
}
// insert a hostname pattern into the given HostnameTrie. Returns an error
// if hostname pattern is invalid.
func (t *HostnameTrie) insert(s string) error {
s = strings.ToLower(s)
if err := isValidHostnamePattern(s); err != nil {
return err
}

I think it would be a good idea.

Moving it.

As requested in the issue, can you include a benchmark comparison between the old and the current version, please?

Sorry forgot that. Adding it.

Which part of matchTrieNode is missing in trieNode for re-using it or that is not possible to integrate?

I noticed that I could use use source as a hash map to look up key values, I was comparing iterative and recursive trie implementation and totally forget the original idea,😞. I want to provide iterative vs recursive benchmarks too if it's OK.
(The old problem)
In Hosts we need to store the value (IP or IP:port) for the matching trie node. I choose to put the value inside the trie node. I thought the best way is to create a generic value type [Value T] and use bool in HostnameTrie and using string in AddressTrie. So the problem was isLeaf bool should be value string.

@eugercek
Copy link
Contributor Author

eugercek commented Oct 27, 2022

Hi @codebien, it's ready for your review 🙂

Trie Implementation

I used trieNode and iterative implementation. Iterative's contains() can't be elegant. The recursive one is much simpler. Though insert() is simpler in iterative IMO. Also, I moved the trie implementation to a different file and added unit tests.

Benchmark

I couldn't run benchmarks on the master because of a bug, which I fixed in b97e14e, that's the only difference between the real master and the master I used.

Also, maybe we can add a benchmark test to GitHub actions just to make sure it's correct atm. A simple go test ./... -run=^$ -bench=. -benchtime=1x takes 1 minute on my machine.

Benchmark command

git checkout master && go test -v ./... -run=^$ -bench=. -benchmem  > old.txt
git checkout wildcard-host && go test -v ./... -run=^$ -bench=. -benchmem  > new.txt 
benchstat old.txt new.txt > diff.txt

Benchmark values diff.txt

EDIT: Fixed some minor lint errors.

@codebien
Copy link
Collaborator

Hi @eugercek,
I fixed the benchmark error in #2750. Can you rebase so we can avoid that change here, please?

Regarding the benchmark, I would suggest to not running the entire suite but adding a filter -bench=BenchmarkTheUsefulBench for just running the required bench for this PR.

@codebien
Copy link
Collaborator

codebien commented Oct 31, 2022

It would be also helpful if you could rebase and squash the current commit history and move to something with dedicated commits for:

  • Movement for the current files without changes to the new package
  • Add the new required changes
  • Migration for all the occurrences
  • The new additional logic for Hosts

In this way, reviewing the current code would be easier.

But before doing something like this wait for @mstoykov's review because maybe he would ask for a different general idea.

@mstoykov
Copy link
Collaborator

mstoykov commented Nov 2, 2022

Yeah - it will be nicer if at least the movement of the HostAddress is done separately. I personally even think you can open a PR for that on it's own and base this one on top of that.

On first pass I have a bunch of small questions here and there, but it seems most of them are related to the trie implementation that is in practice the same as before 🤔

Since we don't use Go 1.18 couldn't use generics

We definitely can bump the go version - we do already only build with 1.18+. But I am not certain that it will make that much of a difference.

Can you also elaborate on the envconfig issue you are hitting? THere are a couple of comments in the code about it

@eugercek
Copy link
Contributor Author

eugercek commented Nov 2, 2022

Hi @mstoykov

Trie Implementation

Changes:

  • Added unit tests, I think these can be a great resource to learn the implementation.
  • Removed trieNode.insert()'s error returning.

I didn't change any logic intrieNode implementation only simplified a function, trieNode.insert() never returns an error I changed that. The current implementation is recursive. I implemented an iterative version but removed that later because I don't think match() is readable in the iterative version but insert() is much simpler in the iterative IMO.

Generic

Sorry for the generic noise, at first I thought I need to use generics but it's not a must, the current implementation has no problem.

envconfig

Even though there's no environment variable for the hosts option it creates the NullAddressTrie.Trie AFAIU it should not create the NulAdressTrie.Trie. That's why I used a "constructor" NewAddressTrie() with Valid field set to true like in NullHostnameTrie.

I'll do the following if I understood correctly:

  • Create another PR for the changes related to the previous code (added unit test, moving trie implementation to a different file, removing the error message). But I'm waiting for your comment on the recursive vs iterative implementation topic.
  • Rebase this PR to a more readable set of commits.

Also, I couldn't get which benchmarks I should run, or if should I create new ones. Can you elaborate on this?

@codebien
Copy link
Collaborator

codebien commented Nov 11, 2022

Hi @eugercek,
thanks again for your contribution. 🙇

  • Create another PR for the changes related to the previous code (added unit test, moving trie implementation to a different file, removing the error message).

This would be great, yes.

But I'm waiting for your comment on the recursive vs iterative implementation topic.

If you change the current implementation, I suggest two things:

  • Create benchmarks for insert and contains methods.
  • Apply the changes only after and in a separate commit so we can clearly compare the two versions.

So, ideally, I expect 3 commits:

  • only the movements
  • benchmarks addition
  • changes from recursive to iterative

Also, I couldn't get which benchmarks I should run, or if should I create new ones. Can you elaborate on this?

The original benchmark request is for creating a comparison between the current way (Hosts map[string]*HostAddress) with the new one based on types.NullAddressTrie. In this way, we can identify if we are introducing relevant performance degradation.

@eugercek
Copy link
Contributor Author

Hi @codebien

I listed what I'll do, I'll update this message when I progress

  • I'll create the PR and remove its feature from this PR
    • Since you're OK with the review (with benchmark) I wanna think a little more about the iterative trie implementation on the new PR
    • Will add benchmark for the contains and insert
  • Adding the latter benchmark on this PR and squashing the commits the form you wanted

To unit test future commits we need to move `HostAddress`
to prevent circular dependency
@eugercek
Copy link
Contributor Author

I'm rebasing this on master and squashing commits to a readable way.

Also added Hosts benchmark.

@eugercek
Copy link
Contributor Author

Benchmark result is sad 😔

name           old time/op    new time/op    delta
DialerHosts-8     211ns ± 1%     213ns ± 1%  +0.71%  (p=0.000 n=30+30)

name           old alloc/op   new alloc/op   delta
DialerHosts-8      160B ± 0%      160B ± 0%    ~     (all equal)

name           old allocs/op  new allocs/op  delta
DialerHosts-8      5.00 ± 0%      5.00 ± 0%    ~     (all equal)

Also you can see #2747 (comment) for all benchmarks

Benchmark script

bench(){
  echo "started: $2 benchmark"
  git checkout "$1"  2> /dev/null
  go test -run=^$ -bench=BenchmarkDialerHosts -benchmem -count 30 > "$2"
}

cd lib/netext

bench 51a4d001 "original"
bench cd82d0d9 "addresstrie"

benchstat original addresstrie > "bs_addresstrie"

@codebien codebien requested review from imiric and removed request for oleiade November 29, 2022 08:09
@imiric imiric requested review from oleiade and removed request for imiric November 29, 2022 16:21
@sniku sniku removed the request for review from codebien November 30, 2022 14:27
@sniku sniku requested a review from imiric November 30, 2022 14:27
Copy link
Member

@oleiade oleiade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work 👏🏻 👏🏻 👏🏻 Really well done!

I've added a bunch of comments. Although I wouldn't consider any of them blocking, I still set them in request changes to ensure they're considered 🙇🏻

lib/types/types.go Outdated Show resolved Hide resolved
lib/types/types.go Outdated Show resolved Hide resolved
lib/types/types.go Outdated Show resolved Hide resolved
lib/types/types.go Outdated Show resolved Hide resolved
lib/types/types.go Outdated Show resolved Hide resolved
lib/types/addresstrie.go Outdated Show resolved Hide resolved
lib/types/addresstrie.go Outdated Show resolved Hide resolved
lib/types/addresstrie_test.go Outdated Show resolved Hide resolved
lib/types/addresstrie_test.go Outdated Show resolved Hide resolved
func TestAddressTrie(t *testing.T) {
t.Parallel()

at := NewAddressTrie(map[string]HostAddress{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick uninformed question, do we want to support compound wildcards *.hello.*. Does it even make sense? I'm not sure. If this is nonsensical or has not added value, please disregard this 👍🏻

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's no practical purpose for it, given DNS is hierarchichal. There would be no reason for someone testing one.hello.com to use a pattern of *.hello.* to override resolution of both all subdomains of hello.com and all possible TLDs matching *.hello.*. It would complicate the implementation, maybe not by a lot, but still, I don't think this would be needed or worth it.

Copy link
Contributor

@imiric imiric Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, one use case I think a second asterisk as suffix would be convenient is for ports. You can already specify a port along with the hostname, e.g. 'test.k6.io:443': '1.2.3.4:8443'. So maybe someone would find it useful to override only specific ports, like '*.k6.io:24*': '1.2.3.4:8443'. But I think this would be very niche and complicate the implementation, so let's invoke YAGNI for the second asterisk. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC in #1532, only one wildcard is chosen. that's why I implemented it like that. YAGNI is my best friend 😆

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, thanks a lot for the pointers folks. If everyone's okay with it, let's disregard my question indeed, and reevaluate it a later point if we do end up needing it then 👍🏻 🙇🏻

@oleiade
Copy link
Member

oleiade commented Dec 1, 2022

A question from my side, as I just joined the review. I'm a bit unclear as to what we're benchmarking and what is the underlying objective?

@eugercek
Copy link
Contributor Author

eugercek commented Dec 1, 2022

Hi @oleiade thanks for your detailed review and compliment 🙏🏻

On the benchmark @codebien requested it. I don't have any idea how the information is used 😅 But we're benchmarking previous Hosts implementation and current one. Previous one was O(1) hashmap lookup, current was is O(n) for n length hostname.

@oleiade
Copy link
Member

oleiade commented Dec 1, 2022

Hey @eugercek

@codebien brought me up to speed. From my perspective, and I think @codebien agrees, judging by the numbers the benchmark exhibits, we're good to go with the new trie API that your PR introduces. We judged a 1% performance difference acceptable if we're trading it for convenience and maintainability 👍🏻

Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work again @eugercek! 👏 Everything is very comprehensible. 👍

I just left some minor comments, but they're not really blockers.

Apologies for the delay and back and forth with reviewing this. We definitely would like to make it part of the upcoming v0.42.0 release, if you have time to wrap up the minor changes we raised here by early next week. 🙏

lib/types/addresstrie.go Outdated Show resolved Hide resolved
lib/types/addresstrie.go Outdated Show resolved Hide resolved
@eugercek
Copy link
Contributor Author

eugercek commented Dec 3, 2022

I'll push your requested changes today. Thanks for your reviews. 🙏🏻

@eugercek
Copy link
Contributor Author

eugercek commented Dec 4, 2022

I added all resolved and unresolved changes, waiting for your review 🙇🏻

One criticism from me, new names Hosts (previosuly HostAddressTrie) and Host (HostAdress) can be confusing.

@oleiade
Copy link
Member

oleiade commented Dec 5, 2022

@eugercek Thanks for your work refactoring this 👍🏻 Regarding the naming, while we acknowledge that it might turn out confusing, I believe @imiric and I will turn out rather aligned on this and would prefer to proceed with the Hosts and Host naming for the reason we mentioned previously 🙇🏻

imiric
imiric previously approved these changes Dec 5, 2022
Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the recent changes and your work on this PR @eugercek 🙇 Outstanding work! 👏

Just noticed some tiny nitpicks, but no real blockers from me.

lib/types/hosts.go Outdated Show resolved Hide resolved
lib/types/hosts.go Outdated Show resolved Hide resolved
lib/types/hosts_test.go Outdated Show resolved Hide resolved
lib/types/addresstrie.go Outdated Show resolved Hide resolved
@eugercek
Copy link
Contributor Author

eugercek commented Dec 5, 2022

I added the requested changes, sorry for them.

Thanks for all of your efforts on this PR 🙏🏻 , I learned a lot on the journey 😄

Hope to see you soon 👋🏻

Copy link
Member

@oleiade oleiade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your work great work, much appreciated 🙇🏻

Copy link
Contributor

@imiric imiric left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again! 🙇 :shipit:

@imiric imiric merged commit a403c90 into grafana:master Dec 6, 2022
imiric pushed a commit to grafana/k6-docs that referenced this pull request Dec 22, 2022
This documents support for using a wildcard with the `hosts` option,
which was recently added in v0.42.0. See grafana/k6#2747.
MattDodsonEnglish pushed a commit to grafana/k6-docs that referenced this pull request Dec 23, 2022
This documents support for using a wildcard with the `hosts` option,
which was recently added in v0.42.0. See grafana/k6#2747.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants