New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add new IPAM subcomponent #2605
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2605 +/- ##
==========================================
+ Coverage 61.74% 61.76% +0.01%
==========================================
Files 134 136 +2
Lines 21763 22098 +335
==========================================
+ Hits 13438 13649 +211
- Misses 6869 6998 +129
+ Partials 1456 1451 -5 |
Adds the ginkgo BDD testing framework. Signed-off-by: Drew Erny <drew.erny@docker.com>
917e589
to
4d3129c
Compare
did some git wrangling to remove the dependency on #2579 |
Part of a series of commits rewriting the Allocator. Adds another subcomponent, the purpose of which is to translate between the libnetwork ipam library and the swarmkit allocator. It is responsibile for filling out IP addresses. Signed-off-by: Drew Erny <drew.erny@docker.com>
4d3129c
to
4582f8c
Compare
} | ||
} | ||
allocate = append(allocate, nwid) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to avoid code with continue LABEL
if possible. These two loops could written to loop over a map[string]struct{}
which acts as a set of the requested network IDs. Likewise, we could have a mapping of existing vip network IDs to the VIP. This would allow us to replace these 2 doubly-nested loops with a single loop:
for netID, currentVIP := range currentVIPsByNetID {
if _, shouldKeep := requestedNetIDs[netID]; shouldKeep {
keep = append(keep, currentVIP)
} else {
deallocate = append(deallocate, currentVIP)
}
delete(requestedNetIDs, netID)
}
// Now `keep` contains all of the VIPs we should keep, `deallocate` contains
// all of the VIPs we should deallocate, and any network IDs which remain in
// the `requestedNetIDs` set are IDs for networks from which we need to
// allocate new VIPs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you are correct and i agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, rather than wait until I go through the whole thing I'll just do this in chunks. These are my comments of the code up to AllocateNetwork
, which I have not yet looked at.
// reason. | ||
type ErrFailedAddressRequest struct { | ||
address string | ||
err error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking: for errors that are wrapped like this, perhaps we should provide a Cause() error
message to the error, so that the errors.Cause
function will work on it? https://github.com/pkg/errors/blob/master/errors.go#L256.
So here, and in ErrBustedIPAM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We certainly can do that. On a local branch I've restructured the errors so that the same set of errors is shared across all allocator subcomponents. I'll push up to this branch what progress I've made on the IPAM allocator tomorrow and we can take another look.
"github.com/docker/swarmkit/api" | ||
) | ||
|
||
// DrvRegistry is an interface defining |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... defining ... ? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the code was getting to predictable so i figured i'd spice it up with some cliff hangers.
|
||
// pools is used to save the internal poolIDs needed when releasing a pool. | ||
// It maps an ip address to a pool ID. | ||
pools map[string]string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies I am not familiar with IPAM - does this map an IP range to a pool ID, or one specific IP address? The IP seems to come from https://github.com/docker/swarmkit/pull/2605/files#diff-572df9327298b425a55967841029c382R116.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One specific IP address. This is a design carried over from the previous allocator, which makes sense to me. Each IP address is allocated from a specific pool, and each pool has a pool id (returned by libnetwork as part of requesting the pool) which references that pool and is used for releasing. A network can have many pools, and the allocator will try to retrieve an address from each of them. If a pool is full, or you try to request a specific IP address that does not belong to that pool, then the allocator will try the next pool.
The end result of this is that there's no straightforward way to map an IP address to the pool it originated from. So, to circumvent this, you keep a mapping of every IP address to the pool it was allocated from, so that when you release it you can release to the correct pool.
But now that I get to thinking about it, I don't know what the performance implications of saving so many strings (one for each IP address) is. I think theoretically you SHOULD be able to compute which pool an IP address belongs to, and from there get the pool ID. I think that this way is more straightforward for now, though, and can be easily optimized later.
I'll add a comment explaining all this.
if ipamOpts == nil { | ||
ipamOpts = map[string]string{} | ||
} | ||
// if we have an ipam driver name and we have IPAM configs, but for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking - would it make sense to just assign this in the block that specifies what to do if ipamOpts
are nil
? I don't think it makes a difference to reassign back if it's not nil
, then you may not need a comment explaining it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i honestly don't know why i did it like this 🤔 🤔 🤔
// the subnet, which is the key for the pools map. | ||
} | ||
// delete the Gateway option from the IPAM options when we're done | ||
delete(ipamOpts, ipamapi.RequestAddressType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we guaranteed that the original options would never have have a request address type? This seems to be mutating a map on a api.Network
object that is passed to us - is that ok, and if so, could we state in the comment for the Restore
function that this function may have side effects and that anything that calls Restore
must not expect to use the slice of networks again in its previous state? If not, can we make a copy of the map instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking: Since this is called in a loop, we be deleting RequestAddressType
multiple times, which is fine since delete is idempotent, but is a little confusing.
|
||
// now the attachments | ||
for _, attachment := range attachments { | ||
// nil checking for the same reaason as VIPs. why is everything a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling nit: "reaason" -> "reason"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed the tests yet - I am not super familiar with how this all works together, but here are my questions and comments so far.
// AllocateNetwork allocates the IPAM pools for the given network. The network | ||
// must not be nil, and must not use a node-local driver. | ||
// | ||
// We can't use a node-lcal driver, because we don't have access to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling nit: "node-lcal" -> "node-local"
// the subnet, which is the key for the pools map. | ||
} | ||
// delete the Gateway option from the IPAM options when we're done | ||
delete(ipamOpts, ipamapi.RequestAddressType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking: Since this is called in a loop, we be deleting RequestAddressType
multiple times, which is fine since delete is idempotent, but is a little confusing.
// everywhere later | ||
ipamOpts := map[string]string{} | ||
if n.Spec.IPAM != nil && n.Spec.IPAM.Driver != nil && n.Spec.IPAM.Driver.Options != nil { | ||
ipamOpts = n.Spec.IPAM.Driver.Options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Below where where we set the imapapi.AllocSerialPrefix
setting, and where the options' ipamapi.RequestAddressType
is modified and deleted, we seem to be potentially modifying the options in the spec. I'm not sure if this network ever gets written back to the store, but perhaps we shouldn't be updating the options in the spec - can we make a copy of these options instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i chose not to copy because this is a map and i am lazy. but you're right, this is not the time for laziness.
finalConfigs = append(finalConfigs, config) | ||
} | ||
|
||
// now that everythign has succeeded, add the fields to the network object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling nit: "everythign" -> "everything"
|
||
for _, config := range local.nw.IPAM.Configs { | ||
// these things can return errors, but we literally can't do anything | ||
// about it if they do, so just ignore it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we log these errors, maybe at the debug level, or would that not be useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably would be useful... but I'm honestly not sure what to do about it. I think that I need to modify this code so that errors on deallocation are kicked all the way up to the caller, who can log them. This avoids importing the logging library.
} | ||
// if we get here, that means we've tried every pool on the network, | ||
// and all of them are exhausted | ||
return ErrFailedAddressRequest{"new", ipamapi.ErrNoAvailableIPs} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, not super familiar with what is important to allocate and deallocate, so this may not make sense. Do we need to have rollback logic for any addresses or VIPS that have already been allocated if this error or ErrFailedAddressRequest
is returned partway through the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to add rollback logic, but I have it added in my local branch.
delete(local.endpoints, address) | ||
|
||
// both of these things can error, but we can't do anything about | ||
// it if they do, so we ignore the errors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we log a debug log if they error? Also when we clean up networks, we return a ErrDoubleFault
if the cleanup fails. Should we also do that when cleaning up attachments when rolling back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the error handling for this kind of sucks. Error on deallocation should never happen, because if they do, there's something wrong, and we don't really have an opportunity for a do-over or a way to sanely handle them. I think that I will change this, like above, to return errors to the caller, and let the caller decide what to do, which will eventually involve this getting kicked up to the top level and logged there.
superceded by #2615 |
This PR is marked WIP because it still needs tests written, but the file itself is mostly finished and ready for review.
Part of a series of commits rewriting the Allocator. Adds another subcomponent, the purpose of which is to translate between the libnetwork ipam library and the swarmkit allocator. It is responsible for filling out IP addresses.
Note that this PR includes the commit from #2579, so in review, you should focus only on changes in the
ipam
package. I will fix this Eventually but it's a pain so I've opted not to yet.