-
Notifications
You must be signed in to change notification settings - Fork 41
Resource Plugin #491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource Plugin #491
Conversation
|
针对当前commit的一些说明:
|
|
所以我应该看哪些- - |
现在完成度还没那么高,主要动的是
|
jschwinger233
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
整体方向还行, 就是有些实现有点别扭, 现在先不用在意.
| res[node] = &resourcetypes.NodeResourceInfo{ | ||
| NodeName: node, | ||
| Capacity: utils.Min(info1.Capacity, info2.Capacity), | ||
| Rate: info1.Rate + info2.Rate*info2.Weight, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加权平均
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是加权平均,L134的是当前带加权的总和,最后会除以总数来算平均
resources/plugin.go
Outdated
| UnlockNodes(ctx context.Context, nodes []string) error | ||
|
|
||
| // GetAvailableNodes returns available nodes and total capacity | ||
| GetAvailableNodes(ctx context.Context, rawRequest RawParams) (map[string]*types.NodeResourceInfo, int, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
名字不好, 之后换一个.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are only two hard things in Computer Science...
utils/file.go
Outdated
|
|
||
| // ListAllExecutableFiles returns all the executable files in the given path | ||
| func ListAllExecutableFiles(path string) ([]string, error) { | ||
| entries, err := os.ReadDir(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filepath.Walk 轻松重构
types/resource.go
Outdated
| } | ||
|
|
||
| // ResourcePluginOptions . | ||
| type ResourcePluginOptions map[string][]string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interface map 呢
types/options.go
Outdated
| type DeployOptions struct { | ||
| ResourceOpts ResourceOptions | ||
| //ResourceOpts ResourceOptions | ||
| ResourceOpts ResourcePluginOptions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为啥不叫 ResourceOptions
221192b to
b12f3f8
Compare
cae47ef to
39843ee
Compare
|
纪念一下,终于能build了...下一步是把cpu和mem先写成插件,持续debug |
cluster/calcium/build.go
Outdated
| return node, err | ||
| // node, err := c.scheduler.MaxIdleNode(nodes) | ||
| // todo: node, err := c.resource.MaxIdleNode(nodes) | ||
| node := nodes[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
标记一下,这里还得改改...
计划是让每个plugin都实现max idle node,并且返回一个优先级,选择优先级最大的那个node。
实际操作的时候只有cpu插件返回比较大的优先级。
39843ee to
7e63725
Compare
resources/plugin.go
Outdated
| // Plugin resource plugin | ||
| type Plugin interface { | ||
| // LockNodes locks the given nodes | ||
| LockNodes(ctx context.Context, nodes []string) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里也是有点纠结,感觉插件这边不一定需要有lock的功能,依赖core里那个ETCD的锁也不是不行?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个已经讨论过了, 先依然用 core 的锁, 如果有什么问题再来讨论
core.go
Outdated
|
|
||
| if config.Profile != "" { | ||
| http.Handle("/metrics", metrics.Client.ResourceMiddleware(cluster)(promhttp.Handler())) | ||
| // todo: http.Handle("/metrics", metrics.Client.ResourceMiddleware(cluster)(promhttp.Handler())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这是做啥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
metrics里删了一大堆东西,后续也许还要补上,这里放一个todo...
utils/transaction.go
Outdated
| // `prepare` should be a pure calculation process without side effects. | ||
| // `commit` writes the calculation result of `prepare` into database. | ||
| // if `commit` returns error, `rollback` will be performed. | ||
| func PCR(ctx context.Context, prepare func() error, commit func() error, rollback func() error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
按照之前的风格也应该叫 Pcr
utils/transaction.go
Outdated
| // `commit` writes the calculation result of `prepare` into database. | ||
| // if `commit` returns error, `rollback` will be performed. | ||
| func PCR(ctx context.Context, prepare func() error, commit func() error, rollback func() error) error { | ||
| err := prepare() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if err := prepare(); err != nil { bla bla } 紧凑点
utils/transaction.go
Outdated
| log.Errorf(ctx, "[PCR] failed to prepare, err: %v", err) | ||
| return err | ||
| } | ||
| err = commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同样这里也是,风格保持一致
utils/transaction.go
Outdated
| } | ||
| return err | ||
| } | ||
| return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
而且按照这个注释写的逻辑,我感觉其实用 Txn 包一下就可以了啊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
类似于 if prepare then commit else rollback,为什么需要一个新函数
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其实是完全可以用Txn来替换的,用这个PCR主要是为了体现Prepare过程是无副作用的,这样感觉逻辑清晰一些...
目前的设计里plugin的大部分API都是纯计算无副作用的,例如Alloc / Realloc / Remap,它们就适用于Prepare过程
types/config.go
Outdated
| MaxConcurrency int64 `yaml:"max_concurrency" default:"20"` // concurrently call single runtime in the same time | ||
| Store string `yaml:"store" default:"etcd"` // store type | ||
|
|
||
| ResourcePluginDir string `yaml:"resource_plugin_dir" default:"/etc/eru/resource_plugins"` // resource plugins path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plugins,注意是复数
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
另外折腾了这里的话,你的 yaml sample 也得改
metrics/metrics.go
Outdated
| log.Errorf(nil, "[SendNodeInfo] Error occurred while sending cpu used data to statsd: %v", err) //nolint | ||
| } | ||
| } | ||
| // |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WTF?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
因为Node里的ResourceMeta也换成map[string]interface{}了,所以不能直接用之前这个metrics,这里等流程跑通了之后要改...
metrics/handler.go
Outdated
| // | ||
| //// ResourceMiddleware to make sure update resource correct | ||
| //// todo: remove resource types | ||
| //func (m *Metrics) ResourceMiddleware(cluster cluster.Cluster) func(http.Handler) http.Handler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WTF??
lock/mocks/template.go
Outdated
|
|
||
| func FromTemplate() lock.DistributedLock { | ||
| m := &DistributedLock{} | ||
| m.On("Lock", mock.Anything).Return(context.Background(), nil) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这是用在哪里的,既不是测试又不是 mock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个是store的FromTemplate里调用的...
engine/transform.go
Outdated
| if err != nil { | ||
| return res, err | ||
| } | ||
| err = json.Unmarshal(body, &res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
紧凑点
ef0676e to
0f1c0b1
Compare
| if err != nil { | ||
| return nil, err | ||
| } | ||
| return files, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
files := []string{}
if err := filepath.Walk(basedir, func(path string, info fs.FileInfo, err error) error {
if err != nil {
return err
}
if info.IsDir() && path != basedir {
return filepath.SkipDir
}
if isExecutable(info.Mode().Perm()) {
files = append(files, path)
}
return nil
}); err!= nil {
return nil, err
}
可以写紧凑点的
| } | ||
|
|
||
| message StringSlice { | ||
| repeated string slice = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
好像没有必要单独定义这个 message
resources/plugin.go
Outdated
| UpdateNodeResourceUsage(ctx context.Context, node string, resourceArgs []coretypes.RawParams, direction bool) error | ||
|
|
||
| // UpdateNodeResourceCapacity updates node resource capacity | ||
| UpdateNodeResourceCapacity(ctx context.Context, node string, resourceOpts coretypes.RawParams, direction bool) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node 这个名字不好, 这个仓库里 node 变量都是 *types.Node 类型, string 类型的变量名是 nodename
resources/manager.go
Outdated
| rollbackPluginChan <- plugin | ||
| } | ||
| }) | ||
| close(rollbackPluginChan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rollbackPluginChan 听起来像是要回滚的 plugin, 但是 len()==all 的时候又不回滚, 这个逻辑虽然正确但是很奇怪, 想办法调整一下
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我终于知道这里哪里最别扭了, 是你把 fan-out fan-in 的逻辑的一部分放到 callPlugins 里, 而这个拆分的结果是, 如你在当前这一层来观察 rollbackPluginChan 这个 chan, 你会发现没有并发场景去使用它, 但是如果改成 []Plugin slice 也不对, 因为并发被 callPlugins 藏起来的, 但是必要的并发同步却又留给了调用方.
我觉得这里可以使用泛型了, callPlugins[T any](xxx, func(Plugin) T) []T, 让 callPlugin 内部的 goroutine 完成所有的同步, 最后返回 slice.
resources/plugin.go
Outdated
| // Plugin resource plugin | ||
| type Plugin interface { | ||
| // LockNodes locks the given nodes | ||
| LockNodes(ctx context.Context, nodes []string) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个已经讨论过了, 先依然用 core 的锁, 如果有什么问题再来讨论
resources/plugin.go
Outdated
| UnlockNodes(ctx context.Context, nodes []string) error | ||
|
|
||
| // SelectAvailableNodes returns available nodes and total capacity | ||
| SelectAvailableNodes(ctx context.Context, nodes []string, resourceOpts coretypes.RawParams) (map[string]*types.NodeResourceInfo, int, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
声明类型的时候, 对那些读上去不知道是啥的最好加上变量名增加可读性, 比如这里的返回值 (map, int, error) 可以写成 (_ map, variableName int, _ error), 可以让人知道这个 int 是啥, 这样注释都可以直接省略
resources/plugin.go
Outdated
|
|
||
| // GetNodeResourceInfo returns total resource info and available resource info of the node, format: {"cpu": 2} | ||
| // also returns diffs, format: ["node.VolumeUsed != sum(workload.VolumeRequest"] | ||
| GetNodeResourceInfo(ctx context.Context, node string, workloads []*coretypes.Workload, fix bool) (coretypes.RawParams, coretypes.RawParams, []string, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个接口拆两个吧, Get 操作的同时在改数据...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
大力支持!之前的Get我就觉得有点奇怪...
resources/plugin.go
Outdated
| // SetNodeResourceInfo sets both total node resource info and allocated resource info | ||
| // used for rollback of RemoveNode | ||
| // notice: here uses absolute values, not delta values | ||
| SetNodeResourceInfo(ctx context.Context, resourceCapacity coretypes.RawParams, resourceUsage coretypes.RawParams) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个接口第三参数是 resourceUsage, 和 UpdateNodeResourceUsage 这个接口是不是语义重复了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
还有 UpdateNodeResourceCapacity, 功能重复?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的Set是设置绝对值,Update用的是相对值...
resources/plugin.go
Outdated
| // Alloc allocates resource, returns engine args for each workload, format: [{"cpus": 1.2}, {"cpus": 1.2}] | ||
| // also returns resource args for each workload, format: [{"cpus": 1.2}, {"cpus": 1.2}] | ||
| // pure calculation | ||
| Alloc(ctx context.Context, node string, deployCount int, resourceOpts coretypes.RawParams) ([]coretypes.RawParams, []coretypes.RawParams, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
没看懂前两个 return 值是啥, 不如你把变量名加到返回值的类型声明里; RawParam 这个类型是只和资源有关的吗? 从名字来看像是能承载万物
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
打算把RawParams改成具体的类型,WorkloadResourceOpts / NodeResourceOpts / WorkloadResourceArgs / NodeResourceArgs / EngineArgs
resources/plugin.go
Outdated
|
|
||
| // AddNode adds a node with requested resource, returns resource capacity and (empty) resource usage | ||
| // should return error if the node already exists | ||
| AddNode(ctx context.Context, node string, resourceOpts coretypes.RawParams) (coretypes.RawParams, coretypes.RawParams, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
第二返回值是啥
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resource usage
jschwinger233
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉还是 callPlugins 太恶心了, 改成泛型就好了
resources/plugin.go
Outdated
| Remap(ctx context.Context, node string, workloadMap map[string]*coretypes.Workload) (map[string]coretypes.RawParams, error) | ||
|
|
||
| // UpdateNodeResourceUsage updates node resource usage | ||
| UpdateNodeResourceUsage(ctx context.Context, node string, resourceArgs []coretypes.RawParams, direction bool) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
direction 是啥?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incr / Decr,表示usage的改变方向
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
那把这个类型建立一个 alias 吧, 不要用 bool, 语义模糊
resources/manager.go
Outdated
| // SelectAvailableNodes returns available nodes which meet all the requirements | ||
| // the caller should require locks | ||
| // pure calculation | ||
| func (pm *PluginManager) SelectAvailableNodes(ctx context.Context, nodes []string, resourceOpts types.RawParams) (map[string]*resourcetypes.NodeResourceInfo, int, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doCalculateCapacity 和 SelectAvailableNodes 这两个函数几乎拥有一样的语义吧, 我觉得前者的名字好一点, 然后似乎不需要 doCalculateCapacity 这个函数了
resources/manager.go
Outdated
| } | ||
|
|
||
| // Alloc . | ||
| func (pm *PluginManager) Alloc(ctx context.Context, node string, deployCount int, resourceOpts types.RawParams) ([]types.RawParams, []map[string]types.RawParams, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pm.Alloc 是有 IO 的而 plugin.Alloc 是纯计算的, 最好统一一下 Alloc 这个函数的语义
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不只是Alloc,为了方便回滚 + PCR,plugin里大多数func都是纯计算的...但是在plugin manager里为了方便外部调用,就成了有IO的。
如果plugin的Alloc也有IO,那么plugin manager回滚的时候就不能统一调用UpdateNodeResourceUsage,而是要反向调用Alloc,也就是说Alloc本身要提供rollback的能力,感觉这样就啰嗦了很多。
*当然不可以在plugin manager里将所有resource args取反向然后直接UpdateNodeResourceUsage,因为默认core这边是不知道如何计算“反向”的,而且不是所有resource args都能通过加个负号就变成“反向”的。
所以这里的解决方案...可能是把plugin里的Alloc改成一个看起来无副作用的名字?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有个问题, 我们对命令行接口好像有点理解不同, 我心想的其实是那种..以文档 spec 来规定接口的命令行, 比如 https://github.com/containernetworking/cni/blob/master/SPEC.md#cni-operations, spec 规定一个正确的实现应该有哪些命令行调用选项, 最后的实现方只要提供一个二进制满足 spec 就好;
但是你的接口好像是让 plugin 去实现一个 golang interface?
cluster/calcium/create.go
Outdated
| // deploy scheduled workloads on one node | ||
| func (c *Calcium) doDeployWorkloadsOnNode(ctx context.Context, ch chan *types.CreateWorkloadMessage, nodename string, opts *types.DeployOptions, deploy int, plans []resourcetypes.ResourcePlans, seq int) (indices []int, err error) { | ||
| logger := log.WithField("Calcium", "doDeployWorkloadsOnNode").WithField("nodename", nodename).WithField("opts", opts).WithField("deploy", deploy).WithField("plans", plans).WithField("seq", seq) | ||
| func (c *Calcium) doDeployWorkloadsOnNode(ctx context.Context, ch chan *types.CreateWorkloadMessage, nodename string, opts *types.DeployOptions, deploy int, engineArgs []types.RawParams, resourceArgs []map[string]types.RawParams, seq int) (indices []int, err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个参数太多了, 手动格式化一下, 一个参数一行
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
借个楼,最终实现的时候是“提供一个二进制满足spec”的,plugin这个interface一方面是为了mock,一方面是“万一以后用rpc插件呢?”
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
我对这个 go interface 没意见, 我是看到你没有实现具体的命令行调用方式, 如果按照 spec 来调用, 那么命令行的调用方式都是一样的, 只需要换二进制.
| if err != nil { | ||
| return res, err | ||
| } | ||
| if err = json.Unmarshal(body, &res); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
聪明!
resources/plugin.go
Outdated
|
|
||
| command := exec.CommandContext(ctx, cmd, args...) | ||
| if err := command.Run(); err != nil { | ||
| log.Errorf(ctx, "[callBinaryPlugin] failed to run plugin %s, command %s %v, err %s", bp.path, cmd, args, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里 err 的时候输出一下 stderr, 非常有帮助
|
cluster 层的逻辑整体方向没太大问题, plugin 那里需要再讨论一下, 我可以直接看看你实现的一个具体的 plugin 仓库先. |
4fc68d5 to
292d65d
Compare
2e1d867 to
65be1b5
Compare
|
用泛型重写了一下plugin manager,显著减少了编码过程的痛苦。但是现在各种工具对泛型的支持还不全,包括但并不限于:
mockery这个问题我想办法解决了:要pull下来mockery的代码,然后把go版本改成1.17,在go install的时候要这么用:`go install -gcflags=all=-G=3 -tags="typeparams"。但是golangci-lint还没搞... |
| package utils | ||
|
|
||
| // Map like map in Python | ||
| func Map[T1, T2 any](slice []T1, f func(T1) T2) []T2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
太骚了,测试加起来,这部分一看 ut 的覆盖率急剧下降了
| return ramInBytes, nil | ||
| } | ||
|
|
||
| flag := int64(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
测试 too
utils/transaction.go
Outdated
| // `prepare` should be a pure calculation process without side effects. | ||
| // `commit` writes the calculation result of `prepare` into database. | ||
| // if `commit` returns error, `rollback` will be performed. | ||
| func Pcr(ctx context.Context, prepare func(ctx context.Context) error, commit func(ctx context.Context) error, rollback func(ctx context.Context) error, ttl time.Duration) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
缩写用 PCR
| func (p *VolumePlan) UnmarshalJSON(b []byte) (err error) { | ||
| if *p == nil { | ||
| *p = VolumePlan{} | ||
| // StringSlice . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unit test
2f77b72 to
9e12d3b
Compare
|
迁移工具 & 回滚工具做好啦 |
60efde5 to
38de27a
Compare
0c06f3a to
34dacc4
Compare
38de27a to
1f68954
Compare
jschwinger233
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
整体结构还行, 兼容性可能要考虑再改改.
| Storage: resourceMeta.StorageLimit, | ||
| // note here will change the node resource meta (stored in resource plugin) | ||
| // todo: add wal here | ||
| engineArgs, deltaResourceArgs, resourceArgs, err = c.resource.Realloc(ctx, workload.Nodename, workload.ResourceArgs, opts.ResourceOpts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里的 txn 有没有可能做得对称一点, 比如这里用 source.SetNodeResourceUsage + store.UpdateWorkload 来回滚 resource.Realloc, 语义是非常隐晦的, 最好是有对称的 API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
其实应该理解为Realloc的回滚是resource.SetNodeResourceUsage,store.UpdateWorkload回滚的是上面的store.UpdateWorkload,算是一一对应的,就是Realloc回滚这里名字不大对称...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
manager里把realloc和alloc的rollback包了一下,这样看起来就是对称的了。
d32ff7e to
c270f5c
Compare
c270f5c to
dfeb5a2
Compare
|
准备合啦,接下来的工作都要基于插件化来做了。 |
This reverts commit ac037e7.
This reverts commit ac037e7.
关于资源插件化的PR。
设计想法:对于core来说,它不需要知道各种资源的作用&分配方式,只需要把请求的参数转发给资源插件即可。资源插件会进行资源的具体分配工作,并对core返回engine args,用于后续engine层创建workload。
资源插件采用可执行文件的方式,core会加载指定目录下的所有可执行文件作为资源插件。
在创建workload的流程中,eru-cli要把所有关于资源的请求参数转化为map[string][]string的形式,例如{"cpu": ["1.2"]}。由于core不知道哪些参数对应哪个资源插件,所以它会把所有的参数依次转发给所有的资源插件。各个插件自己负责解析自己需要的参数,并做出正确的处理。
由于要实现插件化,所以资源相关的信息并不会再存储在node metadata里了,而是由各个资源插件自己负责存储,存储也不一定用ETCD,处理得当的话MySQL / MongoDB之类的也行(就是部署麻烦点)。
因此在workload metadata中,关于资源的描述换成了map[string]map[string][]string的方式(是不是看着有点绕),例如:
要分别记录各个资源插件分配的资源,否则在rollback / remove要释放资源的时候不知道怎么释放。
主要要做的事情: