Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace k8sapiserver startAPIServer to use Kubernetes createServerChain #121

Merged
merged 50 commits into from
Apr 18, 2022

Conversation

matthewygf
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

This PR tries to extend the simulator apiserver to accept CRDs, this is useful because some scheduler-plugins uses CRDs. #77

Additionally, we can directly use the Kubernetes/api-server/createServerChain method to reduce the burden of maintaining API code.

This is a PR spawned from a discussion in PR #113

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

After this extension, user can use kubectl to apply CRDs.

/label tide/merge-method-squash

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 28, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @matthewygf. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Feb 28, 2022
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Feb 28, 2022
@matthewygf
Copy link
Contributor Author

/assign @sanposhiho

@matthewygf
Copy link
Contributor Author

@sanposhiho , to build my branch successfully locally, I had to use "go mod vendor", and then generate the openAPI definition in kubernetes/hack/update-generated-openapi.sh to make it work.

I suspect this is why the test is failling, and I am not sure how to fix this test... would you be able to advise ... ?

@sanposhiho
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Mar 3, 2022
@sanposhiho
Copy link
Member

@matthewygf
Thanks for your work!

I think we need to use go mod vendor as you said and replace this line to:

OPENAPIFILE="vendor/k8s.io/kubernetes/pkg/generated/openapi/zz_generated.openapi.go"

Then, after running make openapi on project root, we can resolve the error.

(We are using make openapi to generate openapi in submodules/kubernetes dir and move generated openapi file to our project.)


And btw, I found another bug on /hack/openapi.sh 😓 Please replace this line to:

cp pkg/generated/openapi/zz_generated.openapi.go "../../${OPENAPIFILE}"

@matthewygf
Copy link
Contributor Author

matthewygf commented Mar 3, 2022

@sanposhiho Thanks for the suggestion !

Your method did work ! however I thought the make kube-apiserver took too long. So I have borrowed the hack/open-api sh into my branch, which speeds up the gen open-api by a bit. (See the updated hack/openapi.sh)

The test currently fails because the env $GOPATH seems to be not set for make test. Do you know how to set the env in github action ? :')

Port: 443,
func createK8SAPIChainedServer(etcdURL, frontendURL string) (*apiserver.APIAggregator, error) {
serverOpts, cleanupFunc, err := createK8SAPIServerOpts(etcdURL, frontendURL)
defer cleanupFunc()
Copy link
Member

@sanposhiho sanposhiho Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't register cleanupFunc to defer here since cleanup will be executed when createK8SAPIChainedServer is finished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense, I am going to pass the cleanupFunc all the way up to the shutdown Func instead.

@sanposhiho
Copy link
Member

sanposhiho commented Apr 13, 2022

@matthewygf
It seems that all requests to API server from web client make panic invalid memory address or nil pointer dereference
on api-server. 🤔
I'm investigating that, but no cause identified yet...

Could you check if the panic occurs on your machine as well or not?

@sanposhiho
Copy link
Member

It seems that only requests from Web UI doesn't work well. So this change is related probably.
#115

@sanposhiho
Copy link
Member

Can send a request via kubectl. Hmmm, only requests from web client have the issue 🤔

$ kubectl get pods  --server=http://localhost:3131      
NAME   READY   STATUS    RESTARTS   AGE
hoge   0/1     Pending   0          16s

/IR3qCXyThP/dbCiHrF3v1cuhBOHY8CLVg==
-----END EC PRIVATE KEY-----`

// StartAPIServer starts both the secure k8sAPIServer and proxy server to handle insecure serving, and it make panic when a error happen.
func StartAPIServer(kubeAPIServerURL, etcdURL, frontendURL string) (*restclient.Config, func(), error) {
h := &APIServerHolder{Initialized: make(chan struct{})}
handler := http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
Copy link
Member

@sanposhiho sanposhiho Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, finally found work-around... It seems that user-agent causes the panic. I don't know why 🤷‍♂️🤷‍♂️🤷‍♂️
Let's overwrite this. This work-around works for me. (Other good work-around is welcome)

Suggested change
handler := http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
handler := http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
// work-around: we need to overwrite user-agent from client.
// some user-agents causes a panic on kube-apiserver.
req.Header.Set("User-Agent", "kubectl/v1.21.5 (darwin/amd64) kubernetes/aea7bba")

Copy link
Contributor Author

@matthewygf matthewygf Apr 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I0414 21:43:43.780933 1 trace.go:205] Trace[354409240]: "Delete" url:/api/v1/namespaces/default/pods,user-agent:simulator/v0.0.0 (linux/amd64) kubernetes/$Format,audit-id:7b1c1668-a971-440e-a841-eff8aeee635e,client:127.0.0.1,accept:application/json, */*,protocol:HTTP/1.1 (14-Apr-2022 21:43:42.177) (total time: 1603ms) - I am not sure how or why you have to set the user-agent😲, in the kube-apiserver trace, it seems the user-agent is always set to simulator/v0.0.0. 🤔

My requests are from WebUI, hitting the reset button.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No panic with kubectl either 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about something like this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my investigation, we need to overwrite user-agent. Some user-agent (chrome is one of them) breaks api-server. 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check if I can run simulator successfully on docker container.

matthewygf and others added 2 commits April 14, 2022 22:12
Co-authored-by: Kensei Nakada <handbomusic@gmail.com>
Co-authored-by: Kensei Nakada <handbomusic@gmail.com>
@matthewygf
Copy link
Contributor Author

matthewygf commented Apr 14, 2022

@matthewygf It seems that all requests to API server from web client make panic invalid memory address or nil pointer dereference on api-server. 🤔 I'm investigating that, but no cause identified yet...

Could you check if the panic occurs on your machine as well or not?

Hmmm, @sanposhiho I haven't got any panic when making request via webUI... 🤔 I run the simulator-server using docker compose, how do you run the simulator server ?

newest

@sanposhiho
Copy link
Member

sanposhiho commented Apr 15, 2022

I run the simulator-server using docker compose, how do you run the simulator server ?

Run locally with make start + yarn dev and then sent request from web ui(chrome).

@sanposhiho
Copy link
Member

sanposhiho commented Apr 15, 2022

Okay, I also could run the simulator without panic in docker container. Hmmmmmm. 🤷‍♂️🤷‍♂️
But, in my docker container, all request's user-agent are Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36 (the same user-agent when I run the simulator locally)
So,,, if requests have chrome user-agent (or other non-kubernetes supported user-agent), when we run the simulator locally, kube-apiserver will break and when we run the simulator in docker container, it won't. 🤔

Could you run the simulator locally with make start and yarn dev and check if api-server panic or not? @matthewygf

@matthewygf
Copy link
Contributor Author

matthewygf commented Apr 17, 2022

Okay, I also could run the simulator without panic in docker container. Hmmmmmm. 🤷‍♂️🤷‍♂️ But, in my docker container, all request's user-agent are Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.75 Safari/537.36 (the same user-agent when I run the simulator locally) So,,, if requests have chrome user-agent (or other non-kubernetes supported user-agent), when we run the simulator locally, kube-apiserver will break and when we run the simulator in docker container, it won't. 🤔

Could you run the simulator locally with make start and yarn dev and check if api-server panic or not? @matthewygf

@sanposhiho I wasn't able to build the webui from yarn dev, (so I start frontend using docker container) but I managed to start the server locally with make start. I tested it with chrome after commented out the user agent default code and it seems to be fine 🤷‍♂️... Can you check whether using frontend with docker works ? Or using frontend with production build ? yarn build && yarn start.

Maybe another issue is the apple user agent ? My machine is a windows, and I start the simulator in wsl (windows subsystem for linux).

@sanposhiho
Copy link
Member

Sorry, I didn't update master in my fork. The latest master also has this problem.
I raised an issue for this problem: #147
So, I think we can merge this branch without solving this. 👌 Thanks for investigating with me.

Can you check whether using frontend with docker works? Or using frontend with production build ? yarn build && yarn start.

Yes. When I use docker (make docker_build_and_up), it works fine. I also tried in another PC (M1 Mac Pro 2020) and could reproduce this. So this issue doesn't seem to be due to my machine.

Also, I tried "one in docker, the other locally":

  • run frontend locally, and run backend in docker → works fine.
  • run backend(+ kube-apiserver) locally and run frontend in docker → panic.

So, this problem is only happens when we run backend locally.

Comment on lines 38 to 40
if req.UserAgent() == "" {
req.Header.Set("User-Agent", restclient.DefaultKubernetesUserAgent())
}
Copy link
Member

@sanposhiho sanposhiho Apr 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, could you please revert this? I'll try to fix this issue in another PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted :)

@196Ikuchil
Copy link
Contributor

@matthewygf @sanposhiho
Hi! Finally, #100 has merged. Please check it if you need it.

#121 (comment)

@sanposhiho
Copy link
Member

@matthewygf
Thanks for your great work!! I am very happy to merge this change.
Supporting CRD through this change is a great step forward for us. 🎉

/lgtm
/approve

@sanposhiho
Copy link
Member

/hold cancel

@k8s-ci-robot k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Apr 18, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: matthewygf, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 18, 2022
@k8s-ci-robot k8s-ci-robot merged commit 70ab63b into kubernetes-sigs:master Apr 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants