Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stopping faasd and faasd-provider should stop all containers #64

Closed
carlosedp opened this issue Mar 9, 2020 · 4 comments
Closed

Stopping faasd and faasd-provider should stop all containers #64

carlosedp opened this issue Mar 9, 2020 · 4 comments

Comments

@carlosedp
Copy link
Contributor

carlosedp commented Mar 9, 2020

Expected Behaviour

Stopping faasd and faasd-provider should stop all containers. When the processes faasd and faasd-provider are stopped (with sysctl), it's expected that all containers should be stopped.

Current Behaviour

Some container tasks remain in RUNNING state.

Before stop, all tasks are running:

❯ sc-status faasd-provider
● faasd-provider.service - faasd-provider
   Loaded: loaded (/lib/systemd/system/faasd-provider.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-03-09 08:55:47 EDT; 52s ago
 Main PID: 15197 (faasd)
    Tasks: 8 (limit: 4700)
   Memory: 11.3M (limit: 500.0M)
   CGroup: /system.slice/faasd-provider.service
           └─15197 /usr/local/bin/faasd provider
❯ sc-status faasd
● faasd.service - faasd
   Loaded: loaded (/lib/systemd/system/faasd.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-03-09 08:55:47 EDT; 54s ago
 Main PID: 15189 (faasd)
    Tasks: 10 (limit: 4700)
   Memory: 14.5M (limit: 500.0M)
   CGroup: /system.slice/faasd.service
           └─15189 /usr/local/bin/faasd up
❯ sudo ctr task ls
TASK                 PID      STATUS
basic-auth-plugin    15234    RUNNING
nats                 15333    RUNNING
prometheus           15450    RUNNING
gateway              15577    RUNNING
queue-worker         15698    RUNNING

❯ sudo ctr -n openfaas-fn task ls
TASK      PID      STATUS
figlet    15957    RUNNING

After service stop:

❯ sc-stop faasd-provider
❯ sc-stop faasd
❯ sc-status faasd
● faasd.service - faasd
   Loaded: loaded (/lib/systemd/system/faasd.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Mon 2020-03-09 08:57:58 EDT; 2min 46s ago
  Process: 15189 ExecStart=/usr/local/bin/faasd up (code=exited, status=0/SUCCESS)
 Main PID: 15189 (code=exited, status=0/SUCCESS)
❯ sc-status faasd-provider
● faasd-provider.service - faasd-provider
   Loaded: loaded (/lib/systemd/system/faasd-provider.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Mon 2020-03-09 08:57:23 EDT; 3min 23s ago
  Process: 15197 ExecStart=/usr/local/bin/faasd provider (code=killed, signal=TERM)
 Main PID: 15197 (code=killed, signal=TERM)

❯ sudo ctr -n openfaas-fn task ls
TASK      PID      STATUS
figlet    15957    RUNNING
❯ sudo ctr task ls
TASK            PID      STATUS
prometheus      15450    STOPPED
gateway         15577    RUNNING
queue-worker    15698    RUNNING

As seen, the deployed function, figlet and the containers gateway and queue-worker still had their tasks running.

Below the stop logs:

Mar 09 09:16:51 debian10 faasd[16313]: 2020/03/09 09:16:51 Signal received.. shutting down server in 1s
Mar 09 09:16:51 debian10 faasd[16313]: 2020/03/09 09:16:51 [Delete] removing CNI network for: basic-auth-plugin
Mar 09 09:16:51 debian10 faasd[16313]: 2020/03/09 09:16:51 [Delete] removed: basic-auth-plugin from namespace: /proc/16360/ns/net, ID: basic-auth-plugin-16360
Mar 09 09:16:51 debian10 faasd[16313]: Status of basic-auth-plugin is: running
Mar 09 09:16:51 debian10 faasd[16313]: 2020/03/09 09:16:51 Need to kill basic-auth-plugin
Mar 09 09:16:51 debian10 faasd[16313]: 2020/03/09 09:16:51 [Delete] removing CNI network for: nats
Mar 09 09:16:52 debian10 faasd[16313]: 2020/03/09 09:16:52 [Delete] removed: nats from namespace: /proc/16460/ns/net, ID: nats-16460
Mar 09 09:16:52 debian10 faasd[16313]: Status of nats is: running
Mar 09 09:16:52 debian10 faasd[16313]: 2020/03/09 09:16:52 Need to kill nats
Mar 09 09:16:52 debian10 faasd[16313]: [1] 2020/03/09 13:16:52.145386 [INF] STREAM: Shutting down.
Mar 09 09:16:52 debian10 faasd[16313]: [1] 2020/03/09 13:16:52.145588 [INF] Server Exiting..
Mar 09 09:16:52 debian10 faasd[16313]: 2020/03/09 09:16:52 [Delete] removing CNI network for: prometheus
Mar 09 09:16:52 debian10 faasd[16313]: 2020/03/09 09:16:52 [Delete] removed: prometheus from namespace: /proc/16578/ns/net, ID: prometheus-16578
Mar 09 09:16:52 debian10 faasd[16313]: Status of prometheus is: running
Mar 09 09:16:52 debian10 faasd[16313]: 2020/03/09 09:16:52 Need to kill prometheus
Mar 09 09:16:52 debian10 faasd[16313]: level=warn ts=2020-03-09T13:16:52.417Z caller=main.go:501 msg="Received SIGTERM, exiting gracefully..."
Mar 09 09:16:52 debian10 faasd[16313]: level=info ts=2020-03-09T13:16:52.417Z caller=main.go:526 msg="Stopping scrape discovery manager..."
Mar 09 09:16:52 debian10 faasd[16313]: level=info ts=2020-03-09T13:16:52.417Z caller=main.go:540 msg="Stopping notify discovery manager..."
Mar 09 09:16:52 debian10 faasd[16313]: level=info ts=2020-03-09T13:16:52.417Z caller=main.go:562 msg="Stopping scrape manager..."
Mar 09 09:16:52 debian10 faasd[16313]: level=info ts=2020-03-09T13:16:52.417Z caller=main.go:522 msg="Scrape discovery manager stopped"
Mar 09 09:16:52 debian10 faasd[16313]: level=info ts=2020-03-09T13:16:52.417Z caller=main.go:536 msg="Notify discovery manager stopped"
Mar 09 09:16:52 debian10 faasd[16313]: level=info ts=2020-03-09T13:16:52.417Z caller=manager.go:814 component="rule manager" msg="Stopping rule manager..."
Mar 09 09:16:52 debian10 faasd[16313]: level=info ts=2020-03-09T13:16:52.417Z caller=manager.go:820 component="rule manager" msg="Rule manager stopped"
Mar 09 09:16:52 debian10 faasd[16313]: level=info ts=2020-03-09T13:16:52.417Z caller=main.go:556 msg="Scrape manager stopped"
Mar 09 09:16:52 debian10 faasd[16313]: level=info ts=2020-03-09T13:16:52.418Z caller=notifier.go:602 component=notifier msg="Stopping notification manager..."
Mar 09 09:16:52 debian10 faasd[16313]: level=info ts=2020-03-09T13:16:52.419Z caller=main.go:727 msg="Notifier manager stopped"
Mar 09 09:17:07 debian10 faasd[16313]: Disconnected from nats://nats:4222
Mar 09 09:17:07 debian10 faasd[16313]: Reconnect
Mar 09 09:17:07 debian10 faasd[16313]: Connect: nats://nats:4222
Mar 09 09:17:09 debian10 faasd[16313]: Reconnecting (1/120) to nats://nats:4222 failed
Mar 09 09:17:09 debian10 faasd[16313]: Waiting 2s before next try
Mar 09 09:17:10 debian10 faasd[16313]: 2020/03/09 13:17:10 Disconnected from nats://nats:4222
Mar 09 09:17:10 debian10 faasd[16313]: 2020/03/09 13:17:10 Reconnect
Mar 09 09:17:10 debian10 faasd[16313]: 2020/03/09 13:17:10 Connect: nats://nats:4222
Mar 09 09:17:11 debian10 faasd[16313]: Connect: nats://nats:4222
Mar 09 09:17:12 debian10 faasd[16313]: 2020/03/09 13:17:12 Reconnecting (1/60) to nats://nats:4222 failed
Mar 09 09:17:13 debian10 faasd[16313]: Reconnecting (2/120) to nats://nats:4222 failed
Mar 09 09:17:13 debian10 faasd[16313]: Waiting 4s before next try
Mar 09 09:17:14 debian10 faasd[16313]: 2020/03/09 13:17:14 Connect: nats://nats:4222
Mar 09 09:17:16 debian10 faasd[16313]: 2020/03/09 13:17:16 Reconnecting (2/60) to nats://nats:4222 failed
Mar 09 09:17:18 debian10 faasd[16313]: Connect: nats://nats:4222
Mar 09 09:17:20 debian10 faasd[16313]: Reconnecting (3/120) to nats://nats:4222 failed
Mar 09 09:17:20 debian10 faasd[16313]: Waiting 6s before next try
Mar 09 09:17:20 debian10 faasd[16313]: 2020/03/09 13:17:20 Connect: nats://nats:4222
Mar 09 09:17:22 debian10 faasd[16313]: error deleting container prometheus, prometheus, cannot delete running task prometheus: failed precondition
Mar 09 09:17:22 debian10 faasd[16313]: 2020/03/09 09:17:22 [proxy] Done received
Mar 09 09:17:22 debian10 faasd[16313]: 2020/03/09 13:17:22 Reconnecting (3/60) to nats://nats:4222 failed
Mar 09 09:17:23 debian10 systemd[1]: faasd.service: Succeeded.

Possible Solution

Steps to Reproduce (for bugs)

  1. Start faasd and faasd-provider
  2. Deploy a function
  3. Stop faasd and faasd-provider
  4. Check container tasks with sudo ctr -n openfaas-fn task ls and sudo ctr task ls.

Context

Your Environment

Latest faasd built from master.

@carlosedp
Copy link
Contributor Author

carlosedp commented Mar 9, 2020

Apparently Prometheus is preventing the stop to proceed. I changed :

File up.go

91: 		log.Printf("Signal received.. shutting down server in %s\n", shutdownTimeout.String())
92: 		err := supervisor.Remove(services)
93: 		if err != nil {
94: 			fmt.Printf("Error removing services: %s\n", err)
95: 		}

And got:

Mar 09 09:30:36 debian10 faasd[18159]: Error removing services: error deleting container prometheus, prometheus, cannot delete running task prometheus: failed precondition

What is weird is that Prometheus stopped thru faasd never prints it's last message like stopping it manually with kill -TERM pid:

Mar 09 10:05:56 debian10 faasd[21350]: level=info ts=2020-03-09T14:05:56.616Z caller=notifier.go:602 component=notifier msg="Stopping notification manager..."
Mar 09 10:05:56 debian10 faasd[21350]: level=info ts=2020-03-09T14:05:56.616Z caller=main.go:727 msg="Notifier manager stopped"
Mar 09 10:05:56 debian10 faasd[21350]: level=info ts=2020-03-09T14:05:56.616Z caller=main.go:739 msg="See you next time!"

@carlosedp
Copy link
Contributor Author

One issue I believe @alexellis might need to direct is how the provider should behave related to deployed functions.

Since faasd-provider does not keep the state, if it is stopped and the functions deleted, when started back the functions would not be recreated.

One option would be just stopping the functions (killing it's tasks) but keeping the container so in the event of a restart, faasd-provider would scale it back to 1 on access.

What do you think?

@alexellis
Copy link
Member

Why do you feel that is this change required? (I may not understand the problem well enough, I'm listening)

@alexellis
Copy link
Member

/lock: closed

@derek derek bot locked and limited conversation to collaborators May 31, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants