Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-use runtime workers #1733

Merged
merged 21 commits into from
Oct 28, 2019
Merged

Re-use runtime workers #1733

merged 21 commits into from
Oct 28, 2019

Conversation

samtstern
Copy link
Contributor

@samtstern samtstern commented Oct 17, 2019

Description

Fixes #1353

New architecture:

The functions emulator manages a RuntimeWorkerPool full of RuntimeWorkers. These wrap an instance of FunctionsRuntimeInstance which were previously launched as one-time-use.

A RuntimeWorker listens for important changes within the FunctionsRuntimeInstance and summarizes them into a state. This worker is also responsible for sending IPC messages to the child process.

A RuntimeWorker has four possible states:

  • IDLE - ready to receive a request.
  • BUSY - currently executing a request.
  • FINISHING - BUSY but will never return to IDLE. Basically "die when you're done please".
  • FINISHED - the process is dead. Will never be revived, this worker is trash now.

Within the runtime I changed the execution flow. The main() function only does the trigger discovery and stubbing at first. Then it listens for an IPC message telling it to run a function. This message handler can support mutliple invocations as long as the function doesn't experience any unhandled errors.

Scenarios Tested

All scenarios use this test code:

const functions = require('firebase-functions');
const admin = require('firebase-admin');
admin.initializeApp();

console.log("Code Reloaded:", new Date());

exports.simpleFunction = functions.https.onRequest(async (req, res) => {
    console.log('simpleFunction');
    res.json({ name: 'simpleFunction', date: new Date() });
});

exports.slowFunction = functions.https.onRequest(async (req, res) => {
    console.log('slowFunction');
    setTimeout(() => {
        res.json({ name: 'slowFunction', date: new Date() });
    }, 5000);
});

exports.functionThatSometimesThrows = functions.https.onRequest(async (req, res) => {
    console.log('functionThatSometimesThrows');
    const rand = Math.random();
    if (rand <= 0.33) {
        throw new Error('Ooops I did a bad thing!');
    }

    res.json({ name: 'functionThatSometimesThrows', date: new Date() });
});

exports.firestoreWriter = functions.https.onRequest(async (req, res) => {
    console.log('firestoreWriter');
    await admin.firestore().doc('/foo/bar').set({ date: new Date() });
    console.log(`Wrote to /foo/bar`);
    res.json({ name: 'firestoreWriter', date: new Date() });
});

exports.firestoreReader = functions.firestore.document('/foo/bar').onWrite(async (change, ctx) => {
    console.log('firestoreReader');
    console.log(`Detected change at ${change.after.ref.path}`);
    return true;
});

Repeated execution, same code
Notice the code above the function definition is only executed one time.

>  Code Reloaded: 2019-10-18T22:12:27.050Z
i  functions: Beginning execution of "simpleFunction"
>  simpleFunction
i  functions: Finished "simpleFunction" in ~1s
i  functions: Beginning execution of "simpleFunction"
>  simpleFunction
i  functions: Finished "simpleFunction" in ~1s
i  functions: Beginning execution of "simpleFunction"
>  simpleFunction
i  functions: Finished "simpleFunction" in ~1s

Repeated execution with a code change in the middle
Notice that the code reloads and then the console.log is different. Unfortunately we need to do two reloads of the code after a file save: one "diagnostic" to do trigger discovery and then a second one to actually run the function for the first time.

>  Code Reloaded: 2019-10-18T22:12:59.143Z
i  functions: Beginning execution of "simpleFunction"
>  simpleFunction
i  functions: Finished "simpleFunction" in ~1s
>  Code Reloaded: 2019-10-18T22:13:06.256Z
>  Code Reloaded: 2019-10-18T22:13:11.251Z
i  functions: Beginning execution of "simpleFunction"
>  simpleFunction - xxx
i  functions: Finished "simpleFunction" in ~1s
i  functions: Beginning execution of "simpleFunction"
>  simpleFunction - xxx
i  functions: Finished "simpleFunction" in ~1s
i  functions: Beginning execution of "simpleFunction"
>  simpleFunction - xxx
i  functions: Finished "simpleFunction" in ~1s

Function that sometimes throws errors
This test is meant to show how an error thrown in a function will cause the worker to be discarded.

>  Code Reloaded: 2019-10-18T22:15:04.667Z
i  functions: Beginning execution of "functionThatSometimesThrows"
>  functionThatSometimesThrows
i  functions: Finished "functionThatSometimesThrows" in ~1s
i  functions: Beginning execution of "functionThatSometimesThrows"
>  functionThatSometimesThrows
⚠  functions: Error: Ooops I did a bad thing!
    at exports.functionThatSometimesThrows.functions.https.onRequest (/tmp/tmp.e4vfaKijHH/functions/index.js:23:15)
    at Run (/usr/local/google/home/samstern/Projects/firebase/firebase-tools/lib/emulator/functionsEmulatorRuntime.js:612:20)
    at /usr/local/google/home/samstern/Projects/firebase/firebase-tools/lib/emulator/functionsEmulatorRuntime.js:586:19
    at Generator.next (<anonymous>)
    at /usr/local/google/home/samstern/Projects/firebase/firebase-tools/lib/emulator/functionsEmulatorRuntime.js:7:71
    at new Promise (<anonymous>)
    at __awaiter (/usr/local/google/home/samstern/Projects/firebase/firebase-tools/lib/emulator/functionsEmulatorRuntime.js:3:12)
    at Run (/usr/local/google/home/samstern/Projects/firebase/firebase-tools/lib/emulator/functionsEmulatorRuntime.js:579:12)
    at /usr/local/google/home/samstern/Projects/firebase/firebase-tools/lib/emulator/functionsEmulatorRuntime.js:611:15
    at Generator.next (<anonymous>)
⚠  Your function was killed because it raised an unhandled error.
>  Code Reloaded: 2019-10-18T22:15:11.150Z
i  functions: Beginning execution of "functionThatSometimesThrows"
>  functionThatSometimesThrows
i  functions: Finished "functionThatSometimesThrows" in ~1s

Slow Function
Here I call the same function twice before the first one finishes, which means that we need a second worker (scale up) and therefore the globals are loaded twice. However a third invocation that waits for one to finish re-uses an existing idling worker.

>  Code Reloaded: 2019-10-18T22:16:57.435Z
i  functions: Beginning execution of "slowFunction"
>  slowFunction
>  Code Reloaded: 2019-10-18T22:16:58.933Z
i  functions: Beginning execution of "slowFunction"
>  slowFunction
i  functions: Finished "slowFunction" in ~5s
i  functions: Finished "slowFunction" in ~5s
i  functions: Beginning execution of "slowFunction"
>  slowFunction
i  functions: Finished "slowFunction" in ~5s

Background Triggers
Just to show that this all works for non-HTTP triggers. Each separate trigger has its own worker pool so running this chain twice causes two code reloads (would be 4 in the old system).

>  Code Reloaded: 2019-10-18T22:18:10.268Z
i  functions: Beginning execution of "firestoreWriter"
>  firestoreWriter
>  Wrote to /foo/bar
i  functions: Finished "firestoreWriter" in ~1s
>  Code Reloaded: 2019-10-18T22:18:11.060Z
i  functions: Beginning execution of "firestoreReader"
>  firestoreReader
>  Detected change at foo/bar
i  functions: Finished "firestoreReader" in ~1s
i  functions: Beginning execution of "firestoreWriter"
>  firestoreWriter
>  Wrote to /foo/bar
i  functions: Beginning execution of "firestoreReader"
>  firestoreReader
>  Detected change at foo/bar
i  functions: Finished "firestoreReader" in ~1s
i  functions: Finished "firestoreWriter" in ~1s

Sample Commands

N/A

@googlebot googlebot added the cla: yes Manual indication that this has passed CLA. label Oct 17, 2019
@coveralls
Copy link

coveralls commented Oct 17, 2019

Coverage Status

Coverage increased (+1.0%) to 66.197% when pulling 0d27e12 on ss-runtime-workers into c9019ff on master.

@samtstern samtstern mentioned this pull request Oct 18, 2019
3 tasks
@samtstern samtstern changed the title [WIP] Re-use runtime workers Re-use runtime workers Oct 18, 2019
src/emulator/functionsEmulator.ts Outdated Show resolved Hide resolved
src/emulator/functionsEmulatorRuntime.ts Outdated Show resolved Hide resolved
src/emulator/functionsEmulatorRuntime.ts Outdated Show resolved Hide resolved
src/emulator/functionsEmulatorRuntime.ts Outdated Show resolved Hide resolved
});
}

getIdleWorker(triggerId: string | undefined): RuntimeWorker | undefined {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this accept undefined? I think I can see the use of returning undefined, but I don't see any use case in your code of accepting undefined...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FunctionsRuntimeBundle has triggerId?: string. A bundle with no triggerId means "I want to run the runtime but no function in particular. It's used to diagnose the user environment.

src/emulator/functionsRuntimeWorker.ts Outdated Show resolved Hide resolved
src/emulator/functionsRuntimeWorker.ts Outdated Show resolved Hide resolved
src/emulator/functionsEmulator.ts Outdated Show resolved Hide resolved
src/emulator/functionsRuntimeWorker.ts Outdated Show resolved Hide resolved
Copy link
Member

@yuchenshi yuchenshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM with a few comments.

// this log entry to happen during the readying.
const triggerLogPromise = waitForLog(runtime.events, "SYSTEM", "triggers-parsed");

// TODO(samstern): Is this a race condition? Could 'ready' happen before we're listening for it?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a race condition right now since we attach the listener synchronously but it's super hard to tell until you go and inspect the code. If we use observables, I'd suggest BehaviorSubject for the state so we don't have to do this. Otherwise, please make the worker expose a promise like worker.ready and use the listener in the worker constructor to maintain it, so we keep the concern local and easy to reason about. In fact, I may suggest to drop the waitForSystemLog and just expose promises for anything we want to listen to.

Copy link
Contributor Author

@samtstern samtstern Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok this sounds like a big (but good) change so I need your help unpacking it:

  1. What is BehaviorSubject?
  2. What do you mean "use the listener in the worker constructor to maintain it"?
  3. In fact, I may suggest to drop the waitForSystemLog and just expose promises for anything we want to listen to. --> waitForSystemLog waits for the next log of a certain type so a promise isn't a suitable replacement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also since I see you approved: do you think these improvements should be made in this PR or should they be future improvements?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, the improvements can totally be done in a separate PR, and I approve this PR as-is. The log race condition issue is not the focus here and we can address that later.

I was suggesting BehaviorSubject from rxjs, but since we don't actually use Observables, I'd drop that idea since I don't want more things that we need to keep in mind when navigating the code base.

In details, I'd suggest that we keep the logic to wait on logs within FunctionsRuntimeInstance. In addition to .exit, it should also expose .ready and .triggerParsed, both promises. The promises themselves can of course be driven by logs, but we should not call waitForLog outside FunctionsRuntimeInstance. If we need more waits, we should expose each as a promise. In this way, we don't need to reason about the ordering outside.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 that makes sense, thanks for the explanation! Will do those as a follow-up.

src/emulator/functionsRuntimeWorker.ts Outdated Show resolved Hide resolved
@lookfirst
Copy link

Just curious, why not implement the nodejs cluster api?

@samtstern
Copy link
Contributor Author

@lookfirst in our case the "hub" and the runtime process need to be totally separate environments. In the runtime processes we need to be able to have a separate require cache, mock out libraries, etc. As far as I know this is not possible with cluser.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Manual indication that this has passed CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

code reloads on every request in 6.10.0
8 participants