Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from callbacks to async/await #3559

Merged
merged 58 commits into from Mar 7, 2019

Conversation

Projects
None yet
2 participants
@muxator
Copy link
Contributor

muxator commented Feb 21, 2019

This PR represents the line of work that refactors Etherpad to use async/await calls instead of callbacks, vastly simplifying the code, as per #3540.

This represents a vast change, and by its nature requires a lot of testing, hence its WIP status. It is currently deployed on https://beta.etherpad.org. It will be merged as soon as deemed sufficiently stable.

It has the priority over other work: present and future PRs, in case of conflicting changes, will need to rebase themselves over this work.

The development is by contributor @raybellis.
The branch will be periodically force-pushed if needed, please update often.

raybellis added some commits Jan 18, 2019

pluginfw/hooks.js: allow returning a Promise in aCallFirst(), aCallAll()
Since this code can end up loaded in browsers when using client side plugins,
avoid use of ES6 syntax features such as arrow functions until MSIE support is
finally dropped.
pluginfw/installer.js: use Promise version of hooks.aCallAll() in ins…
…tall(), uninstall()

We cannot use arrow functions in this file, because code in /src/static can end
up being loaded in browsers, and we still support IE11.
start using "thenify" to support callback and promises
PadManager.sanitizePadId() can't use thenify: single arg callback
plugins download and search: converted to Promises
Also fixed a bug where the system would make a request to the central server for
the plugin list for every search even if the list was already cached.
tests.js: remove use of async.js
Use real `async` instead of async.js where applicable.
The `getPluginTests()` function was never truly async anyway because it only
contains calls to synchronous `fs` modules.
access controls: promisification
`getPadAccess()` (src/node/padaccess.js) is now "promise only", resolving to
`true` or `false` as appropriate, and throwing an exception if there's an
error.

The two call sites (padreadonly.js and importexport.js) updated to match.
db/SessionStore.js: do not migrate to Promises. Make optional all(), …
…clear() and length()

1. This module was not migrated to Promises, because it is only used via
   express-session, which can't actually use promises anyway.

2. all(), clear() and length() depend on the presence of the `db.forEach()`
   function, which in ueberdb2 doesn't even exist.

   Fortunately those three methods are optional, so I made their existence
   conditional on the presence of `db.forEach`.

3. in SessionStore.clear(), replaced a call to db.db.remove() with db.remove()
db/DB.js: add Promise-only API methods
Promisified methods:
  - get()
  - set()
  - findKeys()
  - getSub()
  - setSub()
  - remove()
  - doShutdown()
db/AuthorManager.js: renamed doesAuthorExists() -> doesAuthorExist()
Removed the 's' for consistency with the other `doesFooExist()` manager calls.
Retained an alias for plugins that might be using it.
db/PadManager.js: renamed doesPadExists() -> doesPadExist()
Removed the 's' for consistency with the other `doesFooExist()` manager calls.
Retained an alias for plugins that might be using it.
db/PadManager.js: convert sanitizePadId() to Promises
The function is now iterative rather than recursive.
db/PadMessageHandler.js: partial conversion to Promises
Converted those functions that API.js still depends on, and others that at this
point are never called via the nodeback mechanism.
db/API.js: complete conversion to Promises
This patch also contains significant refactoring relating to error checking of
arguments supplied to the functions (e.g. rev) facilitated by use of `throw`
instead of nodeback errors.
db/SecurityManager.js: converted checkAccess() to pure Promises
Also converted the handler functions that depend on checkAccess() into async
functions too.

NB: this commit needs specific attention to it because it touches a lot of
security related code!
db/AuthorManager.js: further conversion
also fixes a missing await calling `.createAuthor` in db/Pad.js

@muxator muxator force-pushed the raybellis:async-PR branch 4 times, most recently from 950eae5 to b723ef0 Feb 27, 2019

prepare to async: stricter checks
This change is in preparation of the future async refactoring by Ray. It tries
to extract as many changes in boolean conditions as possible, in order to make
more evident identifying eventual logic bugs in the future work.

This proved already useful in at least one case.

BEWARE: this commit exposes an incoherency in the DB API, in which, depending
on the driver used, some functions can return null or undefined. This condition
will be externally fixed by the final commit in this series ("db/DB.js: prevent
DB layer from returning undefined"). Until that commit, the code base may have
some bugs.

@muxator muxator force-pushed the raybellis:async-PR branch 7 times, most recently from 22c9cb9 to 60decc3 Mar 1, 2019

@muxator

This comment has been minimized.

Copy link
Contributor Author

muxator commented Mar 3, 2019

Hi, the PR is almost ready to be integrated. I just need a review from @ray.

While writing async, some boolean expressions where made stricter (e.g. == -> ===, != -> !==). This is generally a very good thing, but on a code base with lots of technical debt like ours, this can resurface latent bugs (this happened at least once).

  1. For this reason, I have tried to extract & centralize as many such changes as possible in a single commit, namely d520e28. These changes were all present in Ray's original work.

  2. While doing this, I noticed that in some files this transformation was not symmetrical (some places were made stricter, others were not). I've done another commit (8ecd0dd) which tentatively makes the situation even. None of these changes were part of Ray's work, yet I think we could integrate them.

I need a review on points 1 and 2. If Ray recognizes 1 as his beloved brainchild, and thinks that 2 may complete it, I'll fold the two changes and merge the PR.

@raybellis

This comment has been minimized.

Copy link
Contributor

raybellis commented Mar 3, 2019

I think your author === null check may result in a similar to the bug that I caused inadvertently with the SessionManager.

If you like, I'm happy to audit all calls to db.get* and ensure that they work with either undefined or null values, given that we can't be sure which will be returned by ueberDB.

I was going to suggest that just testing for if (!value) would suffice, but on reflection that may cause issues if the result value can legally be 0 or "", so an explicit check for val === null || val === undefined may be preferred. The alternative is to explicitly require that loose checks are annotated to indicate that they are intentionally loose.

@muxator

This comment has been minimized.

Copy link
Contributor Author

muxator commented Mar 4, 2019

That's true: using author === null could introduce regressions.

However, in its current status on async (async-PR closely tracks it) AuthorManager.js has only strict checks (X !== null or X === null) except for the one in mapAuthorWithDBKey().

The main pain point is probably ueberDB, but I do not want to digress.

On commenting lax comparison: in async-PR, I agree, indeed I had already put such a comment in SessionManager.js:

/*
   * In some cases, the db layer could return "undefined" as well as "null".
   * Thus, it is not possible to perform strict null checks on group2sessions.
   * In a previous version of this code, a strict check broke session
   * management.
   *
   * See: https://github.com/ether/etherpad-lite/issues/3567#issuecomment-468613960
   */
if (!group2sessions || !group2sessions.sessionIDs) {

I think we have to decide what to do with AuthorManager, as well. All the original comparisons were lax, but if you have a look at d520e28 (which, I remember, just extracts changes made on async), all but one were made strict. Could this introduce latent bugs, as well?

If you like, I'm happy to audit all calls to db.get* and ensure that they work with either undefined or null values, given that we can't be sure which will be returned by ueberDB.

That would be welcome: in that case we could reach a more coherent situation.

If you do not have time, or if we want to postpone this, I can also make a ugly proposal in the name of pragmatism:

  • put lax comparisons wherever the old code base was already lax. This would bring us nearer to bug-for-bug compatibility
  • put comments explaining our reasoning
  • merge the PR so we can start moving forward

If you think it makes sense, I can compare develop with the asynchronous version, and worsen its purity in the name of compatibility.

@raybellis

This comment has been minimized.

Copy link
Contributor

raybellis commented Mar 5, 2019

I've got time to do that audit. How (and where) would you prefer to see any relevant commits made?

@raybellis

This comment has been minimized.

Copy link
Contributor

raybellis commented Mar 5, 2019

For reference, here's a comparison of ueberDB2's behaviour with dirty vs mysql:

DirtyDB

get undefined (random) key
-> undefined
set explicit value and verify
-> pass
set value to undefined
-> undefined
set value to null
-> undefined

MySQL

get undefined (random) key
-> null
set explicit value and verify
-> pass
set value to undefined and get
-> undefined
set value to null and get
-> null
db/DB.js: prevent DB layer from returning undefined
ueberDB2 can return either undefined or null for a missing key, depending on
which DB driver is used. This patch changes the promise version of the API so
that it will always return null.
@raybellis

This comment has been minimized.

Copy link
Contributor

raybellis commented Mar 5, 2019

I found an alternate solution - see 32cebc4

@muxator muxator force-pushed the raybellis:async-PR branch from 60decc3 to ac7663c Mar 7, 2019

@muxator

This comment has been minimized.

Copy link
Contributor Author

muxator commented Mar 7, 2019

Nice. Pulled in with ac7663c.

@muxator muxator marked this pull request as ready for review Mar 7, 2019

@muxator muxator merged commit 4c45ac3 into ether:develop Mar 7, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@muxator

This comment has been minimized.

Copy link
Contributor Author

muxator commented Mar 7, 2019

It's done.

I think it's worth citing from the commit message:

Merge pull request #3559 from raybellis/async-PR

With this commit, that closes #3540, we pay the first big slice of our technical debt. In this line of work we streamlined the code base, reducing its size by 15-20% and making it more understandable at the same time.

The changes were audited and tested collaboratively and are deemed sufficiently stable for being merged.

Known issues:

  • plugin compatibility is still not perfect
  • the error handling path needs to be improved

This is an important day for Etherpad: thanks, Ray!

@muxator muxator deleted the raybellis:async-PR branch Mar 7, 2019

@muxator muxator added this to the 1.8 milestone Mar 29, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.