Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve start-up time #60

Closed
nokome opened this issue May 15, 2019 · 4 comments · Fixed by #138
Closed

Improve start-up time #60

nokome opened this issue May 15, 2019 · 4 comments · Fixed by #138

Comments

@nokome
Copy link
Member

nokome commented May 15, 2019

The start up time of the CLI is curiously high. On Linux, using the binary it takes about 1s:

bash -c "time for i in {1..10}; do bin/stencila-convert --version; done"
0.33.0
0.33.0
0.33.0
0.33.0
0.33.0
0.33.0
0.33.0
0.33.0
0.33.0
0.33.0

real    0m9.936s
user    0m12.604s
sys     0m0.868s

In comparison, nixster takes 0.1s and dockter takes 0.25s (both are built using pkg).

I experimented with commenting out import './boot' from cli.ts and the time (with a binary) was almost exactly the same. However when I commented out import { convert } from './index' it fell to 0.15s. So it may simply be the amount of code that needs to be interpreted by Node at startup time that is causing the slowness. Would using dynamic require() calls be a way around this?

Note sure if we should address this now - that could be premature optimisation.

@alex-ketch
Copy link
Contributor

alex-ketch commented May 15, 2019

Dynamic imports seem like a good first step, from your investigation seems wise not to import every available format's parser/unparser on load :)
As further measures we can layer on a build tool to handle tree-shaking, so a minor nitpick would be to avoid using CommonJS style requires and use ES module imports to make implementation easier down the road.


Update from #65 (comment):

I don't think it's necessary to make changes for this PR, but I do wonder how much our use of sync and eager reading of files contributes towards the sluggishness described in #60.
Overall I'd recommend we follow idiomatic node paradigms and use async methods unless we have strong reason not to, especially for file system/IO operations.

@nokome
Copy link
Member Author

nokome commented Jun 4, 2019

I looked into this further because, apart from the bad UX, the observed sluggishness slows down test-driven development. I was hoping to get an easy to implement speed up by fixing the synchronous loading of JSON Schemas in util.ts. However, some quick tests suggested that was not going to be the case.

I therefore did some experiments to try and isolate which modules, if any, where causing slowness. In index.ts I commented out all the import of codec modules as well as process process.ts. I then added them in successively, begining with the simplest ones and where there were cross dependencies adding the base codec first e.g. xlsx before csv. During this process, there was some refactoring done to remove cross dependencies between codes and the util module.

In each iteration, TS was compiles to JS and 10 iterations done (with cli.ts modified so it wouldn't error with no args):

npm run build:dist
bash -c "time for i in {1..10}; do node dist/cli ; done"

In the final iterations, I removed some of the slowest-to-load modules just to confirm this. Note that util isn't a codec but is the module that has synchronous reading of files at startup time.

Codec Time (s) Increase (s)
None 0.9 -
+ json 1.0 0.1
+ yaml 1.2 0.2
+ json5 1.3 0.1
+ xlsx 3.7 2.4
+ csv,ods 3.7 0.0
+ tdp 5.3 1.6
+ md 6.3 1.0
+ demo-magic 6.3 0.0
+ html 10.9 4.6
+ gdoc 10.9 0.0
+ rpng 11.5 0.6
+ pdf 11.5 0.0
+ pandoc 11.5 0.0
+ docx,jats,latex, odt 11.7 0.2
+ util 13.5 1.8
- html 9.7 -3.8
- xlsx, ods, tdp, csv 5.8 -3.9
- util 3.4 -2.4

Some summary thoughts:

  • there is no single module that is causing slowness
  • using lazy loading of codec modules will probably give best startup time improvement
  • removing sync reads in util.ts is not too difficult and will certainly help
  • the html module may to be slow to load due to use of js-dom
  • codec modules should import dump etc from index.tsinstead of using other codec modules directly to take advantage of lazy loading. For example, currently tdp does this:
import * as csv from './csv'
//...
await dump(await csv.encode(datatable))

but it should do this:

import * as dump from '.'
//...
await dump(datatable, 'csv')

@nokome
Copy link
Member Author

nokome commented Jun 24, 2019

#125 makes some substantial improvements to startup times through dynamic loading of codec modules. Profiling suggests that the next candidate for optimisation should be to make the loading and compilation of JSON schemas by Ajv both lazy and asynchronous. The offending lines are

encoda/src/util.ts

Lines 19 to 25 in 484e461

// Load all schemas for use in by Ajv validator
const schemas = globby
.sync(path.join(built, '*.schema.json'))
.map(file => fs.readJSONSync(file))
// Read in aliases for use in coerce function
const aliases = fs.readJSONSync(path.join(built, 'aliases.json'))

and

encoda/src/util.ts

Lines 69 to 73 in 484e461

// Cached JSON Schema validation functions
const validators = new Ajv({
schemas,
jsonPointers: true
})

(and a similar block for "mutators")

I think the most most pragmatic approach to improving performance here would be to make the validate and mutate functions (and those that rely on them) async and instead of loading all the schemas at once and throwing an error if they can not be found:

encoda/src/util.ts

Lines 84 to 87 in 484e461

const validator = validators.getSchema(
`https://stencila.github.com/schema/${type}.schema.json`
)
if (!validator) throw new Error(`No schema for type "${type}".`)

we should lazily and asynchronously read the .schema.json file and compile it e.g.

if (!validator) {
  const schema = await fs.readJSON(path.join(built, `${type}.schema.json`))
  validator = ajv.addSchema(schema)
                  .compile(schema);
}

@stencila-ci
Copy link
Collaborator

🎉 This issue has been resolved in version 0.53.4 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants