Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explicitly include a single dialect #452

Closed
rogerfar opened this issue Sep 20, 2022 · 12 comments · Fixed by #511 or #517
Closed

Explicitly include a single dialect #452

rogerfar opened this issue Sep 20, 2022 · 12 comments · Fixed by #511 or #517
Labels

Comments

@rogerfar
Copy link

We are using this library in production in the browser, so bundle sizes matter.

When I analyze my bundle app, the sql-formatter library uses about 443kb, but a large part of this is all the dialetcs:

image

We only need 1 dialect, so there is no need for the others to be in the bundle.

Would it be possible to have an option to explicitly import a language so that the rest can be tree-shaken from the bundle?

I was thinking, maybe something like this:
import { format } from 'sql-formatter/sqlite';

@nene
Copy link
Collaborator

nene commented Sep 21, 2022

Yeah, that's definitely a feature that would be nice to have.

I did a quick comparison of the minified bundle sizes:

  • with all dialects included: 256KiB
  • with only the default "sql" dialect: 77KiB

Possibly a better approach syntax-wise would be to separately export the core engine and language definitions. Something like:

import { format } from 'sql-formatter/core';
import { sqlite } from 'sq;-formatter/sqlite';

format("SELECT *", { language: sqlite });

The language option already accepts passing in a custom dialect implementation, so that would work out more naturally.

However, currently this feature isn't really a priority. And for the time being, it's the first time such code-splitting has been requested.

To put things in perspective, the node-sql-parser library has the same issue and in its case each language definition is way larger than in sql-formatter.

@verhovsky
Copy link
Contributor

verhovsky commented Oct 5, 2022

highlight.js provides this API for this:

import hljs from 'highlight.js/lib/core'
import python from 'highlight.js/lib/languages/python'
hljs.registerLanguage('python', python)
hljs.highlightElement('<some Python in html>', {language: 'python'})

I wouldn't want to pass Formatter objects everywhere for the same reason you didn't make that the API initially. It's tedious to import and pass around an object instead of just a string. If I have a bunch of SQL snippets and I'm storing the SQL dialect with the snippet, if I'm doing anything with those snippets besides passing them to sql-formatter, I'm storing the information "what dialect is this SQL snippet in?" as a string, not as an sql-formatter Formatter object. So that means in my application I'm going to have to have a mapping of languageName -> FormatterObject somewhere, but it makes much more sense for that mapping to live in the library, not everyone's applications.

I don't really know what to suggest here because obviously it's nicer that the library exports a function instead of an object with a .format() property. You could probably have the format function have the registerLanguage function as a property, like

import { format } from 'sql-formatter/core';
import { sqlite } from 'sq;-formatter/sqlite';

format.registerLanguage("sqlite", sqlite);
format("SELECT *", { language: 'sqlite' });

but even if that's possible it's not a good idea.

@nene
Copy link
Collaborator

nene commented Oct 5, 2022

Hey, thanks for the highlight.js example.

I agree with your concern that it could be inconvenient to perform this language mapping in code. However, there is a fundamental problem with the way highlight.js has solved this. What if in one part of code you do hljs.registerLanguage('lisp', commonLisp) and in another part you do hljs.registerLanguage('lisp', scheme). And when you then finally call hljs.highlightElement(el, {language: 'lisp'}) you can't be certain which way it will get highlighted.

That's definitely a contrived example and very unlikely to happen in practice, but it's nevertheless always a possibility with this kind of shared global state solution. Perhaps it's worth the tradeoff, but perhaps not. Not really sure.

Regarding your concern over attaching registerLanguage() to format, there's really no need to do that. We can simply export it separately:

import { format, registerLanguage } from 'sql-formatter/core';
import { sqlite } from 'sql-formatter/sqlite';

registerLanguage("sqlite", sqlite);
format("SELECT *", { language: 'sqlite' });

Then again, the mapping isn't really that hard to do. Just wrap the calling of sql-formatter into your own function:

import { format, sqlite, postgresql, transactsql } from "sql-formatter/core";

const languageMap = { sqlite, postgres, transactsql };

export function formatSql(code: string, lang: keyof typeof languageMap) {
  return format(code, { language: languageMap[lang], tabWidth: 4, keywordCase: 'upper' });
}

Which is probably a good idea anyway, as you likely want to apply the same settings to all languages you're formatting, plus you're shielding the rest of your code base from any changes that might happen in the library.

@verhovsky
Copy link
Contributor

What if in one part of code you do hljs.registerLanguage('lisp', commonLisp) and in another part you do hljs.registerLanguage('lisp', scheme)

This is just setting a key on a dictionary. I would expect the last one I ran (scheme) to happen, and if I'm doing this asynchronously or something (is that possible in JS?) then I can expect that setting the same key on a dictionary asynchronously is going to lead to problems, that's not sql-formatter's fault.

the mapping isn't really that hard to do

I'm only suggesting that having a way to modify the library's internal mapping

export const formatters = {
bigquery: BigQueryFormatter,
db2: Db2Formatter,
hive: HiveFormatter,
mariadb: MariaDbFormatter,
mysql: MySqlFormatter,
n1ql: N1qlFormatter,
plsql: PlSqlFormatter,
postgresql: PostgreSqlFormatter,
redshift: RedshiftFormatter,
singlestoredb: SingleStoreDbFormatter,
snowflake: SnowflakeFormatter,
spark: SparkFormatter,
sql: SqlFormatter,
sqlite: SqliteFormatter,
transactsql: TransactSqlFormatter,
trino: TrinoFormatter,
tsql: TransactSqlFormatter, // alias for transactsql
};

and having a way to load it with an empty mapping would make code just a little simpler for some users, but if you don't think it's the right call then whatever API lets me not have to load every language would be great. I'm also just using one formatter, so this doesn't matter to me.

you likely want to apply the same settings to all languages you're formatting

In my case you're right that's what I'm already doing, but there's probably people with simpler usecases that don't need to do that.

@nene
Copy link
Collaborator

nene commented Oct 8, 2022

Causing this mapping conflict in one's own code is, I'd guess, a fairly unlikely scenario, where one is simply shooting himself in the foot. The more problematic case is when you use sql-formatter and also use a library which internally uses sql-formatter. In that case the library can screw up the mapping you have defined, and you'll be having very hard time figuring out why the formatter is not working as it should (you might not even be aware that there's some library which is also using sql-formatter).

That's a fundamental problem with shared mutable state, and the best practice is to avoid it in the first place.

@verhovsky
Copy link
Contributor

verhovsky commented Oct 8, 2022

If we're talking about code that imports sql-formatter and imports a library that itself imports sql-formatter, wouldn't those be two different instances of the library and have their own, unshared state?

And actually, since formatters is exported, sql-formatter already lets users do this

import { format, formatters } from 'sql-formatter';
formatters.trino = {}
console.log(format('SELECT * FROM tbl', { language: 'trino' })); // errors when it tries to use {} as a Formatter

and get it to use a custom formatter. You can even almost do what I'm suggesting letting people do (through a function in the library, but I guess just through directly editing the dictionary is good too)

import { format, formatters } from 'sql-formatter';
formatters.some_other_language = {}
console.log(format('SELECT * FROM tbl', { language: 'some_other_language' }));

but it doesn't work just because this check

if (typeof cfg.language === 'string' && !supportedDialects.includes(cfg.language)) {
throw new ConfigError(`Unsupported SQL dialect: ${cfg.language}`);
}

doesn't expect formatters to change after initialization.

You could have this API:

import { format, formatters } from 'sql-formatter';
console.log(formatters) // all the formatters
console.log(format('SELECT * FROM tbl', { language: 'redshift' }));
import { format, formatters } from 'sql-formatter/core';
import { TrinoFormatter } from 'sql-formatter/languages/trino';
console.log(formatters) // empty {}
formatters.trino = TrinoFormatter
console.log(format('SELECT * FROM tbl', { language: 'trino' }));

@nene
Copy link
Collaborator

nene commented Oct 8, 2022

If we're talking about code that imports sql-formatter and imports a library that itself imports sql-formatter, wouldn't those be two different instances of the library and have their own, unshared state?

This depends on whether they're using the same or different versions of the library. And also on how exactly the versions are specified, like when one library requires ^2.0.0 and another requires ^2.1.0, then npm can only download 2.1.0 to satisfy both. In that case the code and also the internal state of the library is shared.

And actually, since formatters is exported, sql-formatter already lets users do this

Good catch. That's been exported by mistake. Should remove.

@nene
Copy link
Collaborator

nene commented Nov 4, 2022

This feature is now implemented and included to 12.0.0-beta.4 release.

See the docs for details.

@rogerfar
Copy link
Author

rogerfar commented Nov 7, 2022

@nene I tried the new feature, it works fine, but the output is the same as before. When using the code from the example it doesn't treeshake the other languages away just yet.

@nene
Copy link
Collaborator

nene commented Nov 7, 2022

Thanks. Will investigate...

@nene
Copy link
Collaborator

nene commented Nov 8, 2022

Found the culprit. Did another release that should fix the problem: 12.0.0-beta.6

Let me know if this works better.

@nene nene reopened this Nov 8, 2022
@rogerfar
Copy link
Author

rogerfar commented Nov 8, 2022

Looks good, before:
image
312.63KB

Now:
image
84.36KB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants