Add way to style identifiers. #148

wkeese · 2023-10-04T12:44:19Z

Replace "default" token with "whitespace" and "identifier" tokens, with fallback to "unknown" token.

Also, change backticked identifiers like `foo` to be classified as "identifier" rather than "string".

This allows for identifiers to be styled independently from strings and whitespace.

It also simplifies getSegments() from 30 lines down to 5, by removing the special-case code for the "default" token.

Fixes #147.

scriptcoded · 2023-10-04T13:54:46Z

Thanks, that was fast! All in all I think this is a good change, but I'll have to read through and check the code when I get the time.

Since this changes the public API this is considered a breaking change. You might see that the commit lint failed so please see the contribution guidelines for information on the commit message format used. This also means you need to add something like the following footer to your commit:

BREAKING CHANGE: The `default` segment has been split into `string`, `whitespace` and `unknown`.

I'm wondering if there is some way of releasing this in a non-breaking fashion, but I don't think there is. I don't like pushing breaking changes too often so maybe it's time for me to have another look at how I manage releases so there can be a next experimental branch or something like that. I think it'd be good to start bunching these bigger changes together.

wkeese · 2023-10-04T14:14:13Z

Thanks.

You might see that the commit lint failed so please see the contribution guidelines for information on the commit message format used.

My bad on that one, I'll update the message.

I'm wondering if there is some way of releasing this in a non-breaking fashion, but I don't think there is.

I've also been wondering about that.

The highlight() function works largely the same as before, both normal mode and HTML mode, except (what I consider to be) a bug fix to stop classifying identifiers as strings. In other words, SQL like select * from EMP where NAME="John Smith" will get highlighted the same as before, i.e. no syntax highlighting for EMP or NAME.

But getSegments() does have some new output types (and loses the "default" output type), which could break some code.

Replace "default" segment with "whitespace" and "identifier" segments, with fallback to "unknown" segment. Also, classify backticked identifiers like `foo` as "identifier" rather than "string". This allows for identifiers to be styled independently from strings and whitespace. It also simplifies getSegments() from 30 lines down to 5, by removing the special-case code for the "default" segment. BREAKING CHANGE: The `default` segment has been split into `identifier` and `whitespace` segments. There's also a new `unknown` segment that will only show up for malformed SQL such as an unclosed string. However, the highlight() function works largely the same as before, both normal mode and HTML mode, except for the bug fix to stop classifying identifiers as strings. In other words, SQL like select * from EMP where NAME="John Smith" will get highlighted the same as before, i.e. no syntax highlighting for EMP or NAME. Fixes scriptcoded#147.

scriptsbot · 2023-10-19T02:04:50Z

This pull request has been marked as stale because it has been open for 14 days with no activity. Remove the stale label or comment or this will be closed in 5 days.

wkeese · 2023-10-19T05:22:33Z

Anything you need from me on this?

scriptcoded · 2023-10-24T12:48:50Z

Hi again! I'm really sorry for letting you wait so long. Managing life, work and non-paid open source work can be a struggle sometimes. Should've left an update either way.

I've switched to using semantic-release for managing the package now which makes it possible to distribute using dist-tags on NPM. So I'll merge this to the beta branch.

I also took the liberty of adding identifer to the default colors and types myself and I'll be merging this using a squash instead. So if you feel good with these changes I'll merge this to the beta branch so you can start using it if you're dependent on these changes.

Thanks so much for your patience!

wkeese · 2023-10-24T13:01:22Z

Hi again! I'm really sorry for letting you wait so long. Managing life, work and non-paid open source work can be a struggle sometimes. Should've left an update either way.

No worries, it seems like you're putting a lot of pressure on yourself with that scriptbot 2-week timer. :-)

I also took the liberty of adding identifer to the default colors and types myself and I'll be merging this using a squash instead. So if you feel good with these changes I'll merge this to the beta branch so you can start using it if you're dependent on these changes.

Sure, looks good, I have no opinions about colors.

scriptcoded · 2023-10-24T14:09:41Z

it seems like you're putting a lot of pressure on yourself with that scriptbot 2-week timer. :-)

I added it mostly to be able to close some issues that I didn't get responses on previously. But this time it came back to me 😅 Turns out it's a pretty good reminder.

Anyways, let's merge then! I'll come around to your other issues eventually!

github-actions · 2023-10-24T14:13:01Z

🎉 This PR is included in version 5.0.0-beta.2 🎉

The release is available on:

Your semantic-release bot 📦🚀

Rearrange regexps so "special" regexp catches everything not caught by other regexps. Eliminates the "unknown" segment I added in scriptcoded#148. In practice, the only time we would hit the "unknown" segment is for a few weird character that weren't already caught by the "special" segment, for example, the ? and unclosed ` in "a `?> b". I figured that in those rare cases, we might as well just call those characters "special". Fixes scriptcoded#178, refs scriptcoded#148.

Rearrange regexps so "special" regexp catches everything not caught by other regexps. Eliminates the "unknown" segment I added in scriptcoded#148. In practice, the only time we would hit the "unknown" segment is for a few weird characters that weren't already caught by the "special" segment, for example, the ? and unclosed ` in "a `?> b". I figured that in those rare cases, we might as well just call those characters "special". Fixes scriptcoded#178, refs scriptcoded#148.

Rearrange regexps so "special" regexp catches everything not caught by other regexps. Eliminates the "unknown" segment I added in #148. In practice, the only time we would hit the "unknown" segment is for a few weird characters that weren't already caught by the "special" segment, for example, the ? and unclosed ` in "a `?> b". I figured that in those rare cases, we might as well just call those characters "special". Fixes #178, refs #148.

Rearrange regexps so "special" regexp catches everything not caught by other regexps. Eliminates the "unknown" segment I added in scriptcoded#148. In practice, the only time we would hit the "unknown" segment is for a few weird characters that weren't already caught by the "special" segment, for example, the ? and unclosed ` in "a `?> b". I figured that in those rare cases, we might as well just call those characters "special". Fixes scriptcoded#178, refs scriptcoded#148.

Rearrange regexps so "special" regexp catches everything not caught by other regexps. Eliminates the "unknown" segment I added in #148. In practice, the only time we would hit the "unknown" segment is for a few weird characters that weren't already caught by the "special" segment, for example, the ? and unclosed ` in "a `?> b". I figured that in those rare cases, we might as well just call those characters "special". Fixes #178, refs #148.

# [5.0.0](v4.4.2...v5.0.0) (2024-07-02) * chore!: add support for Node 22 ([9478bf1](9478bf1)) ### Bug Fixes * improve number detection ([02d459a](02d459a)), closes [#149](#149) * improve operator detection ([183a4fb](183a4fb)), closes [#150](#150) * typo in unknown segments ([70af287](70af287)), closes [#148](#148) [#178](#178) [#148](#148) ### Features * add way to style identifiers ([25677d4](25677d4)), closes [#147](#147) * release 5.1.0 ([cb0c0f1](cb0c0f1)) ### BREAKING CHANGES * The `default` segment has been split into `identifier` and `whitespace` segments. There's also a new `unknown` segment that will only show up for malformed SQL such as an unclosed string. However, the highlight() function works largely the same as before, both normal mode and HTML mode, except for the bug fix to stop classifying identifiers as strings. In other words, SQL like select * from EMP where NAME="John Smith" will get highlighted the same as before, i.e. no syntax highlighting for EMP or NAME. * drop support for Node 14.

# [5.0.0](v4.4.2...v5.0.0) (2024-07-02) * chore!: add support for Node 22 ([9478bf1](9478bf1)) ### Bug Fixes * improve number detection ([02d459a](02d459a)), closes [#149](#149) * improve operator detection ([183a4fb](183a4fb)), closes [#150](#150) * typo in unknown segments ([70af287](70af287)), closes [#148](#148) [#178](#178) [#148](#148) ### Features * add way to style identifiers ([25677d4](25677d4)), closes [#147](#147) ### BREAKING CHANGES * The `default` segment has been split into `identifier` and `whitespace` segments. There's also a new `unknown` segment that will only show up for malformed SQL such as an unclosed string. However, the highlight() function works largely the same as before, both normal mode and HTML mode, except for the bug fix to stop classifying identifiers as strings. In other words, SQL like select * from EMP where NAME="John Smith" will get highlighted the same as before, i.e. no syntax highlighting for EMP or NAME. * drop support for Node 14.

# [6.0.0](v5.0.0...v6.0.0) (2024-07-02) ### Bug Fixes * improve number detection ([02d459a](02d459a)), closes [#149](#149) * improve operator detection ([183a4fb](183a4fb)), closes [#150](#150) * typo in unknown segments ([70af287](70af287)), closes [#148](#148) [#178](#178) [#148](#148) ### Features * add way to style identifiers ([25677d4](25677d4)), closes [#147](#147) * release 5.1.0 ([3a58def](3a58def)) ### BREAKING CHANGES * The `default` segment has been split into `identifier` and `whitespace` segments. There's also a new `unknown` segment that will only show up for malformed SQL such as an unclosed string. However, the highlight() function works largely the same as before, both normal mode and HTML mode, except for the bug fix to stop classifying identifiers as strings. In other words, SQL like select * from EMP where NAME="John Smith" will get highlighted the same as before, i.e. no syntax highlighting for EMP or NAME.

scriptcoded mentioned this pull request Oct 4, 2023

Add way to style identifiers (i.e. schema, table, and column names) #147

Closed

wkeese force-pushed the simplify-tokenizer branch from 55de594 to cadb0a9 Compare October 4, 2023 13:57

wkeese force-pushed the simplify-tokenizer branch from cadb0a9 to d6fb836 Compare October 4, 2023 14:19

wkeese mentioned this pull request Oct 5, 2023

improve operator detection #150

Closed

scriptsbot added the Stale label Oct 19, 2023

scriptsbot removed the Stale label Oct 20, 2023

scriptcoded mentioned this pull request Oct 22, 2023

ci: switch to semantic-release #156

Merged

scriptcoded changed the base branch from master to beta October 24, 2023 11:34

scriptcoded added 2 commits October 24, 2023 14:44

fix: add identifier to default colors and types

a5dd7d6

Merge branch 'master' into simplify-tokenizer

2ea8ef1

scriptcoded merged commit 01df1cd into scriptcoded:beta Oct 24, 2023
8 checks passed

github-actions bot added the released on @beta label Oct 24, 2023

wkeese mentioned this pull request Apr 13, 2024

Bug in beta tokenizer #178

Closed

wkeese mentioned this pull request Apr 14, 2024

fix: improve operator detection and fix unknown segment #193

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add way to style identifiers. #148

Add way to style identifiers. #148

wkeese commented Oct 4, 2023

scriptcoded commented Oct 4, 2023

wkeese commented Oct 4, 2023

scriptsbot commented Oct 19, 2023

wkeese commented Oct 19, 2023

scriptcoded commented Oct 24, 2023

wkeese commented Oct 24, 2023

scriptcoded commented Oct 24, 2023

github-actions bot commented Oct 24, 2023

Add way to style identifiers. #148

Add way to style identifiers. #148

Conversation

wkeese commented Oct 4, 2023

scriptcoded commented Oct 4, 2023

wkeese commented Oct 4, 2023

scriptsbot commented Oct 19, 2023

wkeese commented Oct 19, 2023

scriptcoded commented Oct 24, 2023

wkeese commented Oct 24, 2023

scriptcoded commented Oct 24, 2023

github-actions bot commented Oct 24, 2023