Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Add docs for cspell-lib so that it's easier to use it as a library #1813

Open
wujekbogdan opened this issue Sep 29, 2021 · 10 comments

Comments

@wujekbogdan
Copy link
Contributor

wujekbogdan commented Sep 29, 2021

I'm working on a node.js spell-checking service.

I was looking for a good and actively maintained JS spell-checking JS library. It turned out that the most popular lib - typo.js despite having lots of daily downloads is not actively developed. It isn't very powerful either. Other libs I found suffer from the same issue.

I found that cspell is used internally by VSCode, so it seemed to be a perfect candidate, until I found it's not intended to be used as a library. The main purpose, from what I see, is a command line tool.

I started diging into the source code and I found that scpell-lib is a pretty decent tool and has everything I need. Thanks to unit tests I was able to figure out how to put all the pieces together and developed a little proof-of-concept:

import {
  checkText,
  combineTextAndLanguageSettings,
  CompoundWordsMethod,
  createSpellingDictionary,
  finalizeSettings,
  getDefaultSettings,
  getDictionary,
} from 'cspell-lib';
import { SpellingDictionaryCollection } from 'cspell-lib/dist/SpellingDictionary';

/**
 * @param customWords
 * @return {Promise<function(*=): Promise<(null|string)[]>>}
 * @constructor
 */
export const SpellcheckerFactory = async (customWords = []) => {
  const settings = {
    // I'm not sure if I need the entire default settings object
    ...getDefaultSettings(),
    // I want to use the lib just for plain text. I'm not sure if this is the best way to disable programming languages spell-checking
    enabledLanguageIds: [],
  };

  // I'm not sure if passing '' as a second argument is correct
  const fileSettings = combineTextAndLanguageSettings(settings, '', ['plaintext']);
  const finalSettings = finalizeSettings(fileSettings);
  const [dictionary, customDictionary] = await Promise.all([
    // Is it OK to get dictionary before I initialize the custom dictionary?
    getDictionary(finalSettings),
    // I'm not sure if `name` and `source` attributes make any difference.
    createSpellingDictionary(
      customWords,
      'customDictionary',
      'customWords',
      undefined
    ),
  ])
  const dictionariesCollection = new SpellingDictionaryCollection(
    [customDictionary, dictionary],
    'dictionaries'
  );

  const getSuggestion = word => {
    const suggestions = dictionariesCollection.suggest(word, 1, CompoundWordsMethod.SEPARATE_WORDS);
    return suggestions.length ? suggestions[0].word : null;
  };

  return async phrase => {
    const checkedText = await checkText(phrase, fileSettings);
    const errors = checkedText.items.filter(({ isError }) => isError);

    return Promise.all(errors.map(({ text }) => getSuggestion(text)));
  };
};
import { checkSpelling } from './spelchecker';

describe('checkSpelling', () => {
  it('should check spelling', async () => {
    expect(await checkSpelling('zażułć gęśla jaśń')).toEqual(['zażółć', 'gęślą', 'jaźń']);
  });
});

The problem is that I have no idea if what I'm doing is right. It works, but most likely it could be done better/cleaner/more efficiently. I have several doubts - see the comments in the code.

It would be great if cspell-lib was documented. This lib seems to be the best spell-checking lib on the market. It would be nice if we could use it with ease.

@Jason3S
Copy link
Collaborator

Jason3S commented Sep 30, 2021

@wujekbogdan,

Cool idea to create a service. There are a LOT of questions in this request.

A few tips:

Configuration / Settings are your friend

If your custom word list is static, then store it in a text file with one word per line. You can reference it in the settings:

Store all your custom dictionary / settings in a cspell.json or cspell.config.js file and use readSettings function to load them.

If you only want to use your own custom word list, then the following will work:

  const settings = {
    // Needed to load existing dictionaries. Not needed if you only plan to use your own.
    ...getDefaultSettings(),
    // Not needed
    // enabledLanguageIds: [],
    // Optionally your custom words can go here.
    words: customWords // these words will be part of the dictionary returned by getDictionary
  };

I suggest using mergeSettings to build up the settings if you read settings from a file.

const settings = mergeSettings(getDefaultSettings(), readSettings('path to your cspell.config.js`));
// empty '' is fine. The method looks for embedded `cspell` settings in the document. Since you do not
// expect them, no need to send any text.
const fileSettings = combineTextAndLanguageSettings(settings, '', ['plaintext']);

Avoid using compound word suggestions.

Avoid using compound word suggestions, they are very slow. Only use them if you expect to be splitting words.

const suggestions = dictionary.suggest(word, 1);

@Jason3S
Copy link
Collaborator

Jason3S commented Sep 30, 2021

It would be great if cspell-lib was documented. This lib seems to be the best spell-checking lib on the market. It would be nice if we could use it with ease.

I agree.

@wujekbogdan
Copy link
Contributor Author

Thanks a lot for the quick response!

If you only want to use your own custom word list, then the following will work:

Is it faster than the current solution that relies on SpellingDictionaryCollection? Are there any pros/cons of using one technique over another?

Avoid using compound word suggestions, they are very slow. Only use them if you expect to be splitting words.

Thanks for the tip. I'll have it in mind.

@Jason3S
Copy link
Collaborator

Jason3S commented Sep 30, 2021

Is it faster than the current solution that relies on SpellingDictionaryCollection? Are there any pros/cons of using one technique over another?

Adding it via words: will 1. Create a SpellingDictionary with your words. 2. Enable your words to be used by checkText.

Experiment with the command line app to get a feel for things. Everything is configuration driven.

@Jason3S
Copy link
Collaborator

Jason3S commented Oct 1, 2021 via email

@Jason3S
Copy link
Collaborator

Jason3S commented Oct 1, 2021

Use words works just fine.

import { validateText, combineTextAndLanguageSettings, finalizeSettings, getDefaultSettings } from 'cspell-lib';

const customWords = ['wordz', 'cuztom', 'clockz'];

export const SpellcheckerFactory = async (customWords: string[] = []) => {
    const settings = {
        ...getDefaultSettings(),
        enabledLanguageIds: [],
        words: customWords,
    };

    const fileSettings = combineTextAndLanguageSettings(settings, '', ['plaintext']);
    const finalSettings = finalizeSettings(fileSettings);

    return async (phrase: string) => {
        return await validateText(phrase, finalSettings, { generateSuggestions: true });
    };
};

export const checkSpelling = async (phrase: string) => {
    const spellChecker = await SpellcheckerFactory(customWords);

    return spellChecker(phrase);
};

async function run() {
    const r = await checkSpelling('These are my coztom wordz.');
    console.log('%o', r);
}

run();

Result:

[
  {
    text: 'coztom',
    offset: 13,
    line: { text: 'These are my coztom wordz.', offset: 0 },
    isFlagged: false,
    isFound: false,
    suggestions: [
      'cuztom',     'contos',
      'cotton',     'conto',
      'Cotton',     'condom',
      'custom',     'coom',
      'bottom',     'coyote',
      [length]: 10
    ]
  },
  [length]: 1
]

@wujekbogdan
Copy link
Contributor Author

wujekbogdan commented Oct 2, 2021

Use words works just fine.

Yes, I can confirm that. Sorry for the confusion. After ​posting my last comment I realized that It indeed works, but not as I would expect. Then I deleted my comment, but you were faster and already responded to it :)

The reason why I thought it didn't work was the fact I was using words with diacritic marks (e.g. zażółć gęślą jaźń) - in this case cspell doesn't work well (or at least - does not work like I would expect it to work).

This is out of scope of this ticket - i'll create a new one where I explain the problem in more detail.

You should be using validateText instead of checkText. checkText

Thank you. I'll experiment with it a bit more.

@Jason3S
Copy link
Collaborator

Jason3S commented Oct 2, 2021

The reason why I thought it didn't work was the fact I was using words with diacritic marks (e.g. zażółć gęślą jaźń) - in this case cspell doesn't work well (or at least - does not work like I would expect it to work).

This is out of scope of this ticket - i'll create a new one where I explain the problem in more detail.

Are you trying to ignore diacritic marks or flag them?
By default the spell checker is case / accent insensitive. Try:

    const settings = {
        ...getDefaultSettings(),
        caseSensitive: true,
        words: customWords,
    };

@Jason3S Jason3S removed the new issue label Oct 9, 2021
@reilnuud
Copy link

@Jason3S Sorry to dig up this thread, but there's a real need on our side for this we we are similarly hoping to integrate this into a tool on our team. This is by far the most comprehensive tool I've found.

I have gotten the above code snippets to work, but cannot seem to get any of the default dictionaries to load -- only whatever custom words I supply. I may be missing some context from this original thread -- maybe that was never the intent of these snippets, but could you provide some insight on what might be missing from this snippet to get one of the bundled dictionaries loaded?

Use words works just fine.

import { validateText, combineTextAndLanguageSettings, finalizeSettings, getDefaultSettings } from 'cspell-lib';

const customWords = ['wordz', 'cuztom', 'clockz'];

export const SpellcheckerFactory = async (customWords: string[] = []) => {
    const settings = {
        ...getDefaultSettings(),
        enabledLanguageIds: [],
        words: customWords,
    };

    const fileSettings = combineTextAndLanguageSettings(settings, '', ['plaintext']);
    const finalSettings = finalizeSettings(fileSettings);

    return async (phrase: string) => {
        return await validateText(phrase, finalSettings, { generateSuggestions: true });
    };
};

export const checkSpelling = async (phrase: string) => {
    const spellChecker = await SpellcheckerFactory(customWords);

    return spellChecker(phrase);
};

async function run() {
    const r = await checkSpelling('These are my coztom wordz.');
    console.log('%o', r);
}

run();

Result:

[
  {
    text: 'coztom',
    offset: 13,
    line: { text: 'These are my coztom wordz.', offset: 0 },
    isFlagged: false,
    isFound: false,
    suggestions: [
      'cuztom',     'contos',
      'cotton',     'conto',
      'Cotton',     'condom',
      'custom',     'coom',
      'bottom',     'coyote',
      [length]: 10
    ]
  },
  [length]: 1
]

@Jason3S
Copy link
Collaborator

Jason3S commented Apr 12, 2024

@reilnuud,

Two things:

  1. The api changed slightly since the example was written. getDefaultSettings now returns a Promise.
        const settings = {
    -       ...getDefaultSettings(),
    +       ...(await getDefaultSettings()),
            enabledLanguageIds: [],
            words: customWords,
        };
  2. There is another endpoint that might be easier to use spellCheckDocument.

This is a copy of the test file test-packages/cspell-lib/test-cspell-esbuild-cjs/source/src/index.ts. It is use to make sure bundling the library works, and also servers as a example on how to spell check a file via code.

import assert from 'assert';
import { spellCheckDocument } from 'cspell-lib';
import { resolve } from 'path';
import { pathToFileURL } from 'url';

// cspell:ignore wordz coztom clockz cuztom
const customWords = ['wordz', 'cuztom', 'clockz'];

async function checkSpelling(phrase: string) {
    const result = await spellCheckDocument(
        { uri: 'text.txt', text: phrase, languageId: 'plaintext', locale: 'en' },
        { generateSuggestions: true, noConfigSearch: true },
        { words: customWords, suggestionsTimeout: 2000 },
    );
    return result.issues;
}

async function checkFile(filename: string) {
    const uri = pathToFileURL(resolve(filename)).toString();
    const result = await spellCheckDocument(
        { uri },
        { generateSuggestions: true, noConfigSearch: true },
        { words: customWords, suggestionsTimeout: 2000 },
    );
    return result.issues;
}

export async function run() {
    console.log(`Start: ${new Date().toISOString()}`);
    const r = await checkSpelling('These are my coztom wordz.');
    console.log(`End: ${new Date().toISOString()}`);
    // console.log(r);
    assert(r.length === 1, 'Make sure we got 1 spelling issue back.');
    assert(r[0].text === 'coztom');
    assert(r[0].suggestions?.includes('cuztom'));
    // console.log('%o', r);

    const argv = process.argv;
    if (argv[2]) {
        console.log('Spell check file: %s', argv[2]);
        const issues = await checkFile(argv[2]);
        assert(!issues.length, 'no issues');
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants