Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider Parsing .gitignore #1319

Closed
Kurt-von-Laven opened this issue Jun 8, 2021 · 12 comments · Fixed by #1823
Closed

Consider Parsing .gitignore #1319

Kurt-von-Laven opened this issue Jun 8, 2021 · 12 comments · Fixed by #1823
Milestone

Comments

@Kurt-von-Laven
Copy link

Kurt-von-Laven commented Jun 8, 2021

This is a convenience feature I have appreciated in a few other actions that cuts down on the number of separate exclude lists downstream projects have to maintain. The idea is to have a boolean parameter that tells the action whether you want to automatically exclude everything in the .gitignore file.

@Jason3S Jason3S transferred this issue from streetsidesoftware/cspell-action Jun 8, 2021
@Jason3S Jason3S added this to the v-next milestone Aug 9, 2021
@capaj
Copy link

capaj commented Sep 7, 2021

yes please, this would be awesome to have

@Kurt-von-Laven
Copy link
Author

Kurt-von-Laven commented Sep 8, 2021

I believe this could be done using something like parse-gitignore@v0.5.1.

@Jason3S
Copy link
Collaborator

Jason3S commented Sep 28, 2021

@Kurt-von-Laven,

I agree that it is very useful.

Parsing the file is easy. The internal cspell-glob library will already correctly parse a .gitignore file.

The issue is the logic around loading the .gitignore files.

project/
 |- .gitignore
 |- src
 |- docs/
     |- .gitignore
     |- site/
         |- index.html

When processing index.html it would be necessary to load project/docs/.gitignore and project/.gitignore, including any other .gitignore files until a .git directory is found.

Also needed:

  • command line option: --no-gitignore
  • config option: useGitignore: boolean = true;

It is not a lot of code, it is more the time to make sure it is correct.

@Kurt-von-Laven
Copy link
Author

Kurt-von-Laven commented Sep 29, 2021

Thank you for the thoughtful guidance! That makes good sense. How would you feel about the notion of running git-check-ignore and only supporting this feature when Git is installed? According to that documentation there is quite a lot more complexity involved in matching Git's behavior than I had realized (e.g., handling the core.excludesFile setting and .git/info/exclude) to say nothing of differing versions of Git.

@Maxim-Mazurok
Copy link
Contributor

Thank you for the thoughtful guidance! That makes good sense. How would you feel about the notion of running git-check-ignore and only supporting this feature when Git is installed? According to that documentation there is quite a lot more complexity involved in matching Git's behavior than I had realized (e.g., handling the core.excludesFile setting and .git/info/exclude) to say nothing of differing versions of Git.

This is how we are handling files ignored by git (including core.excludesFile and .git/info/exclude, etc.):

import spawnAsync from '@expo/spawn-async';
import path, { dirname } from 'path';
import { readdirSync, statSync } from 'fs';
import simpleGit from 'simple-git';
import { fileURLToPath } from 'url';

const __dirname = dirname(fileURLToPath(import.meta.url));

const git = simpleGit();
const files: string[] = [];

async function throughDirectory(directory: string): Promise<void> {
	for (const file of readdirSync(directory)) {
		const absolutePath = path.join(directory, file);
		await new Promise<void>((resolve) => {
			// eslint-disable-next-line @typescript-eslint/no-floating-promises
			git.checkIgnore(absolutePath, async (error: unknown, data?: string[]) => {
				if (error || !data) throw error;
				if (data.length === 0) {
					//not ignored
					if (statSync(absolutePath).isDirectory()) await throughDirectory(absolutePath);
					else files.push(absolutePath);
				}
				resolve();
			});
		});
	}
}

const getNotIgnoredFiles = async (): Promise<string[]> => {
	await throughDirectory(path.resolve(__dirname, '..'));
	return files;
};

const shouldChunkFiles = (): boolean => {
	return process.platform === 'win32';
};

const checkFileSpelling = async (files: string[]): Promise<void> => {
	const resultPromise = spawnAsync('cspell', files);
	try {
		const { stdout, stderr } = await resultPromise;
		// when there is NO issue found
		console.log(stdout, stderr);
	} catch (e) {
		// when there is issue found
		const { stdout, stderr } = e;
		console.log(stdout, stderr);
		if (stdout !== '' && process.env.CI === 'true') {
			// when the issue is reported (e.g. when the word is NOT ignored)
			// use fail-fast approach for CI only
			process.exit(e.status);
		}
	}
};

const notIgnoredFiles = await getNotIgnoredFiles();

if (shouldChunkFiles()) {
	notIgnoredFiles.forEach((file: string): Promise<void> => checkFileSpelling([file]));
} else {
	await checkFileSpelling(notIgnoredFiles);
}

In short:

  1. Recursively get list of all files (do not go in folders that are ignored - avoids going through all node_modules)
  2. Spawn cspell with all not ignored files as arguments.

One downside is terrible performance on Windows, because when we spawn cspell - cmd.exe can't handle an extremely long list of arguments, there's some characters limit. So for now we just spawn hundreds of cspell that only check one file. We have a plan to optimize this in the future by doing smarted chunking. But it'll still be a pretty dirty workaround and it'd be awesome if cspell could take care of this check - then it'd be cleaner.

@Kurt-von-Laven
Copy link
Author

@Maxim-Mazurok, I may have misinterpreted your message, but it sounds like you are asking for a feature that @Jason3S already implemented? If I got that right, cspell's release cycle is typically very short, so I would encourage you to give the new support for .gitignore when it comes out in a few days. In case you are on an even tighter timeline, cspell has long offered an ignorePaths config option, so you might alternatively try dynamically setting that option from .gitignore files and only spawning cspell once on your entire project.

@Maxim-Mazurok
Copy link
Contributor

Maxim-Mazurok commented Oct 8, 2021

@Kurt-von-Laven, #1823 adds support for .gitignore files, but it doesn't take into account core.excludesFile, .git/info/exclude and any other ways you can ignore files in git.

My solution uses an actual git client to determine if the file is ignored or not.

My message had two objectives:

  1. Share my working code with the community
  2. Show that I agree with your point in Consider Parsing .gitignore #1319 (comment)

    there is quite a lot more complexity involved in matching Git's behavior than I had realized (e.g., handling the core.excludesFile setting and .git/info/exclude) to say nothing of differing versions of Git.

@Kurt-von-Laven
Copy link
Author

Oh, ha ha, that seems a lot more obvious in retrospect, but thank you for spelling it out for me.

@Jason3S
Copy link
Collaborator

Jason3S commented Oct 8, 2021

@Kurt-von-Laven and @Maxim-Mazurok,

Thank you for the lively discussion. I implemented a basic .gitignore solution that only takes into account the .gitignore files. I figured that it was a good place to start. It fits the 80/20 rule. I think the simple solution will work for a large majority of the cases. The logic is such that it might check more files than needed.

@Maxim-Mazurok, thank you for sharing your solution. Hitting command-line limits is annoying. Please make a New Feature request on ways that would help you. I.e. cspell --files stdin to get the list of files from stdin.

Other than node, CSpell does not depend upon any external tools. This was and is intentional.

@Kurt-von-Laven
Copy link
Author

Excited to try it out! Yes, speaking only for one project, parsing .gitignore alone will certainly cover our needs for the foreseeable future.

@Maxim-Mazurok
Copy link
Contributor

I definitely agree about 80/20 and appreciate that you added gitignore support! I just usually try to hit 100 in my projects when I have resources :D So, I created feature request #1850 to accept stdio, I think it'll help.

If you need use-case example - imagine someone using Local History plugin for VS Code. It creates .history folder with all local file modifications. We don't necessarily want to add this folder to the project's .gitignore, because it doesn't really belong to the project, it's more of a personal choice to use this plugin or not. But we do have to ignore it somehow. The way I usually do that is by adding .history to .git/info/exclude.

Probably this use-case can be worked around by adding project/../.gitignore and then setting --gitignore-root to project/.., but this looks hacky to me.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 8, 2021

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants