Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@W-9729358@ Make CPD engine production ready #522

Merged
merged 1 commit into from
Aug 16, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
2 changes: 1 addition & 1 deletion messages/CpdEngine.js
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
module.exports = {
"CpdViolationMessage": "%s: %s of %s duplication segments detected. %s line(s), %s tokens.",
"CpdViolationMessage": "%s: %s of %s duplication segments detected. %s line(s), %s tokens."
};
3 changes: 2 additions & 1 deletion messages/EventKeyTemplates.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@ module.exports = {
"usingEngineConfigFile": "Using engine configuration file at %s",
"generalInternalLog": "Log from Java: %s",
"customEslintHeadsUp": "About to run Eslint with custom config in %s. Please make sure your current directory has all the required NPM dependencies.",
"customPmdHeadsUp": "About to run PMD with custom config in %s. Please make sure that any custom rule references have already been added to the plugin through scanner:rule:add command."
"customPmdHeadsUp": "About to run PMD with custom config in %s. Please make sure that any custom rule references have already been added to the plugin through scanner:rule:add command.",
"unmatchedPathExtensionCpd": "Path extensions for the following files will not be processed by CPD: %s"
},
"warning": {
"invalidCategorySkipped": "Cataloger skipped invalid PMD Category file '%s'.",
Expand Down
4 changes: 4 additions & 0 deletions messages/run.js
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,10 @@ module.exports = {
E.g., $ sfdx scanner:run --target "somefile.js" --engine "eslint-lwc,pmd"
Evaluates rules against somefile.js, using eslint-lwc and pmd engines.

Use --engine to invoke engines that are not enabled by default.
E.g, $ sfdx scanner:run --target "/some/dir" --engine cpd
Executes CPD engine against known file extensions in "/some/dir". CPD helps detect blocks of code duplication in selected languages.

To use PMD with your own rule reference file, use --pmdconfig. Note that rule filters are not applied.
E.g, $ sfdx scanner:run --target "src" --pmdconfig "pmd_rule_ref.xml"

Expand Down
16 changes: 0 additions & 16 deletions src/Constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -89,21 +89,5 @@ export enum Severity {
HIGH = 1
}

// CPD supported languages: [apex, ecmascript, java, vf, xml]
export const FILE_EXTS_TO_LANGUAGE: Map<string, LANGUAGE> = new Map([
// apex
['.cls', LANGUAGE.APEX],
['.trigger', LANGUAGE.APEX],
// java
['.java', LANGUAGE.JAVA],
// ecmascript
['.js', LANGUAGE.ECMASCRIPT],
// vf
['.component', LANGUAGE.VISUALFORCE],
['.page', LANGUAGE.VISUALFORCE],
// xml
['.xml', LANGUAGE.XML],
]);

// Here, current dir __dirname = <base_dir>/sfdx-scanner/src
export const PMD_LIB = path.join(__dirname, '..', 'dist', 'pmd', 'lib');
184 changes: 99 additions & 85 deletions src/lib/cpd/CpdEngine.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,38 +7,87 @@ import {Config} from '../util/Config';
import {ENGINE, LANGUAGE, Severity} from '../../Constants';
import * as engineUtils from '../util/CommonEngineUtils';
import CpdWrapper from './CpdWrapper';
import {FILE_EXTS_TO_LANGUAGE} from '../../Constants';
import {uxEvents} from '../ScannerEvents';
import crypto = require('crypto');
import * as EnvVariable from '../util/EnvironmentVariable';


Messages.importMessagesDirectory(__dirname);
const messages = Messages.loadMessages('@salesforce/sfdx-scanner', 'CpdEngine');
const eventMessages = Messages.loadMessages("@salesforce/sfdx-scanner", "EventKeyTemplates");

// CPD supported languages: [apex, java, vf, xml]
const FileExtToLanguage: Map<string, LANGUAGE> = new Map([
// apex
['cls', LANGUAGE.APEX],
['trigger', LANGUAGE.APEX],
// java
['java', LANGUAGE.JAVA],
// vf
['component', LANGUAGE.VISUALFORCE],
['page', LANGUAGE.VISUALFORCE],
// xml
['xml', LANGUAGE.XML],
]);


// exported for visibility in tests
export const CpdRuleName = 'copy-paste-detected';
export const CpdRuleDescription = 'Identify duplicate code blocks.';
export const CpdRuleCategory = 'Copy/Paste Detected';
export const CpdInfoUrl = 'https://pmd.github.io/latest/pmd_userdocs_cpd.html#refactoring-duplicates';
export const CpdViolationSeverity = Severity.LOW;
export const CpdLanguagesSupported: LANGUAGE[] = [...new Set (FileExtToLanguage.values())];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, languages supported will be deduced from the file extensions to languages mapping.


export class CpdEngine extends AbstractRuleEngine {

public readonly ENGINE_ENUM: ENGINE = ENGINE.CPD;
public readonly ENGINE_NAME: string = ENGINE.CPD.valueOf();
private readonly ENGINE_ENUM: ENGINE = ENGINE.CPD;
private readonly ENGINE_NAME: string = ENGINE.CPD.valueOf();


private minimumTokens: number;
private logger: Logger;
private config: Config;
private initialized: boolean;
private cpdCatalog: Catalog;
private validCPDLanguages: LANGUAGE[];


private initialized = false;
jfeingold35 marked this conversation as resolved.
Show resolved Hide resolved

public getName(): string {
return this.ENGINE_NAME;
}

public async init(): Promise<void> {
if (this.initialized) {
return;
}
this.logger = await Logger.child(this.getName());
this.config = await Controller.getConfig();
this.minimumTokens = EnvVariable.getEnvVariableAsNumber(this.ENGINE_ENUM, EnvVariable.CONFIG_NAME.MINIMUM_TOKENS) || ( await this.config.getMinimumTokens(this.ENGINE_ENUM) );
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Environment variable would override config value for MinimumTokens.

this.initialized = true;
}

public getTargetPatterns(): Promise<TargetPattern[]> {
return this.config.getTargetPatterns(this.ENGINE_ENUM);
}

public getCatalog(): Promise<Catalog>{
return Promise.resolve(this.cpdCatalog);
const cpdCatalog = {
rules: [{
engine: this.ENGINE_NAME,
sourcepackage: this.ENGINE_NAME,
name: CpdRuleName,
description: CpdRuleDescription,
categories: [CpdRuleCategory],
rulesets: [],
languages: CpdLanguagesSupported,
defaultEnabled: true
}],
categories: [{
engine: this.ENGINE_NAME,
name: CpdRuleCategory,
paths: []
}],
rulesets: []
};
return Promise.resolve(cpdCatalog);
}

/* eslint-disable-next-line no-unused-vars, @typescript-eslint/no-unused-vars */
Expand Down Expand Up @@ -72,34 +121,47 @@ export class CpdEngine extends AbstractRuleEngine {
return results;
}

private sortPaths(targets: RuleTarget[]): Map<LANGUAGE, string[]> {
const languageToPaths = new Map();

private sortPaths(targets: RuleTarget[]): Map<LANGUAGE, string[]> {
const languageToPaths = new Map<LANGUAGE, string[]>();
const unmatchedPaths: string[] = [];
for (const target of targets) {
for (const path of target.paths) {
const i = path.lastIndexOf(".");
if (i === -1) {
uxEvents.emit('warning-always', `Target: '${path}' was not processed by CPD, no file extension found.`);
continue;
if (!this.matchPathToLanguage(path, languageToPaths)) {
// If language could not be identified, note down the path
unmatchedPaths.push(path);
}
const ext = path.substr(i).toLowerCase();
if (FILE_EXTS_TO_LANGUAGE.has(ext)) {
if (this.validCPDLanguages.includes(FILE_EXTS_TO_LANGUAGE.get(ext))) {
const language = FILE_EXTS_TO_LANGUAGE.get(ext);
if (languageToPaths.has(language)) {
languageToPaths.get(language).push(path);
} else {
languageToPaths.set(language, [path]);
}
} else {
uxEvents.emit('warning-always', `Target: '${path}' was not processed by CPD, language '${FILE_EXTS_TO_LANGUAGE.get(ext)}' not supported.`);
}
}
}

// Let user know about file paths that could not be matched
if (unmatchedPaths.length > 0) {
uxEvents.emit('info-verbose', eventMessages.getMessage('info.unmatchedPathExtensionCpd', [unmatchedPaths.join(",")]));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified logic in sortPaths to emit information in only one path.

}

return languageToPaths;
}

/**
* Identify the language of a file using the file extension
* @param path to be examined
* @param languageToPaths map with entries of language to paths matched so far
* @returns true if the language was identifed and false if not
*/
private matchPathToLanguage(path: string, languageToPaths: Map<LANGUAGE, string[]>): boolean {
const ext = path.slice(path.lastIndexOf(".") + 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this lowercase ext in case the user has a file named foo.Java or something else unexpected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed this. Will fix it.

if (ext) {
const language = FileExtToLanguage.get(ext.toLowerCase());
if (language && CpdLanguagesSupported.includes(language)) {
if (languageToPaths.has(language)) {
languageToPaths.get(language).push(path);
} else {
uxEvents.emit('warning-always', `Target: '${path}' was not processed by CPD, file extension '${ext}' not supported.`);
languageToPaths.set(language, [path]);
}
return true;
}
}
return languageToPaths;
return false;
}

private runLanguage(language: string, targetPaths: string[]): Promise<string>{
Expand All @@ -116,7 +178,7 @@ export class CpdEngine extends AbstractRuleEngine {


protected processStdOut(stdout: string): RuleResult[] {
let violations: RuleResult[] = [];
let ruleResults: RuleResult[] = [];

this.logger.trace(`Output received from CPD: ${stdout}`);

Expand All @@ -129,15 +191,15 @@ export class CpdEngine extends AbstractRuleEngine {

const duplications = cpdJson.elements[0].elements;
if (duplications) {
violations = this.jsonToRuleResults(duplications);
ruleResults = this.jsonToRuleResults(duplications);
}
}

if (violations.length > 0) {
if (ruleResults.length > 0) {
this.logger.trace('Found rule violations.');
}

return violations;
return ruleResults;
}

// eslint-disable-next-line @typescript-eslint/no-explicit-any
Expand All @@ -160,11 +222,11 @@ export class CpdEngine extends AbstractRuleEngine {
column: occ.attributes.column,
endLine: occ.attributes.endline,
endColumn: occ.attributes.endcolumn,
ruleName: this.cpdCatalog.rules[0].name,
severity: Severity.LOW,
ruleName: CpdRuleName,
severity: CpdViolationSeverity,
message: messages.getMessage("CpdViolationMessage", [codeFragmentID, occCount, occurences.length, duplication.attributes.lines, duplication.attributes.tokens]),
category: this.cpdCatalog.categories[0].name,
url: 'https://pmd.github.io/latest/pmd_userdocs_cpd.html#refactoring-duplicates',
category: CpdRuleCategory,
url: CpdInfoUrl
};
occCount++;

Expand All @@ -186,52 +248,6 @@ export class CpdEngine extends AbstractRuleEngine {
return ruleResults;
}


public async init(): Promise<void> {
if (this.initialized) {
return;
}
this.logger = await Logger.child(this.getName());
this.config = await Controller.getConfig();
this.initialized = true;
this.logger = await Logger.child(this.getName())

this.cpdCatalog = {
rules: [{
engine: this.ENGINE_ENUM.valueOf(),
sourcepackage: this.ENGINE_ENUM.valueOf(),
name: 'copy-paste-detected',
description: 'Identify duplicate code blocks.',
categories: ['Copy/Paste Detected'],
rulesets: [],
languages: await this.getLanguages(),
defaultEnabled: true
}],
categories: [{
engine: this.ENGINE_ENUM.valueOf(),
name: 'Copy/Paste Detected',
paths: []
}],
rulesets: []
};

this.minimumTokens = await this.config.getMinimumTokens(this.ENGINE_ENUM);
this.initialized = true;
this.validCPDLanguages = [LANGUAGE.APEX, LANGUAGE.JAVA, LANGUAGE.ECMASCRIPT, LANGUAGE.VISUALFORCE, LANGUAGE.XML];
}

private async getLanguages(): Promise<string[]> {
const languages: Set<string> = new Set();
for (const pattern of await this.config.getTargetPatterns(this.ENGINE_ENUM)){
const ext = pattern.substr(pattern.lastIndexOf(".")).toLowerCase();
if (FILE_EXTS_TO_LANGUAGE.has(ext)) {
languages.add(FILE_EXTS_TO_LANGUAGE.get(ext));
}
}
return Array.from(languages);
}


public matchPath(path: string): boolean {
this.logger.trace(`Engine CPD does not support custom rules: ${path}`);
return false;
Expand All @@ -251,6 +267,4 @@ export class CpdEngine extends AbstractRuleEngine {
return severity;
}



}
4 changes: 2 additions & 2 deletions src/lib/cpd/CpdWrapper.ts
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,8 @@ export default class CpdWrapper extends CommandLineSupport {
this.initialized = true;
}

protected async buildClasspath(): Promise<string[]> {
return [`${PMD_LIB}/*`];
protected buildClasspath(): Promise<string[]> {
return Promise.resolve([`${PMD_LIB}/*`]);
}

protected async buildCommandArray(): Promise<[string, string[]]> {
Expand Down
3 changes: 2 additions & 1 deletion src/lib/util/Config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,8 @@ const DEFAULT_CONFIG: ConfigContent = {
{
name: ENGINE.CPD,
targetPatterns: [
"**/*.cls","**/*.page","**/*.component","**/*.trigger"
"**/*.cls","**/*.trigger","**/*.java","**/*.page","**/*.component","**/*.xml",
"!**/node_modules/**","!**/*-meta.xml"
],
disabled: true,
minimumTokens: 100
Expand Down
31 changes: 31 additions & 0 deletions src/lib/util/EnvironmentVariable.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
/**
* Builds expected env variables in this format:
* SFDX_SCANNER.<ENGINE>.VARIABLE_NAME
*/

import { ENGINE } from "../../Constants";

const PREFIX = 'SFDX_SCANNER';
const SEPARATOR = '_';

export enum CONFIG_NAME {
MINIMUM_TOKENS = 'Minimum_Tokens'
}
Comment on lines +11 to +13
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New enum to track config values that can be fetched through environment variables. I've also attempted to standardize the access in this new module.


function getEnvVariableName(engine: ENGINE, configName: CONFIG_NAME): string {
return `${PREFIX}${SEPARATOR}${engine.toUpperCase()}${SEPARATOR}${configName.toUpperCase()}`;
}

export function getEnvVariableAsString(engine: ENGINE, configName: CONFIG_NAME): string {
const envVariableName = getEnvVariableName(engine, configName);
return process.env[envVariableName];
}

export function getEnvVariableAsNumber(engine: ENGINE, configName: CONFIG_NAME): number {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will currently return NaN when the environment variable isn't set or isn't a number.

Do we want it to return NaN if it's set, but not a number, but null if it isn't set?

Whatever we choose, can you add documentation describing the contract when the value isn't set or the value can't be parsed?

Copy link
Contributor Author

@rmohan20 rmohan20 Aug 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We don't have a way to control what value the user may set. I see a suggestion on online forums to clean up the string before parsing:

parseInt(text.replace(/\D/g, ""),10);

where \D is a non-digit. Will add documentation for this as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbartolotta-sfdc Let me know what you think about the fix.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rmohan20 - this change looks good.

const envVariable = getEnvVariableAsString(engine, configName);
if (envVariable) {
// Clean up variable if it has any non-digit values
return parseInt(envVariable.replace(/\D/g, ""), 10);
}
return undefined;
}