Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Fixes incorrect feature importance visualization for Data Frame Analytics classification #150816

Merged
merged 7 commits into from
Feb 14, 2023
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ export const DataGrid: FC<Props> = memo(
analysisType === ANALYSIS_CONFIG_TYPE.OUTLIER_DETECTION
) {
if (schema === 'featureImportance') {
const row = data[rowIndex];
const row = data[rowIndex - pagination.pageIndex * pagination.pageSize];
if (!row) return <div />;
// if resultsField for some reason is not available then use ml
const mlResultsField = resultsField ?? DEFAULT_RESULTS_FIELD;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ import type {
} from '../../../../../../../common/types/feature_importance';
import { DecisionPathChart } from './decision_path_chart';
import { MissingDecisionPathCallout } from './missing_decision_path_callout';
import { TopClass } from '../../../../../../../common/types/feature_importance';

interface ClassificationDecisionPathProps {
predictedValue: string | boolean;
Expand All @@ -42,12 +43,20 @@ export const ClassificationDecisionPath: FC<ClassificationDecisionPathProps> = (
const [currentClass, setCurrentClass] = useState<string>(
getStringBasedClassName(topClasses[0].class_name)
);
const selectedClass = topClasses.find(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description of the fix refers to True and False base classes, but is this change also the cause of the change in behavior here with the bank_classification_1 MLQA bootstrap job where before I was seeing values greater than 1 for yes and no classes:

Before:

image

With this fix:

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS, the fix makes total sense here: for the second row, the model predicts the class "no" with the probability of 0.986. This means that the prediction probability of "yes" is 0.014. This is precisely what the screenshot above shows now 🚀

The screenshot before this fix shows that the decision graph briefly exceeds 1.0 in for prediction probability, which is nonsense.

(t) => getStringBasedClassName(t.class_name) === getStringBasedClassName(currentClass)
) as TopClass;
const predictedProbabilityForCurrentClass = selectedClass
? selectedClass.class_probability
: undefined;

const { decisionPathData } = useDecisionPathData({
baseline,
featureImportance,
predictedValue: currentClass,
predictedProbability,
predictedProbability: predictedProbabilityForCurrentClass,
});

const options = useMemo(() => {
const predictionValueStr = getStringBasedClassName(predictedValue);

Expand Down
Loading