You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Exporting a decision tree where the feature_names or class_names contain special characters (particularly &<>) results in invalid graphviz output, as those characters have specific meanings to graphviz. Escaping to &, < and > results in correct output. This can of course be done by the user but it's something I think scikit-learn should handle internally.
Steps/Code to Reproduce
fromsklearn.datasetsimportload_irisfromsklearnimporttreeiris=load_iris()
clf=tree.DecisionTreeClassifier()
clf=clf.fit(iris.data, iris.target)
target_names= ["setosa & 123", "versicolor", "virginca"]
# target_names = ["setosa & 123", "versicolor", "virginca"] # This one workstree.export_graphviz(
clf,
out_file="tree.dot",
feature_names=iris.feature_names,
class_names=target_names,
filled=True,
special_characters=True,
)
Then run graphviz
dot tree.dot -Tsvg -o tree.svg
Expected Results
Graphviz successfully converts to SVG without error.
Actual Results
Error: not well-formed (invalid token) in line 1
... <br/>class = setosa & 123 ...
in label of node 0
Error: not well-formed (invalid token) in line 1
... <br/>class = setosa & 123 ...
in label of node 1
Although SVG output is written to disk it is not correct.
So this is a bug, but at the same I think its priority is rather low:
using tree.plot_tree is recommended instead of tree.export_graphviz. If there are things that you can not do or don't like with tree.plot_tree, I would say that investing time in improving tree.plot_tree may be a more useful thing to do
if you really need to use graphviz output, a reasonable work-around which is to escape special characters in target_names.
I am a bit worried about trying to support complicated things in the dot output. If this is a simple replacement for a few characters &, < and > why not. If you need to read the dot format spec for a few days and cover all the edge cases, I don't think this is worth our time.
Describe the bug
Exporting a decision tree where the
feature_names
orclass_names
contain special characters (particularly&<>
) results in invalid graphviz output, as those characters have specific meanings to graphviz. Escaping to&
,<
and>
results in correct output. This can of course be done by the user but it's something I think scikit-learn should handle internally.Steps/Code to Reproduce
Then run graphviz
Expected Results
Graphviz successfully converts to SVG without error.
Actual Results
Although SVG output is written to disk it is not correct.
Versions
The text was updated successfully, but these errors were encountered: