Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large svg files being generated for datasets having more than a million records #48

Closed
vinnsvinay opened this issue Aug 21, 2019 · 8 comments
Labels
question This was a question not a bug

Comments

@vinnsvinay
Copy link

I am trying to visualize a decision tree for one of the random forest regression models I have built and I have noticed that the svg file size is around 1.3 gb. I have reduced the depth of the model and changed the visualization to non-fancy (fancy = false) but it still generates a 250 mb+ svg file because of which rendering fails. These huge file sizes are because of the histograms being generated at the leaf node. Is there any way to have a simple output with following fields:

  1. Number of records
  2. Prediction

I have gone through the code and noticed that I need to change the function regr_leaf_node in order to suppress the histogram. Please let me know what would be the next steps.

Thanks,
Vinay

@parrt
Copy link
Owner

parrt commented Aug 21, 2019

Hiya. can you use the "nonfancy" version? There's a parameter. "fancy=False # fancy=False to remove histograms/scatterplots from decision nodes"

@parrt parrt added the question This was a question not a bug label Aug 21, 2019
@parrt parrt closed this as completed Aug 21, 2019
@vinnsvinay
Copy link
Author

Hi Terrence,

I have used fancy = False, even then the image size is very big. It's because of the scatter plots drawn in leaf nodes. Is there an option to disable scatter plots for leaf nodes? I am currently changing the code to disable the scatterplots at leaf nodes but it just creates empty plots and makes it look very ugly.

Thanks,
Vinay

@parrt
Copy link
Owner

parrt commented Aug 21, 2019

oh!! yeah, svg is clean but huge. Can you generate that image then convert to PNG?

@vinnsvinay
Copy link
Author

I am trying to use online converters to convert but none of them are accepting svg files over 250 mb. I have edited the code in regr_leaf_viz function to disable drawing the scatter plot which makes it look very ugly. Will try some other visualizations to enhance the leaf nodes.

@parrt
Copy link
Owner

parrt commented Aug 21, 2019

can you try saving svg file to disk THEN converting?

@vinnsvinay
Copy link
Author

I saved the svg file using save_svg function and used online converters to convert it to jpg/png format

@parrt
Copy link
Owner

parrt commented Aug 21, 2019

Why not call convert or something via os.exec()? (i.e., standard commands from command line)

@vinnsvinay
Copy link
Author

I wasn't aware of them. I will try that. Thanks Terrance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question This was a question not a bug
Projects
None yet
Development

No branches or pull requests

2 participants