-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gerbolyze convert fails silently when an SVG contains a polygon with a large number of points. #46
Comments
@jaseg any chance you've looked into this? I tried working around it and have not found a nice way to do so - it is quite limiting for the |
Thank you for the report. I just looked into it on my fast machine and I'm able to reproduce this. I'll have a look. |
Some prelimirary observations:
Overall, weird. BeautifulSoup IIRC uses lxml, and svg-flatten uses pugixml, so this shouldn't be a parser issue. |
This is definitely a XML parser issue. Using the 36MB test file generated by the reproducer above, the following script clearly demonstrates that BeautifulSoup misparses the test file when used with #!/usr/bin/env python
from pathlib import Path
from xml.etree import ElementTree
from bs4 import BeautifulSoup, diagnose
input_svg = Path('long_attr_test.svg')
print('BeautifulSoup:')
soup = BeautifulSoup(input_svg.read_text(), features='lxml-xml')
layers = {e.get('id'): e.get('inkscape:label') for e in soup.find_all('g', recursive=True)}
print('found layers:', layers)
print('ElementTree:')
root = ElementTree.fromstring(input_svg.read_text())
# This API sucks
ns = {'svg': 'http://www.w3.org/2000/svg',
'inkscape': 'http://www.inkscape.org/namespaces/inkscape'}
layers = {e.attrib['id']: e.get(f'{{{ns["inkscape"]}}}label') for e in root.iterfind('svg:g', ns)}
print('found layers:', layers) Output:
I suppose an easy, albeit annoying fix would be to just raise an issue upstream with BeautifulSoup while also switching out BeautifulSoup for something more competent. I don't want to go ETree because ETree's API sucks really bad when namespaces are involved, but given that there isn't that much code that needs to parse and modify XML here anyway I might just bite the bullet. |
That is weird, I didn’t catch that it was nondeterministic. It seems like there must be some issue with the parser if BeautifulSoup is not returning all elements? That or the call to |
The input to BeautifulSoup is actually the unprocessed input svg, not the usvg output. I'm currently having trouble reproducing the issue again, but I'll keep trying. I'm reasonably confident it's not in usvg because usvg is pretty good code, and also I've had usvg churn through about 10k instances of the test SVG with no evidence of any mis-parsing. I suspect pugixml. "18MB in a single attr" might just be asking too much of it, I'll see if I can fix it of if I'll have to swap it out for another parser. edit: to be clear, I think we have two separate issues here that are both triggered by the deluxe-length attrs:
|
Heh. Yeah, sorry for the unusual bug report - I guess it generally isn't required to parse an attribute this large, though I am admittedly surprised to see it fail... I wonder if there is a particular threshold it fails after? Perhaps it is an overflow or similar? |
Sounds like some sort of overflow to me too. My next step is going to be to do some test runs with svg-flatten inside valgrind to see if it's a memory safetey issue, and if it's not to just look at pugixml's source. Thanks for the issue, this is a nice challenge :) |
BeautifulSoup when using lxml in XML mode would mis-parse XML with very long attributes. Specifically, a <polygon> with about 18MB in its points attr would make lxml not return anything past that point in the file. bs4 uses lxml, which uses libxml2. libxml2 has a config option for parsing "huge" files that increases buffer sizes and avoids this error, and this option is exposed in lxml, but AFAICT you can't tell bs4 to set it, and bs4 just silently swallows the error from lxml. Fixes one half of #46
The test processes an SVG file of ~36MB with about 500k points per layer, so it's a bit slow.
Ok, after the flakyness last night, I thoroughly tested both svg-flatten and usvg, and they both performed without any issue. At this point, I chalk up the non-deterministic behavior I saw to user error on my part. AFAICT this issue is fixed now on main and in v3.1.9, which will hit PyPI within the next hours. |
Awesome, thanks! Does this deserve an upstream report to bs4? |
Also, it's a bit strange that bs4 is happily writing the massive files, but not reading them... |
I've reported this issue with bs4. For now, I've left the issue report and reproducer on their tracker marked private because there is a chance that this is a buffer overflow of some sort. |
Hey @jaseg, any update from bs4's side of things? |
Description
Gerbolyze fails to produce all layers contained in a template SVG generated by the
empty-template
sub-command when one of the layers contains a polygon with many points. The following can be used to generate a test SVG as well as demonstrate that the directory is lacking the expected set of files:This python script will create a directory called
gerbolyze_polygon_size_test
, populate it with a template SVG generated by Gerbolyze with two large polygons (with 5e5 points) added to the top and bottom copper layers, then call Gerbolyze to convert this SVG to Gerber files in a new subdirectory calledgerber
. The result of this on my machine appears as follows:Notably, this is missing the bottom copper layer, which we know in this case has elements in the SVG.
Expected Behavior
First, Gerbolyze should not fail silently in these cases - there should be some sort of warning or failure reported, which is not the case when I run this (from my shell, the python script is not hiding any stdout). Second, this probably shouldn't fail at all - the layers that are written appear to be correct when zipped and viewed with the Gerber viewer tool in KiCad. This level of resolution is useful for me, as I am using geometry from the boundary edges of a face of a high-resolution, manifold mesh to construct polygons in the template SVGs.
The text was updated successfully, but these errors were encountered: