Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gerbolyze convert fails silently when an SVG contains a polygon with a large number of points. #46

Closed
Wulfsta opened this issue Mar 20, 2024 · 14 comments

Comments

@Wulfsta
Copy link

Wulfsta commented Mar 20, 2024

Description

Gerbolyze fails to produce all layers contained in a template SVG generated by the empty-template sub-command when one of the layers contains a polygon with many points. The following can be used to generate a test SVG as well as demonstrate that the directory is lacking the expected set of files:

import numpy as np
import subprocess

from bs4 import BeautifulSoup
from pathlib import Path

def generate_polygon_and_append(soup, layer, radius, point_count=5e5, **kwargs):
    theta_array = np.linspace(0, 2*np.pi, num=int(point_count))
    # don't remember how numpy concat works rn so do it ugly
    point_array = np.array([radius * np.cos(theta_array), radius * np.sin(theta_array)]).transpose() + radius
    points = ' '.join([f'{x}, {y}' for (x, y) in point_array])
    layer.append(soup.new_tag('polygon', points=points, **kwargs))

def recursive_rmdir(directory):
    directory = Path(directory)
    for item in directory.iterdir():
        if item.is_dir():
            rmdir(item)
        else:
            item.unlink()
    directory.rmdir()

def main(output_dir='./gerbolyze_polygon_size_test', svg_template_name='gerbolyze_polygon_size_test.svg', point_count=512):
    output_dir_path = Path(output_dir).resolve()
    print(output_dir_path.as_posix())
    output_dir_path.mkdir(parents=True, exist_ok=True)
    svg_template_name_path = output_dir_path.joinpath(svg_template_name)
    if svg_template_name_path.exists():
        svg_template_name_path.unlink()

    maximum_radius = 50

    # Call Gerbolyze to generate an empty template SVG.
    command = ['gerbolyze', 'empty-template', f'--size {2*(maximum_radius)}x{2*(maximum_radius)}mm', svg_template_name_path.as_posix()]
    subprocess.run(' '.join(command), shell=True)

    # Open file we just generated.
    with svg_template_name_path.open() as svg_template:
        svg_template_soup = BeautifulSoup(svg_template, 'xml')

    # Construct SVG.
    for layer in svg_template_soup.find_all('g'):
        if layer['id'] == 'g-top-copper':
            generate_polygon_and_append(svg_template_soup, layer, maximum_radius)
        if layer['id'] == 'g-bottom-copper':
            generate_polygon_and_append(svg_template_soup, layer, maximum_radius)

    with svg_template_name_path.open('w') as svg_template:
        svg_template.write(svg_template_soup.prettify())

    # Call Gerbolyze program to convert SVG to Gerber.
    gerber_dir_path = output_dir_path.joinpath('gerber')
    if gerber_dir_path.exists():
        recursive_rmdir(gerber_dir_path)
    command = ['gerbolyze', 'convert', svg_template_name_path.as_posix(), gerber_dir_path.as_posix()]
    subprocess.run(' '.join(command), shell=True)

if __name__=='__main__':
    main()

This python script will create a directory called gerbolyze_polygon_size_test, populate it with a template SVG generated by Gerbolyze with two large polygons (with 5e5 points) added to the top and bottom copper layers, then call Gerbolyze to convert this SVG to Gerber files in a new subdirectory called gerber. The result of this on my machine appears as follows:

$ ls -hall ./gerbolyze_polygon_size_test/gerber_validation 
total 13M
drwxr-xr-x 2 luke users   6 Mar 19 21:52 .
drwxr-xr-x 4 luke users   5 Mar 19 21:52 ..
-rw-r--r-- 1 luke users 13M Mar 19 21:52 gerbolyze_polygon_size_test-F.Cu.gbr
-rw-r--r-- 1 luke users  86 Mar 19 21:52 gerbolyze_polygon_size_test-F.Mask.gbr
-rw-r--r-- 1 luke users  86 Mar 19 21:52 gerbolyze_polygon_size_test-F.Paste.gbr
-rw-r--r-- 1 luke users  86 Mar 19 21:52 gerbolyze_polygon_size_test-F.SilkS.gbr

Notably, this is missing the bottom copper layer, which we know in this case has elements in the SVG.

Expected Behavior

First, Gerbolyze should not fail silently in these cases - there should be some sort of warning or failure reported, which is not the case when I run this (from my shell, the python script is not hiding any stdout). Second, this probably shouldn't fail at all - the layers that are written appear to be correct when zipped and viewed with the Gerber viewer tool in KiCad. This level of resolution is useful for me, as I am using geometry from the boundary edges of a face of a high-resolution, manifold mesh to construct polygons in the template SVGs.

@Wulfsta Wulfsta changed the title Gerbolyze fails silently when an SVG contains a polygon with a large number of points. Gerbolyze convert fails silently when an SVG contains a polygon with a large number of points. Mar 22, 2024
@Wulfsta
Copy link
Author

Wulfsta commented Jul 4, 2024

@jaseg any chance you've looked into this? I tried working around it and have not found a nice way to do so - it is quite limiting for the convert command.

@jaseg
Copy link
Owner

jaseg commented Jul 4, 2024

Thank you for the report. I just looked into it on my fast machine and I'm able to reproduce this. I'll have a look.

@jaseg
Copy link
Owner

jaseg commented Jul 4, 2024

Some prelimirary observations:

  1. This issue is non-deterministic. When running svg-flatten as below on the same file, for both g-top-copper (at the front of the file) and for g-bottom-copper (at the back of the file), sometimes the output Gerber is empty, and sometimes it is not.
  2. When gerbolyze convert reads the SVG with beautifulsoup, no <g> elements past the first, big g-top-copper element are ever returned by soup.find_all('g', recursive=True).

Overall, weird. BeautifulSoup IIRC uses lxml, and svg-flatten uses pugixml, so this shouldn't be a parser issue.

@jaseg
Copy link
Owner

jaseg commented Jul 4, 2024

This is definitely a XML parser issue. Using the 36MB test file generated by the reproducer above, the following script clearly demonstrates that BeautifulSoup misparses the test file when used with features="lxml-xml", which is it's XML parsing option. Interestingly, it seems to correctly parse the file when used with features="lxml", which is the LXML HTML parser option.

#!/usr/bin/env python

from pathlib import Path
from xml.etree import ElementTree
from bs4 import BeautifulSoup, diagnose

input_svg = Path('long_attr_test.svg')

print('BeautifulSoup:')
soup = BeautifulSoup(input_svg.read_text(), features='lxml-xml')
layers = {e.get('id'): e.get('inkscape:label') for e in soup.find_all('g', recursive=True)}
print('found layers:', layers)

print('ElementTree:')
root = ElementTree.fromstring(input_svg.read_text())
# This API sucks
ns = {'svg': 'http://www.w3.org/2000/svg',
      'inkscape': 'http://www.inkscape.org/namespaces/inkscape'}
layers = {e.attrib['id']: e.get(f'{{{ns["inkscape"]}}}label') for e in root.iterfind('svg:g', ns)}
print('found layers:', layers)

Output:

BeautifulSoup:
found layers: {'g-top-paste': 'top paste', 'g-top-silk': 'top silk', 'g-top-mask': 'top mask', 'g-top-copper': 'top copper'}
ElementTree:
found layers: {'g-top-paste': 'top paste', 'g-top-silk': 'top silk', 'g-top-mask': 'top mask', 'g-top-copper': 'top copper', 'g-bottom-copper': 'bottom copper', 'g-bottom-mask': 'bottom mask', 'g-bottom-silk': 'bottom silk', 'g-bottom-paste': 'bottom paste', 'g-mechanical-outline': 'mechanical outline', 'g-drill-plated': 'drill plated', 'g-drill-nonplated': 'drill nonplated', 'g-other-comments': 'other comments'}

I suppose an easy, albeit annoying fix would be to just raise an issue upstream with BeautifulSoup while also switching out BeautifulSoup for something more competent. I don't want to go ETree because ETree's API sucks really bad when namespaces are involved, but given that there isn't that much code that needs to parse and modify XML here anyway I might just bite the bullet.

@Wulfsta
Copy link
Author

Wulfsta commented Jul 4, 2024

That is weird, I didn’t catch that it was nondeterministic. It seems like there must be some issue with the parser if BeautifulSoup is not returning all elements? That or the call to usvg prior to when that search happens is partially failing silently?

@jaseg
Copy link
Owner

jaseg commented Jul 4, 2024

The input to BeautifulSoup is actually the unprocessed input svg, not the usvg output.

I'm currently having trouble reproducing the issue again, but I'll keep trying. I'm reasonably confident it's not in usvg because usvg is pretty good code, and also I've had usvg churn through about 10k instances of the test SVG with no evidence of any mis-parsing. I suspect pugixml. "18MB in a single attr" might just be asking too much of it, I'll see if I can fix it of if I'll have to swap it out for another parser.

edit: to be clear, I think we have two separate issues here that are both triggered by the deluxe-length attrs:

  1. gerbolyze convert "misses" the bottom layers (and also the drill layers) b/c BeautifulSoup silently eats everything past the first long attr
  2. svg-flatten, when called by gerbolyze, sometimes returns an empty output file when asked to process the chonky svg. This happens for both top and bottom layer.

@Wulfsta
Copy link
Author

Wulfsta commented Jul 4, 2024

deluxe-length attrs

Heh.

Yeah, sorry for the unusual bug report - I guess it generally isn't required to parse an attribute this large, though I am admittedly surprised to see it fail... I wonder if there is a particular threshold it fails after? Perhaps it is an overflow or similar?

@jaseg
Copy link
Owner

jaseg commented Jul 4, 2024

Sounds like some sort of overflow to me too. My next step is going to be to do some test runs with svg-flatten inside valgrind to see if it's a memory safetey issue, and if it's not to just look at pugixml's source.

Thanks for the issue, this is a nice challenge :)

jaseg pushed a commit that referenced this issue Jul 5, 2024
BeautifulSoup when using lxml in XML mode would mis-parse XML with very
long attributes. Specifically, a <polygon> with about 18MB in its points
attr would make lxml not return anything past that point in the file.

bs4 uses lxml, which uses libxml2. libxml2 has a config option for
parsing "huge" files that increases buffer sizes and avoids this error,
and this option is exposed in lxml, but AFAICT you can't tell bs4 to set
it, and bs4 just silently swallows the error from lxml.

Fixes one half of #46
jaseg pushed a commit that referenced this issue Jul 5, 2024
The test processes an SVG file of ~36MB with about 500k points per
layer, so it's a bit slow.
@jaseg
Copy link
Owner

jaseg commented Jul 5, 2024

Ok, after the flakyness last night, I thoroughly tested both svg-flatten and usvg, and they both performed without any issue. At this point, I chalk up the non-deterministic behavior I saw to user error on my part.

AFAICT this issue is fixed now on main and in v3.1.9, which will hit PyPI within the next hours.

@jaseg jaseg closed this as completed Jul 5, 2024
@Wulfsta
Copy link
Author

Wulfsta commented Jul 6, 2024

Awesome, thanks! Does this deserve an upstream report to bs4?

@Wulfsta
Copy link
Author

Wulfsta commented Jul 6, 2024

Also, it's a bit strange that bs4 is happily writing the massive files, but not reading them...

@jaseg
Copy link
Owner

jaseg commented Jul 7, 2024

Awesome, thanks! Does this deserve an upstream report to bs4?

I've reported this issue with bs4. For now, I've left the issue report and reproducer on their tracker marked private because there is a chance that this is a buffer overflow of some sort.

@Wulfsta
Copy link
Author

Wulfsta commented Sep 4, 2024

Hey @jaseg, any update from bs4's side of things?

@jaseg
Copy link
Owner

jaseg commented Sep 10, 2024

@Wulfsta The behavior was because bs4 used some LXML API in a weird way, which led to bs4 silently eating an error. I've fixed this issue in gerbolyze in bd2b373 by simply moving all the SVG parsing code to etree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants