Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Raise ValueError if Symbol does not print to valid C variable name. #26360

Merged
merged 4 commits into from Mar 23, 2024

Conversation

moorepants
Copy link
Member

@moorepants moorepants commented Mar 16, 2024

References to other Issues or PRs

Brought up in opty at csu-hmc/opty#139

This somewhat addresses:

#7767

Brief description of what is fixed or changed

In C code generation processes there can be compile errors because some SymPy symbols do not print to valid C variable names. By the time the C code is compiled the error may not be clearly propagated and explained to the user. One option is to tell people up front at code printing time that they have an invalid symbol name. This will at least give a clear error message instead of a possibly obfuscated or hidden compile error.

Maybe the desired solution is to convert a Symbol.name into a valid C variable name instead of raising an error. I suppose that people could be printing invalid C code and then fixing it post printing before they compile the C code. If so, this "fix" could break their code. I'll throw this out here to see what people think, but maybe it is not a good solution.

Other comments

Release Notes

  • printing
    • Raise ValueError if Symbol does not print to valid C variable name.

@sympy-bot
Copy link

sympy-bot commented Mar 16, 2024

Hi, I am the SymPy bot. I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

Your release notes are in good order.

Here is what the release notes will look like:

  • printing
    • Raise ValueError if Symbol does not print to valid C variable name. (#26360 by @moorepants)

This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.13.

Click here to see the pull request description that was parsed.
<!-- Your title above should be a short description of what
was changed. Do not include the issue number in the title. -->

#### References to other Issues or PRs
<!-- If this pull request fixes an issue, write "Fixes #NNNN" in that exact
format, e.g. "Fixes #1234" (see
https://tinyurl.com/auto-closing for more information). Also, please
write a comment on that issue linking back to this pull request once it is
open. -->

Brought up in opty at https://github.com/csu-hmc/opty/issues/139

This somewhat addresses:

https://github.com/sympy/sympy/issues/7767

#### Brief description of what is fixed or changed

In C code generation processes there can be compile errors because some SymPy symbols do not print to valid C variable names. By the time the C code is compiled the error may not be clearly propagated and explained to the user. One option is to tell people up front at code printing time that they have an invalid symbol name. This will at least give a clear error message instead of a possibly obfuscated or hidden compile error.

Maybe the desired solution is to convert a `Symbol.name` into a valid C variable name instead of raising an error. I suppose that people could be printing invalid C code and then fixing it post printing before they compile the C code. If so, this "fix" could break their code. I'll throw this out here to see what people think, but maybe it is not a good solution.

#### Other comments


#### Release Notes

<!-- Write the release notes for this release below between the BEGIN and END
statements. The basic format is a bulleted list with the name of the subpackage
and the release note for this PR. For example:

* solvers
  * Added a new solver for logarithmic equations.

* functions
  * Fixed a bug with log of integers. Formerly, `log(-x)` incorrectly gave `-log(x)`.

* physics.units
  * Corrected a semantical error in the conversion between volt and statvolt which
    reported the volt as being larger than the statvolt.

or if no release note(s) should be included use:

NO ENTRY

See https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more
information on how to write release notes. The bot will check your release
notes automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->

* printing
  * Raise ValueError if Symbol does not print to valid C variable name.

<!-- END RELEASE NOTES -->

Update

The release notes on the wiki have been updated.

@tjstienstra
Copy link
Contributor

I would prefer compatibility with {, } in the c-code printer, like with lambdify. However, raising an error is a good temporary solution, as it at least informs users why their code does not work.

To have it documented as a reference, here is where the actual replacement of { and } happens when using lambdify:

@moorepants
Copy link
Member Author

Replacing the curly braces is a fix, but we also have to replace all symbols but the valid symbols for the complete fix. The Python printer replace statement is a bandaid for just that invalid symbol. I can do that but it doesn't really solve the full issue.

@smichr
Copy link
Member

smichr commented Mar 16, 2024

Maybe the desired solution is to convert a Symbol.name into a valid C variable name instead of raising an error.

I would leave the burden on the user to either create simple variable names that they hand-tune later or else use names they know will work in C. I think what you have here is fine.

sympy/printing/c.py Outdated Show resolved Hide resolved
@bjodah
Copy link
Member

bjodah commented Mar 22, 2024

I like the sentiment, but I also think there are valid use cases for names such as out[i*N+j] when generating code for use in e.g. templates.
Maybe a subset of characters can be checked for even without an opt-in printing option: e.g. the preprocessor tag '#' cannot be used in an expression in C.

As long as we support out[i*N+j] and similar "hacks" that might be used in codegen I'm all for this PR.

sympy/printing/c.py Outdated Show resolved Hide resolved
Copy link

github-actions bot commented Mar 23, 2024

Benchmark results from GitHub Actions

Lower numbers are good, higher numbers are bad. A ratio less than 1
means a speed up and greater than 1 means a slowdown. Green lines
beginning with + are slowdowns (the PR is slower then master or
master is slower than the previous release). Red lines beginning
with - are speedups.

Significantly changed benchmark results (PR vs master)

Significantly changed benchmark results (master vs previous release)

| Change   | Before [a00718ba]    | After [4fd4517a]    |   Ratio | Benchmark (Parameter)                                                |
|----------|----------------------|---------------------|---------|----------------------------------------------------------------------|
| -        | 69.6±1ms             | 43.8±0.2ms          |    0.63 | integrate.TimeIntegrationRisch02.time_doit(10)                       |
| -        | 68.4±0.5ms           | 43.4±0.2ms          |    0.63 | integrate.TimeIntegrationRisch02.time_doit_risch(10)                 |
| +        | 18.4±0.3μs           | 29.7±0.1μs          |    1.62 | integrate.TimeIntegrationRisch03.time_doit(1)                        |
| -        | 5.35±0.02ms          | 2.84±0.01ms         |    0.53 | logic.LogicSuite.time_load_file                                      |
| -        | 73.1±0.9ms           | 28.4±0.2ms          |    0.39 | polys.TimeGCD_GaussInt.time_op(1, 'dense')                           |
| -        | 25.7±0.1ms           | 16.9±0.05ms         |    0.66 | polys.TimeGCD_GaussInt.time_op(1, 'expr')                            |
| -        | 73.7±0.6ms           | 28.6±0.1ms          |    0.39 | polys.TimeGCD_GaussInt.time_op(1, 'sparse')                          |
| -        | 257±2ms              | 124±0.3ms           |    0.48 | polys.TimeGCD_GaussInt.time_op(2, 'dense')                           |
| -        | 255±0.7ms            | 123±0.2ms           |    0.48 | polys.TimeGCD_GaussInt.time_op(2, 'sparse')                          |
| -        | 658±3ms              | 369±2ms             |    0.56 | polys.TimeGCD_GaussInt.time_op(3, 'dense')                           |
| -        | 657±3ms              | 369±3ms             |    0.56 | polys.TimeGCD_GaussInt.time_op(3, 'sparse')                          |
| -        | 495±2μs              | 283±1μs             |    0.57 | polys.TimeGCD_LinearDenseQuadraticGCD.time_op(1, 'dense')            |
| -        | 1.76±0.01ms          | 1.06±0.01ms         |    0.6  | polys.TimeGCD_LinearDenseQuadraticGCD.time_op(2, 'dense')            |
| -        | 5.77±0.03ms          | 3.09±0.01ms         |    0.53 | polys.TimeGCD_LinearDenseQuadraticGCD.time_op(3, 'dense')            |
| -        | 447±1μs              | 227±0.6μs           |    0.51 | polys.TimeGCD_QuadraticNonMonicGCD.time_op(1, 'dense')               |
| -        | 1.49±0.01ms          | 681±2μs             |    0.46 | polys.TimeGCD_QuadraticNonMonicGCD.time_op(2, 'dense')               |
| -        | 4.92±0.05ms          | 1.66±0.02ms         |    0.34 | polys.TimeGCD_QuadraticNonMonicGCD.time_op(3, 'dense')               |
| -        | 377±1μs              | 206±2μs             |    0.55 | polys.TimeGCD_SparseGCDHighDegree.time_op(1, 'dense')                |
| -        | 2.45±0.02ms          | 1.24±0ms            |    0.51 | polys.TimeGCD_SparseGCDHighDegree.time_op(3, 'dense')                |
| -        | 10.1±0.07ms          | 4.30±0.01ms         |    0.42 | polys.TimeGCD_SparseGCDHighDegree.time_op(5, 'dense')                |
| -        | 361±2μs              | 169±2μs             |    0.47 | polys.TimeGCD_SparseNonMonicQuadratic.time_op(1, 'dense')            |
| -        | 2.49±0.02ms          | 912±5μs             |    0.37 | polys.TimeGCD_SparseNonMonicQuadratic.time_op(3, 'dense')            |
| -        | 9.67±0.03ms          | 2.65±0.03ms         |    0.27 | polys.TimeGCD_SparseNonMonicQuadratic.time_op(5, 'dense')            |
| -        | 1.06±0.01ms          | 431±4μs             |    0.41 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(3, 'dense')           |
| -        | 1.76±0.01ms          | 506±2μs             |    0.29 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(3, 'sparse')          |
| -        | 5.89±0.1ms           | 1.79±0.01ms         |    0.3  | polys.TimePREM_LinearDenseQuadraticGCD.time_op(5, 'dense')           |
| -        | 8.40±0.1ms           | 1.53±0.01ms         |    0.18 | polys.TimePREM_LinearDenseQuadraticGCD.time_op(5, 'sparse')          |
| -        | 288±1μs              | 64.5±0.3μs          |    0.22 | polys.TimePREM_QuadraticNonMonicGCD.time_op(1, 'sparse')             |
| -        | 3.49±0.03ms          | 392±3μs             |    0.11 | polys.TimePREM_QuadraticNonMonicGCD.time_op(3, 'dense')              |
| -        | 4.01±0.06ms          | 282±2μs             |    0.07 | polys.TimePREM_QuadraticNonMonicGCD.time_op(3, 'sparse')             |
| -        | 7.07±0.09ms          | 1.27±0.01ms         |    0.18 | polys.TimePREM_QuadraticNonMonicGCD.time_op(5, 'dense')              |
| -        | 8.75±0.08ms          | 854±3μs             |    0.1  | polys.TimePREM_QuadraticNonMonicGCD.time_op(5, 'sparse')             |
| -        | 5.04±0.07ms          | 2.99±0ms            |    0.59 | polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(2, 'sparse') |
| -        | 11.9±0.05ms          | 6.60±0.03ms         |    0.55 | polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(3, 'dense')  |
| -        | 22.3±0.2ms           | 9.09±0.02ms         |    0.41 | polys.TimeSUBRESULTANTS_LinearDenseQuadraticGCD.time_op(3, 'sparse') |
| -        | 5.26±0.02ms          | 865±3μs             |    0.16 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(1, 'sparse')    |
| -        | 12.6±0.05ms          | 7.04±0.02ms         |    0.56 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(2, 'sparse')    |
| -        | 103±2ms              | 26.0±0.1ms          |    0.25 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(3, 'dense')     |
| -        | 168±0.4ms            | 54.0±0.2ms          |    0.32 | polys.TimeSUBRESULTANTS_QuadraticNonMonicGCD.time_op(3, 'sparse')    |
| -        | 360±3μs              | 214±0.7μs           |    0.6  | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(1, 'sparse')     |
| -        | 4.38±0.04ms          | 838±2μs             |    0.19 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(3, 'dense')      |
| -        | 5.35±0.05ms          | 382±2μs             |    0.07 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(3, 'sparse')     |
| -        | 20.4±0.4ms           | 2.78±0.03ms         |    0.14 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(5, 'dense')      |
| -        | 22.9±0.3ms           | 635±6μs             |    0.03 | polys.TimeSUBRESULTANTS_SparseGCDHighDegree.time_op(5, 'sparse')     |
| -        | 511±30μs             | 136±0.7μs           |    0.27 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(1, 'sparse') |
| -        | 4.86±0.04ms          | 608±2μs             |    0.13 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(3, 'dense')  |
| -        | 5.35±0.02ms          | 140±0.8μs           |    0.03 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(3, 'sparse') |
| -        | 13.7±0.2ms           | 1.26±0.01ms         |    0.09 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(5, 'dense')  |
| -        | 14.3±0.2ms           | 141±1μs             |    0.01 | polys.TimeSUBRESULTANTS_SparseNonMonicQuadratic.time_op(5, 'sparse') |
| -        | 134±0.8μs            | 76.9±0.4μs          |    0.57 | solve.TimeMatrixOperations.time_rref(3, 0)                           |
| -        | 248±1μs              | 88.8±0.4μs          |    0.36 | solve.TimeMatrixOperations.time_rref(4, 0)                           |
| -        | 24.2±0.06ms          | 10.2±0.2ms          |    0.42 | solve.TimeSolveLinSys189x49.time_solve_lin_sys                       |
| -        | 28.2±0.1ms           | 15.3±0.1ms          |    0.54 | solve.TimeSparseSystem.time_linsolve_Aaug(20)                        |
| -        | 54.3±0.2ms           | 24.7±0.08ms         |    0.45 | solve.TimeSparseSystem.time_linsolve_Aaug(30)                        |
| -        | 27.9±0.2ms           | 15.1±0.06ms         |    0.54 | solve.TimeSparseSystem.time_linsolve_Ab(20)                          |
| -        | 54.2±0.2ms           | 24.3±0.09ms         |    0.45 | solve.TimeSparseSystem.time_linsolve_Ab(30)                          |

Full benchmark results can be found as artifacts in GitHub Actions
(click on checks at the top of the PR).

@bjodah
Copy link
Member

bjodah commented Mar 23, 2024

Looks great! I'm fine to merge it at this point. If I'm playing the devils advocate: perhaps we should allow for names such as *(ptr+offset) and (*(ptr+offset)) but I think it's unlikely that anyone has been generating code like that.

@moorepants
Copy link
Member Author

Ok, this should be good unless there is more feedback. Thanks folks.

@bjodah bjodah merged commit cba1e7c into sympy:master Mar 23, 2024
68 checks passed
@bjodah
Copy link
Member

bjodah commented Mar 23, 2024

Thanks Jason!

@zachetienne
Copy link
Contributor

I have concerns regarding the current implementation of this patch, particularly due to its reliance on regular expressions for parsing the assign_to strings. Regular expressions can sometimes fail to manage edge cases effectively and introduce unnecessary overhead. A specific example of this limitation is when attempting to use assign_to="commondata->dt" to reference an element of a C struct, which is not supported with the present method. Considering these issues, I suggest that reverting to the previous behavior—where there was no validation on the assign_to strings—might actually be more beneficial than applying this patch. This approach could potentially avoid the complications introduced by regular expressions and maintain the system's flexibility and robustness.

@bjodah
Copy link
Member

bjodah commented Mar 23, 2024

@zachetienne I agree with you. I completely forgot about struct fields. Both -> and . ought to be supported, also in c++ it is quite common (e.g. in eigen) that we use the call opertor to get a reference to write to, e.g. : assign_to='A(i, j)'.

@moorepants how do you want to move forward?

@zachetienne
Copy link
Contributor

@bjodah : I appreciate your understanding and the acknowledgment of the complexities involved with struct fields and the nuances of C++ syntax, such as the use of the call operator in libraries like Eigen. My concern, however, extends beyond just accommodating these syntaxes. Specifically, it revolves around the principle that it should not be within sympy's scope to enforce or validate the correctness of C/C++ syntax. This responsibility ideally lies with the C/C++ compilers, which are in part designed for this purpose.

Implementing a fully robust validation routine within sympy to handle the myriad of syntactical variations and edge cases in C/C++ would not only be a significant undertaking but also introduce considerable overhead. This could potentially impact the performance and maintainability of sympy, diverting resources away from its core functionalities.

Therefore, while I understand the motivation behind ensuring syntactical correctness at the sympy level, I advocate for a more streamlined approach that relies on the compilers to perform these checks. This would allow sympy to focus on its strengths, reducing complexity and maintaining efficiency. I believe exploring alternatives that achieve a balance between usability and performance, without overstepping the responsibilities of sympy, would be in our best interest.

@moorepants
Copy link
Member Author

I am fine if we revert this and look for a better solution, but I disagree that the solution is to rely on solely the compiler with no validation at the SymPy level. Our entire C code printing and code generation tooling already attempts to pass valid C code to the compiler so there are no compiler errors. For example, we throw errors if you use reserved words, which is no different than throwing errors if you use invalid variable names. There is also no issue with using regular expressions as long as we design them properly around any edge cases.

A typical SymPy user will not know how to deal with compiler errors (even if those even properly surface for them to read). There is no machinery in SymPy to catch compiler errors and translate them into something a SymPy user would understand. An advanced user can relax or change any rule we put into place or write their own printer.

@bjodah
Copy link
Member

bjodah commented Mar 23, 2024

I think that a user who is directly generating C-code using ccode(...) can be expected to understand C syntax. If the user uses this functionality indirectly through another library, then I think that the library should pass a printer option which enables strict checking (with helpful error messages).

That would also reduce the risk of breaking back-comp.

Regarding regular expressions: they are quite costly, and I notice that we actually compile it on every invocation in this PR.

I recently added a similar printing option in gh-25913. Would something similar be a way forward here you think?

@moorepants
Copy link
Member Author

I notice that we actually compile it on every invocation in this PR

That should be changed for sure.

@moorepants
Copy link
Member Author

I recently added a similar printing option in #25913. Would something similar be a way forward here you think?

Yes, I'm open for any ways to do this, but would like checks at this this level (optionally or not).

@moorepants
Copy link
Member Author

How about this: moorepants@bd3b15b

@moorepants
Copy link
Member Author

It looks like we have strict as True by default in the code printers.

@zachetienne
Copy link
Contributor

zachetienne commented Mar 24, 2024

@moorepants: Thanks for your work in revising the patch.

[edit: remove duplicated comments abt the patch]

@bjodah, @moorepants: Can we please revert the merged patch, while @moorepants focuses on refining a new one? [edit: remove immaterial comment abt CI] Thanks in advance!

@moorepants
Copy link
Member Author

@zachetienne this is at least the second time that you've reported issues on SymPy asking us to fix things so that your CI runs as desired. It is your choice to have a Github "green" tied to SymPy dev, but not ours to keep your CI green. I recommend either not tying your CI green light to SymPy dev or to accept that you are testing against an unreleased SymPy. We will revert the patch when and if we decide to do so.

If you catch bugs while testing against SymPy development versions, it's easiest if you report the exact errors and examples that are causing failures, which we can then investigate. I personally become less motivated to help you given your current feedback approach.

@zachetienne
Copy link
Contributor

@moorepants: Please accept my sincere apology. I deeply appreciate the hard work and dedication that goes into maintaining and developing SymPy. The tool is invaluable to my work, and I hold the utmost respect for the community that supports its growth and evolution. Indeed, this is not the place to mention my CI failing. Moving forward, I will not mention it, and instead focus on just reporting exact errors/example as you suggest. I agree this would be most productive. :)

@moorepants
Copy link
Member Author

Thanks. We'll find a workable solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants