Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move libsass specific tests to own repository #1484

Closed
wants to merge 1 commit into from

Conversation

mgreter
Copy link
Contributor

@mgreter mgreter commented Oct 13, 2019

Since the workflow we used to have for libsass issues seems no longer welcome,
I would like to move all libsass spec tests to a separate repository so we can
at least have those tested with libsass CI. Once the move is done I try to
adjust all of our libsass CI specs to test and pass both repos.
And to also ensure we don't regress output mode specific stuff again.
https://github.com/mgreter/libsass-specs/tree/incubating [WIP]

@nex3
Copy link
Contributor

nex3 commented Oct 15, 2019

Any behavioral specs that are relevant to LibSass are relevant to all Sass implementations, so they should remain in this repository where every implementation can test against them. The same goes for any new regression specs: they should be added to the shared language spec repository.

At a higher level, it's inappropriate to try to fork the language specs and close all your outstanding PRs because we've instituted a protocol for adding new ones. The plan to refactor the spec repo is well-documented in the issue tracker, and the PR that landed the style guide (#1363) was open for comment for a full month, with an explicit invitation to file issues about it after the fact.

Workflows change, and part of working with a team is being flexible about adapting to workflows that support the broader team even if they aren't ideal for you in particular.

@nex3
Copy link
Contributor

nex3 commented Oct 23, 2019

@mgreter Ping—please re-open your other pull requests to keep LibSass's specs up-to-date.

@nex3
Copy link
Contributor

nex3 commented Dec 19, 2019

@mgreter Ping again. This radio silence isn't an effective way to collaborate.

@saper
Copy link
Member

saper commented Dec 19, 2019

Can I hijack this for a moment? I am a drive-by contributor to this repository and think I have contributed 50 or so commits to this repository so maybe my opinion does not weigh much, but maybe relevant to other drive-by contributors.

I am totally new to the HRX format and I have few suggestions/questions regarding the format itself and its use by the sass/sass-spec repo

  • It would be cool to link to HRX at least once in the style guide. I had trouble with search engines to get me the right pointer and URLs are good, fortunately it is linked in the README file
  • What was the original goal of HRX? Is it supposed to cover anything that human should "unzip" visually or is this format meant to store test cases mostly?
  • How good/availble is the tooling for the HRX format? can I split files easily in python/ruby/c/c++/whatever? google/hrx repo seems to give only the spec.
  • Can we use hrx now with sassc without unpacking anything into the temporary area? (like is there a C library?)
  • I know I can have multiple output files in the hrx format. The sass-spec README tells me I am allowed to have only one output file (either CSS or error). Could we specify the names also for different CSS generation modes? I don't think they are really that libsass-specific.
  • But given we want to use HRX and I use "implementation specific suffix", how can I tell which file is the input and which are the output files to test against? Examples like unbracketed/output.scss in the README are confusing -> both input/output file names extesions are .scss. I feel relying on file extensions to do the right thing might be difficult for humans, given how 'css' and 'scss' are similar.
  • Therefore, could HRX format be changed to have different banners for the input and the output files to be a bit more easier on eyes of the reviewer?
  • could we get metadata such us reference to the issue we have (see below why)? "PREFER referring to issues when marking specs as :todo" refers me to options.yml file.
  • I think it is a pity that options.yml is yet another format in yet another file. If the goal is to have one file per spec, and we agree on the spec format, we should have those metadata in the spec format itself. Again, the goal is to have one context check for the reviewers - which is a worthy goal. Can issue links survive the "todo" marking after it is removed? On the other hand, the spec should not track implementation status - ideally the Github issue is there for that...
  • why only sass comments are allowed? If I have a general description of the case a HRX metadata comment not being part of the name could be useful. For example, the test runner could spit out test descriptions on failure. Having said that, HRX comments are hard to distinguish from the input/output contents and therefore may not be that easy on humans reading and writing the files.
  • what about partials/includes in the HRX? the README tells me I should only have input and either the output or an error file, but then we have files to be included sometimes (example: https://github.com/sass/sass-spec/blob/0749da435328e48acf2a54f0299b4d78cefc8d63/spec/libsass/base-level-parent/imported/basic-prefix.hrx). Should the runner rely on the input.scss filename only to recognize what is the main output? "PREFER making imported files partials" rule is only a weak filename convention - and again the difference can be only an underscore away....

And finally a pretty important question relevant to this issue in particular. I think we currently store both the spec ("how things really should be" or "machine-readable and testable language specification") and the regression tests ("this thing was broken in the past for implementation Zeta").

Since I am not a core sass developer and I my contributions to libsass itself have been tiny (mostly whining) I believe I was mostly contributing things that came out of bug reports, mostly from node-sass I am somewhat familiar with.

My ideal workflow is the following:

  • spec format is so easy and straightforward that even an end-user is able to produce the test case in the format we want (hey, node-sass/sassc could generate the test case automatically if asked).
    failing that:
  • I am usually putting a somewhat minimal case in the repo to easily re-check if the bug is fixed.

This is a regression test case and is rarely minimal and does not describe
This case may be eventually adjusted or even reduced to a proper language spec test case. But it is also possible that specific combination of language features caused some problem.

Therefore I will almost never satisfy rules "DO test only one thing per spec" in those tests, because they are produced before even the issue is understood at all. I'd like to keep them around for a whole product lifecycle even if they are somehow duplicating language spec tests.

Can't speak for @mgreter but this is what I think from the perspective of a small drive-by contributor that mostly deals with regressions and user reported bugs. Maybe keeping implementation specific regression tests in another repo is not a very bad idea overall - the language spec suite could become much cleaner (no todo flags etc.) in that case. Or we should have a common repo, but regression tests should be nicely integrated into the test suite.

@nex3
Copy link
Contributor

nex3 commented Dec 30, 2019

@saper That's a big post, and I don't really have time to respond to each question individually. I think a lot of those questions can be answered by reading through the HRX spec and looking at examples of how various HRX-ified test cases look in this repo. For concrete suggestions, feel free to file separate issues so we can discuss there.

To address your larger point about regression tests, I don't consider them to be a totally separate class from spec tests the same way you do. I think of a bug that needs a regression test as an indication that the original spec tests weren't comprehensive enough, so the regression test itself should be in the same form and the same location as the original spec tests. This is true even if it involves an intersection of multiple language features—and in fact we have many non-regression tests that cover multiple features. There's always a way to represent that intersection without any unrelated behavior making it less clear. The style guide aims to encode one specific, consistent way of doing that.

@mgreter
Copy link
Contributor Author

mgreter commented Jan 17, 2020

@nex3 please tell me which ones you mean, I checked and the only ones I saw were from WIP branches.

@nex3
Copy link
Contributor

nex3 commented Feb 24, 2020

@mgreter I mean PRs like #1469

@mgreter
Copy link
Contributor Author

mgreter commented Feb 25, 2020

@nex3 that PR was merged, so how do you expect me to "please re-open your other pull requests to keep LibSass's specs up-to-date."??

Main reason to pull out libsass spec tests is as @saper wrote: "I think we currently store both the spec ("how things really should be" or "machine-readable and testable language specification") and the regression tests ("this thing was broken in the past for implementation Zeta")."

@saper
Copy link
Member

saper commented Feb 26, 2020

so the regression test itself should be in the same form and the same location as the original spec tests.

I think my point is that it might be difficult to coerce regression tests into the style guide. Which is fine, because I think this style guide is good for spec tests.

This is true even if it involves an intersection of multiple language features—and in fact we have many non-regression tests that cover multiple features.

The style guide says:

DO test only one thing per spec

One feature per test, multiple features per test?

There's always a way to represent that intersection without any unrelated behavior making it less clear.

Yes, but it takes time, requires significant effort and can introduce bugs in the process.
The process of getting clean specs written or improved upon out of the bug reports is iterative and may span multiple releases of the compiler. The specs may be getting improved over time - regression tests remaining frozen serve as an additional check in this process.

The style guide aims to encode one specific, consistent way of doing that.

The way I think now about regression tests is that they should preferably be the unedited versions of original bug reports (subject to sanity of course). The overlap between "clean" specs (this is how the feature looks like) and "dirty" regression tests is expected.

User code being nice or ugly documents the actual usage of the language features, sometimes way beyond the imagination of the language creator.

The regression test should just stay there, preferably identified by a bug number - as the style guide suggests in PREFER referring to issues when marking specs as :todo and not grouped by the language features - this is currently required by the DO organize specs by which part of the language they test rule. Yes, they can be ugly, present badly-written or complex code - that does not matter much as long as they are executable.

Even if a particular regression issue gets closed as going against the spec, the corresponding regression could still be salvaged to test a hopefully improved error reporting.

While specs grow and get better over time, regression tests are write-once-stay-ugly features. It might be worthwhile to have a clear separation between the two and apply the style guide rules to the specs test only.

@nex3
Copy link
Contributor

nex3 commented Feb 27, 2020

@mgreter

@nex3 that PR was merged, so how do you expect me to "please re-open your other pull requests to keep LibSass's specs up-to-date."??

Sorry you're right. I guess I was thinking about the non-style-guide-ified pull requests you landed.

Main reason to pull out libsass spec tests is as @saper wrote: "I think we currently store both the spec ("how things really should be" or "machine-readable and testable language specification") and the regression tests ("this thing was broken in the past for implementation Zeta")."

It's useful to test things that were broken for one implementation in other implementations as well. The point of the style guide is to ensure that those tests are as minimal as possible, rather than just copying whatever code the user originally reported. This helps emphasize exactly what the problem being tested was.


@saper

I think my point is that it might be difficult to coerce regression tests into the style guide. Which is fine, because I think this style guide is good for spec tests.

I don't think that's true. Consider #1486 and #1487 as counter-examples: both of these add regression tests for specific issues that still follow the style guide and minimally target the issues in question.

The style guide says:

DO test only one thing per spec

One feature per test, multiple features per test?

Another way to phrase that requirement is, "make the test as minimal as possible while still exercising the behavior in question". There are plenty of tests, both written before a feature landed and written to guard against regressions for a specific issue, that cover the intersection of multiple features. The point is that a single test shouldn't make a bunch of assertions at once the way, say, this one does.

There's always a way to represent that intersection without any unrelated behavior making it less clear.

Yes, but it takes time, requires significant effort and can introduce bugs in the process.
The process of getting clean specs written or improved upon out of the bug reports is iterative and may span multiple releases of the compiler. The specs may be getting improved over time - regression tests remaining frozen serve as an additional check in this process.

I agree that it takes work to generate clear and minimal specs—I've personally been shouldering the bulk of that work as I go through and rewrite specs (or write fresh specs) for existing language features that follow the style guide. The style guide is designed to make that as smooth as possible by providing clear guidelines on what an ideal spec should look like. But ultimately, it is work, and that work is valuable: it's an investment in the long-term maintainability and flexibility of the repo.

The way I think now about regression tests is that they should preferably be the unedited versions of original bug reports (subject to sanity of course). The overlap between "clean" specs (this is how the feature looks like) and "dirty" regression tests is expected.

User code being nice or ugly documents the actual usage of the language features, sometimes way beyond the imagination of the language creator.

The regression test should just stay there, preferably identified by a bug number - as the style guide suggests in PREFER referring to issues when marking specs as :todo and not grouped by the language features - this is currently required by the DO organize specs by which part of the language they test rule. Yes, they can be ugly, present badly-written or complex code - that does not matter much as long as they are executable.

Even if a particular regression issue gets closed as going against the spec, the corresponding regression could still be salvaged to test a hopefully improved error reporting.

While specs grow and get better over time, regression tests are write-once-stay-ugly features. It might be worthwhile to have a clear separation between the two and apply the style guide rules to the specs test only.

What's the value you see in having these specs "stay ugly"? Is it just a question of minimizing the effort of initially adding them? Because I think it's essentially always worth the effort to avoid taking on technical debt for a project like this that's intended to live a long time.

Those "ugly" tests come with significant costs relative to minimal, style-guide-compliant regression tests. When I was first enabling specs for Dart Sass, complex tests that covered multiple features posed a huge hassle for determining which features the new implementation actually supported. When they use deprecated features, they need to be manually migrated forward to new versions of the language even when those features aren't the point of the test. And whenever they do need to be changed, it's next to impossible to figure out if doing so actually preserves the meaning of the test because it's totally unclear what a minimal reproduction of the actual bug would even look like.

Keeping a test case around is not free. Whether we decide to keep it ugly or not, it needs to be maintained over time, and that maintenance requires understanding what it's testing and why. On the other hand, it's legitimately valuable to test regressions across all implementations, because regressions represent a lack of coverage in the original test suite and a real breakage that happened in practice. These factors together point to making regression tests clear, well-organized, and maintainable just like all the other tests in the repo.

@Awjin
Copy link
Contributor

Awjin commented Feb 28, 2020

@saper

The process of getting clean specs written or improved upon out of the bug reports is iterative and may span multiple releases of the compiler. The specs may be getting improved over time - regression tests remaining frozen serve as an additional check in this process.

This implies that specs need to be improved, and regression tests need to be added, necessitating twice the work to test one thing. Wouldn't it be more efficient to focus on cleanly improving the spec? I think @nex3 makes a great point about "ugly tests". They are essentially a lose-lose—simultaneously technical debt and missed opportunities to test across Sass implementations.

@mgreter
Copy link
Contributor Author

mgreter commented Mar 1, 2020

So what is the outcome? Can we have libsass specific bug-report regression specs in a separate repo or not? I don't know what else to write beside pretty please with a cherry on top! We already miss quite a lot of regression tests in our unit tests: https://github.com/sass/libsass/issues?q=is%3Aissue+is%3Aclosed+label%3A%22Dev+-+Needs+Test%22. I don't have enough free time (and honestly also not the motivation) to break them down to an acceptable minimal level as currently set out in the readme. Won't go into details why this would be difficult (e.g. null-ptr access under certain nesting conditions), but I'd rather have them "ugly" than don't have them at all.

P.s. as a concrete exercise, how would you expect me/us to break down regressions like https://github.com/sass/sass-spec/blob/master/spec/libsass-closed-issues/issue_245443/input.scss or https://github.com/sass/sass-spec/blob/master/spec/libsass-todo-issues/issue_221267/input.scss or
https://github.com/sass/sass-spec/blob/master/spec/libsass-todo-issues/issue_221292.hrx.

P.p.s @nex3 "Sorry you're right. I guess I was thinking about the non-style-guide-ified pull requests you landed.": Feel free to point me to the PR you're meaning: https://github.com/sass/sass-spec/commits?author=mgreter&since=2018-01-01&until=2020-03-01 ... I only changed libsass specific stuff beside the fix in the spec runner that first broke our CI and then after my fix yours :-/

@mgreter mgreter requested a review from Awjin March 1, 2020 04:55
@Awjin
Copy link
Contributor

Awjin commented Mar 3, 2020

[...] this would be difficult (e.g. null-ptr access under certain nesting conditions), but I'd rather have them "ugly" than don't have them at all.

Since this isn't language-level behavior but specific to LibSass, it would be a great candidate for testing in libsass. I think our main concern is about regressions that need testing in Dart Sass too; it would be a shame to lose those tests. Breaking down regressions into a minimal reproduction helps us decide whether the behavior is specific to LibSass or applies across implementations.

P.s. as a concrete exercise, how would you expect me/us to break down regressions like https://github.com/sass/sass-spec/blob/master/spec/libsass-closed-issues/issue_245443/input.scss or https://github.com/sass/sass-spec/blob/master/spec/libsass-todo-issues/issue_221267/input.scss or
https://github.com/sass/sass-spec/blob/master/spec/libsass-todo-issues/issue_221292.hrx.

Thanks for pointing to concrete examples—let's look at the last one's input: /**/0{i:((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((.

I can't easily tell why Sass is parsing this incorrectly. Is it the empty comment /**/ at the front? Is it the missing } at the end? Is it because ( does not get closed? Or is it the sheer quantity of (? The breakage could be coming from some, or all, of those causes.

If the error is indeed caused by a ( that does not get closed, the minimal input is:

a {b: (}

This minimal test makes it clear what we need to fix in Dart Sass and guards against that brokenness in future Sass implementations. If the error actually has multiple causes, having small tests for each is also good. Say the other causes are:

  • Multiple open parentheses a {b: (((}
  • Unclosed bracket a {b:

When fixing the problem in a different implementation, there are now 3 minimal specs that we can address individually. IMO that gives more certainty that we've implemented the correct behavior, compared to trying to get /**/0{i:(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( to pass.

@saper
Copy link
Member

saper commented Mar 4, 2020

I can't easily tell why Sass is parsing this incorrectly. Is it the empty comment /**/ at the front? Is it the missing } at the end? Is it because ( does not get closed? Or is it the sheer quantity of (? The breakage could be coming from some, or all, of those causes.

One of the reasons we do not know answers to those questions in advance we need regression tests.

Currently we have the following workflow in libsass:

  1. Accept the bug, issue 1234
  2. Write a regression test for issue 1234 (marked as todo)
  3. Some time later, analyze the bug
  4. If we miss some language features, clarify and write language another spec test
  5. Fix the bug, close issue 1234
  6. Unmark the regression test for issue 1234 as todo

Analysis at 4 is difficult and may not always be complete even during the bug fixing.

Now back to the example at hand:

Suppose we the parentheses bug in libsass 1.0. We have produced the following test (spec) in a process of fixing the bug:

a {b: (}

Now, libsass 4.0 comes out and, again,

**/0{i:((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((

fails but for a different reason. Some code refactoring made this to pass:

a {b: (}

but this to fail:

a {b: (((((}

for at least 5 parentheses.

Without a regression test, a "clean" spec test does not alert us of a problem in the 4.0 release.

Does adding a {b: (((((} or a {b: (((((((((((((((((((((((((((} or maybe a {b: (((((((((((((((((((((((((((((((((((((((((((((((} contribute anything to the better specification of a language? No.

Did we re-introduce old bug because we didn't store the original problem? Yes.

Guarding against reintroduction of old bugs is a primary goal of regression testing. The tests @mgreter provided are non-functional ones, they do not teach us about sass language and the desired functionality. They expose dark corners of the libsass implementation.

Sure, Dart Sass and other implementations should be able to pass regression tests as well and they might help to find bugs as well. But we won't be able to share them if the some form of a style guide will be forced on us.

Implementations that do not have multiple output styles (compact, compressed, etc.) will not be able to run those regression tests though as they are implementation-specific.

@mgreter
Copy link
Contributor Author

mgreter commented Mar 4, 2020

Since this isn't language-level behavior but specific to LibSass, it would be a great candidate for testing in libsass.

I would like to keep them in sass-spec under libsass folder OR anywhere else (that's the main purpose of this PR), but since I did so my commit rights have been revoked by @nex3. So what do you expect me to do, I rather can follow the readme or don't do anything at all! We had a workflow in libsass to add a regression test for every bug that was reported, eg. #1473 (comment) or #1481 (as @saper also pointed out).

So can we move "libsass specific tests to own repository" or not!?? They are not gone, anybody feel free to make minimal spec tests off of it, but I/we just want to have a 1 to 1 regression test for each issue!!

Btw. @Awjin the above issue is because of stack limitation (and fuzzy people) in c/c++, which dart-sass also has in comon, but we as libsass get far worse reputation for it: https://www.cvedetails.com/vulnerability-list/vendor_id-16627/Libsass.html

So what is the outcome? Can we have libsass specific bug-report regression specs in a separate repo or not? I don't know what else to write beside pretty please with a cherry on top!

PLEASE 🙏

@mgreter
Copy link
Contributor Author

mgreter commented Mar 7, 2020

@xzyfer @saper I'm also OK to just copy the existing regressions tests to a new repo and add the outstanding tests to it. I feel like this is currently the only way to keep our libsass regression test policy. I don't think we can get an acceptance under the sass github"folder" so we probably have to live with it living in my own "mgreter" space. I'll give a few more days, but will start to copy and to cover outstanding/existing regressions test there!

@mgreter
Copy link
Contributor Author

mgreter commented Mar 7, 2020

Will seek another way to keep libsass regressions!

@mgreter mgreter closed this Mar 7, 2020
@saper
Copy link
Member

saper commented Mar 7, 2020

Yes, I would just rename the repo to make sure it is not the "spec" repo. We just need to hook it into the libsass CI I think?

@xzyfer
Copy link
Contributor

xzyfer commented Mar 7, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants