Move libsass specific tests to own repository #1484

mgreter · 2019-10-13T03:20:22Z

Since the workflow we used to have for libsass issues seems no longer welcome,
I would like to move all libsass spec tests to a separate repository so we can
at least have those tested with libsass CI. Once the move is done I try to
adjust all of our libsass CI specs to test and pass both repos.
And to also ensure we don't regress output mode specific stuff again.
https://github.com/mgreter/libsass-specs/tree/incubating [WIP]

nex3 · 2019-10-15T01:14:03Z

Any behavioral specs that are relevant to LibSass are relevant to all Sass implementations, so they should remain in this repository where every implementation can test against them. The same goes for any new regression specs: they should be added to the shared language spec repository.

At a higher level, it's inappropriate to try to fork the language specs and close all your outstanding PRs because we've instituted a protocol for adding new ones. The plan to refactor the spec repo is well-documented in the issue tracker, and the PR that landed the style guide (#1363) was open for comment for a full month, with an explicit invitation to file issues about it after the fact.

Workflows change, and part of working with a team is being flexible about adapting to workflows that support the broader team even if they aren't ideal for you in particular.

nex3 · 2019-10-23T18:16:03Z

@mgreter Ping—please re-open your other pull requests to keep LibSass's specs up-to-date.

nex3 · 2019-12-19T01:27:31Z

@mgreter Ping again. This radio silence isn't an effective way to collaborate.

saper · 2019-12-19T22:38:46Z

Can I hijack this for a moment? I am a drive-by contributor to this repository and think I have contributed 50 or so commits to this repository so maybe my opinion does not weigh much, but maybe relevant to other drive-by contributors.

I am totally new to the HRX format and I have few suggestions/questions regarding the format itself and its use by the sass/sass-spec repo

And finally a pretty important question relevant to this issue in particular. I think we currently store both the spec ("how things really should be" or "machine-readable and testable language specification") and the regression tests ("this thing was broken in the past for implementation Zeta").

Since I am not a core sass developer and I my contributions to libsass itself have been tiny (mostly whining) I believe I was mostly contributing things that came out of bug reports, mostly from node-sass I am somewhat familiar with.

My ideal workflow is the following:

spec format is so easy and straightforward that even an end-user is able to produce the test case in the format we want (hey, node-sass/sassc could generate the test case automatically if asked).
failing that:
I am usually putting a somewhat minimal case in the repo to easily re-check if the bug is fixed.

This is a regression test case and is rarely minimal and does not describe
This case may be eventually adjusted or even reduced to a proper language spec test case. But it is also possible that specific combination of language features caused some problem.

Therefore I will almost never satisfy rules "DO test only one thing per spec" in those tests, because they are produced before even the issue is understood at all. I'd like to keep them around for a whole product lifecycle even if they are somehow duplicating language spec tests.

Can't speak for @mgreter but this is what I think from the perspective of a small drive-by contributor that mostly deals with regressions and user reported bugs. Maybe keeping implementation specific regression tests in another repo is not a very bad idea overall - the language spec suite could become much cleaner (no todo flags etc.) in that case. Or we should have a common repo, but regression tests should be nicely integrated into the test suite.

nex3 · 2019-12-30T23:49:15Z

@saper That's a big post, and I don't really have time to respond to each question individually. I think a lot of those questions can be answered by reading through the HRX spec and looking at examples of how various HRX-ified test cases look in this repo. For concrete suggestions, feel free to file separate issues so we can discuss there.

To address your larger point about regression tests, I don't consider them to be a totally separate class from spec tests the same way you do. I think of a bug that needs a regression test as an indication that the original spec tests weren't comprehensive enough, so the regression test itself should be in the same form and the same location as the original spec tests. This is true even if it involves an intersection of multiple language features—and in fact we have many non-regression tests that cover multiple features. There's always a way to represent that intersection without any unrelated behavior making it less clear. The style guide aims to encode one specific, consistent way of doing that.

mgreter · 2020-01-17T02:47:26Z

@nex3 please tell me which ones you mean, I checked and the only ones I saw were from WIP branches.

nex3 · 2020-02-24T20:18:35Z

@mgreter I mean PRs like #1469

mgreter · 2020-02-25T23:53:41Z

@nex3 that PR was merged, so how do you expect me to "please re-open your other pull requests to keep LibSass's specs up-to-date."??

Main reason to pull out libsass spec tests is as @saper wrote: "I think we currently store both the spec ("how things really should be" or "machine-readable and testable language specification") and the regression tests ("this thing was broken in the past for implementation Zeta")."

saper · 2020-02-26T01:19:02Z

so the regression test itself should be in the same form and the same location as the original spec tests.

I think my point is that it might be difficult to coerce regression tests into the style guide. Which is fine, because I think this style guide is good for spec tests.

This is true even if it involves an intersection of multiple language features—and in fact we have many non-regression tests that cover multiple features.

The style guide says:

DO test only one thing per spec

One feature per test, multiple features per test?

There's always a way to represent that intersection without any unrelated behavior making it less clear.

Yes, but it takes time, requires significant effort and can introduce bugs in the process.
The process of getting clean specs written or improved upon out of the bug reports is iterative and may span multiple releases of the compiler. The specs may be getting improved over time - regression tests remaining frozen serve as an additional check in this process.

The style guide aims to encode one specific, consistent way of doing that.

The way I think now about regression tests is that they should preferably be the unedited versions of original bug reports (subject to sanity of course). The overlap between "clean" specs (this is how the feature looks like) and "dirty" regression tests is expected.

User code being nice or ugly documents the actual usage of the language features, sometimes way beyond the imagination of the language creator.

The regression test should just stay there, preferably identified by a bug number - as the style guide suggests in PREFER referring to issues when marking specs as :todo and not grouped by the language features - this is currently required by the DO organize specs by which part of the language they test rule. Yes, they can be ugly, present badly-written or complex code - that does not matter much as long as they are executable.

Even if a particular regression issue gets closed as going against the spec, the corresponding regression could still be salvaged to test a hopefully improved error reporting.

While specs grow and get better over time, regression tests are write-once-stay-ugly features. It might be worthwhile to have a clear separation between the two and apply the style guide rules to the specs test only.

nex3 · 2020-02-27T21:55:46Z

@mgreter

@nex3 that PR was merged, so how do you expect me to "please re-open your other pull requests to keep LibSass's specs up-to-date."??

Sorry you're right. I guess I was thinking about the non-style-guide-ified pull requests you landed.

Main reason to pull out libsass spec tests is as @saper wrote: "I think we currently store both the spec ("how things really should be" or "machine-readable and testable language specification") and the regression tests ("this thing was broken in the past for implementation Zeta")."

It's useful to test things that were broken for one implementation in other implementations as well. The point of the style guide is to ensure that those tests are as minimal as possible, rather than just copying whatever code the user originally reported. This helps emphasize exactly what the problem being tested was.

@saper

I think my point is that it might be difficult to coerce regression tests into the style guide. Which is fine, because I think this style guide is good for spec tests.

I don't think that's true. Consider #1486 and #1487 as counter-examples: both of these add regression tests for specific issues that still follow the style guide and minimally target the issues in question.

The style guide says:

DO test only one thing per spec

One feature per test, multiple features per test?

Another way to phrase that requirement is, "make the test as minimal as possible while still exercising the behavior in question". There are plenty of tests, both written before a feature landed and written to guard against regressions for a specific issue, that cover the intersection of multiple features. The point is that a single test shouldn't make a bunch of assertions at once the way, say, this one does.

There's always a way to represent that intersection without any unrelated behavior making it less clear.

Yes, but it takes time, requires significant effort and can introduce bugs in the process.
The process of getting clean specs written or improved upon out of the bug reports is iterative and may span multiple releases of the compiler. The specs may be getting improved over time - regression tests remaining frozen serve as an additional check in this process.

I agree that it takes work to generate clear and minimal specs—I've personally been shouldering the bulk of that work as I go through and rewrite specs (or write fresh specs) for existing language features that follow the style guide. The style guide is designed to make that as smooth as possible by providing clear guidelines on what an ideal spec should look like. But ultimately, it is work, and that work is valuable: it's an investment in the long-term maintainability and flexibility of the repo.

The way I think now about regression tests is that they should preferably be the unedited versions of original bug reports (subject to sanity of course). The overlap between "clean" specs (this is how the feature looks like) and "dirty" regression tests is expected.

User code being nice or ugly documents the actual usage of the language features, sometimes way beyond the imagination of the language creator.

The regression test should just stay there, preferably identified by a bug number - as the style guide suggests in PREFER referring to issues when marking specs as :todo and not grouped by the language features - this is currently required by the DO organize specs by which part of the language they test rule. Yes, they can be ugly, present badly-written or complex code - that does not matter much as long as they are executable.

Even if a particular regression issue gets closed as going against the spec, the corresponding regression could still be salvaged to test a hopefully improved error reporting.

While specs grow and get better over time, regression tests are write-once-stay-ugly features. It might be worthwhile to have a clear separation between the two and apply the style guide rules to the specs test only.

What's the value you see in having these specs "stay ugly"? Is it just a question of minimizing the effort of initially adding them? Because I think it's essentially always worth the effort to avoid taking on technical debt for a project like this that's intended to live a long time.

Those "ugly" tests come with significant costs relative to minimal, style-guide-compliant regression tests. When I was first enabling specs for Dart Sass, complex tests that covered multiple features posed a huge hassle for determining which features the new implementation actually supported. When they use deprecated features, they need to be manually migrated forward to new versions of the language even when those features aren't the point of the test. And whenever they do need to be changed, it's next to impossible to figure out if doing so actually preserves the meaning of the test because it's totally unclear what a minimal reproduction of the actual bug would even look like.

Keeping a test case around is not free. Whether we decide to keep it ugly or not, it needs to be maintained over time, and that maintenance requires understanding what it's testing and why. On the other hand, it's legitimately valuable to test regressions across all implementations, because regressions represent a lack of coverage in the original test suite and a real breakage that happened in practice. These factors together point to making regression tests clear, well-organized, and maintainable just like all the other tests in the repo.

Awjin · 2020-02-28T06:44:30Z

@saper

The process of getting clean specs written or improved upon out of the bug reports is iterative and may span multiple releases of the compiler. The specs may be getting improved over time - regression tests remaining frozen serve as an additional check in this process.

This implies that specs need to be improved, and regression tests need to be added, necessitating twice the work to test one thing. Wouldn't it be more efficient to focus on cleanly improving the spec? I think @nex3 makes a great point about "ugly tests". They are essentially a lose-lose—simultaneously technical debt and missed opportunities to test across Sass implementations.

mgreter · 2020-03-01T03:55:16Z

So what is the outcome? Can we have libsass specific bug-report regression specs in a separate repo or not? I don't know what else to write beside pretty please with a cherry on top! We already miss quite a lot of regression tests in our unit tests: https://github.com/sass/libsass/issues?q=is%3Aissue+is%3Aclosed+label%3A%22Dev+-+Needs+Test%22. I don't have enough free time (and honestly also not the motivation) to break them down to an acceptable minimal level as currently set out in the readme. Won't go into details why this would be difficult (e.g. null-ptr access under certain nesting conditions), but I'd rather have them "ugly" than don't have them at all.

P.s. as a concrete exercise, how would you expect me/us to break down regressions like https://github.com/sass/sass-spec/blob/master/spec/libsass-closed-issues/issue_245443/input.scss or https://github.com/sass/sass-spec/blob/master/spec/libsass-todo-issues/issue_221267/input.scss or
https://github.com/sass/sass-spec/blob/master/spec/libsass-todo-issues/issue_221292.hrx.

P.p.s @nex3 "Sorry you're right. I guess I was thinking about the non-style-guide-ified pull requests you landed.": Feel free to point me to the PR you're meaning: https://github.com/sass/sass-spec/commits?author=mgreter&since=2018-01-01&until=2020-03-01 ... I only changed libsass specific stuff beside the fix in the spec runner that first broke our CI and then after my fix yours :-/

Add specs for selector-extend()

Awjin · 2020-03-03T19:08:18Z

[...] this would be difficult (e.g. null-ptr access under certain nesting conditions), but I'd rather have them "ugly" than don't have them at all.

Since this isn't language-level behavior but specific to LibSass, it would be a great candidate for testing in libsass. I think our main concern is about regressions that need testing in Dart Sass too; it would be a shame to lose those tests. Breaking down regressions into a minimal reproduction helps us decide whether the behavior is specific to LibSass or applies across implementations.

P.s. as a concrete exercise, how would you expect me/us to break down regressions like https://github.com/sass/sass-spec/blob/master/spec/libsass-closed-issues/issue_245443/input.scss or https://github.com/sass/sass-spec/blob/master/spec/libsass-todo-issues/issue_221267/input.scss or
https://github.com/sass/sass-spec/blob/master/spec/libsass-todo-issues/issue_221292.hrx.

Thanks for pointing to concrete examples—let's look at the last one's input: /**/0{i:((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((.

I can't easily tell why Sass is parsing this incorrectly. Is it the empty comment /**/ at the front? Is it the missing } at the end? Is it because ( does not get closed? Or is it the sheer quantity of (? The breakage could be coming from some, or all, of those causes.

If the error is indeed caused by a ( that does not get closed, the minimal input is:

a {b: (}

This minimal test makes it clear what we need to fix in Dart Sass and guards against that brokenness in future Sass implementations. If the error actually has multiple causes, having small tests for each is also good. Say the other causes are:

Multiple open parentheses a {b: (((}
Unclosed bracket a {b:

When fixing the problem in a different implementation, there are now 3 minimal specs that we can address individually. IMO that gives more certainty that we've implemented the correct behavior, compared to trying to get /**/0{i:(((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( to pass.

saper · 2020-03-04T01:44:38Z

I can't easily tell why Sass is parsing this incorrectly. Is it the empty comment /**/ at the front? Is it the missing } at the end? Is it because ( does not get closed? Or is it the sheer quantity of (? The breakage could be coming from some, or all, of those causes.

One of the reasons we do not know answers to those questions in advance we need regression tests.

Currently we have the following workflow in libsass:

Accept the bug, issue 1234
Write a regression test for issue 1234 (marked as todo)
Some time later, analyze the bug
If we miss some language features, clarify and write language another spec test
Fix the bug, close issue 1234
Unmark the regression test for issue 1234 as todo

Analysis at 4 is difficult and may not always be complete even during the bug fixing.

Now back to the example at hand:

Suppose we the parentheses bug in libsass 1.0. We have produced the following test (spec) in a process of fixing the bug:

a {b: (}

Now, libsass 4.0 comes out and, again,

**/0{i:((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((

fails but for a different reason. Some code refactoring made this to pass:

a {b: (}

but this to fail:

a {b: (((((}

for at least 5 parentheses.

Without a regression test, a "clean" spec test does not alert us of a problem in the 4.0 release.

Does adding a {b: (((((} or a {b: (((((((((((((((((((((((((((} or maybe a {b: (((((((((((((((((((((((((((((((((((((((((((((((} contribute anything to the better specification of a language? No.

Did we re-introduce old bug because we didn't store the original problem? Yes.

Guarding against reintroduction of old bugs is a primary goal of regression testing. The tests @mgreter provided are non-functional ones, they do not teach us about sass language and the desired functionality. They expose dark corners of the libsass implementation.

Sure, Dart Sass and other implementations should be able to pass regression tests as well and they might help to find bugs as well. But we won't be able to share them if the some form of a style guide will be forced on us.

Implementations that do not have multiple output styles (compact, compressed, etc.) will not be able to run those regression tests though as they are implementation-specific.

mgreter · 2020-03-04T02:45:54Z

Since this isn't language-level behavior but specific to LibSass, it would be a great candidate for testing in libsass.

I would like to keep them in sass-spec under libsass folder OR anywhere else (that's the main purpose of this PR), but since I did so my commit rights have been revoked by @nex3. So what do you expect me to do, I rather can follow the readme or don't do anything at all! We had a workflow in libsass to add a regression test for every bug that was reported, eg. #1473 (comment) or #1481 (as @saper also pointed out).

So can we move "libsass specific tests to own repository" or not!?? They are not gone, anybody feel free to make minimal spec tests off of it, but I/we just want to have a 1 to 1 regression test for each issue!!

Btw. @Awjin the above issue is because of stack limitation (and fuzzy people) in c/c++, which dart-sass also has in comon, but we as libsass get far worse reputation for it: https://www.cvedetails.com/vulnerability-list/vendor_id-16627/Libsass.html

So what is the outcome? Can we have libsass specific bug-report regression specs in a separate repo or not? I don't know what else to write beside pretty please with a cherry on top!

PLEASE 🙏

mgreter · 2020-03-07T02:03:31Z

@xzyfer @saper I'm also OK to just copy the existing regressions tests to a new repo and add the outstanding tests to it. I feel like this is currently the only way to keep our libsass regression test policy. I don't think we can get an acceptance under the sass github"folder" so we probably have to live with it living in my own "mgreter" space. I'll give a few more days, but will start to copy and to cover outstanding/existing regressions test there!

mgreter · 2020-03-07T02:28:11Z

Will seek another way to keep libsass regressions!

saper · 2020-03-07T02:40:19Z

Yes, I would just rename the repo to make sure it is not the "spec" repo. We just need to hook it into the libsass CI I think?

xzyfer · 2020-03-07T05:34:02Z

I'm on board with a separate repo. We get a lot of external fuzzing that doesn't group neatly into language features.

…

On Sat, 7 Mar 2020, 1:40 pm Marcin Cieślak, ***@***.***> wrote: Yes, I would just rename the repo to make sure it is not the "spec" repo. We just need to hook it into the libsass CI I think? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1484?email_source=notifications&email_token=AAENSWBBIY6LKYH5WMKWGT3RGGXZLA5CNFSM4JAFICSKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEODM2DY#issuecomment-596036879>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAENSWDYN6Y6HVHQU37UFATRGGXZLANCNFSM4JAFICSA> .

mgreter mentioned this pull request Oct 13, 2019

Move libsass specific tests to own repository mgreter/libsass-specs#1

Merged

mgreter requested review from nex3, nshahan and jathak October 13, 2019 03:22

saper mentioned this pull request Feb 27, 2020

Restructure directories to conform to style guide #1516

Open

mgreter force-pushed the move-libsass-spec-tests branch from cd1b55a to c89119d Compare March 1, 2020 04:52

Merge pull request sass#1518 from sass/selector-extend

c89119d

Add specs for selector-extend()

mgreter requested a review from Awjin March 1, 2020 04:55

mgreter requested review from saper, xzyfer, HamptonMakes, akhleung, chriseppstein, glebm and malrase March 4, 2020 03:43

mgreter requested review from mirisuzanne, nschonni and paulirish March 4, 2020 03:44

mgreter closed this Mar 7, 2020

saper mentioned this pull request Dec 21, 2020

GitHub Readable spec format #1095

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move libsass specific tests to own repository #1484

Move libsass specific tests to own repository #1484

mgreter commented Oct 13, 2019 •

edited

Loading

nex3 commented Oct 15, 2019

nex3 commented Oct 23, 2019

nex3 commented Dec 19, 2019

saper commented Dec 19, 2019 •

edited

Loading

nex3 commented Dec 30, 2019

mgreter commented Jan 17, 2020

nex3 commented Feb 24, 2020

mgreter commented Feb 25, 2020 •

edited

Loading

saper commented Feb 26, 2020 •

edited

Loading

nex3 commented Feb 27, 2020

Awjin commented Feb 28, 2020

mgreter commented Mar 1, 2020 •

edited

Loading

Awjin commented Mar 3, 2020 •

edited

Loading

saper commented Mar 4, 2020 •

edited

Loading

mgreter commented Mar 4, 2020 •

edited

Loading

mgreter commented Mar 7, 2020 •

edited

Loading

mgreter commented Mar 7, 2020

saper commented Mar 7, 2020

xzyfer commented Mar 7, 2020 via email

Move libsass specific tests to own repository #1484

Move libsass specific tests to own repository #1484

Conversation

mgreter commented Oct 13, 2019 • edited Loading

nex3 commented Oct 15, 2019

nex3 commented Oct 23, 2019

nex3 commented Dec 19, 2019

saper commented Dec 19, 2019 • edited Loading

nex3 commented Dec 30, 2019

mgreter commented Jan 17, 2020

nex3 commented Feb 24, 2020

mgreter commented Feb 25, 2020 • edited Loading

saper commented Feb 26, 2020 • edited Loading

nex3 commented Feb 27, 2020

Awjin commented Feb 28, 2020

mgreter commented Mar 1, 2020 • edited Loading

Awjin commented Mar 3, 2020 • edited Loading

saper commented Mar 4, 2020 • edited Loading

mgreter commented Mar 4, 2020 • edited Loading

mgreter commented Mar 7, 2020 • edited Loading

mgreter commented Mar 7, 2020

saper commented Mar 7, 2020

xzyfer commented Mar 7, 2020 via email

mgreter commented Oct 13, 2019 •

edited

Loading

saper commented Dec 19, 2019 •

edited

Loading

mgreter commented Feb 25, 2020 •

edited

Loading

saper commented Feb 26, 2020 •

edited

Loading

mgreter commented Mar 1, 2020 •

edited

Loading

Awjin commented Mar 3, 2020 •

edited

Loading

saper commented Mar 4, 2020 •

edited

Loading

mgreter commented Mar 4, 2020 •

edited

Loading

mgreter commented Mar 7, 2020 •

edited

Loading