New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there a style guide for Pester tests? #317

Closed
alx9r opened this Issue Apr 21, 2015 · 10 comments

Comments

Projects
None yet
4 participants
@alx9r
Member

alx9r commented Apr 21, 2015

I've been using Richard Berg's style guide for normal powershell code. It has helped a great deal in making my code more consistent, terse, idiomatic, and readable. Pester tests are so different from normal powershell code that that style guide doesn't help a bit for Pester tests.

I've learned some pretty time-consuming lessons with Pester that probably could have been avoided with an opinionated style guide. The main topics where I think I could have benefited from a strong opinion are as follows:

  • that multiple Assert-MockCalled's in a single It block ought to be avoided
  • how to deal with powershell's unreliable module auto-loading
  • scope and repetition of code across Describe blocks
  • scope and variables in -MockWith script blocks
  • scope and variables in -ParameterFilter script blocks
  • when to start a new Describe block vs continuing an existing Describe block
  • how to deal with the material differences in InModuleScope scope behavior when running a .Test.ps1 using F5 from ISE vs Invoke-Pester
  • tradeoffs between using plain-but-repetitive clearly-readable code, and terse-but-error-prone code in It blocks
  • when to use -Scope It instead of a Context block when using Assert-MockCalled

I'm sure someone with a greater corpus of tests has many more lessons learned that could be shared in a style guide.

Is there a style guide for Pester tests? If so where is it? If not, is this project the right place to create one?

@dlwyatt

This comment has been minimized.

Show comment
Hide comment
@dlwyatt

dlwyatt Apr 21, 2015

Member

I'm not sure where the "Single Should per It" guideline comes from. I don't see a problem with multiple assertions, so long as they combine to test a single concept. So, I might say that $variable should not be $null, then call a method on the $variable object, and compare the result to some expected value. I'm fine with that being in the same It block.

That said, I think it's a cool idea, and we could put it on the wiki here.

Member

dlwyatt commented Apr 21, 2015

I'm not sure where the "Single Should per It" guideline comes from. I don't see a problem with multiple assertions, so long as they combine to test a single concept. So, I might say that $variable should not be $null, then call a method on the $variable object, and compare the result to some expected value. I'm fine with that being in the same It block.

That said, I think it's a cool idea, and we could put it on the wiki here.

@alx9r

This comment has been minimized.

Show comment
Hide comment
@alx9r

alx9r Apr 21, 2015

Member

@dlwyatt Oops, that should have read "multiple Assert-MockCalled's per It" in order to avoid what @nohwnd calls "assertion roulette". I've changed it in the list.

Member

alx9r commented Apr 21, 2015

@dlwyatt Oops, that should have read "multiple Assert-MockCalled's per It" in order to avoid what @nohwnd calls "assertion roulette". I've changed it in the list.

@alx9r

This comment has been minimized.

Show comment
Hide comment
@alx9r

alx9r Apr 21, 2015

Member

@dlwyatt I expect that some aspects of a style guide would become topics of debate. Do you think those debates should occur here as issues?

Member

alx9r commented Apr 21, 2015

@dlwyatt I expect that some aspects of a style guide would become topics of debate. Do you think those debates should occur here as issues?

@dlwyatt

This comment has been minimized.

Show comment
Hide comment
@dlwyatt

dlwyatt Apr 21, 2015

Member

Yep, that's fine. We could also do it on the Google group, but I think it's okay to have the conversation here in an Issue log for now.

Member

dlwyatt commented Apr 21, 2015

Yep, that's fine. We could also do it on the Google group, but I think it's okay to have the conversation here in an Issue log for now.

@nohwnd

This comment has been minimized.

Show comment
Hide comment
@nohwnd

nohwnd Apr 21, 2015

Member

First of all testing is really subjective. What seems readable to you may seem awfully complicated to me. And the other way around. Setting a one-size-fit-all style is likely not possible, and I am not trying to do that. What I may try on the other hand is to set the foundation, and explain why I do things the way I do it. But at the same time don't take me as an authority, I am not one by any means. If you find your way to do things, and it makes sense for you for some reason, please argue with me.

Let's talk about why we test. We test because we need a simple set of boundaries that define a more complicated system. Coming up with simple tests and gradually refining them to define more complex systems is easy for us humans. Definitely easier than defining a complex system in a single swoop.

Saying that 1 + 1 = 2 is simple, we want it to be true because that is how we think the system should behave, and we can express what we want with a simple test:

1 + 1 | Should Be 2

We can run the test and if it succeeds, then the system likely works as defined.

In other tests we could also define that 3 + 5 = 8, 10 + 3 = 13 and many more. We are using other test cases to further define the expected behavior of the + operator. But notice one thing, every test is roughly as complex as the previous. Adding together three and five is pretty much as complicated as adding one and one. The tests are not becoming more complex, only the system under test (in this case the + operator) is becoming more complex as it needs to accomodate more use cases.

Why I am so sure that the one test case is not more complex than the other? Well, we technicians like to measure things, and so of course we can measure complexity as well, in this case we call it the cyclomatic complexity. In simple word, code has cyclomatic complexity of 1 if there is only single path through it. In such code there must not be any if or other construct like loop or switch. Cyclomatic complexity of 1 is what we are aiming for in test code. Such code is easy to understand, and easy to reason about because there are no ifs and whens to distract us.

Another ingredient of reliable test suite is to make the tests fail. The so called RED, GREEN, REFACTOR cycle. But failing the test just willy nilly is not enough. To demonstrate that, let's replace our example with this test, of a function that should return '1' no matter what:

Describe "Write-One" {
    It 'Outputs 1' { 
        # -- Arrange
        $expected = 1

        # -- Act
        $actual = Get-One

        # -- Assert
        $actual | Should Be $expected
    }
}

Currently it fails with CommandNotFoundException. Does that count as failed? No it does not! The assertion did not fail, a prerequisite of the test failed. To make it fail correctly. Add the function, make it return 200 and run the test again. Now it fails in the assertion, and you can proceed to implement the function.

When the test finally succeeds, you might consider trying to alter the SUT code in a such way that makes the test fail again. If it does not fail you are probably not 100 % sure why it worked in the first place. Take some time to investigate and likely add more test.

If you do that and you write a test that succeeded on first run don't panic, you just called a so called characterization test. Just make sure that you go back to the code and you change that one line of code that make the test (the assertion in that test!) fail. Then change your code back and run the tests again. When you perform this check, notice if you knew exactly which line to change, and how. If you did not, and you constantly have to try few lines before making the test fail, your code or your test are probably too complicated and might need a bit of improvement.

Lastly, a reliable test must be deterministic. If given the same input without changing anything else it should always fail (or succeed), but never alter between those two states. Making a test as simple as possible, and testing just a single aspect of the SUT at a time helps us achieve that. There is nothing more annoying than a test that occasionally fails and nobody knows why.

So to sum up: We are writing tests, because they are simple pieces of code that we saw fail for a single reason. And we also saw them succeed. At that point we were pretty sure why they failed and why they succeeded. This little piece of trust in the code then make us trust the whole system, because if every single piece of the system works correctly, the whole system must also work correctly.

So now to your actual questions:

  • that multiple Assert-MockCalled's in a single It block ought to be avoided
    A unit test should test only single aspect of the problem, for the reasons mentioned earlier, so any number of assertions larger than one should bu used with caution, not just Mock assertions. In any case try to avoid the not being sure why your test failed.
    A symptom of this is having 20 tests, and knowing that test 3 fails when any of tests 2, 7 or 10 fail. But with test 2 the first assertion in test 3 fails (that assertion tests presence of a file), and with test 7 the second assertion in test 3 fails, that one tests if there is a service running and similarly another assertion in test 3 fails when test 10 fails. That is assertion roulette. You can't simply tell what happened from looking at your failed tests.
  • how to deal with powershell's unreliable module auto-loading
    those are integration tests, those will always be difficult as there are so many unknowns, so you might expect some failed test suite runs. Identifying why the test run failed and making it easier for yourself to identify the same cause again is one of the challenges of engineering. I don't many co-dependent modules so I don't have any specific recommendation
  • scope and repetition of code across Describe blocks
    You can run any code in It just by defining a function, so I'd use that. But still keep the complexity of the code 1. In other words use the functions as nicely named containers for the code, but do not perform any logic there. What we should do is to provide you with mechanism that guarantees to run code after failed test. Although you can do it in a limited fashion like this:
    function SetupDatabase () {}
    function TeardownDatabase () {}

    SetupDatabase
    It "something db related" {}
    TeardownDatabase
  • scope and variables in -MockWith script blocks
    Not sure what are your use cases. But anything more complex than returning a value/object should be avioided.
  • scope and variables in -ParameterFilter script blocks
    This should contain a simple boolean condition using variables named as parameters of the mocked function (e.g. $Path for Get-ChildItem, if you do anything complex here you might rethink your strategy.
  • when to start a new Describe block vs continuing an existing Describe block
    I usually use a single describe for a single function
  • how to deal with the material differences in InModuleScope scope behavior when running a .Test.ps1 using F5 from ISE vs Invoke-Pester
    Not sure what you mean.
  • tradeoffs between using plain-but-repetitive clearly-readable code, and terse-but-error-prone code in It
    this is too subjective, but we can discuss it over few examples if you have some
  • when to use -Scope It instead of a Context block when using Assert-MockCalled
    I do it pretty much all the time, it isolates one test case from another, this is one of the features that should go in Pester 3 but we decided not to put it there because it seemed too limiting for our complicated test suites
Member

nohwnd commented Apr 21, 2015

First of all testing is really subjective. What seems readable to you may seem awfully complicated to me. And the other way around. Setting a one-size-fit-all style is likely not possible, and I am not trying to do that. What I may try on the other hand is to set the foundation, and explain why I do things the way I do it. But at the same time don't take me as an authority, I am not one by any means. If you find your way to do things, and it makes sense for you for some reason, please argue with me.

Let's talk about why we test. We test because we need a simple set of boundaries that define a more complicated system. Coming up with simple tests and gradually refining them to define more complex systems is easy for us humans. Definitely easier than defining a complex system in a single swoop.

Saying that 1 + 1 = 2 is simple, we want it to be true because that is how we think the system should behave, and we can express what we want with a simple test:

1 + 1 | Should Be 2

We can run the test and if it succeeds, then the system likely works as defined.

In other tests we could also define that 3 + 5 = 8, 10 + 3 = 13 and many more. We are using other test cases to further define the expected behavior of the + operator. But notice one thing, every test is roughly as complex as the previous. Adding together three and five is pretty much as complicated as adding one and one. The tests are not becoming more complex, only the system under test (in this case the + operator) is becoming more complex as it needs to accomodate more use cases.

Why I am so sure that the one test case is not more complex than the other? Well, we technicians like to measure things, and so of course we can measure complexity as well, in this case we call it the cyclomatic complexity. In simple word, code has cyclomatic complexity of 1 if there is only single path through it. In such code there must not be any if or other construct like loop or switch. Cyclomatic complexity of 1 is what we are aiming for in test code. Such code is easy to understand, and easy to reason about because there are no ifs and whens to distract us.

Another ingredient of reliable test suite is to make the tests fail. The so called RED, GREEN, REFACTOR cycle. But failing the test just willy nilly is not enough. To demonstrate that, let's replace our example with this test, of a function that should return '1' no matter what:

Describe "Write-One" {
    It 'Outputs 1' { 
        # -- Arrange
        $expected = 1

        # -- Act
        $actual = Get-One

        # -- Assert
        $actual | Should Be $expected
    }
}

Currently it fails with CommandNotFoundException. Does that count as failed? No it does not! The assertion did not fail, a prerequisite of the test failed. To make it fail correctly. Add the function, make it return 200 and run the test again. Now it fails in the assertion, and you can proceed to implement the function.

When the test finally succeeds, you might consider trying to alter the SUT code in a such way that makes the test fail again. If it does not fail you are probably not 100 % sure why it worked in the first place. Take some time to investigate and likely add more test.

If you do that and you write a test that succeeded on first run don't panic, you just called a so called characterization test. Just make sure that you go back to the code and you change that one line of code that make the test (the assertion in that test!) fail. Then change your code back and run the tests again. When you perform this check, notice if you knew exactly which line to change, and how. If you did not, and you constantly have to try few lines before making the test fail, your code or your test are probably too complicated and might need a bit of improvement.

Lastly, a reliable test must be deterministic. If given the same input without changing anything else it should always fail (or succeed), but never alter between those two states. Making a test as simple as possible, and testing just a single aspect of the SUT at a time helps us achieve that. There is nothing more annoying than a test that occasionally fails and nobody knows why.

So to sum up: We are writing tests, because they are simple pieces of code that we saw fail for a single reason. And we also saw them succeed. At that point we were pretty sure why they failed and why they succeeded. This little piece of trust in the code then make us trust the whole system, because if every single piece of the system works correctly, the whole system must also work correctly.

So now to your actual questions:

  • that multiple Assert-MockCalled's in a single It block ought to be avoided
    A unit test should test only single aspect of the problem, for the reasons mentioned earlier, so any number of assertions larger than one should bu used with caution, not just Mock assertions. In any case try to avoid the not being sure why your test failed.
    A symptom of this is having 20 tests, and knowing that test 3 fails when any of tests 2, 7 or 10 fail. But with test 2 the first assertion in test 3 fails (that assertion tests presence of a file), and with test 7 the second assertion in test 3 fails, that one tests if there is a service running and similarly another assertion in test 3 fails when test 10 fails. That is assertion roulette. You can't simply tell what happened from looking at your failed tests.
  • how to deal with powershell's unreliable module auto-loading
    those are integration tests, those will always be difficult as there are so many unknowns, so you might expect some failed test suite runs. Identifying why the test run failed and making it easier for yourself to identify the same cause again is one of the challenges of engineering. I don't many co-dependent modules so I don't have any specific recommendation
  • scope and repetition of code across Describe blocks
    You can run any code in It just by defining a function, so I'd use that. But still keep the complexity of the code 1. In other words use the functions as nicely named containers for the code, but do not perform any logic there. What we should do is to provide you with mechanism that guarantees to run code after failed test. Although you can do it in a limited fashion like this:
    function SetupDatabase () {}
    function TeardownDatabase () {}

    SetupDatabase
    It "something db related" {}
    TeardownDatabase
  • scope and variables in -MockWith script blocks
    Not sure what are your use cases. But anything more complex than returning a value/object should be avioided.
  • scope and variables in -ParameterFilter script blocks
    This should contain a simple boolean condition using variables named as parameters of the mocked function (e.g. $Path for Get-ChildItem, if you do anything complex here you might rethink your strategy.
  • when to start a new Describe block vs continuing an existing Describe block
    I usually use a single describe for a single function
  • how to deal with the material differences in InModuleScope scope behavior when running a .Test.ps1 using F5 from ISE vs Invoke-Pester
    Not sure what you mean.
  • tradeoffs between using plain-but-repetitive clearly-readable code, and terse-but-error-prone code in It
    this is too subjective, but we can discuss it over few examples if you have some
  • when to use -Scope It instead of a Context block when using Assert-MockCalled
    I do it pretty much all the time, it isolates one test case from another, this is one of the features that should go in Pester 3 but we decided not to put it there because it seemed too limiting for our complicated test suites

@nohwnd nohwnd added the Question label Apr 22, 2015

@nohwnd

This comment has been minimized.

Show comment
Hide comment
@nohwnd

nohwnd Apr 27, 2015

Member

@alx9r Any remarks?

Member

nohwnd commented Apr 27, 2015

@alx9r Any remarks?

@alx9r

This comment has been minimized.

Show comment
Hide comment
@alx9r

alx9r Apr 27, 2015

Member

@nohwnd I've rethought this over the last few days. I think the challenges I had getting Pester to work effectively were caused by, broadly-speaking, two categories of problems:

  1. Unreported/undiscussed/undocumented pitfalls, bugs, or limitations of Pester and/or Powershell. -- It's probably premature to bake workarounds to these into a style guide. I think I should just open github issues for these problems so we can establish what the best solutions for these are.
  2. My unawareness of the implications of how Pester behaves for the kinds of testing I'm using it for. -- I'm sure Pester can be used effectively for a multitude of different purposes that result in a multitude of different styles. More examples of clean, readable, maintainable tests for different testing situations would help here, but I'm not that confident a style guide would.

Perhaps somewhere in the the Pester tutorials there could be links to real-world projects that have real well-writeen Pester tests that newbies can read to get a sense for what works. Reading through the tests in @dlwyatt's PSGenericMethods helped me a bit. The tests in Pester itself might be an example of this, although the self-referential nature of those tests was a bit confusing for me at first.

Member

alx9r commented Apr 27, 2015

@nohwnd I've rethought this over the last few days. I think the challenges I had getting Pester to work effectively were caused by, broadly-speaking, two categories of problems:

  1. Unreported/undiscussed/undocumented pitfalls, bugs, or limitations of Pester and/or Powershell. -- It's probably premature to bake workarounds to these into a style guide. I think I should just open github issues for these problems so we can establish what the best solutions for these are.
  2. My unawareness of the implications of how Pester behaves for the kinds of testing I'm using it for. -- I'm sure Pester can be used effectively for a multitude of different purposes that result in a multitude of different styles. More examples of clean, readable, maintainable tests for different testing situations would help here, but I'm not that confident a style guide would.

Perhaps somewhere in the the Pester tutorials there could be links to real-world projects that have real well-writeen Pester tests that newbies can read to get a sense for what works. Reading through the tests in @dlwyatt's PSGenericMethods helped me a bit. The tests in Pester itself might be an example of this, although the self-referential nature of those tests was a bit confusing for me at first.

@nohwnd

This comment has been minimized.

Show comment
Hide comment
@nohwnd

nohwnd Apr 27, 2015

Member

@alx9r That's a great idea! I've asked people on Twitter to share links to their projects using Pester. I remember doing this while ago, but at that point only Pester and Chocolatey were probably using Pester :))

You are right about the tests in the framework, they might tend to be difficult to understand, and we have to do different kinds of hacks to test what we need to test.

Member

nohwnd commented Apr 27, 2015

@alx9r That's a great idea! I've asked people on Twitter to share links to their projects using Pester. I remember doing this while ago, but at that point only Pester and Chocolatey were probably using Pester :))

You are right about the tests in the framework, they might tend to be difficult to understand, and we have to do different kinds of hacks to test what we need to test.

@dlwyatt

This comment has been minimized.

Show comment
Hide comment
@dlwyatt

dlwyatt Apr 27, 2015

Member

I have quite a few examples of Pester test scripts in my personal repositories (and several more in DSC-related stuff that's mostly on PowerShell.org):

Some of these are older scripts that could use some cleaning up. For example, I'm in the habit now of assigning the result of the command to a variable, then using the variable in the call to Should, but many of my original Pester scripts didn't do that. This is particularly ugly with Should Throw, where the input can be a long script block.

Member

dlwyatt commented Apr 27, 2015

I have quite a few examples of Pester test scripts in my personal repositories (and several more in DSC-related stuff that's mostly on PowerShell.org):

Some of these are older scripts that could use some cleaning up. For example, I'm in the habit now of assigning the result of the command to a variable, then using the variable in the call to Should, but many of my original Pester scripts didn't do that. This is particularly ugly with Should Throw, where the input can be a long script block.

@it-praktyk

This comment has been minimized.

Show comment
Hide comment
@it-praktyk

it-praktyk Aug 18, 2017

Contributor

I assume that the issue can be closed now. I've labeled it 'Documentation' too - maybe in the future, someone moves parts of information from the thread/issue to the Pester wiki.

@adbertram, something for your book is here?

Contributor

it-praktyk commented Aug 18, 2017

I assume that the issue can be closed now. I've labeled it 'Documentation' too - maybe in the future, someone moves parts of information from the thread/issue to the Pester wiki.

@adbertram, something for your book is here?

@it-praktyk it-praktyk closed this Aug 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment