Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I saw this post in the open issues, and upon closer inspection, realized that the z_score algorithm wasn't calculating z-scores correctly.
I re-implemented the z_score algorithm with the correct math, and then refactored it out into its own module. This effectively decouples the z_score calculation from the alternative class. The alternative class now implements its own z_score method which simply passes data along to the calculate method in the new module.
I also tweaked the z_score specs to work with the new implementation, and changed the UI output in the case that the experiment doesn't yet have sufficient data.
Finally, I updated the documentation with a section about statistical validity. It may make sense in future versions to implement a minimum sample size calculator in Split that hides significance data until the experiment has run its course, which would minimize false positives (and is best practices in A/B testing). Until then, it seems prudent to have some sort of warning or advice to users so that they are running statistically valid tests.
Happy to tweak things if need be.
Cheers,
Casey