Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added module for ztest and testing as well #13938

Closed
wants to merge 1 commit into from

Conversation

cassielayden
Copy link

Reference issue

Progress made on feature request #13662

What does this implement/fix?

Implementation adds a module for performing the z tests and a test suite for the Z-test module as well.
Z-Test module is at scipy/stats/ztest.py
Test module is at scipy/stats/tests/test_ztest.py

Additional information

The implementation is not complete. The functions in the added module are correct and working when provided the input outlines. Comments are included throughout the code that explain how and where input should be taken in a user interface. Feel free to contact me with more questions about this.
Additionally, I will attempt to implement the interface as I see possible, however I do believe another developer may be better fit for this task.

@tupui
Copy link
Member

tupui commented Apr 26, 2021

Thanks @cassielayden for submitting a PR. I am afraid I will have to close it for a few reasons.

From the issue this PR is addressing, @charlotte12l already expressed some interest in working on this. While there is no formal assignment of issues/PR, we are all volunteers and try not to re-do the same work. We certainly don't want to have contributors competing on PRs. Instead we prefer to share the work load by discussing in issues or on the mail list before.

Thus, I would invite you to contact @charlotte12l and see how you can be of help if this topic interests you. In the present case, a new statistical test, we also would want to design if in a way we could reuse existing components from other existing tests. In terms of maintainability, we would not want to include an implementation which would be totally contained here.

Before working on something else, I would also invite you to read our contributing documentation. It's a long document, but it's really important that you understand it and follow it. There are valuable informations such as how to format your code and tests. In the end we will have to respect this document from start to finish before considering including any contribution. As for other issues you could have a look at, we do have a label good first issue.

Thank you for your understanding and your willingness to contribute to SciPy. While this PR was not successful, I do hope that you would consider contributing again.

@tupui tupui closed this Apr 26, 2021
@ilayn
Copy link
Member

ilayn commented Apr 26, 2021

It's better to first check if @charlotte12l is actually working on it. Looks like it is stalled over there so it doesn't have to be closed in case they don't ping back.

@rgommers
Copy link
Member

Thanks for reopening @ilayn. @tupui this PR needs some work but it's totally valid to work on this. I suggest only closing a PR if you're really certain that's the right thing to do (like, it stalled for a year and someone else took it over in a new PR, or the bug it addressed was already fixed.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few high-level comments to get started.

I suggest first writing the public function, something like

def ztest(...):
    """
    Docstring here
    """
    ...

# caused and on any theory of liability, whether in contract, strict
# liability or tort (including negligence or otherwise) arising in any way
# out of the use of this software, even if advised of the possibility of
# such damage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new file, so it does not need to have this copyright copied over from another file. Please remove this block comment.

siegelslopes)
from ._stats import (_kendall_dis, _toint64, _weightedrankedtau,
_local_correlations)
from dataclasses import make_dataclass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please import only the functions and modules you are actually using to implement the Z test functionality. Running flake8 on this file will show you unnecessary imports.

'tiecorrect', 'ranksums', 'kruskal', 'friedmanchisquare',
'rankdata',
'combine_pvalues', 'wasserstein_distance', 'energy_distance',
'brunnermunzel', 'alexandergovern']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, just add __all__ = ['ztest'] (if that's the function name).

'rankdata',
'combine_pvalues', 'wasserstein_distance', 'energy_distance',
'brunnermunzel', 'alexandergovern']
_one_sample_z_table = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment about where these values come from?

Also, is 4 digits precise enough, and is there not a way to calculate these values instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the cdf of a normal. Why not use stats.norm.cdf?

@charlotte12l
Copy link
Contributor

@tupui @ilayn @rgommers Sorry for the late reply. I'm currently not working on it because there is not much interest after I mailed the mailinglist about this, so I added this to the end of my GSoC timeline.
@cassielayden I read your codes and I'd recommend taking a look at gh-12873 for a good example of implementing a new hypothesis test(Thanks @mdhaber for mentioning the example in #13662 ).
Moreover, maybe it is better to return the p-values rather than 0/1 to make it consistent with other tests? You can use _normtest_finish to get the p-value from the Z statistic.
Finally, z-test is quite similar to t-test, I think you can always refer to t-test implementation for naming, code structures, tests, logics, etc. :)

@tupui
Copy link
Member

tupui commented Apr 28, 2021

@tupui @ilayn @rgommers Sorry for the late reply. I'm currently not working on it because there is not much interest after I mailed the mailinglist about this, so I added this to the end of my GSoC timeline.

No worries, and thanks for helping with this specific information 😃

@mdhaber mdhaber closed this Sep 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement scipy.stats
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants