Working on documentation for new methods, pushing since I'm taking lunch

researchpy · Mar 10, 2022 · 3ee235b · 3ee235b
1 parent 3f616b4
commit 3ee235b
Show file tree

Hide file tree

Showing 6 changed files with 187 additions and 17 deletions.
diff --git a/source/anova_documentation.rst b/source/anova_documentation.rst
@@ -32,7 +32,7 @@ anova methods
 
       * **return_type** : The type of data structure the results should be returned as. Supported options are 'Dataframe' which will return a Pandas DataFrame or 'Dictionary' which will return a dictionary.
       * **decimals** : The number of decimal places the data should be rounded too.
-      * **pretty_format ** : If pretty formatting should be applied. This adds extra empty spaces in the returned data structure for visualization of the results.
+      * **pretty_format** : If pretty formatting should be applied. This adds extra empty spaces in the returned data structure for visualization of the results.
 
   * **regression_table(return_type = "Dataframe", decimals = 4, conf_level = 0.95)**
 
@@ -93,7 +93,7 @@ called 'systolic'.
 
 .. code:: python
 
-  import researchpy as rp
+ import researchpy as rp
   import pandas as pd
   # Used to load example data #
   import statsmodels.datasets

diff --git a/source/difference_test_documentation.rst b/source/difference_test_documentation.rst
@@ -201,12 +201,11 @@ that N is the total number of observations.
 Rank-Biserial correlation coefficient r (between or within subjects design)
 """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
 The following formula is used to calculate the Rank-Biserial
-correlation coefficient r using the W-value and N. This formula
-is used to calculate the r coefficient for the Wilcoxon ranked-sign test.
+correlation coefficient r :cite:`Kerby2012` for the Wilcoxon ranked-sign test.
 
 .. math::
 
-  \text{Rank-Biserial r} = \frac{W}{\sum{\text{rank}}}
+  \text{Rank-Biserial r = } \frac{\sum{Ranks}_{+} - \sum{Ranks}_{-}}{\sum{Ranks}_{total}}
 
 
 
@@ -274,7 +273,7 @@ Now the data is in the correct structure.
 
     # If you don't store the 2 returned DataFrames, it outputs as a tuple and
     # is displayed
-    difference_test("StressReactivity ~ C(Exercise)",
+    rp.difference_test("StressReactivity ~ C(Exercise)",
                     data = df2,
                     equal_variances = True,
                     independent_samples = True).conduct(effect_size = "all")
@@ -302,7 +301,7 @@ Now the data is in the correct structure.
 .. code:: python
 
     # Otherwise you can store them as objects
-    summary, results = difference_test("StressReactivity ~ C(Exercise)",
+    summary, results = rp.difference_test("StressReactivity ~ C(Exercise)",
                                        data = df2,
                                        equal_variances = True,
                                        independent_samples = True).conduct(effect_size = "all")
@@ -343,7 +342,7 @@ Now the data is in the correct structure.
 .. code:: python
 
     # Paired samples t-test
-    summary, results = difference_test("StressReactivity ~ C(Exercise)",
+    summary, results = rp.difference_test("StressReactivity ~ C(Exercise)",
                                        data = df2,
                                        equal_variances = True,
                                        independent_samples = False).conduct(effect_size = "all")
@@ -383,7 +382,7 @@ Now the data is in the correct structure.
 .. code:: python
 
     # Welch's t-test
-    summary, results = difference_test("StressReactivity ~ C(Exercise)",
+    summary, results = rp.difference_test("StressReactivity ~ C(Exercise)",
                                        data = df2,
                                        equal_variances = False,
                                        independent_samples = True).conduct(effect_size = "all")
@@ -424,7 +423,7 @@ Now the data is in the correct structure.
 .. code:: python
 
     # Wilcoxon signed-rank test
-    summary, results = difference_test("StressReactivity ~ C(Exercise)",
+    summary, results = rp.difference_test("StressReactivity ~ C(Exercise)",
                                        data = df2,
                                        equal_variances = False,
                                        independent_samples = False).conduct(effect_size = "r")

diff --git a/source/refs.bib b/source/refs.bib
@@ -1,5 +1,29 @@
 % Encoding: UTF-8
 
+
+@article{Kerby2012,
+  author  = {Dave S. Kerby},
+  title   = {The Simple Difference Formula: An Approach to Teaching Nonparametric Correlation},
+  journal = {Innovative Teaching},
+  year    = {2014},
+  volume  = {3},
+  number  = {1},
+  note    = {10.2466/11.IT.3.1}
+}
+
+@article{Fritz_Morris_Richler2012,
+  author  = {Catherine O. Fritz, Peter E. Morris, and Jennifer J. Richler},
+  title   = {Effect Size Estimates: Current Use, Calculations, and Interpretation},
+  journal = {Journal of Experimental Psychology: General},
+  year    = {2012},
+  volume  = {141},
+  number  = {1},
+  pages   = {2-18},
+  note    = {10.1037/a0024338}
+}
+
+
+
 @book{grissomkim2012,
   title     = {Effect Sizes for Research: Univariate and Multivariate Applications},
   publisher = {Routledge},
@@ -63,7 +87,7 @@ @inbook{hedges1985
   note      = {10.2307/1164953}
 }
 
-@book{ kline2004,
+@book{kline2004,
   title     = {Beyond significance testing: Reforming data analysis methods in behavioral research},
   publisher = {American Psychological Association},
   year      = {2004},

diff --git a/source/signrank_documentation.rst b/source/signrank_documentation.rst
@@ -0,0 +1,147 @@
+*************
+signrank()
+*************
+
+Description
+===========
+Conducts the Wilcoxon signed-ranks test for paired-sample data. Data can be entered using the formula_like structure,
+or by passing two array like structures. How to use both of these approaches will be
+demonstrated. The results can be returned as Pandas DataFrame object (default) or as a Python
+dictionary object.
+
+The data model is passed to *signrank* and then the *conduct* method needs to be applied. This
+method returns 3 data objects within a tuple.
+
+
+
+Parameters
+==========
+
+Input
+-----
+**signrank(formula_like = None, data = {}, group1 = None, group2 = None, zero_method = "pratt", correction = False, mode = "auto")**
+
+  * **formula_like** : A valid formula which will parse the data into a design matrix.
+  * **data** : The dataframe which contains the data to be analyzed; required if using *formula_like*.
+  * **group1** : The array like object which contains data for the paired-sample.
+  * **group2** : The array like object which contains data for the paired-sample.
+  * **zero_method** : How to handle the zero-differences in the ranking process. Available options are (from :cite:`scipy_wilcoxon`):
+    * *"pratt"* : Includes zero-differences in the ranking process, but drops the ranks of the zeros (default).
+    * *"wilcox"* : Discards all zero-differences.
+  * **correction** : Boolean value indicating if the continuity correction should be applied; see :cite:`scipy_wilcoxon` for more information.
+  * **mode** : Method to calculate the p-value, see :cite:`scipy_wilcoxon` for more information. Options are:
+    * *"auto"* : Use the exact distribution if there are no more than 25 observations and no ties, otherwise a normal approximation will be used (default).
+    * *"exact"* : Use the exact distribution, can be used if there are no more than 25 observations and no ties.
+    * *"approx"* : Use a normal approximation.
+
+
+Returns
+-------
+Returns an object with class "signrank"; this object has an accessible method which is described below.
+
+signrank methods
+^^^^^^^^^^^^^^^^
+
+  * **conduct(return_type = "Dataframe", effect_size = [])**
+
+      * **return_type** : The type of data structure the results should be returned as. Supported options are 'Dataframe' which will return a Pandas DataFrame or 'Dictionary' which will return a dictionary.
+      * **effect_size** : A list object which indicates which effect size measures should be calculated. Available options are:
+        * *pd* : Calculates the Rank-Biserial r coefficient.
+        * *pearson* : Calculates the Pearson r coefficient.
+
+      After using the conduct method three objects will be returned within a tuple. The first object
+      provides descriptive information regarding the ranks, the second object contains the adjustment information,
+      and the third object contains the test results.
+
+
+
+Effect size measures formulas
+=============================
+By default no effect size measures are calculated.
+
+Rank-Biserial r :cite:`Kerby2012`
+""""""""""""""""""""""""""""""""""""""""""""""""
+.. math::
+
+  \text{Rank-Biserial r = } \frac{\sum{Ranks}_{+} - \sum{Ranks}_{-}}{\sum{Ranks}_{total}}
+
+Pearson r :cite:`Fritz_Morris_Richler2012`
+""""""""""""""""""""""""""""""""""""""""""
+.. math::
+
+  \text{Pearson r = } \frac{Z}{\sqrt{N}}
+
+Where N is the total number of observations included in the model.
+
+
+
+Examples
+========
+
+Loading Packages and Data
+-------------------------
+First to load required libraries for this example. Below, an example data set will be loaded
+in using statsmodels.datasets; the data loaded in is a data set available through Stata
+called 'fuel'.
+
+.. code:: python
+
+ import researchpy as rp
+  import pandas as pd
+  # Used to load example data #
+  import statsmodels.datasets
+
+  fuel = statsmodels.datasets.webuse('fuel')
+  fuel["id"] = range(1, fuel.shape[0] + 1)
+  fuel.info()
+
+.. raw:: html
+
+  <table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th>mpg1</th>      <th>mpg2</th>      <th>id</th>    </tr>  </thead>  <tbody>    <tr>      <td>20.0000</td>      <td>24.0000</td>      <td>1</td>    </tr>    <tr>      <td>23.0000</td>      <td>25.0000</td>      <td>2</td>    </tr>    <tr>      <td>21.0000</td>      <td>21.0000</td>      <td>3</td>    </tr>    <tr>      <td>25.0000</td>      <td>22.0000</td>      <td>4</td>    </tr>    <tr>      <td>18.0000</td>      <td>23.0000</td>      <td>5</td>    </tr>  </tbody></table>
+
+
+The data is currently in a wide structure where each column, mpg1 and mpg2, represent a value for the same ID. This format
+is supported by signrank. The long format structure is also supported using the *formula_like* approach, in order to have
+the data ready for this demonstration section the transformation will be conducted here.
+
+.. code:: python
+
+  fuel2 = pandas.melt(fuel, id_vars = "id",
+                       value_vars = ["mpg1", "mpg2"],
+                       var_name = "mpg")
+
+   fuel2.head()
+
+.. raw:: html
+
+  <table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th>id</th>      <th>mpg</th>      <th>value</th>    </tr>  </thead>  <tbody>    <tr>      <td>1</td>      <td>mpg1</td>      <td>20.0000</td>    </tr>    <tr>      <td>2</td>      <td>mpg1</td>      <td>23.0000</td>    </tr>    <tr>      <td>3</td>      <td>mpg1</td>      <td>21.0000</td>    </tr>    <tr>      <td>4</td>      <td>mpg1</td>      <td>25.0000</td>    </tr>    <tr>      <td>5</td>      <td>mpg1</td>      <td>18.0000</td>    </tr>  </tbody></table>
+
+
+
+Signrank using Wide Structure Datasets
+---------------------------------------
+Since the test returns 3 data objects, this demonstration will assign each data object to variable. This is not required, but
+it makes the output look cleaner.
+
+.. code:: python
+
+  desc, var_adj, res = signrank(group1 = fuel.mpg1, group2 = fuel.mpg2).conduct()
+
+   print(desc, var_adj, res, sep = "\n"*2)
+
+.. raw:: html
+
+  <table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th>sign</th>      <th>obs</th>      <th>sum ranks</th>      <th>expected</th>    </tr>  </thead>  <tbody>    <tr>      <td>positive</td>      <td>3</td>      <td>13.5000</td>      <td>38.5000</td>    </tr>    <tr>      <td>negative</td>      <td>8</td>      <td>63.5000</td>      <td>38.5000</td>    </tr>    <tr>      <td>zero</td>      <td>1</td>      <td>1.0000</td>      <td>1.0000</td>    </tr>    <tr>      <td>all</td>      <td>12</td>      <td>78.0000</td>      <td>78.0000</td>    </tr>  </tbody></table>
+
+  <table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th>unadjusted variance</th>      <th>adjustment for ties</th>      <th>adjustment for zeros</th>      <th>adjusted variance</th>    </tr>  </thead>  <tbody>    <tr>      <td>162.5000</td>      <td>-1.6250</td>      <td>-0.2500</td>      <td>160.6250</td>    </tr>  </tbody></table>
+
+  <table border="1" class="dataframe">  <thead>    <tr style="text-align: right;">      <th>z</th>      <th>w</th>      <th>pval</th>    </tr>  </thead>  <tbody>    <tr>      <td>-1.9726</td>      <td>13.5000</td>      <td>0.0485</td>    </tr>  </tbody></table>
+
+
+If one does not assign each object to a variable, the output is still readable.
+
+.. code:: python
+
+  signrank(group1 = fuel.mpg1, group2 = fuel.mpg2).conduct()
+
+.. raw:: literal
diff --git a/source/summarize_documentation.rst b/source/summarize_documentation.rst
@@ -94,7 +94,7 @@ First demonstration will show how to get descriptive statistics for a single var
 
 .. code:: python
 
-  summarize(auto.price)
+  rp.summarize(auto.price)
 
 .. raw:: html
 
@@ -108,7 +108,7 @@ Now let's get information from 2 variables at the same time.
 
 .. code:: python
 
-  summarize(auto[["price", "mpg"]])
+  rp.summarize(auto[["price", "mpg"]])
 
 .. raw:: html
 
@@ -126,7 +126,7 @@ Pandas Series Groupby Object
 
 .. code:: python
 
-  summarize(auto.groupby("foreign")["price"])
+  rp.summarize(auto.groupby("foreign")["price"])
 
 
 .. raw:: html
@@ -139,7 +139,7 @@ Pandas Dataframe Groupby Object
 
 .. code:: python
 
-  summarize(auto.groupby(["foreign"])[["price", "mpg"]])
+  rp.summarize(auto.groupby(["foreign"])[["price", "mpg"]])
 
 .. raw:: html
 

diff --git a/source/ttest_documentation.rst b/source/ttest_documentation.rst
@@ -180,12 +180,12 @@ that N is the total number of observations.
 
 Rank-Biserial correlation coefficient r (between or within subjects design)
 ---------------------------------------------------------------------------
-The Rank-Biserial r is also provided for the Wilcoxon signed-rank test as is
+The Rank-Biserial r :cite:`Kerby2012` is also provided for the Wilcoxon signed-rank test as is
 calculated as:
 
 .. math::
 
-  \text{Rank-Biserial r} = \frac{W}{\sum{\text{rank}}}
+  \text{Rank-Biserial r = } \frac{\sum{Ranks}_{+} - \sum{Ranks}_{-}}{\sum{Ranks}_{total}}