### Example 1 - Test for normal distribution

### 1.1 What to do?
First of all we need data. For example load the exemplary data from example 1 (ex1.csv) of synthetic height and age data.
Now the goal is to test if these data are normal distributed.
We will do this by applying the statsmed.stdnorm_test() test to the input data.
Apperently for the height data the tests do not indicate a significant difference from a normal distribution, but 
for the age data the Shapiro-Wilk-Test is significant for no normal distribution.

For comparision, height data were generated with an underlying normal distribution and age data by a beta distribution.

If applied to your own data please be careful with possible NaN or None values in your data.

Output of the statsmed.stdnorm_test() is an array with:\
&emsp;0 if both tests do not indicate a significant difference from a normal distribution and 1 if at least ones does,\
&emsp;0 if Shapiro-Wilk-Test does not indicate a significant difference from a normal distribution and 1 if does,\
&emsp;0 if Kolmogorow-Smirnow-Test does not indicate a significant difference from a normal distribution and 1 if does,\
&emsp;Test-statistic of Shapiro-Wilk-Test,\
&emsp;p-value of Shapiro-Wilk-Test,\
&emsp;test-statistic of Kolmogorow-Smirnow-Test,\
&emsp;p-value of Kolmogorow-Smirnow-Test\
This allows you do multiple tests and save the results in a new array.

In [4]:
import pandas
from statsmed import statsmed
data = pandas.read_csv('ex1.csv',delimiter=',',on_bad_lines='skip')

print(data)

print('Testing normal distribution of height data:')
print(statsmed.stdnorm_test(data['height']))
print('\n')
print('Testing normal distribution of age data:')
print(statsmed.stdnorm_test(data['age']))

    Unnamed: 0  height   age
0            0    1.69  82.0
1            1    1.74  68.0
2            2    1.73  72.0
3            3    1.78  52.0
4            4    1.82  27.0
..         ...     ...   ...
95          95    1.84  22.0
96          96    1.83  44.0
97          97    1.65  27.0
98          98    1.88  45.0
99          99    1.71  20.0

[100 rows x 3 columns]
Testing normal distribution of height data:
Shapiro-Wilk: Normal dsitribution (p-value = 0.80 
 	 - p-value >= 0.05 indicates no significant difference from normal distribution)
Kolmogorow-Smirnow: Normal dsitribution (p-value = 0.85 
 	 - p-value >= 0.05 indicates no significant difference from normal distribution)
Both tests do not indicate a significant difference from a normal distribution
[0, 0, 0, 0.9916924834251404, 0.7985291481018066, 0.05967461548666875, 0.8474435848949414]


Testing normal distribution of age data:
Shapiro-Wilk: No normal dsitribution (p-value = 0.0031)
Kolmogorow-Smirnow: Normal dsitribution (

### 1.2 What to write?

In the statistical analysis section of a manuscript you may write:
"Normality was verified by quantification using Shapiro-Wilk test.", "Normality was verified by quantification using Kolmogorov-Smirnov test." or "Normality was verified by quantification using Shapiro-Wilk test and Kolmogorov-Smirnov test. If one test demonstrated significant no normal distribution, normal distribution of the data was rejected." depending on what you used.

### 1.3 Explanation
A normal distribution is an assumption for multiple tests like the student's t-test. Thus, we need to verify if the data is normal distributed. Therefore, the above described test can be used. A normal distribution has a bell-like shape see figure below for a normal probability distribution with variance 1:\
<img src="norm_pdf.png" alt="nlp" width="350"/>\
If your data are normal distributed, the histogram of the normalized data (subtract mean and divide by standard deviation) should be close to the bell-like shape. Like in the following histogram:\
<img src="norm_hist.png" alt="nlp" width="350"/>\
If the data is not normal distributed it should have a different shape.\
Both, the Shapiro-Wilk-Test and Kolmogorow-Smirnow-Test cannot confirm a normal distribution but they can tell you if it is significant not a normal distribution and you should reject the assumption of a normal distribution.

