# Set Up

In [None]:
using DataFrames, Normalize

## Load Data

Get the directory of the desired dataset.
- Windows
    1. Hold the Shift key.
    2. Right-click the file.
    3. Click "Copy as path".
- Mac
    1. Right-click the file.
    2. Hold the Option key.
    3. Click "Copy <filename\> as Pathname".
- Linux
    1. Open Terminal.
    2. Run `readlink -f <filename>`.
    3. Copy the resulting path. 

Type it between the first set of quotes.

If the dataset is an Excel or OpenDocument Spreadsheet, type the sheet name between the second set of quotes.

In [None]:
df = tabular_to_dataframe("<dataset directory>", "<sheet name>")
describe(df)

Choose the variable(s) to analyze, including any group variables.

Prefix the variable name(s) with a colon between the innermost brackets.

In [None]:
sample = df[!, [:variable1, :variable2,]]
describe(sample)

## Clean Data

Replace `missing` with `new_value` in a variable of `sample`.

This will allow skewness and kurtosis to be calculated for a variable containing `missing` values.

Prefix the variable name with a colon in the parentheses.

In [None]:
new_value = NaN
replace_missing!(sample, :variable; new_value)

Replace blank values (i.e., " ") with `new_value` in a variable of `sample`, and format it to decimals (floats).

This will allow skewness and kurtosis to be calculated for a variable containing blank values.

Prefix the variable name with a colon in the parentheses.

In [None]:
new_value = NaN
sheetcol_to_float!(sample, :variable, blank_to=new_value)

## Group Data

Prefix any group variables with a colon between the innermost brackets.

If the below cell is run, replace `sample` in later cells with `gd`.

In [None]:
gd = groupby(sample, [:group1,]);

# Main

Display details about skewness and kurtosis of the data.

If the data has only one group and two dependent variables, type `dependent=true` after the trailing comma. The skewness and kurtosis of the difference between the dependent variables will be displayed.

In [None]:
print_skewness_kurtosis(sample,)

Attempt once to normalize the data for its skewness and kurtosis ratios to be within the range of ±`normal_ratio`.

If the data has only one group and two dependent variables, type `dependent=true` after the trailing comma. The difference between the dependent variables will be normalized.

In [None]:
normal_ratio = 2
results = normalize(sample; normal_ratio,)
print_findings(results)

Replace `NaN` with `missing` in the normal data, and export to a CSV file.

Type a new filename before `.csv` in the quotes. If a directory is not specified in the filename, then the file will be in the same location as this notebook.

If the data has only one group and two dependent variables, type `dependent=true` after the trailing comma. A column of zeros will also be created in the CSV file for dependent testing.

In [None]:
normal_to_csv("newfilename.csv", results,)

## Transformations

_min_ – minimum value in the data <br> 
_max_ – maximum value in the data

### Positive Skew

square root: $\sqrt{x}$


add and square root: $\sqrt{x + 1 - min}$


invert: $\frac{1}{x}$


add and invert: $\frac{1}{x + 1 - min}$


square and invert: $\frac{1}{x^2}$


add, square, and invert: $\frac{1}{x^2 + 1 - min^2}$


square root and invert: $\frac{1}{\sqrt{x}}$


add, square root, and invert: $\frac{1}{\sqrt{x + 1 - min}}$


square root, add, and invert: $\frac{1}{\sqrt{x} + 1 - \sqrt{min}}$


log base 10: $\log_{10}(x)$


add and log base 10: $\log_{10}(x + 1 - min)$


natural log: $\ln(x)$


add and natural log: $\ln(x + 1 - min)$

### Negative Skew

square: $x^2$

cube: $x^3$

antilog: $10^x$

reflect and invert: $\frac{1}{max + 1 - x}$

reflect and square root: $\sqrt{max + 1 - x}$

reflect and log base 10: $\log_{10}(max + 1 - x)$

### Stretch Skew

logit: $\log_{10}|\frac{x}{1 - x}|$

add and logit: $\log_{10}|\frac{x + 0.25}{1 - (x + 0.25)}|$