# See the Power and Function transformer notes for better understanding on how to handle skewness

- Power Transformer : http://localhost:8888/notebooks/fullstackdatascience/ML_Concepts/Feature_Engineering/Feature_transformation/Transformer/PowerTransformer_BoxCox_YeoJohnson.ipynb


- Function transformer : 
http://localhost:8888/notebooks/fullstackdatascience/ML_Concepts/Feature_Engineering/Feature_transformation/Transformer/Function_Transformer.ipynb

___
___
___

__Why we need skewness or kurtosis?__

> The most frequently occurring type of data and probability distribution is the normal distribution. A symmetrical bell-shaped curve defines it. However, under the influence of significant causes, the normal distribution too can get distorted. This distortion can be calculated using skewness and kurtosis. 

## What Is a Normal Distribution?

A normal distribution is a continuous probability distribution for a random variable. A random variable is a variable whose value depends on the outcome of a random event. For example, flipping a coin will give you either heads or tails at random. You cannot determine with absolute certainty if the following outcome is a head or a tail. 


When you plot the probability of a random event, you get its probability distribution. The probability of a random variable that can take on any value is called a __continuous probability distribution.__ The number of values that the probability could be are infinite and form a continuous curve. Hence, instead of writing the probability values, you define the range in which they lie.


When the continuous probability distribution curve is bell-shaped, i.e., it looks like a hill with a well-defined peak, it is said to be a __Normal Distribution.__ 


- The peak of the curve is at the mean, and the data is symmetrically distributed on either side of it. 


- The mean, median, and mode are equal to each other or lie close to each other.

![normal_dist1.jpg](attachment:normal_dist1.jpg)

## Skewness

Skewness refers to the degree of symmetry, or more precisely, the degree of lack of symmetry. Distributions, or data sets, are said to be symmetric if they appear the same on both sides of a central point.

### What Is Skewness?

Skewness is used to measure the level of asymmetry in our graph. It is the measure of asymmetry that occurs when our data deviates from the norm. 


Sometimes, the normal distribution tends to tilt more on one side. This is because the probability of data being more or less than the mean is higher and hence makes the distribution asymmetrical. This also means that the data is not equally distributed. 

___Skewness informs users of the direction of outliers, though it does not tell users the number of outliers.___

#### The skewness can be on two types:

1. __Negatively Skewed (Left-skewed) :__ In a Negatively Skewed distribution, the data points are more concentrated towards the right-hand side of the distribution. This makes the mean, median, and mode bend towards the right. Hence these values are always negative. 

In this distribution, $$Mode\;>\;Median\;>\;Mean$$
<br></br>

2. __Positively Skewed (Right-skewed):__ In a distribution that is Positively Skewed, the values are more concentrated towards the right side, and the left tail is spread out. Hence, the statistical results are bent towards the left-hand side. Hence, that the mean, median, and mode are always positive. 

In this distribution,$$\;Mean\;>\;Median\;>\;Mode$$
<br></br>



![skewness11.png](attachment:skewness11.png)

#### Boxplot of skewed data:

- Left Skewed Boxplot

If the bulk of observations are on the high end of the scale, a boxplot is left skewed. Consequently, the left whisker is longer than the right whisker.


- Right Skewed Box Plot

If a box plot is skewed to the right, the box shifts to the left and the right whisker gets longer. As a result, the mean is greater than the median

![skew7-2.png](attachment:skew7-2.png)

<center><b>Boxplots do not show modes.</b></center>

### Pearson’s First Coefficient

The median is always the middle value, and the mean and mode are the extremes, so you can derive a formula to capture the horizontal distance between mean and mode.

![image.png](attachment:image.png)

The above formula gives you Pearson's first coefficient. Division by the standard deviation will help you scale down the difference between mode and mean. __This will scale down their values in a range of -1 to 1.__

Now understand the below relationship between mode, mean and median.

![image.png](attachment:image.png)

Substituting this in Pearson’s first coefficient gives us Pearson’s second coefficient and the formula for skewness:

![image.png](attachment:image.png)

#### If this value is between:


1. -0.5 and 0.5, the distribution of the value is almost symmetrical


2. -1 and -0.5, the data is negatively skewed, and if it is between 0.5 to 1, the data is positively skewed. The skewness is moderate.


3. If the skewness is lower than -1 (negatively skewed) or greater than 1 (positively skewed), the data is highly skewed.

#### How to check skewness in python :

![skew.PNG](attachment:skew.PNG)

## How to treat skewness?

## 1. Function transformers : 

### 1.1.  Log Transform : 


Log transformation is most likely the first thing you should do to remove skewness from the predictor.

__When to use:__

- can't use on negative values


- Should only be used on Right-Skewed Data


- By appling it, data becomes lineraly distributed (far values come closer).

Log converts Additive Scale into Multiplicative Scale

It can be easily done via Numpy, just by calling the log() function on the desired column. You can then just as easily check for skew:

## <span class="mark">Note : never use Log transformation for left skewed data. It will mess your data up!</span>

![image.png](attachment:image.png)

### 1.2. Square Root Transform (mostly not used)

The square root sometimes works great and sometimes isn’t the best suitable option. In this case, I still expect the transformed distribution to look somewhat exponential, but just due to taking a square root the range of the variable will be smaller.

You can apply a square root transformation via Numpy, by calling the sqrt() function. Here’s the code:

![image.png](attachment:image.png)

### 1.3. Square Transform $(x^2)$


$$Used\;only\;for\;Left-Skewed\;Data$$


$$never\;use\;Square\;transformation\;for\;right\;skewed\;data$$

## 2. Power Transformer :

### 2.1. Box-Cox Transform

This is the last transformation method I want to explore today. 

You should only know that it is just another way of handling skewed data. To use it, __your data must be positive (greater than zero)__ — so that can be a bummer sometimes.

You can import it from the Scipy library, but the check for the skew you’ll need to convert the resulting Numpy array to a Pandas Series:

![image.png](attachment:image.png)

### 2.2 Yeo - Johnson Transformer

__Can be applied on negative values too__


____
____

## Kurtosis

Kurtosis refers to the proportion of data that is heavy-tailed or light-tailed in comparison with a normal distribution.

#### What Is Kurtosis?

__Kurtosis is used to find the presence of outliers in our data.__ It gives us the total degree of outliers present.

#### Types of kurtosis : 

- The data can be heavy-tailed, and the peak can be flatter, almost like punching the distribution or squishing it. This is called __Negative Kurtosis (Platykurtic).__


- If the distribution is light-tailed and the top curve steeper, like pulling up the distribution, it is called __Positive Kurtosis (Leptokurtic).__

<br></br>

![kurtosis.jpg](attachment:kurtosis.jpg)

- __The expected value of kurtosis is 3.__ This is observed in a symmetric distribution. 


- A __kurtosis greater than three will indicate Positive Kurtosis.__ In this case, the value of kurtosis will range from 1 to infinity. 



- Further, a __kurtosis less than three will mean a negative kurtosis.__ The range of values for a negative kurtosis is from -2 to infinity. The greater the value of kurtosis, the higher the peak. 

![image.png](attachment:image.png)

__Now excess kurtosis will vary from -2 to infinity.__


__Excess Kurtosis for Normal Distribution = 3–3 = 0__


__The lowest value of Excess Kurtosis is when Kurtosis is 1 = 1–3 = -2__