diff --git a/docs/images/Variable_Transformation.png b/docs/images/Variable_Transformation.png index f3f2824c4..b9904d5ae 100644 Binary files a/docs/images/Variable_Transformation.png and b/docs/images/Variable_Transformation.png differ diff --git a/docs/images/reciprocal_transformer/reciprocal_transfomer_original.png b/docs/images/reciprocal_transformer/reciprocal_transfomer_original.png index 7fd41668f..de92ced54 100644 Binary files a/docs/images/reciprocal_transformer/reciprocal_transfomer_original.png and b/docs/images/reciprocal_transformer/reciprocal_transfomer_original.png differ diff --git a/docs/user_guide/transformation/ArcSinhTransformer.rst b/docs/user_guide/transformation/ArcSinhTransformer.rst index 56acaf3e8..dc016dfdb 100644 --- a/docs/user_guide/transformation/ArcSinhTransformer.rst +++ b/docs/user_guide/transformation/ArcSinhTransformer.rst @@ -5,15 +5,15 @@ ArcSinhTransformer ================== -The inverse hyperbolic sine (or arcsinh) transformation is a variance-stabilizing +The inverse hyperbolic sine (or arcsinh) transformation is a variance stabilising transformation that achieves results similar to the logarithmic transformation, while retaining zero values in a variable, something the logarithm cannot do. It has gained popularity in recent years; therefore, we add support for it in Feature-engine. -Variance stabilizing transformations +Variance stabilising transformations ------------------------------------ -Variance stabilizing transformations are commonly used in regression analysis to make +Variance stabilising transformations are commonly used in regression analysis to make skewed data more evenly distributed, approximate normality, or reduce heteroscedasticity. One of the most commonly used transformations is the logarithm. However, the logarithm transformation has one limitation: it is not defined for the value 0. @@ -23,7 +23,7 @@ the logarithm is undefined, researchers developed a number of alternatives to tr those zeros. The simplest alternative consists of adding 1 (or a constant value to the variable). In fact, -the Box-Cox transformation is a generalized version of power transformations that automatically +the Box-Cox transformation is a generalised version of power transformations that automatically introduces a shift in 0 valued observations before applying the logarithm. However, adding 1 (or a constant) before applying a log transformation is arbitrary and can @@ -42,8 +42,10 @@ The inverse hyperbolic sine (IHS) transformation is defined as follows: x' = \operatorname{arcsinh}(x) = \ln\left(x + \sqrt{x^2 + 1}\right) -The IHS transformation works with data defined on the whole real line including -negative values and zeros. For large values of x, the IHS behaves like a log +The IHS transformation works with data defined on the whole real space including +negative values and zeros. + +For large values of x, the IHS behaves like a log transformation. For small values of x, or in other words as x approaches 0, IHS(x) approaches x. @@ -187,7 +189,7 @@ In the bottom panels we see the effect of the inverse hyperbolic sine transforma The fundamental message of this experiment is that: -- Changing the variable scale will affect the variance stabilizing power of the IHS transformation +- Changing the variable scale will affect the variance stabilising power of the IHS transformation - Reducing the scale (multiplying by values <1) increases the separation of larger values from zero values (second panel), which is probably not what we want - Increasing the scale substantially, may also result in suboptimal distributions, as shown on the right panel @@ -285,7 +287,7 @@ negative values after the transformation (middle panel). Limitations of the IHS ---------------------- -As with all variance stabilizing transformations, the IHS comes with limitations, being, +As with all variance stabilising transformations, the IHS comes with limitations, being, the result of the transformation largely depends on the variable scale, by the own definition of the transformation. @@ -313,8 +315,8 @@ separation of larger values of the variable from 0. Unlike :class:`LogTransformer()`, :class:`ArcSinhTransformer()` can handle zero and negative values without requiring any preprocessing (or so we wanted to think). -Python demo ------------ +Python implementation +--------------------- In this demo, we'll show how to use the inverse hyperbolic sine transformation with care. @@ -414,7 +416,7 @@ variables: test_t.hist(bins=20, figsize=(8,4)) plt.show() -In the following figure, we see that while the arcsinh transformation seemed to stabilize the +In the following figure, we see that while the arcsinh transformation seemed to stabilise the variance of the variable profit, it does an awful job for the variable net-worth: .. image:: ../../images/arcsinh-ihs.png @@ -426,7 +428,7 @@ Scaling the distribution before arcsinh center and rescale data before transformation. We discussed previously that re-scaling the variables before applying the arcsinh transformation -can help achieve better variance stabilizing results. +can help achieve better variance stabilising results. Let's rescale the variable profit before applying the arcsinh transformation and then display the histogram of the resulting dataframe: @@ -456,7 +458,7 @@ Shifting the distribution before arcsinh ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We mentioned previously that shifting the variables before applying the arcsinh transformation -can help achieve better variance stabilizing results. +can help achieve better variance stabilising results. Let's shift the variable profit before applying the arcsinh transformation, to make all its values positive. After that, we display the histogram of the resulting dataframe: @@ -550,53 +552,15 @@ For more details on the inverse hyperbolic sine transformation, check the follow 3. `Burbidge, J. B., Magee, L., & Robb, A. L. (1988). Alternative transformations to handle extreme values of the dependent variable. Journal of the American Statistical Association. `_ 4. `Aihounton, Henningsen. (2020). Units of measurement and the inverse hyperbolic sine transformation. The Econometrics Journal. `_ -Tutorials, books and courses ----------------------------- - -For tutorials about variance stabilizing transformations, check out our online course: - -.. figure:: ../../images/feml.png - :width: 300 - :figclass: align-center - :align: left - :target: https://www.trainindata.com/p/feature-engineering-for-machine-learning - - Feature Engineering for Machine Learning - -| -| -| -| -| -| -| -| -| -| - -Or read our book: - -.. figure:: ../../images/cookbook.png - :width: 200 - :figclass: align-center - :align: left - :target: https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587 - - Python Feature Engineering Cookbook - -| -| -| -| -| -| -| -| -| -| -| -| -| - -Both our book and course are suitable for beginners and more advanced data scientists -alike. By purchasing them you are supporting Sole, the main developer of Feature-engine. \ No newline at end of file +Additional resources +-------------------- + +For tutorials about this and other feature engineering methods check out these resources: + +- `Feature Engineering for Machine Learning `_, online course. +- `Feature Engineering for Time Series Forecasting `_, online course. +- `Python Feature Engineering Cookbook `_, book. + +Both our book and courses are suitable for beginners and more advanced data scientists +alike. By purchasing them you are supporting `Sole `_, +the main developer of feature-engine. \ No newline at end of file diff --git a/docs/user_guide/transformation/ArcsinTransformer.rst b/docs/user_guide/transformation/ArcsinTransformer.rst index 7e6c512df..70a18059d 100644 --- a/docs/user_guide/transformation/ArcsinTransformer.rst +++ b/docs/user_guide/transformation/ArcsinTransformer.rst @@ -5,29 +5,35 @@ ArcsinTransformer ================= -The :class:`ArcsinTransformer()` applies the arcsin transformation to -numerical variables. - The arcsine transformation, also called arcsin square root transformation, or angular transformation, takes the form of arcsin(sqrt(x)) where x is a real number between 0 and 1. -The arcsin square root transformation helps in dealing with probabilities, -percentages, and proportions. +.. tip:: + + The arcsin square root transformation helps in dealing with probabilities, + percentages, and proportions. + +:class:`ArcsinTransformer()` applies the arcsin transformation to +numerical variables. + +.. note:: + + :class:`ArcsinTransformer()` only works with numerical variables with values + between 0 and 1. If the variable contains a value outside of this range, the + transformer will raise an error. -The :class:`ArcsinTransformer()` only works with numerical variables with values -between 0 and 1. If the variable contains a value outside of this range, the -transformer will raise an error. +Python implementation +--------------------- -Example -~~~~~~~ +In this section, we'll show how to apply the arcsin square root transformation with +:class:`ArcsinTransformer()`. Let's load the breast cancer dataset from scikit-learn and separate it into train and test sets. .. code:: python - import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split @@ -43,8 +49,8 @@ test sets. # Separate data into train and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0) -Now we want to apply the arcsin transformation to some of the variables in the -dataframe. These variables values are in the range 0-1, as we will see in coming +We want to apply the arcsin transformation to some of the variables in the +dataframe. These variables' values are in the range 0-1, as we will see in coming histograms. First, let's make a list with the variable names: @@ -65,7 +71,7 @@ First, let's make a list with the variable names: 'worst symmetry', 'worst fractal dimension'] -Now, let's set up the arscin transformer to modify only the previous variables: +Now, let's set up the arscin transformer to modify the previous variables: .. code:: python @@ -74,9 +80,11 @@ Now, let's set up the arscin transformer to modify only the previous variables: # fit the transformer tf.fit(X_train) - -The transformer does not learn any parameters when applying the fit method. It does -check however that the variables are numericals and with the correct value range. + +.. note:: + + The transformer does not learn any parameters when applying the fit method. It does + check, however, that the variables are numericals and with the correct value range. We can now go ahead and transform the variables: @@ -86,20 +94,21 @@ We can now go ahead and transform the variables: train_t = tf.transform(X_train) test_t = tf.transform(X_test) -And that's it, now the variables have been transformed with the arscin formula. +That's it, now the variables have been transformed with the arscin formula. -Finally, let's make a histogram for each of the original variables to examine their -distribution: +Let's go ahead and check out the effect of the transformation on the variables' distribution. +We'll start by making a histogram for each of the original variable: .. code:: python # original variables X_train[vars_].hist(figsize=(20,20)) +You can see in the following image that the variables are skewed. Note +that all variables have values between 0 and 1: + .. image:: ../../images/breast_cancer_raw.png -You can see in the previous image that many of the variables are skewed. Note however, -that all variables had values between 0 and 1. Now, let's examine the distribution after the transformation: @@ -108,60 +117,22 @@ Now, let's examine the distribution after the transformation: # transformed variable train_t[vars_].hist(figsize=(20,20)) +In the following image, we see that many of the variables have a more Gaussian looking +shape after the transformation: .. image:: ../../images/breast_cancer_arcsin.png -You can see in the previous image that many variables have after the transformation a -more Gaussian looking shape. + Additional resources -------------------- -For more details about this and other feature engineering methods check out these resources: - - -.. figure:: ../../images/feml.png - :width: 300 - :figclass: align-center - :align: left - :target: https://www.trainindata.com/p/feature-engineering-for-machine-learning - - Feature Engineering for Machine Learning - -| -| -| -| -| -| -| -| -| -| - -Or read our book: - -.. figure:: ../../images/cookbook.png - :width: 200 - :figclass: align-center - :align: left - :target: https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587 - - Python Feature Engineering Cookbook - -| -| -| -| -| -| -| -| -| -| -| -| -| - -Both our book and course are suitable for beginners and more advanced data scientists -alike. By purchasing them you are supporting Sole, the main developer of Feature-engine. \ No newline at end of file +For tutorials about this and other feature engineering methods check out these resources: + +- `Feature Engineering for Machine Learning `_, online course. +- `Feature Engineering for Time Series Forecasting `_, online course. +- `Python Feature Engineering Cookbook `_, book. + +Both our book and courses are suitable for beginners and more advanced data scientists +alike. By purchasing them you are supporting `Sole `_, +the main developer of feature-engine. \ No newline at end of file diff --git a/docs/user_guide/transformation/BoxCoxTransformer.rst b/docs/user_guide/transformation/BoxCoxTransformer.rst index a20e9760f..63b8933ed 100644 --- a/docs/user_guide/transformation/BoxCoxTransformer.rst +++ b/docs/user_guide/transformation/BoxCoxTransformer.rst @@ -5,7 +5,7 @@ BoxCoxTransformer ================= -The Box-Cox transformation is a generalization of the power transformations family and is +The Box-Cox transformation is a generalisation of the power transformations family and is defined as follows: .. code:: python @@ -16,10 +16,10 @@ defined as follows: Here, y is the transformed data, x is the variable to transform and λ is the transformation parameter. -The Box Cox transformation is used to reduce or eliminate variable skewness and obtain +The Box-Cox transformation is used to reduce or eliminate variable skewness and obtain features that better approximate a normal distribution. -The Box Cox transformation evaluates commonly used transformations. When λ = 1 then we +The Box-Cox transformation evaluates commonly used transformations. When λ = 1 then we have the original variable, when λ = 0, we have the logarithm transformation, when λ = - 1 we have the reciprocal transformation, and when λ = 0.5 we have the square root. @@ -28,12 +28,14 @@ and selects the optimal value of the λ parameter, which is the one that returns transformation. The best transformation occurs when the transformed data better approximates a normal distribution. -The Box Cox transformation is defined for strictly positive variables. If your variables -are not strictly positive, you can add a constant or use the Yeo-Johnson transformation -instead. +.. note:: + The Box-Cox transformation is defined for strictly positive variables. If your variables + are not strictly positive, you can add a constant or use the Yeo-Johnson transformation + instead. -Uses of the Box Cox Transformation + +Uses of the Box-Cox Transformation ---------------------------------- Many statistical methods that we use for data analysis make assumptions about the data. @@ -45,8 +47,10 @@ When these assumptions are not met, we can't fully trust the results of our regr analyses. To make data meet the assumptions and improve the trust in the models, it is common practice in data science projects to transform the variables before the analysis. -In time series forecasting, we use the Box Cox transformation to make non-stationary time -series stationary. +.. tip:: + + In time series forecasting, we use the Box-Cox transformation to make non-stationary time + series stationary. References ---------- @@ -66,14 +70,14 @@ error. To apply this transformation to non-positive variables, you can add a con value. Alternatively, you can apply the Yeo-Johnson transformation with the :class:`YeoJohnsonTransformer()`. -Python code examples --------------------- +Python implementation +--------------------- -In this section, we will apply this data transformation to 2 variables of the Ames house +In this section, we will apply the Box-Cox transformation to 2 variables of the Ames house prices dataset. Let's start by importing the modules, classes and functions and then loading the house -prices dataset and separating it into train and test sets. +prices dataset and separating it into train and test sets: .. code:: python @@ -123,7 +127,7 @@ In the following output we see the predictor variables of the house prices datas [5 rows x 79 columns] -Let's inspect the distribution of 2 variables in the original data with histograms. +Let's inspect the distribution of 2 variables in the original data with histograms: .. code:: python @@ -134,8 +138,8 @@ In the following plots we see that the variables are non-normally distributed: .. image:: ../../images/nonnormalvars2.png -Now we apply the BoxCox transformation to the 2 indicated variables. First, we set up -the transformer and fit it to the train set, so that it finds the optimal lambda value. +Now we apply the Box-Cox transformation to the 2 indicated variables. First, we set up +the transformer and fit it to the train set, so that it finds the optimal lambda value: .. code:: python @@ -188,66 +192,15 @@ In the following plots we see that the variables are non-normally distributed, b .. image:: ../../images/nonnormalvars2.png -Tutorials, books and courses ----------------------------- - -You can find more details about the Box Cox transformation technique with the :class:`BoxCoxTransformer()` here: - -- `Jupyter notebook `_ - -For tutorials about this and other data transformation techniques and feature engineering -methods check out our online courses: - -.. figure:: ../../images/feml.png - :width: 300 - :figclass: align-center - :align: left - :target: https://www.trainindata.com/p/feature-engineering-for-machine-learning - - Feature Engineering for Machine Learning - -.. figure:: ../../images/fetsf.png - :width: 300 - :figclass: align-center - :align: right - :target: https://www.trainindata.com/p/feature-engineering-for-forecasting - - Feature Engineering for Time Series Forecasting - -| -| -| -| -| -| -| -| -| -| - -Or read our book: - -.. figure:: ../../images/cookbook.png - :width: 200 - :figclass: align-center - :align: left - :target: https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587 - - Python Feature Engineering Cookbook - -| -| -| -| -| -| -| -| -| -| -| -| -| - -Our book and courses are suitable for beginners and more advanced data scientists -alike. By purchasing them you are supporting Sole, the main developer of Feature-engine. \ No newline at end of file +Additional resources +-------------------- + +For tutorials about this and other feature engineering methods check out these resources: + +- `Feature Engineering for Machine Learning `_, online course. +- `Feature Engineering for Time Series Forecasting `_, online course. +- `Python Feature Engineering Cookbook `_, book. + +Both our book and courses are suitable for beginners and more advanced data scientists +alike. By purchasing them you are supporting `Sole `_, +the main developer of feature-engine. \ No newline at end of file diff --git a/docs/user_guide/transformation/LogCpTransformer.rst b/docs/user_guide/transformation/LogCpTransformer.rst index abd2ac4e4..344a7d8e7 100644 --- a/docs/user_guide/transformation/LogCpTransformer.rst +++ b/docs/user_guide/transformation/LogCpTransformer.rst @@ -20,12 +20,12 @@ You can enter the positive quantity to add to the variable as a dictionary, wher keys are the variable names, and the values are the constant to add to each variable. If you want to add the same value to all variables, you can pass an integer or float, instead. -Alternatively, the :class:`LogCpTransformer()` will find the necessary value to make all +Alternatively, :class:`LogCpTransformer()` will find the necessary value to make all values of the variable positive. For strictly positive variables, C will be 0, and the transformation will be log(x). -Python example --------------- +Python implementation +--------------------- Let's check out the functionality of :class:`LogCpTransformer()`. @@ -37,7 +37,6 @@ into train and test sets. .. code:: python - import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.datasets import fetch_california_housing @@ -79,8 +78,10 @@ before applying the logarithm transformation: {'MedInc': 0, 'HouseAge': 0} -In this case, the transformation applied by :class:`LogCpTransformer()` is the same as -using :class:`LogTransformer()` because these variables are strictly positive. +.. note:: + + In this example, the transformation applied by :class:`LogCpTransformer()` is the same as + using :class:`LogTransformer()` because these variables are strictly positive. We can now go ahead and transform the variables: @@ -94,7 +95,7 @@ Then we can plot the original variable distribution: .. code:: python - # un-transformed variable + # non-transformed variable X_train["MedInc"].hist(bins=20) plt.title("MedInc - original distribution") plt.ylabel("Number of observations") @@ -120,11 +121,10 @@ Transforming non-strictly positive variables ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Let's now show the functionality of :class:`LogCpTransformer()` with variables that contain -values lower or equal to 0. Let's load the diabetes dataset: +values lower than or equal to 0. Let's load the diabetes dataset: .. code:: python - import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.datasets import load_diabetes @@ -226,8 +226,9 @@ as follows: tf = LogCpTransformer(C=5) tf.fit(X_train) -In this case, all numerical variables will be transformed. We can find the variables that -will be transformed in the `variables_` attribute: +In this example, as we did not specify any variable, all numerical variables will be +transformed. We can find the variables that will be transformed in the `variables_` +attribute: .. code:: python @@ -280,57 +281,15 @@ And the constant values will be those from the dictionary: You can now apply `transform()` to transform all these variables. -Tutorials, books and courses ----------------------------- - -You can find more details about the :class:`LogCpTransformer()` here: - -- `Jupyter notebook `_ - -For tutorials about this and other data transformation methods, like the square root transformation, power transformations, the box cox transformation, check out our online course: - -.. figure:: ../../images/feml.png - :width: 300 - :figclass: align-center - :align: left - :target: https://www.trainindata.com/p/feature-engineering-for-machine-learning - - Feature Engineering for Machine Learning - -| -| -| -| -| -| -| -| -| -| - -Or read our book: - -.. figure:: ../../images/cookbook.png - :width: 200 - :figclass: align-center - :align: left - :target: https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587 +Additional resources +-------------------- - Python Feature Engineering Cookbook +For tutorials about this and other feature engineering methods check out these resources: -| -| -| -| -| -| -| -| -| -| -| -| -| +- `Feature Engineering for Machine Learning `_, online course. +- `Feature Engineering for Time Series Forecasting `_, online course. +- `Python Feature Engineering Cookbook `_, book. -Both our book and course are suitable for beginners and more advanced data scientists -alike. \ No newline at end of file +Both our book and courses are suitable for beginners and more advanced data scientists +alike. By purchasing them you are supporting `Sole `_, +the main developer of feature-engine. \ No newline at end of file diff --git a/docs/user_guide/transformation/LogTransformer.rst b/docs/user_guide/transformation/LogTransformer.rst index 6a8bd2bc2..ed2787e99 100644 --- a/docs/user_guide/transformation/LogTransformer.rst +++ b/docs/user_guide/transformation/LogTransformer.rst @@ -7,24 +7,30 @@ LogTransformer The log transformation is used to transform skewed data so that the values are more evenly distributed across the value range. -Some regression models, like linear regression, t-test and ANOVA, make assumptions about the data. When the assumptions are not met, we can't trust the results. Applying data transformations is common practice during regression analysis because it can help make the data meet those assumptions and hence obtain more reliable results. +Some regression models, like linear regression, t-test and ANOVA, make assumptions about the data. When the assumptions are not met, we can't trust the results. + +Applying data transformations is common practice during regression analysis because it can help make the data meet those assumptions and hence obtain more reliable results. The logarithm function is helpful for dealing with positive data with a right-skewed distribution. That is, those variables whose observations accumulate towards lower values. A common example is the variable income, with a heavy accumulation of values toward lower salaries. More generally, when data follows a log-normal distribution, then its log-transformed version approximates a normal distribution. -Other useful transformations are the square root transformation, power transformations and the box cox transformation. +Other useful transformations are the square root transformation, power transformations and the Box-Cox transformation. In statistical analysis, we can apply the logarithmic transformation to both the dependent variable (that is, the target) and the independent variables (that is, the predictors). These can help meet the linear regression model assumptions and unmask a linear relationship between predictors and response variable. -With Feature-engine, we can only log transform input features. You can easily transform the target variable by applying `np.log(y)`. +With feature-engine, we can only log transform input features. You can easily transform the target variable by applying `np.log(y)`. + +LogTransformer +-------------- + +:class:`LogTransformer()` applies the natural logarithm or the logarithm in base 10 to numerical variables. -The LogTransformer ------------------- +.. note:: -The :class:`LogTransformer()` applies the natural logarithm or the logarithm in base 10 to numerical variables. Note that the logarithm can only be applied to positive values. Thus, if the variable contains 0 or negative variables, this transformer will return and error. + Note that the logarithm can only be applied to positive values. Thus, if the variable contains 0 or negative variables, this transformer will return and error. -To transform non-positive variables you can add a constant to shift the data points towards positive values. You can do this from within the transformer by using :class:`LogCpTransformer()`. +To transform non-positive variables you can add a constant to shift the data points towards positive values. You can do this from by using :class:`LogCpTransformer()`. Python implementation --------------------- @@ -101,7 +107,9 @@ We want to apply the natural logarithm to these 2 variables in the dataset using With `fit()`, this transformer does not learn any parameters, but it checks that the variables you entered are numerical, or if no variable was entered, it will automatically find all numerical variables. -To apply the logarithm in base 10, pass `'10'` to the `base` parameter when setting up the transformer. +.. note:: + + To apply the logarithm in base 10, pass `'10'` to the `base` parameter when setting up the transformer. Now, we can go ahead and transform the data: @@ -123,7 +131,7 @@ In the following histograms we see that the natural log transformation helped ma Note that the transformed variable has a more Gaussian looking distribution. -If we want to recover the original data representation, with the method `inverse_transform`, the :class:`LogTransformer()` will apply the exponential function to obtain the variable in its original scale: +If we want to recover the original data representation, with the method `inverse_transform`, :class:`LogTransformer()` will apply the exponential function to obtain the variable in its original scale: .. code:: python @@ -137,60 +145,20 @@ In the following plots we see histograms showing the variables in their original .. image:: ../../images/nonnormalvars2.png -Following the transformations with scatter plots and residual analysis of the regression models helps understand if the transformations are useful in our regression analysis. - - -Tutorials, books and courses ----------------------------- - -You can find more details about the :class:`LogTransformer()` here: - -- `Jupyter notebook `_ - -For tutorials about this and other data transformation methods, like the square root transformation, power transformations, the box cox transformation, check out our online course: - -.. figure:: ../../images/feml.png - :width: 300 - :figclass: align-center - :align: left - :target: https://www.trainindata.com/p/feature-engineering-for-machine-learning - - Feature Engineering for Machine Learning +.. tip:: -| -| -| -| -| -| -| -| -| -| + Following the transformations with scatter plots and residual analysis of the regression models helps understand if the transformations are useful in our regression analysis. -Or read our book: -.. figure:: ../../images/cookbook.png - :width: 200 - :figclass: align-center - :align: left - :target: https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587 +Additional resources +-------------------- - Python Feature Engineering Cookbook +For tutorials about this and other feature engineering methods check out these resources: -| -| -| -| -| -| -| -| -| -| -| -| -| +- `Feature Engineering for Machine Learning `_, online course. +- `Feature Engineering for Time Series Forecasting `_, online course. +- `Python Feature Engineering Cookbook `_, book. -Both our book and course are suitable for beginners and more advanced data scientists -alike. By purchasing them you are supporting Sole, the main developer of Feature-engine. \ No newline at end of file +Both our book and courses are suitable for beginners and more advanced data scientists +alike. By purchasing them you are supporting `Sole `_, +the main developer of feature-engine. \ No newline at end of file diff --git a/docs/user_guide/transformation/PowerTransformer.rst b/docs/user_guide/transformation/PowerTransformer.rst index d35e6d1b0..c86881f87 100644 --- a/docs/user_guide/transformation/PowerTransformer.rst +++ b/docs/user_guide/transformation/PowerTransformer.rst @@ -10,7 +10,7 @@ variables into a more suitable shape for modeling. The transformation function i typically represented as :math:`x' = x^{\lambda}`, where :math:`x` is the original variable and :math:`\lambda` (lambda) is the transformation parameter. -These transformations help stabilize the variance, make the data adopt a more normal +These transformations help stabilise the variance, make the data adopt a more normal distribution-like shape, and/or improve the linearity of relationships. Use of Power transformations @@ -19,10 +19,10 @@ Use of Power transformations Power transformations are particularly useful for meeting the assumptions of statistical tests, and models that require linear relationships between variables and homoscedasticity (constant variance across values). They can also help in reducing -skewness in the data, i.e., by normalizing distributions. +skewness in the data, i.e., by normalising distributions. Power transformations differ from scalers in that they modify the distribution of the -data, typically to stabilize variance and normalize the distribution, whereas scalers +data, typically to stabilise variance and normalise the distribution, whereas scalers simply adjust the scale of the data without altering its underlying distribution. In short, power functions provide an excellent data analysis toolkit, especially for @@ -44,7 +44,7 @@ Which lambda should I choose? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The challenge of power transformations resides in finding the right lambda for the -transformation. In general, this consists of trial and error, or using generalization +transformation. In general, this consists of trial and error, or using generalisation functions like the Box-Cox or the Yeo-Johnson transformation. As general guidelines, if the variables are right-skewed we'd use lambda <1, and if the @@ -53,8 +53,8 @@ variables are left-skewed we'd use lambda >1. Box-Cox transformation ~~~~~~~~~~~~~~~~~~~~~~ -The Box-Cox transformation is a generalization of power transformations that finds -an optimal lambda to stabilize variance and make the data more normally distributed. +The Box-Cox transformation is a generalisation of power transformations that finds +an optimal lambda to stabilise variance and make the data more normally distributed. This transformation only accepts positive values. Feature-engine's :class:`BoxCoxTransformer()` applies the Box-Cox transformation. @@ -78,16 +78,16 @@ Feature-engine also provides the following power transformers: - :class:`ReciprocalTransformer` - :class:`ArcsinTransformer` -For more details about these variance stabilizing transformations, check the article -`Variance stabilizing transformations in machine learning `_. -Python example --------------- +Python implementation +--------------------- :class:`PowerTransformer()` applies power transformations to numerical independent -variables. We'll use the Ames House Prices' dataset to see it in action. +variables. We'll use the Ames House Prices dataset to see it in action. First, let's load the dataset and split it into train and test sets: .. code:: python @@ -111,7 +111,7 @@ First, let's load the dataset and split it into train and test sets: X, y, test_size=0.3, random_state=42 ) -Now, let's visualize the distribution of the `LotArea` variable: +Now, let's visualise the distribution of the `LotArea` variable: .. code:: python @@ -223,8 +223,8 @@ especially in algorithms that hinge on the assumption of data variability, like linear regression and other regression-based models. -Choosing lambda accordingly to the distribution -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Choosing lambda according to the distribution +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In this section, we'll further explore the impact of the lambda parameter for left- and right-skewed distributions. @@ -431,56 +431,12 @@ and transformed distributions, to ensure that we obtain the results we expect. Additional resources -------------------- -You can find more details about the :class:`PowerTransformer()` here: - -- `Jupyter notebook `_ - -For more details about this and other feature engineering methods -check out these resources: - - -.. figure:: ../../images/feml.png - :width: 300 - :figclass: align-center - :align: left - :target: https://www.trainindata.com/p/feature-engineering-for-machine-learning - - Feature Engineering for Machine Learning - -| -| -| -| -| -| -| -| -| -| +For tutorials about this and other feature engineering methods check out these resources: -Or read our book: - -.. figure:: ../../images/cookbook.png - :width: 200 - :figclass: align-center - :align: left - :target: https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587 - - Python Feature Engineering Cookbook - -| -| -| -| -| -| -| -| -| -| -| -| -| +- `Feature Engineering for Machine Learning `_, online course. +- `Feature Engineering for Time Series Forecasting `_, online course. +- `Python Feature Engineering Cookbook `_, book. -Both our book and course are suitable for beginners and more advanced data scientists -alike. By purchasing them you are supporting Sole, the main developer of Feature-engine. \ No newline at end of file +Both our book and courses are suitable for beginners and more advanced data scientists +alike. By purchasing them you are supporting `Sole `_, +the main developer of feature-engine. \ No newline at end of file diff --git a/docs/user_guide/transformation/ReciprocalTransformer.rst b/docs/user_guide/transformation/ReciprocalTransformer.rst index 850d427ca..c8e9c2085 100644 --- a/docs/user_guide/transformation/ReciprocalTransformer.rst +++ b/docs/user_guide/transformation/ReciprocalTransformer.rst @@ -5,18 +5,19 @@ ReciprocalTransformer ===================== -A reciprocal transformation involves replacing each data value x, with its reciprocal, 1/x​. This transformation is -useful for addressing heteroscedasticity, where the variability of errors in a regression model differs across values -of an independent variable, and for transforming skewed distributions into more symmetric ones. It can also linearize +A reciprocal transformation involves replacing each data value x, with its reciprocal, 1/x. + +This transformation is useful for addressing heteroscedasticity, where the variability of errors in a regression model differs across values +of an independent variable, and for transforming skewed distributions into more symmetric ones. It can also linearise certain nonlinear relationships, making them easier to model with linear regression, and improve the overall fit of a -linear model by reducing the influence of outliers or normalizing residuals. +linear model by reducing the influence of outliers or normalising residuals. Applications ------------ -The reciprocal transformation is useful for ratios, where the values of a variable result from the division of two v -ariables. Some examples include variables like student-teacher ratio (students per teacher) or crop yield (tons per acre). +The reciprocal transformation is useful for ratios, where the values of a variable result from the division of two +variables. Some examples include variables like student-teacher ratio (students per teacher) or crop yield (tons per acre). By calculating the inverse of these variables, we shift from representing students per teacher to teachers per student, or from tons per acre to acres per ton. This transformation still makes intuitive sense and can result in a better spread @@ -30,22 +31,26 @@ Properties - The inverse of the reciprocal transformation is also the reciprocal transformation - The range of the reciprocal function includes all real numbers except 0 -Although in theory, the reciprocal function is defined for both positive and negative values, in practice, it's mostly -used to transform strictly positive variables. +.. note:: + + Although in theory, the reciprocal function is defined for both positive and negative values, in practice, it's mostly + used to transform strictly positive variables. ReciprocalTransformer --------------------- -The :class:`ReciprocalTransformer` applies the reciprocal transformation to numerical variables. By default, it will +:class:`ReciprocalTransformer` applies the reciprocal transformation to numerical variables. By default, it will find and transform all numerical variables in the dataset. A better practice would be to apply the transformer to a selected group of variables, which you can do by passing a list with the variable names to the `variables` parameter when setting up the transformer. -If any of the variables contains 0 as value, the transformer will raise an error. +.. note:: + + If any of the variables contains 0 as value, the transformer will raise an error. -Python examples ---------------- +Python implementation +--------------------- In the next sections, we'll demonstrate how to apply the reciprocal transformation with :class:`ReciprocalTransformer`. @@ -55,7 +60,6 @@ garage. Next, we'll separate the data into train and test sets: .. code:: python import matplotlib.pyplot as plt - from sklearn.datasets import fetch_openml from sklearn.model_selection import train_test_split from feature_engine.transformation import ReciprocalTransformer @@ -98,9 +102,9 @@ Let's plot the distribution of the variable with the square foot area per car in In the following image we can see the skewness of the variable: -.. image:: ../../images/reciprocal_transformer/reciprocal_transfomer_original.png - -| +.. figure:: ../../images/reciprocal_transformer/reciprocal_transfomer_original.png + :align: center + :width: 350px Let's now apply the reciprocal transformation to this variable: @@ -122,8 +126,9 @@ Finally, let's plot the distribution after the reciprocal transformation: In the following image, we see that the reciprocal transformation made the variable's values follow more closer a symmetric or normal distribution: -.. image:: ../../images/reciprocal_transformer/reciprocal_transfomer_new.png - +.. figure:: ../../images/reciprocal_transformer/reciprocal_transfomer_new.png + :align: center + :width: 350px Inverse transformation ~~~~~~~~~~~~~~~~~~~~~~ @@ -147,16 +152,17 @@ Let's check out the reverted transformation: As you can see in the following image, we obtained the original data by re-applying the reciprocal function to the transformed variable: -.. image:: ../../images/reciprocal_transformer/reciprocal_transfomer_inverse.png - +.. figure:: ../../images/reciprocal_transformer/reciprocal_transfomer_inverse.png + :align: center + :width: 350px Pipeline of transformations ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -As we mentioned previously, the reciprocal transformation is suitable, in general for ratio variables, so we need to +As we mentioned previously, the reciprocal transformation is suitable, in general, for ratio variables, so we need to transform other variables in the data set with other type of transformations. -Let's not plot the distribution of the 3 variables in the original data to see which transformations could be suitable +Let's plot the distribution of the 3 variables in the original data to see which transformations could be suitable for them: .. code:: python @@ -172,16 +178,13 @@ distribution), and `GarageArea` is a continuous variable: | Let's then create a pipeline to apply the square root transformation to `GarageCounts` and the Box-Cox transformation -to `GarageArea`, while applying the reciprocal transformation to "sqrfootpercar": +to `GarageArea`, while applying the reciprocal transformation to `sqrfootpercar`: .. code:: python from feature_engine.pipeline import Pipeline from feature_engine.transformation import PowerTransformer, BoxCoxTransformer - from feature_engine.pipeline import Pipeline - from feature_engine.transformation import PowerTransformer, BoxCoxTransformer - pipe = Pipeline([ ("reciprocal", ReciprocalTransformer(variables="sqrfootpercar")), ("sqrroot", PowerTransformer(variables="GarageCars", exp=1/2)), @@ -195,7 +198,7 @@ Let's now fit the pipeline and transform the datasets: train_t = pipe.fit_transform(X_train) test_t = pipe.transform(X_test) -And now, we can corroborate how these transformations improved the value spread across all variables by plotting the +We can check out how these transformations changed the value spread across all variables by plotting the histograms for the transformed data: .. code:: python @@ -210,7 +213,7 @@ symmetrically distributed across their value ranges: | -An that's it! We've now applied different mathematical functions to stabilize the variance of the variables in the +That's it! We've now applied different mathematical functions to stabilise the variance of the variables in the dataset. Alternatives to the reciprocal function @@ -227,10 +230,10 @@ If the variable contains counts, then the square root transformation is better s The Box-Cox transformation automates the process of finding the best transformation by exploring several functions automatically. -All these functions are considered variance stabilizing transformations, and have been designed to transform data, to +All these functions are considered variance stabilising transformations, and have been designed to transform data, to meet the assumptions of statistical parametric tests and linear regression models. -You can apply all these functions out-of-the-box with the transformers from Feature-engine's transformation module. +You can apply all these functions out-of-the-box with the transformers from feature-engine's transformation module. Remember to follow up the transformations with proper data analysis, to ensure that the transformations returned the desired effect, otherwise, we are adding complexity to the feature engineering pipeline for now added benefit. Alternatives with Feature-engine @@ -247,56 +250,12 @@ You can apply other variance data transformation functions with the following tr Additional resources -------------------- -You can find more details about the :class:`ReciprocalTransformer()` here: +For tutorials about this and other feature engineering methods check out these resources: +- `Feature Engineering for Machine Learning `_, online course. +- `Feature Engineering for Time Series Forecasting `_, online course. +- `Python Feature Engineering Cookbook `_, book. -- `Jupyter notebook `_ - -For more details about this and other feature engineering methods check out these resources: - - -.. figure:: ../../images/feml.png - :width: 300 - :figclass: align-center - :align: left - :target: https://www.trainindata.com/p/feature-engineering-for-machine-learning - - Feature Engineering for Machine Learning - -| -| -| -| -| -| -| -| -| -| - -Or read our book: - -.. figure:: ../../images/cookbook.png - :width: 200 - :figclass: align-center - :align: left - :target: https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587 - - Python Feature Engineering Cookbook - -| -| -| -| -| -| -| -| -| -| -| -| -| - -Both our book and course are suitable for beginners and more advanced data scientists -alike. By purchasing them you are supporting Sole, the main developer of Feature-engine. \ No newline at end of file +Both our book and courses are suitable for beginners and more advanced data scientists +alike. By purchasing them you are supporting `Sole `_, +the main developer of feature-engine. \ No newline at end of file diff --git a/docs/user_guide/transformation/YeoJohnsonTransformer.rst b/docs/user_guide/transformation/YeoJohnsonTransformer.rst index ec59ae6ce..16fd02f47 100644 --- a/docs/user_guide/transformation/YeoJohnsonTransformer.rst +++ b/docs/user_guide/transformation/YeoJohnsonTransformer.rst @@ -50,7 +50,7 @@ values. - For variables with both positive and negative values: The Yeo-Johnson transformation combines the two approaches, using different powers for the positive and negative segments of the variable. To apply the Yeo-Johnson transformation in Python, you can use `scipy.stats.yeojohnson`, which can transform one variable -at a time. For transforming multiple variables simultaneously, libraries like scikit-klearn and Feature-engine are more suitable. +at a time. For transforming multiple variables simultaneously, libraries like scikit-klearn and feature-engine are more suitable. The YeoJohnsonTransformer ------------------------- @@ -71,8 +71,6 @@ and testing sets. .. code:: python - import numpy as np - import pandas as pd import matplotlib.pyplot as plt from sklearn.datasets import fetch_openml from sklearn.model_selection import train_test_split @@ -124,7 +122,7 @@ Let's now set up the transformer to apply the Yeo-Johnson transformation to 2 va tf.fit(X_train) -With `fit()`, :class:`YeoJohnsonTransformer()` learns the optimal lambda for the yeo-johnson power transformation. We +With `fit()`, :class:`YeoJohnsonTransformer()` learns the optimal lambda for the Yeo-Johnson power transformation. We can inspect these values as follows: .. code:: python @@ -137,7 +135,7 @@ We see the optimal lambda values below: {'LotArea': 0.02258978732751055, 'GrLivArea': 0.06781061353154169} -We can now go ahead and apply the data transformation to get closer to normal distributions. +We can now go ahead and apply the data transformation to get closer to normal distributions: .. code:: python @@ -195,55 +193,12 @@ values, using the `inverse_transform` method. Additional resources -------------------- -You can find more details about the :class:`YeoJohnsonTransformer()` here: +For tutorials about this and other feature engineering methods check out these resources: -- `Jupyter notebook `_ +- `Feature Engineering for Machine Learning `_, online course. +- `Feature Engineering for Time Series Forecasting `_, online course. +- `Python Feature Engineering Cookbook `_, book. -For more details about this and other feature engineering methods check out these resources: - - -.. figure:: ../../images/feml.png - :width: 300 - :figclass: align-center - :align: left - :target: https://www.trainindata.com/p/feature-engineering-for-machine-learning - - Feature Engineering for Machine Learning - -| -| -| -| -| -| -| -| -| -| - -Or read our book: - -.. figure:: ../../images/cookbook.png - :width: 200 - :figclass: align-center - :align: left - :target: https://www.packtpub.com/en-us/product/python-feature-engineering-cookbook-9781835883587 - - Python Feature Engineering Cookbook - -| -| -| -| -| -| -| -| -| -| -| -| -| - -Both our book and course are suitable for beginners and more advanced data scientists -alike. By purchasing them you are supporting Sole, the main developer of Feature-engine. +Both our book and courses are suitable for beginners and more advanced data scientists +alike. By purchasing them you are supporting `Sole `_, +the main developer of feature-engine. diff --git a/docs/user_guide/transformation/index.rst b/docs/user_guide/transformation/index.rst index 00ce20bfb..db49716ed 100644 --- a/docs/user_guide/transformation/index.rst +++ b/docs/user_guide/transformation/index.rst @@ -1,30 +1,49 @@ .. -*- mode: rst -*- -Variance Stabilizing Transformations +Variance Stabilising Transformations ==================================== -Feature-engine's variable transformers transform numerical variables with various -mathematical transformations. +Feature-engine's variance stabilising transformers transform numerical variables with various +mathematical operations, like logarithm, power, reciprocal, and so on. Variable transformations are commonly used to spread the values of the original variables -over a wider value range. See the following illustration: +over a wider value range and help meet the assumptions of several statistical models. +See the following illustration: .. figure:: ../../images/Variable_Transformation.png :align: center -Article -------- +.. tip:: -We added a lot of information about **variance stabilizing transformations** in this -`article `_. + To learn more about `**variance stabilising transformations** `_ + and their role in statistics and machine learning, check out our very detailed + `article `_ in our + Train in Data Blog. -**Note** +Supported transformations +------------------------- -Note however, that improving the value spread is not always possible and it depends -on the nature of the variable. +================================== ========================= ===================================================== ====================================================================== + Transformer Limitations Description Suitable for +================================== ========================= ===================================================== ====================================================================== +:class:`LogTransformer()` Not valid for x<=0 Applies natural or decimal logarithm. Positive continuous variables with right skew. +:class:`LogCpTransformer()` None Applies logarithm after adding a constant value. Continuous variables with a right skew. +:class:`ReciprocalTransformer()` Not defined for x=0 Applies the reciprocal transformation: 1/x. Variables representing ratios or proportions, like tons per acre. +:class:`ArcsinTransformer()` 0<= x <= 1 Applies the arcsin square root transformation. Probabilities or proportion variables with values between 0 and 1. +:class:`ArcSinhTransformer()` None Applies the inverse hyperbolic sine function. Similar to log but retaining zero values in a variable. +:class:`PowerTransformer()` None Applies any power transformation x = x**n. Square root is suitable for count variables. Other powers vary. +:class:`BoxCoxTransformer()` Not defined for x<=0 Applies the Box-Cox transformation. +:class:`YeoJohnsonTransformer()` None Applies the Yeo-Johnson transformation. +================================== ========================= ===================================================== ====================================================================== -**Transformers** +.. note:: + + Improving the value spread is not always possible and it depends on the nature of + the variable. + +Transformers +------------ .. toctree:: :maxdepth: 1