# How to implement polynomial logistic regression in scikit-learn?

In [36]:
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline

data = load_iris()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [37]:
X.shape

(150, 4)

## Step 1

First you need to convert your data to polynomial features. Originally, our data has 4 columns:

In [38]:
X_train.shape

(112, 4)

You can create the polynomial features with scikit learn (here it is for degree 2):

In [39]:
poly = PolynomialFeatures(degree = 2, interaction_only=False, include_bias=False)
X_poly = poly.fit_transform(X_train)
X_poly.shape

(112, 14)

We know have 14 features (the original 4, their square, and the 6 crossed combinations)

## Step 2

On this you can now build your logistic regression calling `X_poly`

In [40]:
logistic_model = LogisticRegression(C=1000000, solver='newton-cg', max_iter=250).fit(X_poly,y_train)
print(f'Logistic regression model coefficients:{logistic_model.coef_}\n')

Logistic regression model coefficients:[[  1.29131952   1.20450927  -0.66783301  -0.5269703    4.10355629
    4.72831301  -4.22525882  -2.85482295   4.12411739  -1.16829398
   -1.25933566  -4.59718793  -2.07590722  -0.8178695 ]
 [  2.70666477   1.12624387   2.23891673   0.19757773   3.62196281
   -0.58774185  -0.29592672  -4.30044307   5.93203176   3.03581619
    0.34758996  -5.66623784  -5.3223053   -3.32300639]
 [ -3.99798429  -2.33075315  -1.57108372   0.32939256  -7.7255191
   -4.14057116   4.52118554   7.15526602 -10.05614915  -1.86752221
    0.91174571  10.26342577   7.39821252   4.14087588]]



Note: if you then want to evaluate your model on the test data, you also need to follow these 2 steps and do:

In [41]:
logistic_model.score(poly.transform(X_test), y_test)

0.9473684210526315

# Putting everything together in a Pipeline (optional)

You may want to use a Pipeline instead that processes these two steps in one object to avoid building intermediary objects:

In [42]:
pipe = Pipeline([('polynomial_features',poly), ('logistic_regression',logistic_model)])
pipe.fit(X_train, y_train)
pipe.score(X_test, y_test)

0.9473684210526315

# References
- [ ] [How to implement polynomial logistic regression in scikit-learn?](https://stackoverflow.com/questions/55937244/how-to-implement-polynomial-logistic-regression-in-scikit-learn)
- [ ] %pip install scikit-learn when using IPython to install packages