# Spline features

The modelling tools included in `ISLP` allow for
construction of spline functions of features.

Force rebuild

In [1]:
import numpy as np
from ISLP import load_data
from ISLP.models import ModelSpec, ns, bs

In [2]:
Carseats = load_data('Carseats')
Carseats.columns

Index(['Sales', 'CompPrice', 'Income', 'Advertising', 'Population', 'Price',
       'ShelveLoc', 'Age', 'Education', 'Urban', 'US'],
      dtype='object')

Let's make a term representing a cubic spline for `Population`. We'll use knots based on the 
deciles.

In [3]:
knots = np.percentile(Carseats['Population'], np.linspace(10, 90, 9))
knots

array([ 58.9, 110.4, 160. , 218.6, 272. , 317.8, 366. , 412.2, 467. ])

In [4]:
bs_pop = bs('Population', internal_knots=knots, degree=3)

The object `bs_pop` does not refer to any data yet, it must be included in a `ModelSpec` object
and fit using the `fit` method.

In [5]:
design = ModelSpec([bs_pop], intercept=False)
py_features = np.asarray(design.fit_transform(Carseats))

## Compare to `R`

We can compare our polynomials to a similar function in `R`

In [6]:
%load_ext rpy2.ipython

We'll recompute these features using `bs` in `R`. The default knot selection of the
`ISLP` and `R` version are slightly different so we just fix the set of internal knots.

In [7]:
%%R -i Carseats,knots -o R_features
library(splines)
R_features = bs(Carseats$Population, knots=knots, degree=3)

In [8]:
np.linalg.norm(py_features - R_features)

1.1372379284497324e-15

## Underlying model

As for `poly`, the computation of the B-splines is done by a special `sklearn` transformer.

In [9]:
bs_pop

Variable(variables=('Population',), name='bs(Population, internal_knots=[ 58.9 110.4 160.  218.6 272.  317.8 366.  412.2 467. ], degree=3)', encoder=BSpline(internal_knots=array([ 58.9, 110.4, 160. , 218.6, 272. , 317.8, 366. , 412.2, 467. ]),
        lower_bound=10.0, upper_bound=509.0), use_transform=True, pure_columns=False, override_encoder_colnames=True)

## Natural splines 

Natural cubic splines are also implemented.

In [10]:
ns_pop = ns('Population', internal_knots=knots)
design = ModelSpec([ns_pop], intercept=False)
py_features = np.asarray(design.fit_transform(Carseats))

In [11]:
%%R -o R_features
library(splines)
R_features = ns(Carseats$Population, knots=knots)

In [12]:
np.linalg.norm(py_features - R_features)

1.2473757226554746e-15

## Intercept

Looking at `py_features` we see it contains columns: `[Population**i for i in range(1, 4)]`. That is, 
it doesn't contain an intercept, the order 0 term. This can be include with `intercept=True`. This means that the
column space includes an intercept, though there is no specific column labeled as intercept.

In [13]:
bs_int = ns('Population', internal_knots=knots, intercept=True)
design = ModelSpec([bs_int], intercept=False)
py_int_features = np.asarray(design.fit_transform(Carseats))

In [14]:
py_int_features.shape, py_features.shape

((400, 11), (400, 10))