# Numpy Example

In this example we'll look at training a `fastai2` tabular model utilizing the new `NumPy` DataLoaders

In [None]:
%cd ..

/media/mldata/fastai2/zach/fastai2_tabular_hybrid


In [None]:
from fastai2.tabular.all import *
from fastai2_tabular_hybrid.numpy import *

In [None]:
path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')

We'll be using the `ADULT_SAMPLE` dataset for an example

In [None]:
df.head()

Unnamed: 0,age,workclass,fnlwgt,education,education-num,marital-status,occupation,relationship,race,sex,capital-gain,capital-loss,hours-per-week,native-country,salary
0,49,Private,101320,Assoc-acdm,12.0,Married-civ-spouse,,Wife,White,Female,0,1902,40,United-States,>=50k
1,44,Private,236746,Masters,14.0,Divorced,Exec-managerial,Not-in-family,White,Male,10520,0,45,United-States,>=50k
2,38,Private,96185,HS-grad,,Divorced,,Unmarried,Black,Female,0,0,32,United-States,<50k
3,38,Self-emp-inc,112847,Prof-school,15.0,Married-civ-spouse,Prof-specialty,Husband,Asian-Pac-Islander,Male,0,0,40,United-States,>=50k
4,42,Self-emp-not-inc,82297,7th-8th,,Married-civ-spouse,Other-service,Wife,Black,Female,0,0,50,United-States,<50k


First we'll want to build a `TabularPandas` object to preprocess the data for us:

In [None]:
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
y_names = 'salary'
splits = RandomSplitter()(range_of(df))

In [None]:
to = TabularPandas(df, procs=procs, cat_names=cat_names, cont_names=cont_names,
                   y_names=y_names, splits=splits)

Next we'll build our `NumpyDataloaders` by passing in the `TabularPandas` object, as well as any other parameters we want to specify. I'll show verbosity here but these are the defaults

In [None]:
dls = NumpyDataLoaders(to, bs=64, val_bs=128, shuffle_train=True, device='cuda')

We'll look at one batch of transformed data:

In [None]:
batch = next(iter(dls[0]))

In [None]:
batch[0][0], batch[1][0], batch[2][0]

(tensor([ 5, 12,  5,  9,  5,  3,  1], device='cuda:0'),
 tensor([ 0.3254, -0.1453, -0.4199], device='cuda:0'),
 tensor([0], device='cuda:0', dtype=torch.int8))

And now we'll train a model. We'll utilize our earlier `TabularPandas` object to calculate our embedding sizes and build our `TabularModel`. We can't use `tabular_learner` here directly

In [None]:
emb_szs = get_emb_sz(to)
net = TabularModel(emb_szs, n_cont=3, out_sz=2, layers=[200,100]).cuda()
learn = Learner(dls, net, metrics=accuracy, loss_func=CrossEntropyLossFlat())

And now we can train

In [None]:
%%time
learn.fit_one_cycle(5, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.385238,0.383343,0.822942,00:02
1,0.350524,0.380662,0.831235,00:02
2,0.356431,0.356876,0.835688,00:02
3,0.354293,0.355097,0.837684,00:02
4,0.343811,0.356239,0.836763,00:02


CPU times: user 11.6 s, sys: 79.1 ms, total: 11.6 s
Wall time: 11.7 s


For a quick timing comparison, here is native `fastai2`:

In [None]:
dls = to.dataloaders(bs=64)
learn = tabular_learner(dls, metrics=accuracy)

In [None]:
%%time
learn.fit_one_cycle(5, 1e-2)

epoch,train_loss,valid_loss,accuracy,time
0,0.391983,0.370891,0.82801,00:03
1,0.35215,0.36542,0.830467,00:03
2,0.35173,0.356867,0.836763,00:03
3,0.346647,0.353872,0.836456,00:03
4,0.340674,0.353653,0.837224,00:03


CPU times: user 18.8 s, sys: 62.7 ms, total: 18.8 s
Wall time: 18.8 s


As you can see we got ~40% speed boost!