# `imdb_fasttext`
This example demonstrates the use of fastText for text classification

Based on Joulin et.al.'s paper:
_Bags of Tricks for Efficient Text Classification_
(https://arxiv.org/abs/1607.01759)

Can achieve accuracy around 88% after 5 epochs in ca. 70s on CPU.

### Import packages

In [1]:
from __future__ import print_function
import numpy as np

from keras.preprocessing import sequence
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.layers import Embedding
from keras.layers import AveragePooling1D
from keras.datasets import imdb

Using Theano backend.


### Set parameters
First, we specify a seed for the random number generator to obtain reproducible results.

In [2]:
np.random.seed(1337)

Now we specify the model and training parameters.

In [3]:
max_features = 20000
maxlen = 400
batch_size = 32
embedding_dims = 20
nb_epoch = 5

### Import and prepare data
We use the IMDb dataset in this example. It will be downloaded from the Internet during the first run. Subsequently, a cached version will be used.

In [4]:
(X_train, y_train), (X_test, y_test) = imdb.load_data(nb_words=max_features)

Let's see what we have...

In [5]:
print(len(X_train), 'train sequences')
print(len(X_test), 'test sequences')

25000 train sequences
25000 test sequences


Now we have to pad the data to the right shape

In [6]:
print('X_train shape before padding:', X_train.shape)
print('X_test shape before padding:', X_test.shape)

print('Padding sequences (samples x time)...')
X_train = sequence.pad_sequences(X_train, maxlen=maxlen)
X_test = sequence.pad_sequences(X_test, maxlen=maxlen)

print('X_train shape after padding:', X_train.shape)
print('X_test shape after padding:', X_test.shape)

X_train shape before padding: (25000,)
X_test shape before padding: (25000,)
Padding sequences (samples x time)...
X_train shape after padding: (25000, 400)
X_test shape after padding: (25000, 400)


### Model specification
We start off with creating a `Sequential` model for our network:

In [7]:
model = Sequential()

The first layer is an `Embedding` layer which maps our vocabulary indices into `embedding_dims` dimensions.

In [8]:
model.add(Embedding(max_features, embedding_dims, input_length=maxlen))

The second layer is an `AveragePooling1D` layer, which will average the embeddings of all words in the document.  
Its output is flattened using a `Flatten` layer.

In [9]:
model.add(AveragePooling1D(pool_length=model.output_shape[1]))
model.add(Flatten())

The flattened output of the second layer is projected onto a single unit output layer with a sigmoid activation function.

In [10]:
model.add(Dense(1, activation='sigmoid'))

Now we compile the model. We use binary cross-entropy as a loss function and Adam as an optimizer.

In [11]:
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

### Model fitting
Now we fit the model!

In [12]:
model.fit(X_train, y_train, batch_size=batch_size, nb_epoch=nb_epoch, validation_data=(X_test, y_test));

Train on 25000 samples, validate on 25000 samples
Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


### Prediction
We can easily use the trained model for prediction:

In [13]:
y_predicted = model.predict(X_test)

# Preview the first 10 predictions
for i in range(10):
    print("Prediction: {:.2f}, ground truth: {:d}".format(y_predicted[i, 0], y_test[i]))

Prediction: 0.78, ground truth: 1
Prediction: 0.89, ground truth: 1
Prediction: 1.00, ground truth: 1
Prediction: 0.61, ground truth: 1
Prediction: 0.89, ground truth: 1
Prediction: 0.24, ground truth: 0
Prediction: 0.02, ground truth: 0
Prediction: 0.17, ground truth: 0
Prediction: 0.84, ground truth: 1
Prediction: 0.87, ground truth: 1
