## Hashing trick

Instead of building a hash table of the features encountered in training, as the vectorizers do, instances of FeatureHasher apply a hash function to the features to determine their column index in sample matrices directly. The result is increased speed and reduced memory usage, at the expense of inspectability; the hasher does not remember what the input features looked like and has no inverse_transform method.

* https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.FeatureHasher.html
* https://scikit-learn.org/stable/modules/feature_extraction.html#feature-hashing

In [35]:
from sklearn.feature_extraction import FeatureHasher

text = ["A bird in hand is worth two in the bush.",
        "Good things come to those who wait.",
        "These watches cost $1500! ",
        "These are other fish in the sea.",
        "The ball is in your court.",
        "Mr. Smith Goes to Washington ",
        "Doogie Howser M.D."]

hasher = FeatureHasher(n_features=8, input_type='string')
hashed_features = hasher.fit_transform(text)

hashed_features.shape

(7, 8)

In [36]:
hashed_features.toarray()

array([[ 10.,   0.,   1.,   1.,  -3., -11.,   1.,  -1.],
       [  6.,  -2.,   0.,   3.,   0.,  -6.,   0.,   0.],
       [  4.,  -5.,   1.,  -1.,   2.,  -3.,   0.,   0.],
       [  6.,   2.,   2.,   0.,  -1.,  -8.,   0.,   3.],
       [  7.,   1.,   1.,   1.,   1.,  -6.,   1.,   0.],
       [  4.,   2.,   1.,   0.,  -2.,  -5.,   0.,  -1.],
       [  2.,   1.,   0.,   3.,   0.,  -3.,   2.,   1.]])

In [42]:
hasher = FeatureHasher(n_features=16, input_type='string')
hashed_features = hasher.fit_transform(text)

hashed_features.shape

(7, 16)

In [43]:
hashed_features.toarray()

array([[ 1.,  0.,  1.,  1., -3., -4.,  0.,  1.,  9.,  0.,  0.,  0.,  0.,
        -7.,  1., -2.],
       [ 0., -3.,  1.,  3., -1., -2.,  0.,  2.,  6.,  1., -1.,  0.,  1.,
        -4.,  0., -2.],
       [ 0., -5.,  1., -1.,  2.,  1.,  0.,  3.,  4.,  0.,  0.,  0.,  0.,
        -4.,  0., -3.],
       [ 0.,  2.,  2., -1., -1., -2.,  0.,  6.,  6.,  0.,  0.,  1.,  0.,
        -6.,  0., -3.],
       [ 2.,  1.,  1.,  1., -1., -2.,  1.,  1.,  5.,  0.,  0.,  0.,  2.,
        -4.,  0., -1.],
       [-1.,  1.,  1.,  0., -2., -2.,  0.,  1.,  5.,  1.,  0.,  0.,  0.,
        -3.,  0., -2.],
       [ 0.,  0.,  0.,  3.,  0., -1.,  2.,  2.,  2.,  1.,  0.,  0.,  0.,
        -2.,  0., -1.]])

In [44]:
hasher.inverse_transform(hashed_features)

AttributeError: 'FeatureHasher' object has no attribute 'inverse_transform'