Skip to content

Commit

Permalink
DOC Remove mentions of milk
Browse files Browse the repository at this point in the history
This was very old an un-maintained.

closes #101
  • Loading branch information
luispedro committed Jun 11, 2020
1 parent 90e85fe commit 44debfc
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 80 deletions.
27 changes: 14 additions & 13 deletions docs/source/classification.rst
@@ -1,14 +1,13 @@
======================================
Tutorial: Classification Using Mahotas
======================================
.. versionadded:: 0.8
Before version 0.8, texture was under mahotas, not under mahotas.features

Here is an example of using mahotas and `milk <http://luispedro.org/software/milk>`_
for image classification (but most of the code can easily be adapted to use
another machine learning package). I assume that there are three important
directories: ``positives/`` and ``negatives/`` contain the manually labeled
examples, and the rest of the data is in an ``unlabeled/`` directory.
Here is an example of using mahotas and `scikit-learn
<https://scikit-learn.org>`__ for image classification (but most of the code
can easily be adapted to use another machine learning package). I assume that
there are three important directories: ``positives/`` and ``negatives/``
contain the manually labeled examples, and the rest of the data is in an
``unlabeled/`` directory.

Here is the simple algorithm:

Expand All @@ -25,7 +24,6 @@ We start with a bunch of imports::
from glob import glob
import mahotas
import mahotas.features
import milk
from jug import TaskGenerator

Now, we define a function which computes features. In general, texture features
Expand All @@ -40,16 +38,18 @@ are very fast and give very decent results::
the mean (sometimes you use the spread ``ptp()`` too).

Now a pair of functions to learn a classifier and apply it. These are just
``milk`` functions::
``scikit-learn`` functions::

@TaskGenerator
def learn_model(features, labels):
learner = milk.defaultclassifier()
return learner.train(features, labels)
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(features, labels)
return clf

@TaskGenerator
def classify(model, features):
return model.apply(features)
return model.predict(features)

We assume we have three pre-prepared directories with the images in jpeg
format. This bit you will have to adapt for your own settings::
Expand All @@ -73,6 +73,7 @@ This uses texture features, which is probably good enough, but you can play
with other features in ``mahotas.features`` if you'd like (or try
``mahotas.surf``, but that gets more complicated).

(This was motivated by `a question on Stackoverflow <http://stackoverflow.com/questions/5426482/using-pil-to-detect-a-scan-of-a-blank-page/5505754>`__).
(This was motivated by `a question on Stackoverflow
<http://stackoverflow.com/questions/5426482/using-pil-to-detect-a-scan-of-a-blank-page/5505754>`__).


67 changes: 0 additions & 67 deletions docs/source/surfref.rst
Expand Up @@ -67,70 +67,3 @@ We now compute all features for all images in widefield dataset::
features.append(surf_ref(f, ref))
labels.append(dir)
origins.append(origin_counter)

Classification
--------------

With all the precomputed features, we can now run 10~fold cross-validation on
these features.

We will using milk for machine learning::

import milk

Milk's interface is around learner objects. We are going to define a function::

def train_model(features, labels):

The first step is to find centroids::

# concatenate all the features:
concatenated = np.concatenate(features)

We could use the whole array concatenated for kmeans. However, that would take
a long time, so we will use just 1/16th of it::

concatenated = concatenated[::16]
_,centroids = milk.kmeans(concatenated, k=len(labels)//4, R=123)

The R argument is the random seed. We set it to a constant to get reproducible
results, but feel free to vary it.

Based on these centroids, we project the features to histograms. Now, we are
using all of the features::

features = np.array([
project_centroids(centroids, fs, histogram=True)
for fs in features])

Finally, we can use a traditional milk learner (which will perform feature
selection, normalization, and SVM training)::

learner = milk.defaultlearner()
model = learner.train(features, labels)

We must return both the centroids that were used and the classification model::

return centroids, model

To classify an instance, we define another function, which uses the centroids
and the model::

def apply_many(centroids, model, features):
features = np.array([
project_centroids(centroids, fs, histogram=True)
for fs in features])
return model.apply_many(features)

In fact, while the above will work well, milk already provides a learner object
which will perform all of those tasks!

::

import milk
from milk.supervised.precluster import frac_precluster_learner

learner = frac_precluster_learner(kfrac=4, sample=16)
cmatrix,names = milk.nfoldcrossvalidation(features, labels, origins=origins, learner=learner)
acc = cmatrix.astype(float).trace()/cmatrix.sum()
print('Accuracy: {.1}%'.format(100.*acc))

0 comments on commit 44debfc

Please sign in to comment.