Michigan State University's CSE484 "Information Retrieval" final project for Content-Based Image Retrieval. Implemented in Python.
Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
features
.gitignore
.gitmodules
README.mdown
aio.py
bag.py
buildIndex.py
cbir.py
cluster.py
logging.conf
processFeatures.py
txt2npy.py

README.mdown

Note

This project has been superceded by a C++ variant. For whatever reason, generating the clusters in Python is obscenely slow. While I never tested the nearest-neighbor operation, I imagine that it would be as well. I figure this is Python's fault, not FLANN's.

You can find the new project here.

Old Documentation for Legacy Sake

Using Py-CBIR

MSU Py-CBIR is modularized for easy use.

  • processFeatures.py
    • Processes a feature file into an in-memory array
  • buildIndex.py
    • Builds an Index from an in-memory array of points, saves it to file
  • cluster.py
    • Clusters all of the points from an Index and saves those clusters to file
  • bag.py
    • Loads a feature file and cluster file into memory, determines which cluster each feature belongs to.
    • Determines which image that feature belongs to, and writes the cluster number to file.
      • This cluster number is taken from the ordering of the clusters in the cluster file
      • This list of cluster numbers can be directly used as 'words' in a bag-of-words approach

Example Usage

# Load the first 10 documents' features into their own file, so that 
# processing goes faster
$ head -n 2158 features/esp.feature > features.first10documents
$ FEATURES='features.first10documents'

# Convert the MSU-supplied feature list into a NumPy list that loads much faster
$ python txt2npy.py -f $FEATURES


# Build the index
$ python buildIndex.py -f $FEATURES.npy -o $FEATURES.index

# Find 100 clusters.  Note that the project description requires 125000
$ python cluster.py -f $FEATURES.npy -i $FEATURES.index -o $FEATURES.clusters -n 100

# Find cluster associations ('words)
$ python bag.py -f $FEATURES.npy -c $FEATURES.clusters -d docs -l features/imglist.txt -s features/esp.size

# Take a look at the first Image, to see which clusters its points are associated with
$ more docs/00004fc7eb4bf00ba434c167890b99fa.txt