This project has been superceded by a C++ variant. For whatever reason, generating the clusters in Python is obscenely slow. While I never tested the nearest-neighbor operation, I imagine that it would be as well. I figure this is Python's fault, not FLANN's.
You can find the new project here.
Old Documentation for Legacy Sake
MSU Py-CBIR is modularized for easy use.
- Processes a feature file into an in-memory array
- Builds an Index from an in-memory array of points, saves it to file
- Clusters all of the points from an Index and saves those clusters to file
- Loads a feature file and cluster file into memory, determines which cluster each feature belongs to.
- Determines which image that feature belongs to, and writes the cluster number to file.
- This cluster number is taken from the ordering of the clusters in the cluster file
- This list of cluster numbers can be directly used as 'words' in a bag-of-words approach
# Load the first 10 documents' features into their own file, so that # processing goes faster $ head -n 2158 features/esp.feature > features.first10documents $ FEATURES='features.first10documents' # Convert the MSU-supplied feature list into a NumPy list that loads much faster $ python txt2npy.py -f $FEATURES # Build the index $ python buildIndex.py -f $FEATURES.npy -o $FEATURES.index # Find 100 clusters. Note that the project description requires 125000 $ python cluster.py -f $FEATURES.npy -i $FEATURES.index -o $FEATURES.clusters -n 100 # Find cluster associations ('words) $ python bag.py -f $FEATURES.npy -c $FEATURES.clusters -d docs -l features/imglist.txt -s features/esp.size # Take a look at the first Image, to see which clusters its points are associated with $ more docs/00004fc7eb4bf00ba434c167890b99fa.txt