Some Hash Experiments
Switch branches/tags
Nothing to show
Pull request Compare This branch is 17 commits behind ahmetaa:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Algorithms for Dart


Trie implementation for Dart. Tries are especially good for prefix searches. This implementation is copied from mdakin's TrieDart library.

Usage example:

import 'package:poppy/trie.dart';
var words = ["april", "apron", "apricot", "hello", "goodbye"];
Trie<String> trie = new SimpleTrie();
for(var str in words) {
  trie[str]= str; // we put the key as the value. could be something else 
[apricot, april, apron]	

Minimal Perfect Hash Function (Mphf)

Mphf class is a Minimal Perfect Hash Function (MPHF) implementation.

A Mphf ( is generated from a defined set of unique keys. It produces a distinct integer for each key in the range of [0..keycount-1].

Generated hash function does not store key data in the structure therefore they are very compact. This particular implementation uses around 3.2 bits per key.

Mphfs may be useful for very large look-up structures such as the ones used in language model compression. Mphf generation is a very slow operation, therefore it is generally suggested storing the hash data once it is generated and using it from the storage afterwards. Dart implementation does not provide this functionality.

Usage example:

import 'package:poppy/mphf.dart';
var fruits = ["apple", "orange", "blueberry", "pomegranate"];
var mphf = new Mphf.fromStrings(fruits);
for(var fruit in fruits) {
  print("$fruit = ${mphf.getValue(fruit.charCodes)}");

apple = 3
orange = 2
blueberry = 1
pomegranate = 0	

Bloom Filter (BloomFilter)

BloomFilter is a simple Bloom Filter ( implementation. This structure guarantess if a key was "not" added to it. However it cannot guarantee if a key really added before. A Bloom filter can be constructed with number of keys to add, bits per bucket or maximum expected false positive ratio. Parameter estimation code is converted from commoncrawl project.

Usage example:

import 'package:poppy/bloom_filter.dart';
var fruits = ["apple", "orange", "blueberry", "pomegranate"];
var bloom = new BloomFilter(fruits.length);

for(var fruit in fruits) {
var newFruits = ["apple", "orange", "guava"];  
for(var fruit in newFruits) {
    print("$fruit may exist in bloom filter.");
    print("$fruit does not exist in bloom filter.");

apple may exist in bloom filter.
orange may exist in bloom filter.
guava does not exist in bloom filter.

Base64 Codec

A fast RFC 2045 compliant Base64 decoder with URL safe option. Performance is ~40MB/s for both encoding and decoding. Code is based on Mig Base64 (BSD licensed) with modifications. Implementation is provided by mdakin.

import 'package:poppy/base64.dart';
String encoded = b.encode("Hello".codeUnits);	
print("Encoded= $encoded");
String decoded = b.decode(encoded);
print("Decoded= $decoded"); 

Encoded= SGVsbG8=
Decoded= Hello


This is a special hash function that generates similar hash values for similar items. This means bit positions of two hash values matches better for similar items (smaller Hamming distance). For example:

import 'package:poppy/simhash.dart';
var simHasher = new SimHash();
int h1 = simHasher.getHashFromString("Small rabbit was very sad");
int h2 = simHasher.getHashFromString("Small cute rabbit was very sad");
int h3 = simHasher.getHashFromString("Because his brother was laughing at him");
print ("h1-h2 Hamming distance: ${hammingDistance(h1,h2)}");
print ("h1-h3 Hamming distance: ${hammingDistance(h1,h3)}");

h1-h2 Hamming distance: 9
h1-h3 Hamming distance: 31

For each input a 64 bit hash is generated. This hash can be used in tasks like near duplicate detection and clustering of documents. This idea is represented in Charikar's "Similarity Estimation Techniques from Rounding Algorithms" paper.


CountSet class in count_set.dart is used for counting objects. Similar structures are also known as MultiSet or Bag. This structure is possibly more compact than using a map structure. It also provides count related methods.

import 'package:poppy/count_set.dart';
var fruits = ["apple","apple","orange","apple","pear","orange"];
var set = new CountSet<String>()..addAll(fruits);
for(String fruit in new Set()..addAll(fruits)) {
  print("Count of $fruit is ${set[fruit]}");
print("Non existing item papaya's count:${set['papaya']}");

Count of apple is 3
Count of orange is 2
Count of pear is 1
Non existing item papaya's count:0	  

Sparse Vector (SparseVector)

SparseVector class in sparse_vector.dart can be used for representing large sparse vectors where most of its values are zero. This structure only hold non-zero elements in it. Therefore it is compact.

Internally it is actually a hash table that uses linear probing. It is more efficient than using Map<int,num> structure. Most vector arithmetic operations are not yet added to the code.

Integer Set (Int Set)

A simple implementation of an integer set. This is actually similar to SparseVector class. It is suppose to be sligthly faster and memory efficient than Set<int> structure.

Change List

0.1.8 Introduce Base64 codec. Add String methods to BloomFilter.
0.1.7 Fix an error slipped to 0.1.6 in mphf lib definition. Some cleanup
0.1.6 CountSet is introduced. Dart M3 changes.