Skip to content

Datascript string search

Petter Eriksson edited this page Dec 6, 2017 · 3 revisions

We created a library to for fast-lookup of strings, originally to use it for search suggestions of products and stores.

We've implemented it using datascript and it's neat that it can be used server and client side, because of .cljc. I don't know of a library that's similar to it.

Library namespace: eponai.common.search

Usage and incrementally updated index whenever products are added to datomic: eponai.server.external.product_search

Example usage of the library

                ;; Read a fairly large corpus
(def stuff (let [lines (->> (slurp "/usr/share/dict/web2a")
                            (str/split-lines))
                 ;; Create a datascript db with the schema needed for indexing.
                 conn (datascript/create-conn search-schema)
                 ;; Add the data to an attribute
                 _ (db/transact conn (into [] (map (fn [l] {:name l})) lines))
                 db (db/db conn)
                 ;; Create the index with the entities-by-attr-tx function.
                 index-tx (->> (db/datoms (db/db conn) :aevt :name)
                               (into [] (comp (map :e)
                                              (map #(db/entity (db/db conn) %))))
                               (entities-by-attr-tx :name))
                 indexed-db (datascript/db-with db index-tx)]
             {:db indexed-db 
              ;; 121848 datoms in the transaction
              :tx index-tx
              ;; 76205 lines of words
              :lines lines}))

;; For searches, return their count and print the time it takes to search.
(into []
      (map (juxt identity
                 (fn [search]
                   (time (count (match-string (:db stuff) search))))))
  [
;; Finds strings with a word starting with "acid"
   "acid"
;; Finds strings with a word starting with "acid" and a word starting with "bath"
   "acid bath"
   "ac bath"
   "a bat"
   "a ba"
   "a b"
   "a b c"
   "a b c d"])

;; returns and prints on my MacBook Pro 13" 2015
[["acid" 9]      ;; "Elapsed time: 17.244565 msecs"
 ["acid bath" 1] ;; "Elapsed time: 14.256633 msecs"
 ["ac bath" 1]   ;; "Elapsed time: 12.267515 msecs"
 ["a bat" 6]     ;; "Elapsed time: 12.759851 msecs"
 ["a ba" 80]     ;; "Elapsed time: 81.886891 msecs"
 ["a b" 375]     ;; "Elapsed time: 171.188805 msecs"
 ["a b c" 1]     ;; "Elapsed time: 78.821723 msecs"
 ["a b c d" 0]]  ;; "Elapsed time: 202.673805 msecs"

TODO

Mention match-next-word