# Fuzzy Bayesian concept learning

Suppose we have a population of objects, each represented by a vector of features.  Similar objects have similar feature vectors.  We get only a few positive examples of a concept--say "dog"--that we want generalize correctly.  Which of the other objects are dogs?

The model below is a fuzzy variant of "Bayesian concept learning", as proposed by Josh Tenenbaum in his dissertation (ref).  Rather than assuming the objects are either in or not in the extension of the to-be-learned concept, we assume that each has some degree of membership.  The probability that an object is drawn as an example is proportional to its degree of "in-ness".

##Some data to play with

We populate a hash table whose keys are object name strings and whose values are eleven-dimensional vectors of real numbers.  Vectors for 96 objects have been constructed from about 2000 rdf (subject-verb-object) triples describing them, by means that would take us too far afield to discuss.  Suffice it to say that objects that occur in similar contexts have similar vectors.  

TODO: fix the csv utils so I don't have to do all the trimming.

In [19]:
(require gamble
         gamble/util/csv
         racket/string
         racket/vector)

(define object-codes 
  (make-hash
      (map 
       (lambda (row)
         (cons (string-trim (vector-ref row 0))
               (map (lambda (n) 
                      (string->number 
                       (string-trim n)))
                   (vector->list (vector-drop row 1)))))
       (read-csv-file "toy_vectors.csv"))))

##Having a look at the object vectors

If we look at the cosine similarity of object vectors, we can see that our intuitions of relative similarity are respected.

In [17]:
;; Cosine.
(define (similarity obj1 obj2)
  (let ([v1 (hash-ref object-codes obj1)]
        [v2 (hash-ref object-codes obj2)])
    (let ([l1 (sqrt (apply + (map * v1 v1)))]
          [l2 (sqrt (apply + (map * v2 v2)))])
     (/ (apply + (map * v1 v2)) (* l1 l2)))))

In [18]:
(printf "dog1 (individual) / dog2 (individual): ~a\ndog1 (individual) / dog (class): ~a\ndog (class) / theft (class): ~a\ndog1 (individual) / subPropertyOf (higher-order relation): ~a\ntheft (class) /  transfer (super-class): ~a"
        (similarity "toy_dog1" "toy_dog2")
        (similarity "toy_dog1" "toy_Dog")
        (similarity "toy_Dog" "toy_Theft")
        (similarity "toy_dog1" "rdfs_subPropertyOf")
        (similarity "toy_Theft" "toy_Transfer"))

dog1 (individual) / dog2 (individual): 0.9962239277160961
dog1 (individual) / dog (class): 0.8802545473616828
dog (class) / theft (class): 0.9999341458378703
dog1 (individual) / subPropertyOf (higher-order relation): 0.7880343956261665
theft (class) /  transfer (super-class): 0.9996345287318831

##Utilities

The logistic function is probably familiar.  The second function is a bit of awkwardness intended to simplify the generative model specification inside the sampler, below.

In [4]:
(define (logistic x) (/ 1.0 (+ 1.0 (exp (- x)))))

;; Awkward...
(define (squashed-dot-prod x-list y-fn)
  (logistic (for/sum ([i (length x-list)]) 
        (* (list-ref x-list i) (y-fn i)))))

#The model

We see several examples, which are assumed to be drawn thusly: 

1. There is some weight vector that defines "the concept".  Each element of the weight vector is randomly either zero or a normal draw.

2. The dot product of the weight vector and an object's feature vector is passed through a logistic "squashing" function, producing a number in $(0,1)$ which we take to be the degree to which the object is "in" the concept.

3. The vector of "in-nesses" is normalized (automatically, in construction) to produce a discrete distribution over objects, from which examples are then drawn.

Inference yields in-nesses for several objects, given weights sampled from the posterior, which will favor concentrating in-ness on the examples, and (within the strong smoothness constraint imposed by the model form) avoiding it on non-examples.

In [5]:
(define bcl-sampler
  (mh-sampler
   
   ;;;;;;;  Generative model ;;;;;;;;;
   
   ;; The weight vector that defines the concept.
   (defmem (wt i) (if (flip 0.5) 0 (normal 0 3.0)))
   
   ;; The set of squashed dot products that defines the concept's discrete distribution.
   (deflazy weighted-objects
      (map 
         (lambda (obj)
            (cons obj (squashed-dot-prod (hash-ref object-codes obj) wt)))
         (hash-keys object-codes)))

   ;; Drawing examples from that distribution.
   (defmem (examples k) (discrete weighted-objects))
   
   
   ;;;;;;;;; Observations ;;;;;;;;;;;
   
   (observe (examples 1) "toy_dog1")
   (observe (examples 2) "toy_dog2")
   (observe (examples 3) "toy_dog11")
   (observe (examples 4) "toy_dog12")
   (observe (examples 5) "toy_dog4")
   (observe (examples 6) "toy_dog14")
   
   ;(observe (examples 1) "toy_giveEvt1")
   ;(observe (examples 2) "toy_theft1")
   
   
   ;;;;;;;;; Query ;;;;;;;;;;;;;;;
   
   (vector 
    (squashed-dot-prod (hash-ref object-codes "toy_dog4") wt)
    (squashed-dot-prod (hash-ref object-codes "toy_dog3") wt)
    (squashed-dot-prod (hash-ref object-codes "toy_dog13") wt)
    (squashed-dot-prod (hash-ref object-codes "toy_person3") wt)
    (squashed-dot-prod (hash-ref object-codes "toy_person13") wt)
    (squashed-dot-prod (hash-ref object-codes "toy_theft1") wt)
    (squashed-dot-prod (hash-ref object-codes "toy_giveEvt2") wt)
    (squashed-dot-prod (hash-ref object-codes "rdfs_subPropertyOf") wt))))

In [12]:
(define smpls (sampler->mean bcl-sampler 50 #:burn 2000 #:thin 150))

#Results

Yea!! As we hoped for, generalization is strongest to dogs (the left-most three bars, only the first of which is an example), and next-strongest to people (the next two bars to the right).  Generalization is very low to a theft event and a gift event (the next two), and non-existent to the abstract property "subPropertyOf".

This is significant, if simple, learning from just a few examples, with no negative examples at all, based on a model no more complex than logistic regression.

In [7]:
(require "c3_helpers.rkt")

In [13]:
(bar-c3-categorical 
 (list 1 2 3 4 5 6 7 8)
 (vector->list smpls)
 (list "dog4" "dog3" "dog13" "person3" "person13" "theft1" "gift2" "subPropertyOf")
 #:xlabel "object"
 #:ylabel "mean in-ness")

(c3-data . #hasheq((data . #hasheq((type . bar) (xs . #hasheq((ys1 . xs1))) (columns . ((xs1 dog4 dog3 dog13 person3 person13 theft1 gift2 subPropertyOf) (ys1 0.6102800518094778 0.6103132619332787 0.6103097093279195 0.26620001981665964 0.3339283985447949 0.034404085447293384 0.033901983378010134 0.009646183356972746))))) (axis . #hasheq((x . #hasheq((type . category) (label . object) (tick . #hasheq((rotate . 90))))) (y . #hasheq((label . mean in-ness)))))))