# Fuzzy Bayesian concept learning

Suppose we have a population of objects, each represented by a vector of features.  Similar objects have similar feature vectors.  We get only a few positive examples of a concept--say "dog"--that we want generalize correctly.  Which of the other objects are dogs?

The model below is a fuzzy variant of "Bayesian concept learning", as proposed by Josh Tenenbaum in his dissertation (ref).  Rather than assuming the objects are either in or not in the extension of the to-be-learned concept, we assume that each has some degree of membership.  The probability that an object is drawn as an example is proportional to its degree of "in-ness". 

##Some data to play with

We populate a hash table whose keys are object name strings and whose values are eleven-dimensional vectors of real numbers.  Vectors for 96 objects have been constructed from about 2000 rdf (subject-verb-object) triples describing them, by means that would take us too far afield to discuss.  Suffice it to say that an iterative algoritm ensures that objects that occur in similar contexts have similar vectors.  

TODO: fix the csv utils so I don't have to do all the whitespace trimming.

In [1]:
(require gamble
         gamble/util/csv
         racket/string
         racket/vector
         "c3_helpers.rkt"
         racket/list)

(define object-codes 
  (make-hash
      (map 
       (lambda (row)
         (cons (string-trim (vector-ref row 0))
               (map (lambda (n) 
                      (string->number 
                       (string-trim n)))
                   (vector->list (vector-drop row 1)))))
       (read-csv-file "toy_vectors.csv"))))

(define n-features 11)

##Having a look at the object vectors

If we look at the cosine similarity of object vectors, we can see that our intuitions of relative similarity are mostly respected.

In [15]:
;; Cosine.
(define (similarity obj1 obj2)
  (let ([v1 (hash-ref object-codes obj1)]
        [v2 (hash-ref object-codes obj2)])
    (let ([l1 (sqrt (apply + (map * v1 v1)))]
          [l2 (sqrt (apply + (map * v2 v2)))])
     (/ (apply + (map * v1 v2)) (* l1 l2)))))

In [16]:
(printf "
        dog1 (individual) / dog2 (individual):\t\t~a\n
        dog1 (individual) / dog (class in):\t\t~a\n
        dog1 (individual) / subPropertyOf (2n-ord rel):\t~a\n
        theft (class) /  transfer (super-class):\t~a"
        (/ (round (* 10000 (similarity "toy_dog1" "toy_dog2"))) 10000)
        (/ (round (* 10000 (similarity "toy_dog1" "toy_Dog"))) 10000)
        (/ (round (* 10000 (similarity "toy_dog1" "rdfs_subPropertyOf"))) 10000)
        (/ (round (* 10000 (similarity "toy_Theft" "toy_Transfer"))) 10000))


        dog1 (individual) / dog2 (individual):		0.9962

        dog1 (individual) / dog (class in):		0.8803

        dog1 (individual) / subPropertyOf (2n-ord rel):	0.788

        theft (class) /  transfer (super-class):	0.9996

#The model

We see several examples, which are assumed to be drawn thusly: 

1. First each element of the concept mean vector $\mu^c$ is drawn from an independent Gaussian.

2. For each dimension, a concept-specific precision (a length-scale or importance weight) $\tau^c_j$, is drawn from a mixture of a spike at zero (feature is irrelevant) and a vague gamma distribution.

3. The item's degree of in-ness is exponentially decreasing in its weighted city-block distance from the concept mean:

$$g^c_i = e^{-\sum_j \tau^c_j |x_{ij}-\mu^c_j|}$$

3. The vector of "in-nesses" is normalized (automatically, in construction) to produce a discrete distribution over objects, from which examples are then drawn.

Inference yields in-nesses for several objects, given weights sampled from the posterior, which will favor concentrating in-ness on the examples, and (within the strong smoothness constraint imposed by the model form) avoiding weight on non-examples.

In [17]:
(define bcl-sampler
  (mh-sampler
   
   ;;;;;;;  Generative model ;;;;;;;;;
   
   (deflazy p-relevant (beta 2 2)) 
   
   ;; The mean and precision vectors that define the concept.
   (defmem (precision concept feature) (if (flip p-relevant) 0.000000001 (gamma 0.6 12.0)))   
   (defmem (mean concept feature) (normal 0 8))
   
   ;; In-ness decreases exponentially with city-block distance from concept means,
   ;; with block directions scaled by precisions.
   (define (in-ness object concept)
      (let ([obj-ftrs (hash-ref object-codes object)])
        (+ 0.0000000000001
          (exp 
           (for/sum ([i (length obj-ftrs)])
                (let ([ftr-diff (- (mean concept i) (list-ref obj-ftrs i))])
                  (* -1 (precision concept i) (abs ftr-diff))))))))
                    ;(* -1 (precision concept i) (* ftr-diff ftr-diff))))))))
   
   
   ;; The set of in-ness-es that defines the concept's discrete distribution.
   (defmem (weighted-objects concept)
      (map 
         (lambda (obj)
            (cons obj (in-ness obj concept)))
         (hash-keys object-codes)))

   ;; Drawing examples from that distribution.
   (defmem (examples concept k) (discrete (weighted-objects concept)))
   
   
   ;;;;;;;;; Observations ;;;;;;;;;;;
   
   (observe (examples "chien" 1) "toy_dog1")
   (observe (examples "chien" 2) "toy_dog2")
   (observe (examples "chien" 3) "toy_dog11")
   (observe (examples "chien" 4) "toy_dog12")
   (observe (examples "chien" 5) "toy_dog4")
   (observe (examples "chien" 6) "toy_dog14")
   
   (observe (examples "personne" 1) "toy_person1")
   (observe (examples "personne" 2) "toy_person2")
   (observe (examples "personne" 3) "toy_person11")
   (observe (examples "personne" 4) "toy_person12")
   (observe (examples "personne" 5) "toy_person4")
   (observe (examples "personne" 6) "toy_person14")
   (observe (examples "personne" 7) "toy_person23")
   (observe (examples "personne" 8) "toy_person33")
   
   (observe (examples "transfert" 1) "toy_giveEvt1")
   (observe (examples "transfert" 2) "toy_theft1")
   
   
   ;;;;;;;;; Query ;;;;;;;;;;;;;;;
   
   (vector 
    (in-ness "toy_dog4" "chien")
    (in-ness "toy_dog3" "chien")
    (in-ness "toy_dog13" "chien")
    (in-ness "toy_person3" "chien")
    (in-ness "toy_person13" "chien")
    (in-ness "toy_theft1" "chien")
    (in-ness "toy_giveEvt2" "chien")
    (in-ness "rdfs_subPropertyOf" "chien")
    (in-ness "toy_dog4" "transfert")
    (in-ness "toy_dog3" "transfert")
    (in-ness "toy_dog13" "transfert")
    (in-ness "toy_person3" "transfert")
    (in-ness "toy_person13" "transfert")
    (in-ness "toy_theft1" "transfert")
    (in-ness "toy_giveEvt2" "transfert")
    (in-ness "rdfs_subPropertyOf" "transfert")
    (in-ness "toy_dog4" "personne")
    (in-ness "toy_dog3" "personne")
    (in-ness "toy_dog13" "personne")
    (in-ness "toy_person3" "personne")
    (in-ness "toy_person13" "personne")
    (in-ness "toy_theft1" "personne")
    (in-ness "toy_giveEvt2" "personne")
    (in-ness "rdfs_subPropertyOf" "personne"))))

In [22]:
(define smpls (sampler->mean bcl-sampler 50 #:burn 10000 #:thin 2000))

#Results

## Dogs ("chien")

As we expect (or hope), generalization is strongest to dogs (the left-most three bars, only the first of which is an example), and next-strongest to people (the next two bars to the right).  Generalization is very low to a theft event and a gift event (the next two), and non-existent to the abstract property "subPropertyOf".

This is significant, if simple, learning from just a few examples, with no negative examples at all, based on a model no more complex than logistic regression.

In [19]:
(bar-c3-categorical 
 (list 1 2 3 4 5 6 7 8)
 (take (vector->list smpls) 8)
 (list "dog4" "dog3" "dog13" "person3" "person13" "theft1" "gift2" "subPropertyOf")
 #:xlabel "object"
 #:ylabel "mean in-ness")

(c3-data . #hasheq((data . #hasheq((type . bar) (xs . #hasheq((ys1 . xs1))) (columns . ((xs1 dog4 dog3 dog13 person3 person13 theft1 gift2 subPropertyOf) (ys1 0.23306451574231118 0.23268185349666237 0.23276698178814326 0.032757574823156096 0.039245501256883204 0.0033472494241386375 0.0030088837413401836 0.0006507559608831496))))) (axis . #hasheq((x . #hasheq((type . category) (label . object) (tick . #hasheq((rotate . 90))))) (y . #hasheq((label . mean in-ness)))))))

##Transfer events ("transfert") 

Here we can give only two examples, because there are only three in the data set, and we want one to test generalization on (the second right-most bar).  We don't get the clear separation we got with dogs, but the two transfer events are the most "in".

In [20]:
(bar-c3-categorical 
 (list 1 2 3 4 5 6 7 8)
 (take (drop (vector->list smpls) 8) 8)
 (list "dog4" "dog3" "dog13" "person3" "person13" "theft1" "gift2" "subPropertyOf")
 #:xlabel "object"
 #:ylabel "mean in-ness")

(c3-data . #hasheq((data . #hasheq((type . bar) (xs . #hasheq((ys1 . xs1))) (columns . ((xs1 dog4 dog3 dog13 person3 person13 theft1 gift2 subPropertyOf) (ys1 0.15757171525798097 0.15756628444364007 0.15756713526614774 0.15800038556396417 0.15801798151283028 0.15964610643379398 0.1596421526379694 0.15537696410151314))))) (axis . #hasheq((x . #hasheq((type . category) (label . object) (tick . #hasheq((rotate . 90))))) (y . #hasheq((label . mean in-ness)))))))

##People ("personne")

Not quite as convincing as dogs, but a pretty clear lead for test people.

In [21]:
(bar-c3-categorical 
 (list 1 2 3 4 5 6 7 8)
 (drop (vector->list smpls) 16)
 (list "dog4" "dog3" "dog13" "person3" "person13" "theft1" "gift2" "subPropertyOf")
 #:xlabel "object"
 #:ylabel "mean in-ness")

(c3-data . #hasheq((data . #hasheq((type . bar) (xs . #hasheq((ys1 . xs1))) (columns . ((xs1 dog4 dog3 dog13 person3 person13 theft1 gift2 subPropertyOf) (ys1 0.09372239953674008 0.0937380702327446 0.09373656470551471 0.07858638390565596 0.07977350928868006 0.05756917045909583 0.05633289552346623 0.04272118117197834))))) (axis . #hasheq((x . #hasheq((type . category) (label . object) (tick . #hasheq((rotate . 90))))) (y . #hasheq((label . mean in-ness)))))))