# How to marry a star: probabilistic constraints for meaning in context

## Katrin Erk and Aurelie Herbelot, August 2020

## WebPPL experiments for the paper

# Version 1 of the generative story: global constraints only

## Version 1 step 1: Drawing a collection of high-level frames

This experiment illustrates how the probability of drawing on one versus two scenarios within a single situation description depends on the Dirichlet concentration parameter alpha. 

We use the output of 

`webppl version1.wppl > version1out.txt`

This script has two scenarios only:
* Scenario 0: gothic. Concepts: Bat/animal, Vampire
* Scenario 1: baseball. Concepts: Bat/stick, Player


This particular output was produced with the following output:
* 2000 samples
* concentration parameter alpha = 0.5
* each SD has to have between 1 and 4 referents. 

In [1]:
# Reading in the data

import wpplout
samples = wpplout.parse_webppl_groupformat("wppl/version1out.txt")


ImportError: No module named 'wpplout'

How often did we get 1, 2, 3, or 4 referents? This is about equal, which makes sense, since we drew the number of referents from a uniform distribution:

In [None]:
wpplout.webppl_groupformat_referentspd(samples)

How often did we get only tokens of scenario 1, only tokens of scenario 2, tokens of both?

In [None]:
wpplout.webppl_groupformat_scenariopd(samples)

Now both together: How often did we get samples with 1, 2, 3, 4 referents, and what scenarios did they have?

In [None]:
wpplout.webppl_groupformat_scenarios_referentspd(samples)

### Now with alpha = 0.1

To re-run this, edit `version1.wppl` to set alpha to 0.1.

`webppl version1.wppl > version1out_alpha01.txt`


In [None]:
samples01 = wpplout.parse_webppl_groupformat("wppl/version1out_alpha01.txt")
wpplout.webppl_groupformat_scenariopd(samples01)

## Version 1 step 2: concepts sampled from the scenarios


We use the exact same sample of 2000 situation descriptions from above. What kinds of sampled concepts are we seeing?

All concepts have been set to be to be equally likely within a scenario.

In [None]:
import pandas as pd

pd.set_option('display.max_colwidth', 1000)

wpplout.webppl_groupformat_concepts_referentspd(samples).head(20)

## Version 1 step 3: Feature vectors sampled from the concepts


We use the exact same sample of 2000 situation descriptions from above. What kinds of sampled feature vectors are we seeing? We also show the concepts.

Probabilities of truth are as follows. For "bat", we have used the truth probabilities from the Quantified McRae norms of Herbelot and Vecchi wherever possible.

* Bat-animal:
  * "bat" 1.0, "vampire" 0.01, "player" 0.01, "have_wings" 1.0 (McRae), "fly" 1.0 (McRae), "humanlike" 0.0, "athletic" 0.01, "wooden" 0.001
* Vampire:
  * "bat" 0.01, "vampire" 1.0, "player" 0.01, "have_wings" 0.2, "fly" 0.2, "humanlike" 0.99, "athletic" 0.1, "wooden" 0.001
* Bat-stick
  * "bat" 1.0, "vampire" 0.0, "player" 0.0, "have_wings" 0.01, "fly" 0.01, "humanlike" 0.0, "athletic" 0.0, "wooden" 0.75 (QMcRae)
* Player 
  * "bat" 0.0, "vampire" 0.001, "player" 1.0, "have_wings" 0.01, "fly" 0.01, "humanlike" 0.99, "athletic" 0.8, "wooden" 0.001


In [None]:
df = wpplout.webppl_groupformat_fvectors_pd(samples)
df.sort_values('prob',ascending=False).head(20)


## Version 1 DRS conditions

We use the exact same sample of 2000 situation descriptions from above. What conditions are being generated? We also show the concepts.

Probabilities of truth are as follows. For bat, we have used the salience probabilities from the McRae norms wherever possible.

"vampire", "player", "have_wings", "fly", "humanlike", "athletic", "wooden"
* Bat-animal
  * bat: 1.0, vampire: 0.01, player: 0.0, have_wings: 0.867 (McRae), fly: 0.633 (McRae), humanlike: 0.0, athletic: 0.0, wooden: 0.0
* Vampire
  * bat 0.0, vampire 1.0, player 0.0, have_wings 0.1, fly 0.1, humanlike 0.1, athletic 0.0, wooden 0.0
* Bat-stick
  * bat 1.0, vampire 0.0, player 0.0, have_wings 0.0, fly 0.0, humanlike 0.0, athletic 0.0, wooden 0.733 (McRae)
* Player
  * bat 0.0, vampire 0.0, player 1.0, have_wings 0.0, fly 0.0, humanlike 0.0, athletic 0.6, wooden 0.0
 

In [None]:
wpplout.webppl_groupformat_conditions_pd(samples).sort_values("prob", 
                                                              ascending = False).head(20)

## Version 1, with utterance

We now want to show what happens when reasoning starts from a given utterance. 

Everything up to now was an analysis of the same set of 2000 samples. But for the with-utterance case, we need to get a new set of samples.

* Referents: again between 1 and 4, including the ones in the utterance
* parameters: all the same as above

A version 1 of the generative story does not have semantic roles, our utterances are very simplistic. One is "vampire", the other is the empty utterance.

### "Vampire"

The first utterance is "vampire". What we want to show with that is that when you have an utterance that comes from scenario "Gothic", you are more likely to sample the rest of the utterance from "Gothic" rather than from "Baseball". That is, we want to show that our model tends to stick with the same scenario when it generates additional entities.

call:

`` webppl version1_utterance.wppl > vampireout.txt``


In [None]:
vampire_samples = wpplout.parse_webppl_groupformat("wppl/vampireout.txt")


What does the scenario distribution look like now?

In [None]:
wpplout.webppl_groupformat_scenariopd(vampire_samples)

Let's visualize this.

In [None]:
df_scenarios = wpplout.webppl_groupformat_scenariopd(samples)
df_scenarios_vampire = wpplout.webppl_groupformat_scenariopd(vampire_samples)

# combine the two data frames for visualization
viz_df = df_scenarios.merge(df_scenarios_vampire, 
                             left_on = "scenarios", 
                             right_on = "scenarios", 
                             how = "left").fillna(0)

viz_df = viz_df.set_index("scenarios")
viz_df.columns = ["--", "vampire"]

# and plot
%matplotlib inline

ax = viz_df.plot.bar(title = "Scenario percentages with no utterance vs. utterance 'vampire'",
               rot = 0)
ax.set_xlabel("");


### Utterance "bat" versus "vampire, bat"

Reminder: In version 1, we can only utter unconnected lists of words. The question with the utterance "vampire, bat" is: How often is this understood as bat-animal, how often as bat-stick? We compare this to an utterance that is simply "bat", where we expect to get each reading of "bat" equally often. 

What this example will hopefully show is that high-level constraints can disambiguate words, like in "the player ran to the ball" versus "the violinist ran to the ball". 

All parameters as with the utterance "vampire".

Making samples for the utterance "bat":

``webppl version1_utterance.wppl > batout.txt``

Making sample for the utterance "vampire, bat":

``webppl version1_utterance.wppl > vampirebatout.txt``



In [None]:
# utterance "bat"
bat_samples = wpplout.parse_webppl_groupformat("wppl/batout.txt")
wpplout.webppl_probbats(bat_samples)

In [None]:
# utterance "vampire, bat"
vampirebat_samples = wpplout.parse_webppl_groupformat("wppl/vampirebatout.txt")
wpplout.webppl_probbats(vampirebat_samples)

In [None]:
# and visualizing the whole thing
df_bat = wpplout.webppl_probbats(bat_samples)
df_vampirebat = wpplout.webppl_probbats(vampirebat_samples)

# combine the two data frames for visualization
viz_df = df_bat.merge(df_vampirebat, 
                             left_on = "Concept", 
                             right_on = "Concept", 
                             how = "left").fillna(0)

viz_df = viz_df.set_index("Concept")
viz_df.columns = ["bat", "vampire, bat"]


# and plot
%matplotlib inline

ax = viz_df.plot.bar(title = "Probability of 'bat' senses for the utterances 'bat', and 'vampire, bat'",
               rot = 0)
ax.set_xlabel("");

So when the utterance is "vampire, bat", the "animal" reading is preferred, but this is only a soft preference, no hard constraint. (If alpha were smaller, we would get a higher count of animals.)

# Version 2: With selectional constraints

Version 2 of the generative model has semantic roles with selectional constraints. 

## Hard constraints

We can model hard selectional constraints, for example in order to say that ideas cannot sleep (using Chomsky's famous sentence). Here is the result of 2000 samples of situation descriptions each containing a Bat, an Idea, and a Sleeping event. The Sleeping event is set to have Agent as a mandatory role, and insists on the agent being animate. 

In 100% of the samples, we get that the Agent of the Sleeping event is the bat, not the idea. 

Parameters:
* 2000 samples
* scenario: gothic
* concepts: Sleep, Bat-animal, Idea
* features: bat, sleep, idea, animate
* truth probabilities for Bat: animate 1.0, bat 1.0, idea 0.0
* truth probabilities for Idea: animate 0.0, bat 0.0, idea 1.0
* realization probability of Agent for Sleep: 1.0
* selectional probabilities: probability of generating a (single-feature) selectional vector for Sleep/Agent: animate: 1.0 

Call:

```webppl role_sleep.wppl > rolesleepout.txt```

Parsing the output:

In [2]:
wpplout.parse_webppl_groupformat_eventspd("wppl/rolesleepout.txt")

NameError: name 'wpplout' is not defined

## Soft constraints

Next we demonstrate soft selectional preferences: The verb *eat* prefers its direct object to be edible, though it can also be used on non-edible objects sometimes. We have a baseball scenario containing a bat-stick, an apple, and an eating event. (We omit to specify the agent here, but presumably that would be a player.)

Parameters:
* 2000 samples
* scenario: baseball
* concepts: Apple, Bat-stick, Eat
* features: apple, bat, eat, edible, object
* truth probabilities for Apple: apple 1.0, bat 0.0, eat 0.0, edible 0.98, object 1.0
* truth probabilities for Bat-stick: apple 0.0, bat 1.0, eat 0.0, edible 0.01, object 1.0
* realization probability of Patient for Eat: 1.0 (mandatory)
* selectional probabilities: probability of generating a (single-feature) selectional vector for Eat/Patient: "edible" 1.0, "object" 0.05. (sometimes non-edible objects get eaten.)

Call:

```webppl role_apple_bat.wppl > roleapplebatout.txt```

Results:

In [None]:
wpplout.parse_webppl_groupformat_eventspd("wppl/roleapplebatout.txt")

# Version 3: concept combination

Version 3 of the generative model has head-modifier combination.

## Fanged bat-animal

The first thing we try is to compare features sampled for "bat" to features sampled for "fanged bat".
In this experiment, we hard-code in the program that we sample from a concept combination of "fanged" with "bat". 

For this experiment, parameters were set as follows:

* no Dirichlet, as we're currently doing a single concept only, so no scenarios
* 2000 samples 
* truth probabilities for "bat": from Quantified McRae. 
* truth probabilities for "fanged": 1.0 for "fanged", 0.999 for "animal", 0.01 for everything else
* salience probabilities for "bat": relative mention frequencies from McRae.
* salience probabilities for "fanged" :1.0 for "fanged", 0.0 for everything else
* voting importance probabilities: set automatically based on truth probabilities. Whenever truth probability was within threshold 0.001 of either 1.0 or 0.0, voting importance was set to 1.0. Otherwise, voting importance was set to be equal to salience probability. 

We cut down the number of features to make it possible to show all the numbers in a graph. These are the ones we used:

    "bat", "has_wings", "nocturnal", "furry", "animal", "screeches", "has_fangs", "associated_with_vampires"

Output with "bat" on its own:

``webppl bat.wppl > ccomb_batout.txt``

Output for the concept combination "fanged bat":

``webppl fangedbat.wppl > ccomb_fangedbatout.txt``


In [None]:
# Empirical feature probabilities 
# in the samples of "bat" on its own
ccbat_samples = wpplout.parse_webppl_groupformat("wppl/ccomb_batout.txt")
wpplout.webppl_groupformat_featureprob(ccbat_samples)

In [None]:
# Empirical feature probabilities 
# in the samples of "fanged bat"
ccfangedbat_samples = wpplout.parse_webppl_groupformat("wppl/ccomb_fangedbatout.txt")
wpplout.webppl_groupformat_featureprob(ccfangedbat_samples)


And graphically:

In [None]:
df_ccbat = wpplout.webppl_groupformat_featureprob(ccbat_samples)
df_ccfanged = wpplout.webppl_groupformat_featureprob(ccfangedbat_samples)


# combine the two data frames for visualization
viz_df = df_ccbat.merge(df_ccfanged, 
                             left_on = "feature", 
                             right_on = "feature", 
                             how = "left").fillna(0)

viz_df = viz_df.set_index("feature")
viz_df.columns = ["bat", "fanged bat"]


# and plot
%matplotlib inline

ax = viz_df.plot.bar(title = "Empirical feature probabilities for Bat-animal, Fanged + Bat-animal'", 
                     rot = 0, figsize = (14, 5))
ax.legend(loc='upper left')
ax.set_xlabel("");

## Pet fish

We do the exact same thing with Hampton's "pet fish" example.

Parameters:

* single scenario, "Animals"
* concepts: Pet, Fish
* 2000 samples
* truth probabilities for Pet:
"pet" 1.0, "fish" 0.1, "lives_in_home" 1.0, "eaten" 0.001, "gills_rather_than_lungs" 0.1, "legs_rather_than_fins" 0.9
* truth probabilities for Fish:
"pet" 0.1, "fish" 1.0, "lives_in_home" 0.1, "eaten" 0.9, "gills" 1.0, "legs" 0.0
* salience probabilities for Pet:
"pet" 1.0, "fish" 0.0, "lives_in_home" 0.8, "eaten" 0.2, "gills" 0.1, "legs" 0.1
* salience probabilities for Fish:
"pet" 0.01, "fish" 1.0, "lives_in_home" 0.1, "eaten" 0.5, "gills" 0.5, "legs" 0.5
* importance probability computed automatically as before, threshold 0.001

We are using the same script for all three outputs, just commenting away different options for which concepts are used.

Sampling pets:

``webppl concept_combination_petfish.wppl > ccomb_pet.txt``

Sampling fish:

``webppl concept_combination_petfish.wppl > ccomb_fish.txt``

Sampling pet fish:

``webppl concept_combination_petfish.wppl > ccomb_petfish.txt``

Here is a visualization:

In [3]:
ccpet_samples = wpplout.parse_webppl_groupformat("wppl/ccomb_pet.txt")
ccfish_samples = wpplout.parse_webppl_groupformat("wppl/ccomb_fish.txt")
ccpetfish_samples = wpplout.parse_webppl_groupformat("wppl/ccomb_petfish.txt")

df_ccpet = wpplout.webppl_groupformat_featureprob(ccpet_samples)
df_ccfish = wpplout.webppl_groupformat_featureprob(ccfish_samples)
df_ccpetfish = wpplout.webppl_groupformat_featureprob(ccpetfish_samples)


# combine the data frames for visualization
viz_df = df_ccpet.merge(df_ccfish, 
                             left_on = "feature", 
                             right_on = "feature", 
                             how = "left").fillna(0)
viz_df = viz_df.merge(df_ccpetfish, 
                             left_on = "feature", 
                             right_on = "feature", 
                             how = "left").fillna(0)
viz_df = viz_df.set_index("feature")
viz_df.columns = ["pet", "fish", "pet fish"]


# and plot
ax = viz_df.plot.bar(title = "Empirical feature probabilities for Pet, Fish, Pet+Fish", 
                     rot = 0)
ax.legend(loc='center left')
ax.set_xlabel("");

NameError: name 'wpplout' is not defined

# Final experiment: The astronomer married the star

parameters:
* 2000 samples
* We vary the concentration parameter: **alpha = 0.5, 0.1, 0.05**
* scenarios: Stargazing, Theatre
* concepts: Astronomer, Star-sun, Star-person, Marry
* features: "astronomer", "marry", "star", "person", "object"
* truth probabilities for Astronomer: astronomer 1.0, marry 0.0, star 0.0, person 1.0, object 1.0
* truth probabilities for Star-sun: astronomer 0.0, marry 0.0, star 1.0, person 0.0, object 1.0
* truth probabilities for Star-person: astronomer 0.0, marry 0.0, star 1.0, person 1.0, object 1.0
* role realization probabilities for Marry: Agent 1.0, Patient 1.0 (both mandatory, for simplicity)
* selectional probabilities for Marry/Agent:  person 1.0, object 0.1
* selectional probabilities for Marry/Patient: person 1.0, object 0.1
* Utterance: ```exists x, y, e. astronomer(x) & marry(e) & star(y) &  Agent(e, x) &  Patient(e, y)```

Data generation:

alpha = 0.5: 
```webppl astronomer_married_star.wppl > astronomer_out_05.txt```
alpha = 0.1:
```webppl astronomer_married_star.wppl > astronomer_out_01.txt```

alpha = 0.05:
```webppl astronomer_married_star.wppl > astronomer_out_005.txt```

Result:

In [None]:
def analyze_stars(samples):
    prob_star = { }
    for sample in samples:
        prob, groups, roles = sample
        for group in groups.values():
            features = group[3]
            if "star" in features:
                conc = "/".join(sorted(group[2]))
                prob_star[ conc] = prob_star.get(conc, 0) + prob
    df = pd.DataFrame(prob_star.items())
    df.columns = ["concept", "prob"]
    return df
      
samples_a05 = analyze_stars(wpplout.parse_webppl_groupformat("wppl/astronomer_out_05.txt"))
samples_a01 = analyze_stars(wpplout.parse_webppl_groupformat("wppl/astronomer_out_01.txt"))
samples_a005 = analyze_stars(wpplout.parse_webppl_groupformat("wppl/astronomer_out_005.txt"))

Percentages for alpha = 0.5:

In [None]:
samples_a05

Percentages for alpha = 0.1:

In [None]:
samples_a01

Percentages for alpha = 0.05:

In [None]:
samples_a005

Graphically:

In [None]:
viz_df = samples_a05
viz_df = viz_df.merge(samples_a01, on = "concept")
viz_df = viz_df.merge(samples_a005, on = "concept")
viz_df = viz_df.set_index("concept")
viz_df.columns = ["0.5", "0.1", "0.05"]

ax = viz_df.loc["Star-sun"].plot.bar(rot = 0,
                                title= "Percentage Star-sun objects of 'marry' for different alpha values")
ax.set_xlabel("alpha");

# The vampire eats...

Parameters:

* utterance: exists x, e.vampire(x) & eat(e) & Agent(e, x)
* single scenario, Gothic
* concepts: "Vampire", "Bat-animal", "Blood_orange", "Castle", "Eat"
* features: "vampire", "bat", "blood_orange", "castle", "eat",
		    "animal_ish", "object", "edible", "building"
* maximal number of referents: 6
* realization probabilities for Eat: Agent: 1.0, Patient 0.8, Location 0.4
* selectional probabilities: 
  * Eat/Agent: animal_ish 1.0, object 0.1
  * Eat/Patient: edible 1.0, object 0.1
  * Eat/Location: building 1.0, object 0.05
  
Call: 
``webppl vampire_eats.wppl > vampire_eats_out.txt``


In [None]:
ve_samples = wpplout.parse_webppl_groupformat("wppl/vampire_eats_out.txt")
pd.set_option('display.max_colwidth', 1000)

wpplout.webppl_groupformat_drspd(ve_samples).sort_values("probability", 
                                                        ascending=False).head(20)


In [None]:
# details: 
# how often do we have a patient, how often do we have a location?
# how often does the vampire eat a blood orange, a bat, another vampire

def vampire_eats_showdetails(ve_samples):
    prob_patient = 0
    prob_location = 0
    prob_agentonly = 0
    prob_patient_is = { }
    prob_location_is = { }

    for sample in ve_samples:
        prob, groups, roles = sample
        if any(role[0] == "Patient" for role in roles):
            prob_patient += prob
        if any(role[0] == "Location" for role in roles):
            prob_location += prob
        if len(roles) == 1:
            prob_agentonly += prob
    
        patients = set("/".join(groups[role[2]][2]) for role in roles if role[0] == "Patient")
        for patient in patients:
            prob_patient_is[patient] = prob_patient_is.get(patient, 0) + prob
        locations = set("/".join(groups[role[2]][2]) for role in roles if role[0] == "Location")
        for location in locations:
            prob_location_is[location] = prob_location_is.get(location, 0) + prob
    
     
                

    print("Prob. of having a patient:", round(prob_patient, 3))
    print("Prob. of having a location:", round(prob_location, 3))
    print("Prob. of only having an agent:", round(prob_agentonly, 3))
    print("Summed prob. of situation descriptions with particular patients:")
    for patient, prob in prob_patient_is.items():
        print("\t", patient, ":", round(prob, 3))
    print("Summed prob. of situation descriptions with particular locations:")
    for location, prob in prob_location_is.items():
        print("\t", location, ":", round(prob, 3))
        
vampire_eats_showdetails(ve_samples)

The probability for realizing a patient is 0.8, but only 71% of situation descriptions have patients. Why is this? This is because finding a role filler can fail -- there may be no entity in the situation description that can fill the role. 