::: {.content-hidden}

Note! This document in is partly in `quarto`-flavored markdown and can be used with the `quarto` package to generate a rendered version overview. For this reason you will see occasional raw cells and things like the first line of this cell that will render oddly as a plain jupyter notebook.

# Lambda notebook demo (for linguists)

Author: **Kyle Rawlins, [kgr@jhu.edu](kgr@jhu.edu)**

**Current version**: v2.0, 2024-08-26

This notebook provides a demo of the core capabilities of the lambda notebook, aimed at linguists who already have training in semantics (but not necessarily implemented semantics).

Last updated Aug 2024. Version history:

 * 0.5: first version
 * 0.6: updated to work with refactored class hierarchy (Apr 2013)
 * 0.6.1: small fixes to adapt to changes in various places (Sep 2013)
 * 0.7: various fixes to work with alpha release (Jan 2014)
 * 0.9: substantial updates, merge content from LSA poster (Apr 2014)
 * 0.95: substantial updates for a series of demos in Apr-May 2014
 * 1.0: various changes / fixes, more stand-alone text (2017)
 * 1.0.1: small tweaks (Nov/Dec 2018)
 * 1.0.2: bugfixes
 * 1.1: updates towards quarto refactoring
 * 2.0: quarto refactor
 
If you are viewing this demo interactively, you can run through code cells via shift-enter, or by the run button in the toolbar.

:::

In [None]:
#| echo: false
import lamb.auto
from lamb.types import TypeMismatch, type_e, type_t, type_property
from lamb.meta import TypedTerm, TypedExpr, LFun

In [None]:
#| echo: false
# Interactive note: there are several ways of display outputs, this one is the default:
lamb.display.default(style=lamb.display.DerivStyle.BOXES)
# you can also try:
# lamb.display.default(style=lamb.display.DerivStyle.PROOF)

[Download this notebook](https://github.com/rawlins/lambda-notebook/blob/master/docs/demo-for-linguists.ipynb) for interactive use.

# First pitch to a theoretical linguist

Have you ever wanted to type something like this in, and have it actually do something?

In [None]:
%%lamb
||every|| = λ f_<e,t> : λ g_<e,t> : Forall x_e : f(x) >> g(x)
||cat|| = L x_e : Cat_<e,t>(x)
||meowed|| = L x_e : Meowed_<e,t>(x)

In [None]:
r = ((every * cat) * meowed)
r

In [None]:
r.tree()

The lambda notebook project aims to make this happen. It provides:

1. flexible typed metalanguage: a superset of first-order predicate logic, together with a rich type theory.
2. a framework for doing semantic composition on natural language syntax
3. an api for rendering the structured python objects in layers 1-2 in easy-to-read MathJax (web-oriented LaTeX code) and HTML.

# Background: Two problems in formal semantics

1. Type-driven computation could be a lot easier to visualize and check.

2. Grammar fragments as in Montague Grammar: good idea in principle, hard to use in practice.

  * A **fragment** is a *complete* formalization of *sublanguage* consisting of the *key relevant phenomena* for the problem at hand.  (Potential problem-points italicized.)

Solution: a system for developing interactive fragments: the "*Lambda Notebook*" project.

* Creator can work interactively with analysis -- accelerate development, limit time spent on tedious details.
* Reader can explore derivations in ways that are not typically possible in typical paper format.
* Creator and reader can be certain that derivations work, verified by the system.
* Bring closer together formal semantics and computational modeling.

Inspirations and related projects:

 * @vanEijckUnger10: implementation of compositional semantics in Haskell.  No interface (beyond standard Haskell terminal); great if you like Haskell. There's a rich tradition of programmer linguists writing fragments in Haskell following this book.
 * The [Lambda calculator](http://lambdacalculator.com/) [originally, the UPenn Lambda Calculator, @PennLambda]: teaching oriented implementation of lambda calculus in Java.
 * [`nltk.sem`](https://www.nltk.org/api/nltk.sem.html): implementation of the lambda calculus with a typed metalanguage, interface with theorem provers.  No interactive interface.

## The role of formalism & fragments in semantics

What does *formal* mean in semantics?  What properties should a theory have?

 1. Mathematically precise (lambda calculus, type theory, logic, model theory(?), ...)
 2. Complete (covers "all" the relevant data).
 3. Predictive (like any scientific theory).
 4. Consistent, or at least compatible (with itself, analyses of other phenomena, some unifying conception of the grammar).
 
The *method of fragments* [@Partee79; @ParteeHendriks97] provides a structure for meeting these criteria.

 * Paper with a fragment provides a working system.  (Probably.)
 * Explicit outer bound for empirical coverage.
 * Integration with a particular theory of grammar.  (To some extent.)
 * Explicit answer to many detailed questions not necessarily dealt with in the text.
 
**Claim**: fragments are a method of replicability, similar to a computational modeller providing their model.

 * To be clear, a fragment is neither necessary nor sufficient for having a good theory / analysis / paper...

Additional benefit: useful internal check for researcher.

> "...But I feel strongly that it is important to try to [work with fully explicit fragments] periodically, because otherwise it is extremely easy to think that you have a solution to a problem when in fact you don't." (Partee 1979, p. 41)

## The challenges of fragments

Part 1 of the above quote:

>"It can be very frustrating to try to specify frameworks and fragments explicitly; this project has not been entirely rewarding.  I would not recommend that one always work with the constraint of full explicitness." (Ibid.)

 * Fragments can be tedious and time-consuming to write (not to mention hard).
 * Fragments as traditionally written are in practice not easy for a reader to use.
 
   - Dense/unapproachable.  With exactness can come a huge chunk of hard-to-digest formalism.  E.g. @Partee79, the fragment is about 10% of the paper.
   - Monolithic/non-modular.  For the specified sublanguage, everything is specified.  Outside the bounds of the sublanguage, nothing is specified.  How does the theory fit in with others?
   - Exact opposite of the modern method -- researchers typically hold most aspects of the grammar constant (implicitly) while changing a few key points.  [See e.g., the introduction to @PortnerPartee02 for discussion.]

**Summary:** In practice, the typical payoff for neither the reader nor the writer of a fragment exceeded the effort.


# A solution: digital fragments

@vanEijckUnger10's solution: we can (and perhaps should) specify a fragment in digital form.

* They use Haskell.  Type system of Haskell extremely well-suited to natural language semantics.
* (Provocative statement) Interface, learning curve of Haskell not well suited to semanticists (or most people)? At a minimum: reading code is not always easy.

**Benefits of digital fragments (in principle)**

* Interactive.
* Easy to distribute, adapt, modify.
* Possibility of modularity.  (E.g. abstract a 'library' for compositional systems away from the analysis of a particular phenomenon.)
* Bring closer together the CogSci idea of a 'computational model' to the project of natural language semantics.

**What sorts of things might we want in a fragment / system for fragments?**

* Typed lambda calculus.
* Logic / logical metalanguage.
* Model theory.
* Framework for semantic composition.

The Lambda Notebook project aims to provide these tools in a usable, interactive, format, built on type of Python.

## Part 1: flexible typed metalanguage

The **metalanguage** infrastructure is a set of python classes that implement the building blocks of logical expressions, lambda terms, and various other formal objects, as well as complex formulas built from these pieces. This rests on an implementation of a framework for **type systems** that matches what semanticists tend to assume.

Preface cell with `%%lamb` to enter metalanguage formulas directly.  The following cell defines a variable `x` that has type e, and exports it to the notebook's environment.

In [None]:
%%lamb reset
x = x_e

In [None]:
x.type

This next cell defines some variables whose values are more complex object -- in fact, functions in the typed lambda calculus.

In [None]:
%%lamb
test1 = L p_t : L x_e : P_<e,t>(x) & p # based on a Partee et al example
test1b = L x_e : P_<e,t>(x) & Q_<e,t>(x)
t2 = Q_<e,t>(x_e)

These are now registered as variables in the python namespace and can be manipulated directly.  A typed lambda calculus is fully implemented with all that that entails -- e.g. the value of `test1` includes the whole syntactic structure of the formula, its type, etc. and can be used in constructing new formulas.  The following cells build a complex function-argument formula, and following that, does the reduction.

Notice that beta reduction works properly, i.e. bound $x$ in the function is renamed in order to avoid collision with the free `x` in the argument. [`test1b` is based on an example illustrating alpha conversion from @PMW93.]

In [None]:
test1(t2) # construct a complex function-argument expression

In [None]:
test1(t2).reduce() # do reduction on that expression

In [None]:
%%lamb
catf = L x_e: Cat_<e,t>(x)
dogf = λx: Dog_<e,t>(x_e)

In [None]:
display(catf(x), (catf(x)).type)

In [None]:
catf.type

Type checking of course is a part of all this.  If the types don't match, the computation will throw a `TypeMismatch` exception.  The following cell uses python syntax to catch and print such errors.

In [None]:
with lamb.errors():
    test1(x) # function is type <t,<e,t>> so will trigger a type mismatch.

A more complex expression:

In [None]:
%%lamb
p2 = (Cat_<e,t>(x_e) & p_t) >> (Exists y: Dog_<e,t>(y_e))

What is going on behind the scenes?  The objects manipulated are recursively structured python objects (of class `TypedExpr`). Each layer of recursion stores information about the kind of metalanguage expression it is, as well as its parts.

Many straightforward expressions can be parsed.  Most expressions are created using a call to TypedExpr.factory, which is abbreviated as "te" in the following examples.  The `%%lamb` magic is calling this behind the scenes.

In [None]:
x = %te x_e
x

Various convenience python operators are overloaded, including functional calls.  Here is an example repeated from earlier in two forms:

In [None]:
%%lamb
p2 = (Cat_<e,t>(x_e) & p_t) >> (Exists y: Dog_<e,t>(y_e))

In [None]:
p2 = (te("Cat_<e,t>(x)") & te("p_t")) >> te("(Exists y: Dog_<e,t>(y_e))")
p2

Let's examine in detail what happens when a function and argument combine.

In [None]:
catf = meta.LFun(te("x_e"), te("Cat(x_e)"))
catf

In [None]:
catf(te("y_e"))

Building a function-argument expression builds a complex, unreduced expression.  This can be explicitly reduced (note that the `reduce_all()` function would be used to apply reduction  recursively):

In [None]:
catf(te("y_e")).reduce()

In [None]:
(catf(te("y_e")).reduce()).derivation

The metalanguage supports type polymorphism. For example, we can define a function whose input type is a type variable (`X`) and then combine that function with a concrete type, to force type narrowing:

In [None]:
%lamb ttest = L x_X : P_<X,t>(x)
%lamb tvar = y_t
ttest(tvar)

Other operators not illustrated here include set theoretic expressions, tools for building partial functions and denotations, a rich approach to "meta-meta-language", restricted quantifiers, and more.

### Model theory and evaluation

The lambda notebook supports model theory and evaluation of extensional formulas.

In [None]:
m = meta.Model({'A': '_c1',
                'B': '_c2',
                'C': '_c1',
                'P': {'A', 'B'}},
               domain={'_c1', '_c2', '_c3'})
m

In [None]:
m.evaluate(te("Exists x_e : P_<e,t>(x)")).derivation.trace()

In [None]:
m.evaluate(te("Forall x_e : P_<e,t>(x)")).derivation.trace()

## Part 2: composition systems for an object language

On top of the metalanguage are '**composition systems**' for modeling (step-by-step) semantic composition in an object language such as English.  This is the part of the lambda notebook that tracks and manipulates mappings between object language elements (words, trees, etc) and denotations in the metalanguage.  

A composition system at its core consists of a set of composition rules; the following cell defines a simple composition system that will be familiar to anyone who has taken a basic course in compositional semantics. [See among others, @HeimKratzer98; @IFS24]

In [None]:
# none of this is strictly necessary, the built-in library already provides effectively this system.
fa = lang.BinaryCompositionOp("FA", lang.fa_fun, reduce=True)
pm = lang.BinaryCompositionOp("PM", lang.pm_fun, commutative=True, reduce=True)
pa = lang.BinaryCompositionOp("PA", lang.pa_fun, allow_none=True)
demo_hk_system = lang.CompositionSystem(name="demo system", rules=[fa, pm, pa])
lang.set_system(demo_hk_system)
demo_hk_system

Expressing denotations is done in a `%%lamb` cell, and almost always begins with lexical items.  The following cell defines several lexical items that will be familiar from introductory exercises in the @HeimKratzer98 textbook.

In [None]:
%%lamb
||cat|| = L x_e: Cat_<e,t>(x)
||gray|| = L x_e: Gray_<e,t>(x)
||john|| = John_e
||julius|| = Julius_e
||inP|| = L x_e : L y_e : In_<(e,e),t>(y, x) # `in` is a reserved word in python
||texas|| = Texas_e
||isV|| = L p_<e,t> : p # `is` is a reserved word in python

All object-language representations implement the interface `lamb.lang.Composable`, including lexical items as well as the complex results shown below. In type-driven mode, composition is triggered by using the '`*`' operator on a `Composable`.  This searches over the available composition operations in the system to see if any results can be had.  Given their types, we expect `inP` and `texas` above to be able to compose using the FA rule:

In [None]:
inP * texas

On the other hand `isV` is looking for a property, so we shouldn't expect succesful composition with a type `e` element.

In [None]:
julius * isV # will fail due to type mismatches

Composition results are `Composable`s as well, and so can be further composed:

In [None]:
sentence1 = julius * (isV * (inP * texas))
display(sentence1[0].source_tree())
sentence1

In [None]:
sentence1.trace()

Composition will find all possible paths, but in the current example there are no ambiguities. (Note: the metalanguage by default normalizes the order of conjuncts alphabetically, so the order in the output of PM is independent of what composes with what. This is why the operation is marked "commutative" when defined earlier -- so the composition system knows it doesn't need to bother with both orders.)

In [None]:
gray * cat

In [None]:
gray * (cat * (inP * texas))

In [None]:
a = lang.Item("a", isV.content) # identity function for copula as well
isV * (a * (gray * cat * (inP * texas)))

In [None]:
np = ((gray * cat) * (inP * texas))
vp = (isV * (a * np))
sentence2 = julius * vp
sentence2

In [None]:
sentence1.results[0]

In [None]:
#| column: screen-inset-right
sentence1.tree()

In [None]:
#| column: screen-inset-right
sentence2.tree()

Here's a well-known example exercise from @HeimKratzer98 (names different):

    (1) Julius is a gray cat in Texas fond of John.
 
Calculating the denotation of this is relatively straightforward, as long as the lexical items and order of composition are correct.

In [None]:
fond = lang.Item("fond", "L x_e : L y_e : Fond(y)(x)")
ofP = lang.Item("of", "L x_e : x")
sentence3 = julius * (isV * (a * (((gray * cat) * (inP * texas)) * (fond * (ofP * john)))))
display(sentence3[0].source_tree())
sentence3

In [None]:
#| column: screen-inset-right
sentence3.tree()

The `Composite` class supports indexing, so we can pull out subparts of a derivaiton:

In [None]:
#| column: screen-inset-right
parse_tree3 = sentence3.results[0]
parse_tree3[1][1][1].tree()

There is support for traces and indexed pronouns, using a version of the Predicate Abstraction (PA) rule [based on the version in @IFS24].

In [None]:
binder = lang.Binder(23)
binder2 = lang.Binder(5)
t = lang.Trace(23, types.type_e)
t2 = lang.Trace(5)
display(t, t2, binder)

In [None]:
((t * gray))

In [None]:
b1 = (binder * (binder2 * (t * (inP * t2))))
b2 = (binder2 * (binder * (t * (inP * t2))))
display(b1, b2)

In [None]:
b1.trace()

In [None]:
#| column: screen-inset-right
b1.results[0].tree()

### Composition in tree structures

In many contexts it is natural to just use bottom-up composition, especially given the implementation of the PA rule. However, the lambda notebook supports arbtirarily-sequenced composition in tree structures, with deferred composition when missing bottom-up information. This uses `Tree` objects with an interface designed after those in the [nltk package](https://www.nltk.org/). In these derivations, you can see some of the support for inference over variable types. This composition system more directly matches the classic one presented in @HeimKratzer98.

::: {.content-hidden}

**note** when working with this interactively: running the following cell well change the behavior of many of the prior cells, unless the composition system is reset!
:::

In [None]:
lang.set_system(lang.hk_system)

In [None]:
%%lamb
||gray|| = L x_e : Gray_<e,t>(x)
||cat|| = L x_e : Cat_<e,t>(x)

In [None]:
t2 = Tree("S", ["NP", "VP"])
t2

We *can* do composition in this tree fragment, but the leaf nodes (not being in the lexicon) are treated as placeholders. They are accordingly assigned polymorphic types.

* (The type $\forall X$ is a type that can be anything at all; this gets narrowed by each possible composition rule that could apply to a more specific but still potentially polymorphic type. PM in this system narrows to concrete property types, but each order of FA gives a polymorphic function.)

In [None]:
t2 = Tree("S", ["NP", "VP"])
r2 = lang.compose(t2)
r2.tree()
r2.paths()

We can supply lexical entries to form a complete tree fragment. Composition can still happen in any order, e.g. by default it works top down. If we compose one step the leaf node "gray" is not yet looked up; but composing the whole thing is possible.

In [None]:
Tree = lamb.utils.get_tree_class()
t = Tree("NP", ["gray", Tree("N", ["cat"])])
t

In [None]:
t2 = lang.CompositionTree.tree_factory(t)
r = lang.compose(t2)
r

In [None]:
r.tree()

In [None]:
r2 = lang.get_system().expand_all(t2)
r2

In [None]:
r2.tree()

In [None]:
r2.paths()

## Part 3: human readable rendering of structured python objects

At this point in the demo, the output of the lambda notebook needs no introduction: it has been used at every stage of the document. The lambda notebook is designed from the ground up so that nearly every python class supports IPython/Jupyter's rich outputs, rendering in human-readable form to a combination of LaTeX and HTML that can be displayed in Jupyter Notebooks [@Kluyver:2016aa] and on the web. 

* LaTeX math mode output in Jupyter is rendered via [MathJax](https://www.mathjax.org/) (and on google colab, via [KaTeX](https://katex.org/)).

For example, here is an example lambda expression from earlier:

In [None]:
pmw_test1 = %te L p_t : L x_e : P_<e,t>(x) & p
pmw_test1

This object can be recursively converted into a LaTeX math mode expression, and this automatically happens when displaying such objects in Jupyter.

In [None]:
print(pmw_test1._repr_latex_())

More complex outputs, like composition trees, mix LaTeX and HTML output.

# Summary

The lambda notebook provides:

* a flexible api/framework for implementing a logical typed metalanguage
* a framework for implementing compositional systems based on linguistic semantics
* an api for displaying structured objects in a linguist-readable way

It does this in the context of Jupyter notebook, which further provides facilities for interleaving code (of various kinds) and documentation.

**More info**: please see the many [example and documentation notebooks](https://github.com/rawlins/lambda-notebook/tree/master/notebooks) provided with the project.