pyBPL is a package of tools to implement Bayesian Program Learning (BPL) in Python 3 using PyTorch backend. The original BPL implementation was written in MATLAB (see Lake et al. (2015): "Human-level concept learning through probabilistic program induction"). I'm a Ph.D. student with Brenden Lake and I've developed this library for our ongoing modeling work. At the moment, only the forward generative model is complete; inference algorithms are still in the works (contributions welcome!). The library is still experimental and under heavy development.
The key components of this repository are:
- A fully-differentiable implementation of BPL character learning tools including symbolic rendering, spline fitting/evaluation, and model scoring (log-likelihoods).
- A generalized framework for representing concepts and conceptual background knowledge as probabilistic programs. Character concepts are one manifestation of the framework, included here as the preliminary use case.
I am thankful to Maxwell Nye, Mark Goldstein and Tuan-Anh Le for their help developing this library.
This code repository requires Python 3 and PyTorch >= 1.0.0. A full list of requirements can be found in requirements.txt
.
To install, first run the following command to clone the repository into a folder of your choice:
git clone https://github.com/rfeinman/pyBPL.git
Then, run the following command to install the package:
python setup.py install
In order to generate the documentation site for the pyBPL library, execute the following commands from the root folder:
cd docs/
make html
HELP WANTED: documentation build is broken right now, needs to be fixed.
The following code loads the BPL model with pre-defined hyperparameters and samples a token
from pybpl.library import Library
from pybpl.model import CharacterModel
# load the hyperparameters of the BPL graphical model (i.e. the "library")
lib = Library(use_hist=True)
# create the BPL graphical model
model = CharacterModel(lib)
# sample a character type from the prior P(Type) and score its log-probability
char_type = model.sample_type()
ll_type = model.score_type(char_type)
# sample a character token from P(Token | Type=type) and score its log-probability
char_token = model.sample_token(char_type)
ll_token_given_type = model.score_token(char_type, char_token)
# sample an image from P(Image | Token=token)
image = model.sample_image(char_token)
ll_image_given_token = model.score_image(char_token, image)
All functions required to sample character types, tokens and images are now
complete.
Currently, independent relations sample their position from a uniform distribution over the entire image window by default.
To use the original spatial histogram from BPL, make sure to load the Library object with use_hist=True
.
Note, however, that log-likelihoods for spatial histograms are not differentiable.
My Python implementations of the bottum-up image parsing algorithms are not yet complete (HELP WANTED! see pybpl/bottomup
for current status).
However, I have provided some wrapper functions that call the original matlab code using the MATLAB Engine API for Python.
These functions are located in pybpl/matlab/bottomup
.
You must have the MATLAB bindings installed to use this code.
The library contains all of the parameters of the character learning BPL
model. These parameters have been learned from the Omniglot dataset.
The library data is stored as a
series of .mat
files in the subfolder lib_data/
.
I've included a Matlab script, process_library.m
, which can be
run inside the original BPL repository to
obtain this folder of files. For an example of how to load the library, see
examples/generate_character.py
.
Currently there are 3 working demos, both found in the examples
subfolder.
You can generate a character type and sample a few tokens of the type by running the following command from the root folder:
python examples/generate_character.py
The script will sample a character type from the prior and then sample 4 tokens of the type, displaying the images.
You can generate a character type and then optimize its parameters to maximize the likelihood of the type under the prior by running the following command from the root folder:
python examples/optimize_type.py
Optionally, you may add the integer parameter --ns=<int>
to specify how many strokes you would like the generated character type to have.
To use the bottom-up parsing code, you must meet the following prerequisites:
- You must have an active MATLAB installation and must have installed the MATLAB Engine API for Python.
- You must download the BPL matlab repository and all of its prerequisites, including the Lightspeed toolbox. The BPL repo must be added to your matlab path (alternatively, you may set a BPL_PATH environment variable as
export BPL_PATH="/path/to/BPL"
).
With these prerequisites met, you can produce bottom-up parses using the skeleton extraction + random walks algorithm with the following example script:
python examples/parse_image.py
If you use pyBPL for your research, you are encouraged (though not required) to cite this repository with the following BibTeX reference:
@misc{feinman2020pybpl,
title={{pyBPL}},
author={Feinman, Reuben},
year={2020},
version={0.1},
url={https://github.com/rfeinman/pyBPL}
}