Skip to content
No description, website, or topics provided.
Clojure
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Homework
Lectures
README.md
_config.yml

README.md

CS492 Probabilistic Programming, Spring 2018, KAIST

This is a webpage of the course "CS492 Probabilistic Programming", which is offered at the KAIST CS department in the spring of 2018. The webpage will contain links to lecture slides and other course-related materials.

Probabilistic programming refers to the idea of developing a programming language for writing and reasoning about probabilistic models from machine learning and statistics. Such a language comes with the implementation of several generic inference algorithms that answer various queries about the models written in the language, such as posterior inference and marginalisation. By providing these algorithms, a probabilistic programming language enables data scientists to focus on designing good models based on their domain knowledge, instead of building effective inference engines for their models, a task that typically requires expertise in machine learning, statistics and systems. Even experts in machine learning and statistics may get benefited from such a probabilistic programming system because using the system they can easily explore highly advanced models.

This course has two goals. The first is to help students to be a good user of an expressive probabilistic programming language. Throughout the course, we will use a particular language, called Anglican, but we will emphasise general principles that apply to a wide range of existing probabilistic programming systems. The second goal is to expose the students to recent exciting results in probabilistic programming, which come from machine learning, statistics, programming languages, and probability theory.

1. Important Announcements

[June 7] Short report on your project by 10 June.

Submit a 1-2 page report on your project by the midnight of 10 June (Sunday). You can email it to me and the TA, or you can put it in the usual homework box in the E3-1 building. Also, if you are interested in putting the slides of your project presentation online in the course webpage, email me your slides. If you are not, you don't have to do this.

[June 4] Final exam on 14 June.

The final exam of this course will take place in our usual classroom from 9:00am to 11:00am on 14 June (Thursday next week). It is a closed-book exam. Make sure that you know how to solve homework exercises.

[May 31] Project presentation on 5 and 7 June.

Each project group will present its work on the 5th or the 7th of June. It will be given 20min for presentation and question/answer. On the 5th, the following four groups will present their work:

  1. Dongkyeun Kim, Dongkwan Kim and Changyoung Koh.
  2. Kwonsoo Chae, Seongmin Lee and HyungJin Lim
  3. Youngkyeong Bae, Kyoungyeon Lee and Sunghoi Lee.
  4. Faycal Baki, Timo Moilanen and Jonas Nikula.

On the 7th, the following three groups will present their results:

  1. Jiseok Kim, Jongwan Lee and Jaeyoung Whang.
  2. Dohyeong Kim, Kangsan Kim and Ohjun Kwon.
  3. Kwanwoo Kim, Youngjo Min and Sungje Moon.

[May 22] Homework4 is out.

The due date is 2:00pm on June 1 2018 (Friday). Submit your solutions by putting them in the homework submission box in the third floor of the E3-1 building.

[April 30] Homework3 is out.

The due date is 2:00pm on May 16 2018 (Wednesday). Submit your solutions by putting them in the homework submission box in the third floor of the E3-1 building.

[April 1] Important information about Homework2.

Changyoung Koh found a problem in one of the problems in Homework 2. He also found a solution. Please have a look at his message that is copied below:

==== Changyoung's message: from here ====

I don't know whether everyone has same issue, but in my case, I encountered an error saying Clojure could not locate the package.

This error can be resolved by modifying project.clj:

  1. Add [clojure-csv/clojure-csv "2.0.1"] line to :dependencies [...]
  2. Run $ lein deps to install missing dependencies.
  3. Re-launch Gorilla REPL to apply updated dependencies.

==== Changyoung's message: to here ====

[April 1] Homework2 is out.

The due date is 2:00pm on April 23 2018 (Monday). Submit your solutions by putting them in the homework submission box in the third floor of the E3-1 building.

[March 24] Reminder of the group project pitch on April 5.

During our lecture time on the 5 of April, we will have a project-pitch session. This means that each of the 8 project groups in Track A will give a 7 minute presentation on what the group plans to do, and get feedback from others in the lecture for about 3 minutes. So, please pick a topic for your group project, and do prepare a presentation. We will use the order in the course webpage to decide the order of presentations:

  1. Dongkyeun Kim, Dongkwan Kim and Jungyeun Moon
  2. Kwonsoo Chae, Seongmin Lee and HyungJin Lim
  3. Youngkyeong Bae, Kyoungyeon Lee and Sunghui Lee.
  4. Faycal Baki, Timo Moilanen and Jonas Nikula.
  5. Changyoung Koh.
  6. Jiseok Kim, Jongwan Lee and Jaeyoung Whang.
  7. Dohyeong Kim, Kangsan Kim and Ohjun Kwon.
  8. Kwanwoo Kim, Youngjo Min and Sungje Moon.

This is a tentative plan. Some students might cancel their registrations for this course in the next few weeks. So, there might be some changes in the plan.

[March 14] TA office hour

Kwonsoo will have office hours from 4:00pm to 6:00pm on Tuesday at the room 3431 in the E3-1 building (starting from March 20).

[March 10] Homework1 is out.

The due date is 2:00pm on March 30 2018 (Friday). Submit your solutions by putting them in the homework submission box in the third floor of the E3-1 building.

[February 28] One of the two subjects in Track B, automatic differentation, is taken by a group of four students.

[February 27] Homework0 is out.

You don't have to submit your answer. But we strongly recommend you to try it. This homework will teach you how to run Anglican.

[February 27] Project group.

The group project is an important part of this course. Find your project partners by March 13 2018, and inform Hongseok and Kwonsoo by email. Each group should consist of 3-4 students. Finally, if your group wants to go for Track B, contact Hongseok as early as possible by email. Both topics in Track B will be heavy in math.

2. Logistics

Evaluation

  • Final exam (40%). Project (40%). Homework (20%).

Teaching Staffs

Place and Time

  • Place: room 111 in the N1 building
  • Time: 10:30am - 11:45am on Tuesday and Thursday from February 27 2018 until June 14 2018.
  • Final exam: 9:00am - 11:00am on June 14 2018 (Thursday) at the room 111 in the N1 building.

Online Discussion

  • We will use KLMS.

3. Homework

Submit your solutions by putting them in the homework submission box in the third floor of the E3-1 building.

  • Homework0 - Don't submit.
  • Homework1 - Deadline: 2:00pm on March 30 2018 (Friday).
  • Homework2 - Deadline: 2:00pm on April 23 2018 (Monday).
  • Homework3 - Deadline: 2:00pm on May 16 2018 (Wednesday).
  • Homework4 - Deadline: 2:00pm on June 1 2018 (Friday).

4. Tentative Plan

  • 02/26 (Tue) - Introduction. Slides. Homework0.
  • 03/01 (Thu) - NO LECTURE. Independence Movement Day.
  • 03/06 (Tue) - Basics of Clojure and Tiny Bit of Anglican. Slides. Gorilla worksheet. Programs.
  • 03/08 (Thu) - Basics of Clojure and Tiny Bit of Anglican.
  • 03/13 (Tue) - Posterior Inference, Basics of Anglican, and Importance Sampling. Slides. Homework1. Gorilla worksheet.
  • 03/15 (Thu) - Posterior Inference, Basics of Anglican, and Importance Sampling.
  • 03/20 (Tue) - Generative Modelling with Anglican. Slides. Gorilla worksheet for 2D physics. Gorilla worksheet for program induction. (Note: In order to run the 2D physics example, you will have to copy bounce.clj to the anglican-user/src directory and replace anglican-user/project.clj by this file).
  • 03/22 (Thu) - Generative Modelling with Anglican.
  • 03/27 (Tue) - Markov Chain Monte Carlo. Slides.
  • 03/29 (Thu) - Markov Chain Monte Carlo.
  • 04/03 (Tue) - Markov Chain Monte Carlo. Homework2.
  • 04/05 (Thu) - Group Project Pitch.
  • 04/10 (Tue) - Implementing Inference Algorithms for Probabilistic Programs. Slides. Note. Handwritten Notes: 1, 2, 3, 4, 5, 6
  • 04/12 (Thu) - Implementing Inference Algorithms for Probabilistic Programs.
  • 04/17 (Tue), 04/19 (Thu) - NO LECTURES. Midterm Exam.
  • 04/24 (Tue) - Implementing Inference Algorithms for Probabilistic Programs. Handwritten Notes: 7, 8, 9, 10
  • 04/26 (Thu) - Implementing Inference Algorithms for Probabilistic Programs.
  • 05/01 (Tue) - Stochastic Variational Inference. Slides. Gorilla worksheet for some variational-inference examples. Handwritten Notes: 1, 2, 3. Homework3.
  • 05/03 (Thu) - Stochastic Variational Inference.
  • 05/08 (Tue) - Amortised Inference. Slides.
  • 05/10 (Thu) - Group Presentation 1: Automatic Differentiation
  • 05/15 (Tue) - Normalising Flow
  • 05/17 (Thu) - Denotational Semantics of Probabilistic Programs. Slides. Note. Homework4.
  • 05/22 (Tue) - NO LECTURE. Buddha's Birthday.
  • 05/24 (Thu) - Denotational Semantics of Probabilistic Programs.
  • 05/29 (Tue) - Denotational Semantics of Probabilistic Programs.
  • 05/31 (Thu) - Denotational Semantics of Probabilistic Programs.
  • 06/05 (Tue) - Student Presentation
  • 06/07 (Thu) - Student Presentation
  • 06/12 (Tue), 06/14 (Thu) - NO LECTURES. Final Exam.

5. Studying Materials

Studying the lecture slides and notes and the homework exercises of the course is likely to be the most time-efficient way to catch up with this course. Also, at each lecture, we will give students pointers to the related papers. If a student does not understand a certain concept, we encourage him or her to look it up in the Internet. We typically do this when we encounter a similar problem. In our case, Wikipedia, lecture notes or survey articles have helped us the most.

The next best option is to read the following draft book on probabilistic programming:

  1. "An Introduction to Probabilistic Programming" by Jan-Willem van de Meent, Brooks Paige, Hongseok Yang and Frank Wood. If other authors allow, Hongseok will distribute the book to the students.

Reading this book will give a broader view on probabilistic programming and much deeper understanding into its inference algorithms and their implementations.

If a student feels that she or he lacks background knowledge on machine learning, we recommend him or her to have a look at the following online materials.

  1. The online book "Probabilistic Programming and Bayesian Methods for Hackers" describes Bayesian Machine Learning using a probabilistic programming system called PyMC. Hongseok found this book easy to follow and good at explaining basics and intuitions.
  2. A more standard reference on machine learning is Bishop's book "Pattern Recognition and Machine Learning".

Two good ways to understand probabilistic programming are to try a wide range of examples and to understand common implementation techniques for probabilistic programming languages. The following documents provide such examples or explain those techniques.

  1. Anglican website. In particular, students will learn a lot by trying examples in the site.
  2. Forestdb.org is a great source of interesting probabilistic programs.
  3. Edward tutorial website and Pyro example website. Edward and Pyro are so called deep probabilistic programming languages that attempt to combine deep learning and probabilistic programming. These web pages contain interesting examples that one can try using these languages.
  4. Goodman and Stuhlmuller's book "The Design and Implementation of Probabilistic Programming Languages". This web-based book describes the implementation of WebPPL, a probabilistic programming language on top of JavaScript. Many techniques in the book are general and apply to other probabilistic programming languages.

6. Group Project

A group project is a crucial part of this course. 3-4 students will form a project group, and they will carry out a project in Track A or in Track B:

  1. Track A: A group develops an interesting application of Anglican or other probabilistic programming languages. The members of the group may attempt to find an efficient encoding of a highly complex probabilistic model (such as sequence memoizer) in Anglican, or they may develop a new probabilistic model for a complex data set and analyse the data set, or they may try to find a novel use of probabilistic programming for solving well-known existing problems (such as figuring out secret key in some security protocol).
  2. Track B: At most two groups will be on this track. We recommend this track only for groups that feel comfortable with advanced mathematics. The goal of a group in this case is to study an advanced research topic on probabilistic programming, to gain deep understanding about it, and to help fellow students acquire the same understanding. Specifically, a group performs an in-depth study on one of two advanced topics, (reverse-mode) automatic differentiation and normalising flow, used in or supported by recent deep probabilistic programming languages such as Edward and Pyro. Then, the group has to teach what it learnt to other students in the course. By teaching, we mean (i) a presentation on the studied topic and (ii) a preparation of reading material and exercise problems. Further information about automatic differentiation and normalising flow is given at the end of this webpage.

Group (Track A)

  1. Dongkyeun Kim, Dongkwan Kim and Changyoung Koh.
  2. Kwonsoo Chae, Seongmin Lee and HyungJin Lim
  3. Youngkyeong Bae, Kyoungyeon Lee and Sunghoi Lee.
  4. Faycal Baki, Timo Moilanen and Jonas Nikula.
  5. Jiseok Kim, Jongwan Lee and Jaeyoung Whang.
  6. Dohyeong Kim, Kangsan Kim and Ohjun Kwon.
  7. Kwanwoo Kim, Youngjo Min and Sungje Moon.

Group (Track B)

  1. Donghoon Ham, Youngkyu Hong, Hangyeol Yu and Sihyeon Yu - automatic differentiation.

Concrete Tasks

  1. [Deadline: midnight on March 13 (Tue)] Form a group and inform us by email (Hongseok and Kwonsoo).
  2. [April 5 (Thu)] Presentation of each group about its topic.
  3. [May 10 (Thu)] Presentation on automatic differentiation by a group on track B.
  4. [May 15 (Tue)] Presentation on normalising flow by a group on track B.
  5. [June 5 (Tue), June 7 (Thu)] Presentation of group projects.
  6. [Deadline: midnight on June 8 (Fri)] Submit a report on the project. We recommend each report to be 2-to-4 pages long, although it is fine to write a longer report if needed.

Two Topics in Track B

  1. (Reverse-mode) Automatic Differentiation Automatic differentiation is one of the main driving technologies behind neural nets and deep probabilistic programming languages. Supporting it has been one of the main objectives of Tensorflow and PyTorch, two popular platforms for building neural nets and other machine-learning related software systems. Automatic differentiation is based on nontrivial mathematics; it originates from a non-standard interpretation of mathematical analysis, and its modern generalisation has sometimes been formalised using tools from differential geometry. Pyro, ProbTorch, Edward and Stan heavily use automatic differentiation. A group will have to understand reverse-mode automatic differentiation and explain how it has been or can be used to implement inference algorithms for probabilistic programming languages. Here are a few references that will help a group to find relevant papers.

    1. Rahul's blog article.
    2. Blog article in Rufflewind's scratchpad.
    3. Baydin et al.'s "Automatic Differentiation in Machine Learning: a Survey".
    4. Conal Elliott's PEPM18 talk slides and video.
    5. Automatic differentiation in PyTorch and Tensorflow. There is a short paper about the implementation in PyTorch.
    6. Autograd aims at implementation a very flexible version of automatic differentation for python.
    7. Diffsharp is another well-known implementation for automatic differentation for the F# language.
  2. Normalising Flow Normalising flow is a powerful technique for building an approximating distribution in variational inference. It has been proposed in the context of solving general machine learning problems, but it has become an important construct, called bijector, in recent probabilistic programming languages Edward and Pyro. Here are a few references that will help a group to start its study on this subject:

    1. Eric Jang's blog articles, first and second.
    2. Dillon et al.'s paper "Tensorflow Distribution" explains the bijector implementation of normalising flow in the probabilistic programming language Edward.
    3. Documentation on bijector and inverse autoregressive flow in Pyro.
    4. Rezende and Mohamed's original paper "Variational Inference with Normalizing Flows".
    5. Kingma et al.'s paper "Improved Variational Inference with Inverse Autoregressive Flow".
    6. Germain et al.'s paper "MADE: Masked Autoencoder for Distribution Estimation"
You can’t perform that action at this time.