Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time


This is the repository for the paper Inducing Semantic Roles Without Syntax, by Julian Michael and Luke Zettlemoyer, published in Findings of ACL 2021.

Abstract: Semantic roles are a key component of linguistic predicate-argument structure, but developing ontologies of these roles requires significant expertise and manual effort. Methods exist for automatically inducing semantic roles using syntactic representations, but syntax can also be difficult to define, annotate, and predict. We show it is possible to automatically induce semantic roles from QA-SRL, a scalable and ontology-free semantic annotation scheme that uses question-answer pairs to represent predicate-argument structure. By associating arguments with distributions over QA-SRL questions and clustering them in a mixture model, our method outperforms all previous models as well as a new state-of-the-art baseline over gold syntax. We show that our method works because QA-SRL acts as surrogate syntax, capturing non-overt arguments and syntactic alternations, which are central motivators for the use of semantic role labeling systems.

This repository contains code to replicate the results in the paper and supporting algorithms for doing similar experiments with other data and features.


The bulk of the code is written in Scala, organlized into three modules under qasrl-roles/:

  • clustering/: Implementation of hybrid flat/agglomerative clustering algorithms.
  • modeling/: Construction of features and experimental/analysis pipelines.
  • browse/: A webapp to browse the completed clusters.

The neural network models of QA-SRL are written with PyTorch/AllenNLP, and they are available in the qasrl-modeling repository, brought in here as a submodule.

Manual analysis data from Section 6 of the paper is recorded in this Google Sheet.


To run the Scala code, you need the Mill build tool.

Then to replicate the results in the paper, run scripts/, or just read it for guidance if you are interested in a subset of the experiments. The main entry point is scripts/, which calls into RoleInductionApp. You'll probably want to tail -f stderr.log in another pane/window to see all details, as stderr is redirected there so as not to interfere with the freelog console output written to stdout.

Roleset Browser

You can browse the induced rolesets in a webapp by running the following command:

mill -i qasrl-roles.browse.serve --data conll08-lemma --mode test --domain localhost --port 8888

This will visualize the models that have results reported in the paper (if you have run them; see scripts/ To automatically reconstruct all missing rolesets, run the browser command with the --all flag added (it will take a while if you need all of them).


There's a lot more functionality in this repository for related experiments, such as clustering on lexical (masked language modeling based) features, inducing predicate senses, and jointly inducing semantic roles and predicate senses. The results of these experiments weren't too great, but they might be an interesting starting or reference point for others interested in similar problems. I haven't done much in the way of documentation, so if you're interested in exploring more, reusing any of the code or algorithms, or building on this work, please get in touch.


Bibtex coming soon via the ACL Anthology.


Repository for the paper "Inducing Semantic Roles without Syntax" in Findings of ACL 2021.







No releases published


No packages published