CALM Task

This library implements the task described at the paper Evaluating Online Continual Learning with CALM.

Installation

Download the required corpora by going into the data directory and running

pip install -r requirements.txt
python create_datasets.py

Example Usage

# model definition
import random
def get_loss(x, y):
    return random.random()
def train():
    pass

import calm
import numpy as np; np.random.seed(42) # different seeds will generate different segmentations
MultiLingual_dev_10k = calm.CALM("data/news_dev", "char", switch_frequency=10000, nswitches=100, window_size=20, batch_size=10, cuda=False)
results_fn = "loss_log.jsonl"
with open(results_fn, "w") as flog:
    for x, y in MultiLingual_dev_10k:
        loss = get_loss(x, y)
        train()
        calm.log_loss(flog, MultiLingual_dev_10k, loss)
print(calm.compute_metrics(results_fn))

Important methods and objects

The module calm.py contains the following methods and objects:

CALM: creates a data iterator for the CALM tasks. It implicitly depends on the numpy random generator to randomly switch between classes. It takes
- corpus_path: path to the root of a CALM corpus (e.g. data/news_dev).
- model_level: one of "char" or "word". (str)
- switch_frequency: mean sequence length in number of tokens. (int)
- nswitches: number of switches occurring within a same class. (int)
- window_size: number of consecutive tokens taken per batch. (int)
- batch_size: number of concurrent text streams. (int)
- cuda: send data to GPU (bool)
log_loss: takes a file handler, a CALM iterator object and the loss, and logs all information to a file.
compute_metrics: gets a log filename and computes the following metrics, also disaggregated by class:
- loss: mean loss.
- total_pp: mean overall perplexity.
- ppl@sw: mean perplexity restricted to the first 10 batches after a switch.
- rec: number of batches that the model needs to recover back to the mean loss on the last sequence of a same class.

Replication of the paper's results

If you want to replicate our own experiments, you can find the code we used here.

Citation

Should you want to cite this work, you can use the following BibTeX entry:

@article{Kruszewski:Sorodoc:Mikolov:2020,
  title={Evaluating Online Continual Learning with CALM},
  author={Kruszewski, Germ{\'a}n and Sorodoc, Ionut-Teodor and Mikolov, Tomas},
  journal={arXiv preprint arXiv:2004.03340},
  year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
LICENSE		LICENSE
README.md		README.md
calm.py		calm.py
calm_aux.py		calm_aux.py
example.py		example.py
metrics.py		metrics.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

LICENSE

LICENSE

README.md

README.md

calm.py

calm.py

calm_aux.py

calm_aux.py

example.py

example.py

metrics.py

metrics.py

Repository files navigation

CALM Task

Installation

Example Usage

Important methods and objects

Replication of the paper's results

Citation

About

Releases

Packages

Languages

License

germank/CALM

Folders and files

Latest commit

History

Repository files navigation

CALM Task

Installation

Example Usage

Important methods and objects

Replication of the paper's results

Citation

About

Resources

License

Stars

Watchers

Forks

Languages