Large corpus of uncompressed and compressed sentences from news articles.
The dataset is provided "AS IS" without any warranty, express or implied. Google disclaims all liability for any damages, direct or indirect, resulting from the use of the dataset.
The algorithm to collect the data is described here: Overcoming the Lack of Parallel Data in Sentence Compression, Katja Filippova and Yasemin Altun, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP '13), pp. 1481-1491. (pdf)