TabText: A Flexible and Contextual Approach to Tabular Data Representation
This code corresponds to the paper TabText: https://arxiv.org/abs/2110.15829 by Kimberly Villalobos Carballo, Liangyuan Na, Yu Ma, Léonard Boussioux, Cynthia Zeng, Luis R. Soenksen and Dimitris Bertsimas.
This paper presents a systematic framework that leverages language to extract contextual information from tabular structures, resulting in more complete data representations. We investigate the impact of several language syntactic parsing schemes on the performance of TabText representations, and we show the effectiveness of using TabText for labor-consuming data preprocessing. Our experiments demonstrate that augmenting tabular data with our TabText representations can improve the AUC score by up to 6% across nine healthcare classification tasks.