Skip to content

microsoft/Auto-Formula

Repository files navigation

Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations

Benchmark overview

This benchmark dataset contains real Excel spreadsheets from 4 enterprise domains (3 of which are cralwed from the public web):

In our study [1] below to predict formulas that users want to author in a target spreadsheet cell (using the actual formula in the target cell as the ground-truth), we test different algorithms using real formulas extracted from the spreadsheets of the 4 enterprise domains above, to test algorithms to accurately predict formulas that users would type in a spreadsheet cell.

Statistics of this spreadsheet benchmark can be found in the table below. image

We hope the release of this spreadsheet dataset can facilitate futureu research in this and other related directions.

Data link

Data can be downloaded from this link.

Folder structure

The benchmark has 4 folders, each corresponds to an enterprise domain (Cisco, PGE, TI, Enron).

Furthermore, in our experiments we tested two ways of sampling test spreadsheets:

  • (1) Random: randomly sample 10% from spreadsheets in the same enterprise domain as test cases;
  • (2) Timestamp: order all spreadsheets by timestamps (last modified time), and use the most recently modified 10% spreadsheets as test cases.

These two test settings correspond to the two sub-folders named "random" and "timestamp: under Cisco/PGE/TI/Enron, respectively.

benchmark
└───Cisco
│   └───random
│   └───timestamp
│   │   test_formulas.json
└───PGE
│   └───random
│   └───timestamp
│   │   test_formulas.json
└───TI
│   └───random
│   └───timestamp
│   │   test_formulas.json
└───Enron
│   └───random
│   └───timestamp
│   │   test_formulas.json

Each folder contains two sub-folders for reference workbooks and test workbooks. There is also a "test_formulas.json" file, which contains formulas sampled as test cases, where the format is "spreadsheet-file-name"---"sheet-name"---"row-id"---"col-id" (e.g., "244228682981234450530384331754331956360-mls-mqc-qos-converter-v1.xlsx---MQCtoMLSQOS---19---15").

License

This data set is released under the MICROSOFT RESEARCH LICENSE.

Reference

[1] Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations. TOFILL.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published