- Introduction
- Repository Structure
- Usage
- Dataset
- Disclaimer
- Results
- License
- Request the data
- Get Help
- Citing Details
This repository contains the code to measure g-index
for an experiment as described in the paper. It let's you calculate the g-index
for any model experiment, and simulate the responsiveness of the g-index
to various factors like number of training samples, compute and performance.
Asset Name | Description | |
---|---|---|
1. | domains/ | Contains DAG files for various domains. For more information on domains see Domain Details |
2. | experiments/ | Contains experiment files for experiments mentioned in Results |
3. | notebooks/ | Contains Tutorial Notebooks for reproducing mentioned in the paper results & running simulations on g-index |
4. | results/ | Output directory for CLI mentioned in CLI Usage |
5. | g_index.py | Python File for Calculating g-index for experiments and for running simulations. For more info see Tutorial Notebook |
6. | definitions.md | Contains definitions for various parameters used in g-index calculations. |
Some of the package requirements fail to get installed on Arm-based M1 macs, please use binder/colab to launch the notebooks.
- Create a conda environment from the provided
env.yml
with the following command
conda env create -f env.yml
- Alternatively you can also install the required packages in your existing environment by:
pip install -r requirements.txt
The CLI provides the following functionality Calculating g-index
for a one or more experiments.
To simulate g-index
calculations please refer to the Tutorial Notebook
Parameter | Type | Description |
---|---|---|
-e , --exp_dir |
str |
Name of the directory where experiment files are stored |
-d , --domain_dir |
str |
Name of the directory where domain files are stored |
-s , --print_results |
bool |
Save the results to a file |
-s , --save_results |
bool |
Print the results |
Name | Description |
---|---|
Intelligent System (IS) | Name of the Model used in the Experiment |
Total Curricula Samples | Total number of samples used across all domains for training the IS |
Total Task Samples | Total number of samples used across all domains for training the IS |
Compute Used (E) | Save the results to a file |
Average Performance (θ) | Average performance of the IS across all task domains |
g-index |
g-index value for the IS |
python g_index.py --exp_dir 'experiments' --domain_dir 'domains' --save_results True --print_results True
The dataset provided has 16 domains with 2 samples per class. For more info see Domain Details
Model Name | # Training Samples | Compute Used | θ | g_index | |
---|---|---|---|---|---|
1. | GPT2-345M | 2560 | 127.530 | 0.697 | 7902.972 |
2. | GPT Neo-2.7B | 2560 | 8969.100 | 0.682 | 6421.049 |
3. | GPT2-1.5B | 5120 | 5927.400 | 0.708 | 6390.314 |
4. | GPT2-1.5B | 10240 | 11563.320 | 0.683 | 6006.261 |
5. | GPT2-774M | 2560 | 1516.640 | 0.620 | 4872.334 |
6. | GPT Neo-2.7B | 1280 | 5063.380 | 0.582 | 4476.680 |
7. | GPT2-345M | 1280 | 74.750 | 0.547 | 4399.190 |
8. | GPT2-774M | 5120 | 2941.941 | 0.585 | 4070.117 |
Each model was trained for 30 epochs
This project is licensed under the MIT License. See LICENSE for more details.
Disclaimer : The data samples provided are a small fraction of the data used in the experiments. Due to this, your results might vary from the reported results. If interested in the larger dataset, please send us a mail at the humans@mayahq.com with the following details.
Email Subject: g-index data request
Name:
I/We are an : [Individual/Organisation]
Our use case : [Brief Description]
- Contact us at humans@mayahq.com
- If appropriate, open an issue on GitHub
If this repository, the paper or any of its content is useful for your research, please cite:
@misc{venkatasubramanian2021measure,
title={Towards A Measure Of General Machine Intelligence},
author={Gautham Venkatasubramanian and Sibesh Kar and Abhimanyu Singh and Shubham Mishra and Dushyant Yadav and Shreyansh Chandak},
year={2021},
eprint={2109.12075},
archivePrefix={arXiv},
primaryClass={cs.AI}
}