This repository contains code and data for the brief "The Use of Open Models in Research."
Included in the repository, in the data directory, are the following:
- Closed Models Papers and Annotations, which contains all basic and annotation information on the closed model papers that were annotated for the report.
- Open Models Papers and Annotations, which contains all basic and annotation information on the open model papers that were annotated for the report.
There are 4 queries here, in sql:
- find_model_papers This is the primary query used to find the papers and metadata. It gets modified for different regex to find the correct papers for each model.
- language_model_downloads_one_month This is a query pulling one month of the top-downloaded text generation models from Hugging Face.
- language_model_total_downloads This is a query pulling all our download counts (multiple months but not forever) of the top-downloaded text-generation models from Hugging Face.
- base_models This is a different way to pull "top" models, checking how often base models were used by other models on the leaderboard.