Skip to content

lukasalthoff/scinet

Repository files navigation

SciNet

A task-level database of scientific research — a comprehensive map of what researchers actually do, broken down by domain, field, subfield, and topic. SciNet enables rigorous, task-level analysis of scientific work by mapping the granular activity structure of science across 5 domains, 30 fields, 302 subfields, and 4,516 topics, with 26,371 released task statements.

Website: anatomyofscience.com · Repository: github.com/lukasalthoff/scinet

Overview

SciNet organizes research work into a hierarchy aligned with OpenAlex (domains, fields, subfields, and topics). For each level, we use large language models to generate O*NET-style task statements describing what researchers in that area regularly do.

The files in data/ are released for replication and downstream research.

Data files

All files are UTF-8. CSVs use comma separators. See data/README.md for a standalone description.

File Description
data/tasks.csv Every task in the hierarchy (universal, domain, and subfield levels) with category labels
data/openalex_topic_subfield_mapping.csv Maps each OpenAlex topic to its SciNet display domain, field, and subfield

Data dictionary

tasks.csv

Column Description
task Task statement text
category Task category (e.g. "Ideation & Hypothesis Generation", "Data Gathering")
level One of universal, domain, or subfield
domain Domain name, e.g. "Social Sciences" (empty for universal tasks)
field Display field name, e.g. "Economics" (empty for universal/domain tasks)
subfield Display subfield name, e.g. "Labor Economics" (empty for universal/domain tasks)

openalex_topic_subfield_mapping.csv

Column Description
topic_id OpenAlex topic identifier
topic_name Topic display name
domain SciNet display domain
field SciNet display field
subfield SciNet display subfield

Methodology

SciNet pipeline diagram

  1. Hierarchy: OpenAlex domains, fields, subfields, and topics define the taxonomy.
  2. Task generation: Large language models produce O*NET-style task statements at field, subfield, and topic levels using a top-down hierarchical approach.

For a complete description of every pipeline step — including prompt design, coverage thresholds, O*NET calibration results, and protocols.io validation — see METHODOLOGY.md.

For visual summaries of the data — task distributions, verifiability rankings, AI adoption — see DATA_OVERVIEW.md.

For the research paper when available and project updates, see the Stanford project page.

Citation

If you use this dataset, please cite the SciNet project and this repository, for example:

@misc{scinet_data,
  title        = {SciNet: The Anatomy of Science},
  author       = {Althoff, Lukas},
  year         = {2026},
  howpublished = {\url{https://github.com/lukasalthoff/scinet}},
}

License

Data and documentation in this repository are licensed under CC BY 4.0 — see LICENSE.

Changelog

2026-03-20

  • Replaced generated_tasks.csv, openalex_topics.csv, and catalog.json with two simpler files:
    • tasks.csv: flat task file with category, level, domain, field, and subfield columns.
    • openalex_topic_subfield_mapping.csv: maps OpenAlex topics to SciNet display domains, fields, and subfields.
  • Added METHODOLOGY.md: full pipeline documentation covering taxonomy construction, hierarchical task generation, O*NET-style rating and filtering, AI exposure scoring, O*NET calibration, and protocols.io validation.

About

SciNET: An O*NET for Science — Task Database for Scientific Research

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages