Skip to content

suyoumo/Research_Grader

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Research Grader

English | 中文

Research Grader is a Codex skill for evaluating research papers with an area-chair style workflow: collect papers, extract local PDF text, send them through multiple anonymous reviewer passes, then produce a calibrated ranking with concrete rationales.

It is designed for moments when a single-paper summary is not enough. The strongest use case is comparing several papers from the same research area at once, preferably alongside already accepted conference papers or well-regarded baselines. That comparison makes the score more meaningful: novelty, rigor, and transferable value are judged against the field instead of against an isolated abstract.

Why This Skill

Most paper reviews collapse into either a summary or a vibe score. Research Grader tries to do something stricter:

  • Uses a fixed scoring formula: 40% Innovation + 40% Intrinsic Paper Value + 20% Rigor.
  • Keeps writing quality and visual polish as a separate reference score, not part of the final score.
  • Prioritizes method sections, evaluation protocol, ablations, limitations, and appendices.
  • Penalizes benchmark chasing, thin disclosure, weak baselines, and marketing-style claims.
  • Supports anonymous multi-reviewer scoring, where different reviewer styles inspect the same paper independently.

In practice, this works best when Codex is run with a strong model and high reasoning budget. In our usage, Codex with GPT-5.5 at xhigh reasoning produced the most useful results: the anonymous reviewers were better at finding different failure modes, and the final aggregation was more stable when several close papers had to be ranked.

Anonymous Reviewer Protocol

The skill defaults to multiple blind reviewer passes when sub-agents are available. Each reviewer receives the same rubric and local extracted paper text, but uses a different lens:

  • strict-rigor: methodology, ablations, reproducibility, statistical caution.
  • innovation: novelty, technical idea quality, and whether the work changes practice.
  • value: reusable lessons, failure modes, negative results, and field-level insight.
  • skeptical-meta: overclaiming, hidden assumptions, evaluator weakness, benchmark leakage.

The main Codex agent acts as the area chair. It does not expose one reviewer's scores to another reviewer before collection. After the reviews are in, it reconciles disagreements, reports uncertainty, and emits the final ranking.

Recommended Workflow

  1. Put target PDFs into one directory.
  2. Add accepted or canonical papers from the same area for calibration when possible.
  3. Ask Codex to use this skill to score the whole directory.
  4. Let the skill extract PDF text into analysis/text/.
  5. Run anonymous multi-reviewer scoring.
  6. Read the final ranking and the disagreement notes, not just the total scores.

Example prompt:

Use the Research Grader skill to evaluate all PDFs in this directory.
Use multiple anonymous sub-agents with different reviewer styles.
Compare these papers against the accepted conference papers in the same folder,
then rank them by innovation, intrinsic paper value, and rigor.

Output

The standard output is a compact ranking table:

Rank Report Innovation Paper Value Rigor Aesthetics (ref) Total

Each paper also gets a short rationale covering:

  • Primary contribution.
  • Why it scored where it did.
  • Main limitation or reason it did not rank higher.

Optional tier labels are available:

  • S 夯爆: paradigm-level paper.
  • A 很夯: strong method route with broad research value.
  • B 够硬: robust and valuable engineering or applied research.
  • C 能打: credible and useful but less foundational.
  • D 偏拉: thin, mostly capability report, or limited paper insight.

Repository Layout

.
├── SKILL.md
├── README.md
├── README.zh-CN.md
├── agents/
│   └── openai.yaml
├── references/
│   └── rubric.md
└── scripts/
    └── extract_pdf_reports.py

Extraction Script

Run from a directory containing PDFs:

python3 scripts/extract_pdf_reports.py .

The script writes extracted text and summaries under analysis/:

  • analysis/text/<pdf-stem>.txt
  • analysis/pdf_summaries.json
  • optional representative page PNGs with --render-pages

Notes

Research Grader scores the paper text and disclosed evidence. It does not verify that benchmark numbers are true, that private datasets exist, or that released code exactly matches the paper unless you explicitly ask Codex to perform that external audit.

About

This project is designed to evaluate the paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages