A pure JavaScript tool that extracts, validates, and verifies academic references from uploaded text files or plain text input. Features comprehensive editing capabilities, individual reference validation, and intelligent dual-validation system.
✅ All Fields Editable - Edit authors, title, year, venue, DOI, URL, arXiv ID - everything!
✅ Individual Validation - Validate one reference at a time, no need to run full batch
✅ Dual Validation - Academic references with DOI/arXiv get two-step verification
✅ Fake Reference Detection - Detects arXiv references citing wrong papers (title mismatch)
✅ Smart Text Cleanup - Auto-fixes PDF line-break hyphens and broken URLs
✅ 6 Citation Formats - APA, MLA, IEEE, ACM, arXiv, URL references
✅ Real-time Statistics - See validation progress and results instantly
✅ No Backend Required - Pure JavaScript, runs entirely in your browser
✅ Privacy First - No data storage, no tracking, all processing is local
- Open
index.htmlin your web browser - Upload
sample-references.txtor paste your own references - Click "Extract References" to parse the text
- Edit any field directly in the displayed results (all fields are editable)
- Click "✓ Validate This Reference" on individual references OR "Start Validation" to validate all
- Review validation results and use verification links for manual checking
-
Input Sources:
- Upload .txt files containing references
- Direct text input/paste functionality
-
Supported Citation Formats:
- APA (American Psychological Association)
- MLA (Modern Language Association)
- IEEE (Institute of Electrical and Electronics Engineers)
- ACM (Association for Computing Machinery)
- arXiv (preprint repository references with arXiv IDs)
- URL/Web References (websites, online resources with URLs)
-
Output:
- Display extracted references one by one
- Parse each reference into structured components:
- Authors, year, title, venue for academic references
- Source, title, URL, access date for web references
- DOI (Digital Object Identifier) links when available
- arXiv ID and URL for arXiv preprint references
- Automatic fixing of broken URLs and DOI links from line breaks
-
Intelligent Validation Strategies:
- User activates validation by clicking "Start Validation" button
- Different strategies based on reference type:
Pure URL References:
- Single check: Verify URL accessibility only
- No academic database checking
Academic References WITHOUT DOI:
- Single check: Verify paper exists in Semantic Scholar
- Search by title, authors, year
Academic References WITH DOI:
- DUAL VALIDATION (two separate checks):
- Paper Check: Verify paper exists in Semantic Scholar
- DOI Check: Verify DOI URL is accessible
- Both checks must pass for "Valid" status
- Displays detailed results for each check
arXiv References:
- DUAL VALIDATION (two separate checks):
- Paper Check: Verify paper exists in Semantic Scholar
- arXiv Check: Verify arXiv ID exists using arXiv API
- Fake Reference Detection:
- Compares reference title with arXiv paper title (70% similarity threshold)
- Detects when reference cites "Paper A" but arXiv link points to "Paper B"
- Shows "🚨 FAKE REFERENCE DETECTED" warning for significant mismatches
- Displays title similarity percentage
- Both checks must pass for "Valid" status
- Displays detailed results for each check
-
Validation Process:
- Progressive display (one by one as each completes)
- Flag suspicious or unverifiable references
- Automatic delays between checks to prevent rate limiting/blocking (1.5-2s)
- Longer delays (2s) for dual validation and URL checks
-
All Fields Are Editable:
- Authors, Year, Title, Venue, DOI (for academic references)
- Source, Title, URL, Access Date (for URL references)
- arXiv ID, arXiv URL (for arXiv references)
- Edit directly in the displayed results - no need to re-extract
-
Individual Validation Buttons:
- Each reference has its own "✓ Validate This Reference" button
- Validate individual references without running full validation
- Button changes to "🔄 Re-validate This Reference" after first validation
- Make corrections and re-validate as many times as needed
- Statistics automatically recalculate after each validation
-
Benefits:
- Fix extraction errors immediately
- Test corrections before full validation
- Save time - only validate what needs checking
- Iterative improvement through multiple re-validations
- Real-time Statistics Display:
- Total number of references extracted
- Valid references count (✓ green badge)
- Suspicious references count (⚠ yellow badge)
- Unverified references count (✗ red badge)
- Automatically recalculates after each individual validation
- Updates in real-time during full validation
-
For Academic References, Provide Links:
- Semantic Scholar search link
- Google Scholar search link
- General Google search link
-
For URL References, Provide Links:
- Direct link to visit the URL
- Google search for the URL (to check if it's indexed)
- Google search for the title
-
Flagging System:
- Mark references as: Valid, Suspicious, or Unverified
- Highlight potential issues (missing information, format errors, not found online)
- Color-coded status indicators for quick visual assessment
- Pure JavaScript (no backend required)
- Compatible with modern browsers
- No external dependencies for core functionality (can use fetch API for validation)
-
Semantic Scholar API:
- Endpoint: https://api.semanticscholar.org/
- Free tier available
- Search papers by title, author, DOI
-
Google Scholar:
- Generate search URLs (no official API)
- Manual verification option
-
Input Section:
- File upload button for .txt files
- Editable text area for direct input/paste
- Allow users to modify uploaded text to fix errors
-
Control Buttons:
- "Extract References" / "Re-run" button to process/reprocess text
- "Start Validation" button to activate automatic checking process
-
Output Section:
- Results display area showing references one by one
- Visual indicators for validation status (Valid/Suspicious/Unverified)
- Clickable links for manual verification
- Progressive display: validation results appear one by one as they complete
Workflow A: Individual Validation (Recommended for corrections)
- Upload/Input: Load .txt file or paste reference text
- Extract: Click "Extract References" button
- Review & Edit: Check extracted references, edit any incorrect fields (all fields are editable)
- Validate Individually: Click "✓ Validate This Reference" on specific references
- Iterate: Edit and re-validate as needed until satisfied
Workflow B: Full Validation (Recommended for initial check)
- Upload/Input: Load .txt file or paste reference text
- Edit Input Text (Optional): Modify text in editable text area to fix errors
- Extract/Re-run: Click "Extract References" button (can re-run after editing)
- Start Full Validation: Click "Start Validation" to check all references automatically
- View Progressive Results: Validation results appear one by one as each completes
- Edit & Re-validate: Fix errors in suspicious references using individual validation
- Manual Verification: Click provided links for remaining suspicious references
Note: The tool is semi-automatic - extraction and validation are triggered by user actions, not automatic on upload.
Before extracting references, the tool automatically fixes common formatting issues:
- Problem: Text copied from PDFs often has broken words from line breaks
- Examples:
- "digitiza- tion" → "digitization"
- "Manufactur- ing" → "Manufacturing"
- "Tool- path" → "Toolpath"
- Solution: Automatically detects and removes line-break hyphens
- Pattern: Removes hyphen + space/newline when followed by word continuation
- Preserved: Legitimate hyphenated words (e.g., "state-of-the-art", "machine-learning")
This preprocessing happens automatically during extraction and requires no user action.
-
Input Phase:
- User uploads .txt file OR pastes text into editable text area
- Text is displayed in the text area and can be edited
-
Editing Phase (Optional):
- User can modify the text to fix formatting errors, typos, or inconsistencies
- User can re-upload or paste new text at any time
-
Extraction Phase:
- User clicks "Extract References" button (or "Re-run" for reprocessing)
- Detect citation style for each reference (APA/MLA/IEEE/ACM)
- Extract individual references from text
- Parse each reference into components
- Display extracted references one by one
- Update existing reference items if re-running
-
Validation Phase (User-Initiated):
- User clicks "Start Validation" button to begin automatic checking
- For each reference (processed sequentially):
- Query Semantic Scholar API with reference details (title, authors, year)
- Display validation result immediately as it completes (progressive display)
- If not found, flag as suspicious
- Generate verification links for manual checking
- Results appear one by one, not all at once after completion
-
Manual Verification Phase:
- User can click on provided links for suspicious references:
- Semantic Scholar search link
- Google Scholar search link
- General Google search link
- User can manually verify and make decisions about flagged references
- User can click on provided links for suspicious references:
For each reference, display:
- Original reference text
- Detected citation style
- Parsed components (authors, title, year, venue)
- Validation status (Valid/Suspicious/Unverified)
- Manual verification links:
- [Check on Semantic Scholar]
- [Check on Google Scholar]
- [Search on Google]
- What is CORS? Cross-Origin Resource Sharing is a browser security feature that blocks requests to external websites
- Impact on URL Validation:
- Direct URL checking is blocked by most websites
- Many URL references will show as "Suspicious" even if valid
- DOI URL checks may fail due to CORS blocking
- Workaround: Use the provided manual verification links
-
Semantic Scholar Database:
- Not all papers are indexed (especially older publications)
- Regional journals may have limited coverage
- Valid papers may be flagged as suspicious if not in database
-
Citation Style Detection:
- Pattern-based detection may not be 100% accurate
- Complex or non-standard formats may be misidentified
- Semantic Scholar API: ~100 requests per 5 minutes (free tier)
- Delays built in to prevent rate limiting (1.5-2s between requests)
- Large batches may take several minutes to complete
- Complex multi-line references may have parsing errors
- Non-standard citation formats may extract incorrectly
- Manual editing available to fix extraction errors
- Requires modern browser with JavaScript enabled
- Best performance in Chrome, Firefox, Safari, Edge
- Internet connection required for validation (extraction works offline)
Problem 1: No references extracted
Solution 1: Check that references follow standard citation formats (APA, MLA, IEEE, ACM, arXiv, URL)
Problem 2: Valid reference marked as suspicious
Solution 2: Use manual verification links - paper may not be in Semantic Scholar database
Problem 3: URL references always show as suspicious
Solution 3: This is expected due to CORS restrictions - click "Visit URL" link to manually verify
Problem 4: "Rate Limited" error during validation
Solution 4: Wait 5-10 minutes before continuing - free API tier has limits
Problem 5: Title or other fields show "Unknown" Solution 5: Edit the fields directly in the results and click "Re-validate This Reference"
Problem 6: Extraction errors with PDF-copied text
Solution 6: The tool auto-fixes most issues (line-break hyphens), but you can manually edit the input text before extraction
| Feature | Full Validation | Individual Validation |
|---|---|---|
| Button Location | Top of page ("Start Validation") | Each reference ("✓ Validate This Reference") |
| Scope | All references at once | Single reference at a time |
| When to Use | Initial validation of all references | Fix specific suspicious references |
| Time | 2-5 seconds × number of refs | 2-5 seconds per reference |
| Best For | First-time validation | Testing corrections, selective validation |
| Statistics Update | After all complete | After each validation |
| Edits Applied | N/A (validates current data) | Read from edited input fields |
Recommended Approach:
- Use Full Validation for initial check
- Identify suspicious references
- Edit those references
- Use Individual Validation to re-check corrections
- Repeat as needed
Reference-checking/
├── README.md # Main documentation (you are here)
├── index.html # Main application interface
├── reference-tool.js # Core JavaScript application
├── sample-references.txt # Mixed format references for general testing
├── documents/ # User-facing documentation (8 files)
└── testing/ # Test files for feature validation (11 files)
User documentation and test files mentioned in this README:
Test Files:
COMPREHENSIVE-TEST-REFERENCES.txt- All test cases in one file (16 references)
User Documentation:
USAGE-GUIDE.txt- Detailed step-by-step usage instructionsQUICK-USAGE-GUIDE.txt- Quick reference guide for all-fields-editable featureEDIT-AND-REVALIDATE-FEATURE.txt- Editing and re-validation guideALL-FIELDS-EDITABLE-FEATURE.txt- Complete editing feature documentation
Technical Documentation:
DUAL-VALIDATION-SYSTEM.txt- How the dual validation system worksARXIV-FEATURE-DOCUMENTATION.txt- arXiv reference support detailsCURRENT-STATUS.txt- Current implementation status and active patterns
Test files for validating various features:
test-references.txt- Mixed format references with line-break issuestest-broken-urls.txt- URL extraction with space issuestest-arxiv-references.txt- arXiv format detection and validationtest-fake-reference.txt- Fake reference detection examplestest-8-apa-references.txt- APA reference splitting validationtest-author-extraction.txt- Author name parsing edge casestest-year-suffix.txt- Year suffix handling (2020a, 2020b)test-multiline-references.txt- Multi-line reference handlingtest-hyphenated-breaks.txt- Hyphen removal from line breakstest-ieee-nested-quotes.txt- Quote character handlingtest-references-incl-fake.txt- Combined test scenarios
- Local Processing: All reference extraction and parsing happens in your browser
- No Data Storage: No references or text are stored on any server
- API Calls: Validation requires calls to:
- Semantic Scholar API (for paper verification)
- arXiv API (for arXiv reference verification)
- DOI/CrossRef (for DOI verification - may be blocked by CORS)
- No Tracking: No analytics, cookies, or user tracking
- Open Source: Pure JavaScript - inspect the code yourself
- Export validated references to BibTeX format
- Batch validation mode with selection
- Reference deduplication
- Citation style conversion
- Integration with DOI.org for additional validation
- Auto-complete suggestions for venues and authors
- Undo/redo functionality for edits
- Save and load validation sessions
- initial tool: Janetyc and Claude Code, October 26, 2025