Skip to content

Fix binaryornot to 0.4.4 to prevent misclassification errors#203

Merged
bjk7119 merged 1 commit intomainfrom
data
Apr 17, 2026
Merged

Fix binaryornot to 0.4.4 to prevent misclassification errors#203
bjk7119 merged 1 commit intomainfrom
data

Conversation

@bjk7119
Copy link
Copy Markdown
Contributor

@bjk7119 bjk7119 commented Apr 16, 2026

Summary by CodeRabbit

  • Chores

    • Fix binaryornot dependency to version 0.4.4.
  • binaryornot 0.6.0 introduced a scikit-learn-based decision tree algorithm
    that misclassifies C/H source files containing EUC-KR (Korean) encoded
    comments as binary.

  • Root cause:

    • 0.4.4: uses chardet to detect encoding → decodes successfully as EUC-KR
      → correctly classified as text
    • 0.6.0: uses a trained decision tree that checks utf8_valid as a key
      feature. EUC-KR files are not valid UTF-8, so utf8_valid=0.0.
      Combined with high byte entropy from multi-byte Korean characters,
      the tree classifies them as binary — even though try_euc_kr=1.0
      is present, that feature is not evaluated on the path these files take.

@bjk7119 bjk7119 requested review from dd-jy and soimkim April 16, 2026 08:18
@bjk7119 bjk7119 self-assigned this Apr 16, 2026
@bjk7119 bjk7119 added the chore [PR/Issue] Refactoring, maintenance the code label Apr 16, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 16, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6f6e30c2-d39b-4f1f-8fcf-f6e2c13b2141

📥 Commits

Reviewing files that changed from the base of the PR and between 5285628 and ba5716e.

📒 Files selected for processing (1)
  • pyproject.toml

📝 Walkthrough

Walkthrough

The pyproject.toml file has been updated to pin the binaryornot dependency to version 0.4.4, replacing the previous unpinned requirement specification.

Changes

Cohort / File(s) Summary
Dependency pinning
pyproject.toml
Pinned binaryornot dependency to exact version 0.4.4 instead of unpinned requirement.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Suggested reviewers

  • dd-jy
🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title accurately and directly describes the main change: pinning the binaryornot dependency to version 0.4.4 in pyproject.toml.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch data

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@bjk7119 bjk7119 merged commit 0016686 into main Apr 17, 2026
8 checks passed
@soimkim soimkim changed the title Fix binaryornot to 0.4.4 in pyproject.toml Fix binaryornot to 0.4.4 to prevent misclassification errors in Python 3.14. Apr 17, 2026
@soimkim soimkim changed the title Fix binaryornot to 0.4.4 to prevent misclassification errors in Python 3.14. Fix binaryornot to 0.4.4 to prevent misclassification errors Apr 17, 2026
@soimkim soimkim deleted the data branch April 17, 2026 01:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore [PR/Issue] Refactoring, maintenance the code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants