emcd · emcd · Nov 10, 2025 · Nov 9, 2025 · Nov 9, 2025 · Nov 9, 2025
diff --git a/.auxiliary/notes/detextive-bugs.md b/.auxiliary/notes/detextive-bugs.md
@@ -0,0 +1,17 @@
+# Detextive Issues
+
+## Binary Data Decoded as UTF-16-LE
+
+**Issue**: Detextive incorrectly decodes certain binary data as UTF-16-LE text.
+
+**Example**: A file containing alternating bytes `0xFF 0x00` repeated (i.e., `bytes([0xFF, 0x00] * 52)`) is successfully detected as having charset `utf-16-le` and decoded as text, producing a string of repeated `ÿ` characters.
+
+**Impact**: This causes binary files that should be rejected to be accepted as valid text files. While this is not a security risk for most cases (since the "decoded" content is gibberish), it means that mimeogram may accept files that are not genuinely textual.
+
+**Workaround**: Tests have been updated to use binary files with more recognizable headers (like PE executables with `MZ` magic bytes) that Detextive properly rejects. These files cause decode failures even when a charset is detected.
+
+**Status**: This is a limitation of charset detection algorithms in general - alternating binary patterns can appear to match certain multi-byte encodings like UTF-16. The issue should be reported to the Detextive project for potential improvement in validation heuristics.
+
+**Related Tests**:
+- `test_410_application_x_security`: Updated to check for truly dangerous files only
+- `test_520_nontextual_mime`: Updated to use PE executable header instead of simple binary pattern
diff --git a/.auxiliary/notes/issues.md b/.auxiliary/notes/issues.md
@@ -0,0 +1,78 @@
+# Known Issues
+
+## CLI Parser Failure with tyro
+
+**Discovered**: 2025-11-09 during Detextive 2.0 port verification
+
+**Status**: Pre-existing issue (exists before Detextive 2.0 port)
+
+**Severity**: Critical - CLI is completely non-functional
+
+### Description
+
+The mimeogram CLI fails to start with a tyro parser error:
+
+```
+AssertionError: UnsupportedStructTypeMessage(message="Empty hints for <slot wrapper '__init__' of '_io.TextIOWrapper' objects>!")
+```
+
+### Reproduction
+
+```bash
+hatch run mimeogram --help
+# or any other command: version, create, apply, provide-prompt
+```
+
+### Analysis
+
+The error originates from `tyro` attempting to parse the CLI structure and encountering a type that lacks proper type hints. The error occurs in:
+
+```
+File "/root/.local/share/hatch/env/.../tyro/_parsers.py", line 113, in from_callable_or_type
+    assert not isinstance(out, UnsupportedStructTypeMessage), out
+```
+
+The error mentions `_io.TextIOWrapper`, suggesting that somewhere in the command classes or their dependencies, there's a reference to stdin/stdout/stderr or file handles that tyro cannot introspect.
+
+### Timeline
+
+- **Commit 556db71** (Merge PR #9 - appcore cutover): Error present
+- **Commit fac1d9f** (Integrate detextive package): Error present
+- **Commit c1401a1** (Port to Detextive 2.0): Error present
+- **Commit 32a777f** (Fix linter errors): Error present
+
+This indicates the issue was introduced during the appcore refactor (PR #9), not by the Detextive 2.0 port.
+
+### Investigation Points
+
+1. **appcore type annotations**: The issue likely stems from how `appcore` types are exposed to tyro
+2. **CLI command definitions**: Check `cli.py`, `create.py`, `apply.py`, `prompt.py` for problematic type hints
+3. **TextIOWrapper references**: Search for uses of `sys.stdin`, `sys.stdout`, `sys.stderr` that may need explicit typing
+
+Confirmed uses in codebase:
+- `sources/mimeogram/apply.py:134`: `__.sys.stdin.isatty()`
+- `sources/mimeogram/apply.py:144`: `__.sys.stdin.read()`
+- `sources/mimeogram/interactions.py:76`: `__.sys.stdout.flush()`
+- `sources/mimeogram/display.py:60`: `__.sys.stdin.isatty()`
+
+### Suggested Fix
+
+Based on the user's suggestion: Switch to `emcd-appcore[cli]` which likely includes additional dependencies or type stubs that help tyro properly parse the CLI structure.
+
+### Impact
+
+- **Tests**: All 173 tests pass (tests don't exercise CLI parsing, they import modules directly)
+- **Linters**: Pass cleanly (ruff and pyright)
+- **Detextive integration**: Working correctly
+- **CLI functionality**: Completely broken - cannot run any commands
+
+### Workaround
+
+None currently available. The application can be used programmatically by importing modules directly, but the CLI is unusable.
+
+### Next Steps
+
+1. Try switching dependency from `emcd-appcore~=1.4` to `emcd-appcore[cli]~=1.4`
+2. If that doesn't resolve it, investigate the specific type annotation that tyro cannot parse
+3. Consider adding explicit type annotations to any stdin/stdout/stderr usage
+4. May need to report issue to `tyro` if it's a limitation in their type introspection
diff --git a/pyproject.toml b/pyproject.toml
@@ -19,8 +19,7 @@ dependencies = [
   'absence~=1.1',
   'accretive~=4.1',
   'aiofiles',
-  'chardet',
-  'detextive~=1.0',
+  'detextive~=2.0',
   'dynadoc~=1.4',
   'emcd-appcore~=1.4',
   'exceptiongroup',
@@ -29,7 +28,6 @@ dependencies = [
   'httpx',
   'icecream-truck~=1.5',
   'patiencediff',
-  'puremagic',
   'pyperclip',
   'python-dotenv', # TODO: Remove after cutover to appcore.
   'readchar',

diff --git a/sources/mimeogram/acquirers.py b/sources/mimeogram/acquirers.py
@@ -78,13 +78,16 @@ async def _acquire_from_file( location: __.Path ) -> _parts.Part:
         async with _aiofiles.open( location, 'rb' ) as f: # pyright: ignore
             content_bytes = await f.read( )
     except Exception as exc: raise ContentAcquireFailure( location ) from exc
-    mimetype, charset = _detect_mimetype_and_charset( content_bytes, location )
+    mimetype, charset = __.detextive.infer_mimetype_charset(
+        content_bytes, location = str( location ) )
     if charset is None: raise ContentDecodeFailure( location, '???' )
-    linesep = _parts.LineSeparators.detect_bytes( content_bytes )
+    linesep = __.detextive.LineSeparators.detect_bytes( content_bytes )
     if linesep is None:
         _scribe.warning( f"No line separator detected in '{location}'." )
-        linesep = _parts.LineSeparators( __.os.linesep )
-    try: content = content_bytes.decode( charset )
+        linesep = __.detextive.LineSeparators( __.os.linesep )
+    try:
+        content = __.detextive.decode(
+            content_bytes, location = str( location ) )
     except Exception as exc:
         raise ContentDecodeFailure( location, charset ) from exc
     _scribe.debug( f"Read file: {location}" )
@@ -105,21 +108,22 @@ async def _acquire_via_http(
         response = await client.get( url )
         response.raise_for_status( )
     except Exception as exc: raise ContentAcquireFailure( url ) from exc
-    mimetype = (
-        response.headers.get( 'content-type', 'application/octet-stream' )
-        .split( ';' )[ 0 ].strip( ) )
+    http_content_type = response.headers.get( 'content-type' )
     content_bytes = response.content
-    charset = response.encoding or _detect_charset( content_bytes )
+    mimetype, charset = __.detextive.infer_mimetype_charset(
+        content_bytes,
+        location = url,
+        http_content_type = http_content_type or __.absent )
     if charset is None: raise ContentDecodeFailure( url, '???' )
-    if not _is_textual_mimetype( mimetype ):
-        mimetype, _ = (
-            _detect_mimetype_and_charset(
-                content_bytes, url, charset = charset ) )
-    linesep = _parts.LineSeparators.detect_bytes( content_bytes )
+    linesep = __.detextive.LineSeparators.detect_bytes( content_bytes )
     if linesep is None:
         _scribe.warning( f"No line separator detected in '{url}'." )
-        linesep = _parts.LineSeparators( __.os.linesep )
-    try: content = content_bytes.decode( charset )
+        linesep = __.detextive.LineSeparators( __.os.linesep )
+    try:
+        content = __.detextive.decode(
+            content_bytes,
+            location = url,
+            http_content_type = http_content_type or __.absent )
     except Exception as exc:
         raise ContentDecodeFailure( url, charset ) from exc
     _scribe.debug( f"Fetched URL: {url}" )
@@ -157,102 +161,6 @@ def _collect_directory_files(
     return paths
 
 
-def _detect_charset( content: bytes ) -> str | None:
-    from chardet import detect
-    charset = detect( content )[ 'encoding' ]
-    if charset is None: return charset
-    if charset.startswith( 'utf' ): return charset
-    match charset:
-        case 'ascii': return 'utf-8' # Assume superset.
-        case _: pass
-    # Shake out false positives, like 'MacRoman'.
-    try: content.decode( 'utf-8' )
-    except UnicodeDecodeError: return charset
-    return 'utf-8'
-
-
-def _detect_mimetype( content: bytes, location: str | __.Path ) -> str | None:
-    from mimetypes import guess_type
-    from puremagic import PureError, from_string # pyright: ignore
-    try: return from_string( content, mime = True )
-    except ( PureError, ValueError ):
-        return guess_type( str( location ) )[ 0 ]
-
-
-def _detect_mimetype_and_charset(
-    content: bytes,
-    location: str | __.Path, *,
-    mimetype: __.Absential[ str ] = __.absent,
-    charset: __.Absential[ str ] = __.absent,
-) -> tuple[ str, str | None ]:
-    from .exceptions import TextualMimetypeInvalidity
-    if __.is_absent( mimetype ):
-        mimetype_ = _detect_mimetype( content, location )
-    else: mimetype_ = mimetype
-    if __.is_absent( charset ): # noqa: SIM108
-        charset_ = _detect_charset( content )
-    else: charset_ = charset
-    if not mimetype_:
-        if charset_:
-            mimetype_ = 'text/plain'
-            _validate_mimetype_with_trial_decode(
-                content, location, mimetype_, charset_ )
-            return mimetype_, charset_
-        mimetype_ = 'application/octet-stream'
-    if _is_textual_mimetype( mimetype_ ):
-        return mimetype_, charset_
-    if charset_ is None:
-        raise TextualMimetypeInvalidity( location, mimetype_ )
-    _validate_mimetype_with_trial_decode(
-        content, location, mimetype_, charset_ )
-    return mimetype_, charset_
-
-
-def _is_reasonable_text_content( content: str ) -> bool:
-    ''' Checks if decoded content appears to be meaningful text. '''
-    if not content: return False
-    # Check for excessive repetition of single characters (likely binary)
-    if len( set( content ) ) == 1: return False
-    # Check for excessive control characters (excluding common whitespace)
-    common_whitespace = '\t\n\r'
-    ascii_control_limit = 32
-    control_chars = sum(
-        1 for c in content
-        if ord( c ) < ascii_control_limit and c not in common_whitespace )
-    if control_chars > len( content ) * 0.1: return False  # >10% control chars
-    # Check for reasonable printable character ratio
-    printable_chars = sum(
-        1 for c in content if c.isprintable( ) or c in common_whitespace )
-    return printable_chars >= len( content ) * 0.8  # >=80% printable
-
-
-# MIME types that are considered textual beyond those starting with 'text/'.
-_TEXTUAL_MIME_TYPES = frozenset( (
-    'application/json',
-    'application/xml',
-    'application/xhtml+xml',
-    'application/x-perl',
-    'application/x-python',
-    'application/x-php',
-    'application/x-ruby',
-    'application/x-shell',
-    'application/javascript',
-    'image/svg+xml',
-) )
-# MIME type suffixes that indicate textual content.
-_TEXTUAL_SUFFIXES = ( '+xml', '+json', '+yaml', '+toml' )
-def _is_textual_mimetype( mimetype: str ) -> bool:
-    ''' Checks if MIME type represents textual content. '''
-    _scribe.debug( f"MIME type: {mimetype}" )
-    if mimetype.startswith( ( 'text/', 'text/x-' ) ): return True
-    if mimetype in _TEXTUAL_MIME_TYPES: return True
-    if mimetype.endswith( _TEXTUAL_SUFFIXES ):
-        _scribe.debug(
-            f"MIME type '{mimetype}' accepted due to textual suffix." )
-        return True
-    return False
-
-
 def _produce_fs_tasks(
     location: str | __.Path, recursive: bool = False
 ) -> tuple[ __.cabc.Coroutine[ None, None, _parts.Part ], ...]:
@@ -277,19 +185,3 @@ async def _execute_session( ) -> _parts.Part:
         ) as client: return await _acquire_via_http( client, url )
 
     return _execute_session( )
-
-
-def _validate_mimetype_with_trial_decode(
-    content: bytes, location: str | __.Path, mimetype: str, charset: str
-) -> None:
-    ''' Validates charset fallback and returns appropriate MIME type. '''
-    from .exceptions import TextualMimetypeInvalidity
-    try: text = content.decode( charset )
-    except ( UnicodeDecodeError, LookupError ) as exc:
-        raise TextualMimetypeInvalidity( location, mimetype ) from exc
-    if _is_reasonable_text_content( text ):
-        _scribe.debug(
-            f"MIME type '{mimetype}' accepted after successful "
-            f"decode test with charset '{charset}' for '{location}'." )
-        return
-    raise TextualMimetypeInvalidity( location, mimetype )
diff --git a/sources/mimeogram/formatters.py b/sources/mimeogram/formatters.py
@@ -45,7 +45,7 @@ def format_mimeogram(
             location = 'mimeogram://message',
             mimetype = 'text/plain', # TODO? Markdown
             charset = 'utf-8',
-            linesep = _parts.LineSeparators.LF,
+            linesep = __.detextive.LineSeparators.LF,
             content = message )
         lines.append( format_part( message_part, boundary ) )
     for part in parts:

diff --git a/sources/mimeogram/parsers.py b/sources/mimeogram/parsers.py
@@ -109,17 +109,19 @@ def _parse_descriptor_and_content(
 
 
 _QUOTES = '"\''
-def _parse_mimetype( header: str ) -> tuple[ str, str, _parts.LineSeparators ]:
+def _parse_mimetype(
+    header: str
+) -> tuple[ str, str, __.detextive.LineSeparators ]:
     ''' Extracts MIME type and charset from Content-Type header. '''
     parts = [ p.strip( ) for p in header.split( ';' ) ]
     mimetype = parts[ 0 ]
     charset = 'utf-8'
-    linesep = _parts.LineSeparators.LF
+    linesep = __.detextive.LineSeparators.LF
     for part in parts[ 1: ]:
         if part.startswith( 'charset=' ):
             charset = part[ 8: ].strip( _QUOTES )
         if part.startswith( 'linesep=' ):
-            linesep = _parts.LineSeparators[
+            linesep = __.detextive.LineSeparators[
                 part[ 8: ].strip( _QUOTES ).upper( ) ]
     return mimetype, charset, linesep
 

diff --git a/sources/mimeogram/parts.py b/sources/mimeogram/parts.py
@@ -25,48 +25,6 @@
 from . import fsprotect as _fsprotect
 
 
-class LineSeparators( __.enum.Enum ):
-    ''' Line separators for various platforms. '''
-
-    CR =    '\r'    # Classic MacOS
-    CRLF =  '\r\n'  # DOS/Windows
-    LF =    '\n'    # Unix/Linux
-
-    @classmethod
-    def detect_bytes(
-        selfclass, content: bytes, limit = 1024
-    ) -> "LineSeparators | None":
-        ''' Detects newline characters in bytes array. '''
-        sample = content[ : limit ]
-        found_cr = False
-        for byte in sample:
-            match byte:
-                case 0xd:
-                    if found_cr: return selfclass.CR
-                    found_cr = True
-                case 0xa: # linefeed
-                    if found_cr: return selfclass.CRLF
-                    return selfclass.LF
-                case _:
-                    if found_cr: return selfclass.CR
-        return None
-
-    @classmethod
-    def normalize_universal( selfclass, content: str ) -> str:
-        ''' Normalizes all varieties of newline characters in text. '''
-        return content.replace( '\r\n', '\r' ).replace( '\r', '\n' )
-
-    def nativize( self, content: str ) -> str:
-        ''' Nativizes specific variety newline characters in text. '''
-        if LineSeparators.LF is self: return content
-        return content.replace( '\n', self.value )
-
-    def normalize( self, content: str ) -> str:
-        ''' Normalizes specific variety newline characters in text. '''
-        if LineSeparators.LF is self: return content
-        return content.replace( self.value, '\n' )
-
-
 class Resolutions( __.enum.Enum ):
     ''' Available resolutions for each part. '''
 
@@ -79,7 +37,7 @@ class Part( __.immut.DataclassObject ):
     location: str # TODO? 'Url' class
     mimetype: str
     charset: str
-    linesep: "LineSeparators"
+    linesep: __.detextive.LineSeparators
     content: str
 
     # TODO? 'format' method

diff --git a/sources/mimeogram/updaters.py b/sources/mimeogram/updaters.py
@@ -182,7 +182,7 @@ async def _update_content_atomic(
     location: __.Path,
     content: str,
     charset: str = 'utf-8',
-    linesep: _parts.LineSeparators = _parts.LineSeparators.LF
+    linesep: __.detextive.LineSeparators = __.detextive.LineSeparators.LF
 ) -> None:
     ''' Updates file content atomically, if possible. '''
     import aiofiles.os as os # noqa: PLR0402