Skip to content

dblspk/web-app

Repository files navigation

Gitter chat

Doublespeak

Hides/reveals secret messages in text. Optimized for instant messaging.

Messages are encoded as zero-width Unicode characters, as a casual form of steganography.

Web app: https://dblspk.io/

Chrome extension:

Chrome Web Store

Usage

Web app

Tab / Shift+Tab — cycle through fields

Encoded message is automatically copied by tabbing to or clicking on the field.

Drag and drop files onto page to encode.

Features

  • File transmission
  • CRC-32 error checking
  • Multi-message decoding
  • Linkifies URLs, emails, phone numbers, and Twitter hashtags
  • Preview URLs for images, video, and audio
  • Progressive Web App — can be pinned to your Android homescreen

Possible uses

What can be hidden:

  • Text
  • URLs (similar use to QR codes)
  • Watermarks
  • Small files

Possible places for storage:

  • Chat messages
  • Social media posts
  • User profile information
  • Forums
  • HT⁢⁢‌‌︁︀⁢⁠⁣‌‍‍⁡⁣⁤⁣⁠⁠⁤⁣⁠‌‌⁠⁤⁣⁢⁢⁠‌⁤⁤⁤⁤⁢⁠︁ML
  • Emails
  • Digital documents
  • File names (very short messages only)

How it works

Unicode contains some zero-width, unprintable characters. We use 16 of them to encode any data in hexadecimal, using our arbitrary encoding scheme:

Decimal Hex Binary Character Description
0 0 0000 U+200C zero-width non-joiner
1 1 0001 U+200D zero-width joiner
2 2 0010 U+2060 word joiner
3 3 0011 U+2061 function application
4 4 0100 U+2062 invisible times
5 5 0101 U+2063 invisible separator
6 6 0110 U+2064 invisible plus
7 7 0111 U+206A inhibit symmetric swapping
8 8 1000 U+206B activate symmetric swapping
9 9 1001 U+206C inhibit Arabic form shaping
10 A 1010 U+206D activate Arabic form shaping
11 B 1011 U+206E national digit shapes
12 C 1100 U+206F nominal digit shapes
13 D 1101 U+FE00 variation selector-1
14 E 1110 U+FE01 variation selector-2
15 F 1111 U+FEFF zero-width non-breaking space

A header, encoded in the same way, is prepended:

Size Field Description
1 byte Protocol signature ASCII letter "D", or 0x44
1 byte Protocol version 0x00
4 bytes CRC-32 Calculated on decoded data field
1 byte Data type 0x00: Encryption wrapper
0x01: UTF-8 text
0x02: File
1+ bytes Data length Variable length quantity, representing length of the data field
Varies Data Depends on data type

The resulting string of invisible characters is then inserted at a random location in the cover text. More details in the protocol specification.

Efficiency

Each invisible character represents 4 bits, while taking 3 bytes (24 bits) to store. Thus, the hidden data consumes 6 times as much memory as the original data, not including header data and cover text.

Robustness

Multiple/split messages

When decoding, input is treated as a stream of an arbitrary number of messages. This allows users to paste in any text and decode all messages within at once. This also allows messages that have been split into chunks to be decoded, as long each chunk contains an even number of encoding characters, to maintain byte alignment.

Concatenated messages

Each message header stores the length of the data field, to allow decoding of multiple concatenated messages.

Corrupted and uncorrupted messages mixed together

During parsing, the decoder keeps track of consecutive sequences of encoding characters in the cover text. If some encoding characters have been corrupted or truncated, the CRC fails and the remainder of a sequence must be discarded. However, decoding will resume from the next sequence. This prevents one corrupted message from making all following messages undecodable.

Sequences of insufficient length, such as might occur naturally when encoding characters are used for their original purpose, are discarded.

Roadmap

The following planned features are defined in the protocol specification:

  • Automatic compression, only when size will be reduced
  • Optional built-in encryption for convenience (users can still provide own encryption without this feature)

To-do list

To suggest a feature, please create an issue.

Comparison to other steganography techniques

Pros

  • Produces no visible alteration in the text.
  • Can theoretically store a near-unlimited amount of data regardless of length of the cover text.
  • Can be used with applications that do not support file transfers.
  • Reduces suspicion by not requiring the frequent transfer of large files during communication.

Cons

  • Can be filtered or corrupted by applications that do not support Unicode, or that attempt to format user input.
  • Extremely easy to detect. Any digital text can be checked for the possible presence of a message by pasting it into a decoder, or a text editor that displays non-printing characters. Large messages may create line breaks in some applications.

If you are serious about concealing your payload, you should use another form of steganography.

As with any method of communication, security is only as good as the encryption applied. This only provides a casual level of security through obscurity.

Credits

This project began at Cal Hacks 3.0 by a much less memorable name.

Services used

Cross-browser testing courtesy of BrowserStack.

BrowserStack

License

MIT License