A pure Python implementation of the Punycode algorithm (RFC 3492) for encoding and decoding Unicode domain names.
PunyCode is a specialized encoding syntax used to convert Unicode strings into the limited character subset of ASCII supported by the Domain Name System (DNS). This implementation provides a clean, efficient, and RFC-compliant way to encode and decode Punycode strings in Python.
- 🔄 Bidirectional conversion between Unicode and Punycode
- 📜 Full compliance with RFC 3492 specifications
- 🐍 Pure Python implementation with no external dependencies
- 🌐 Support for both basic ASCII and non-ASCII Unicode characters
- 📚 Comprehensive documentation and examples
Clone the repository:
git clone https://github.com/yourusername/PunyCode.git
cd PunyCode
No additional dependencies are required as this is a pure Python implementation.
python punycode.py
Follow the interactive prompts to encode or decode strings.
from punycode import punycode_encode, punycode_decode
# Encoding example
unicode_str = "München"
encoded = punycode_encode(unicode_str)
print(encoded) # Output: "Mnchen-3ya"
# Decoding example
punycode_str = "Mnchen-3ya"
decoded = punycode_decode(punycode_str)
print(decoded) # Output: "München"
The implementation uses several parameters as defined in RFC 3492:
BASE
: 36 (using digits 0-9 and letters a-z)TMIN
: 1TMAX
: 26SKEW
: 38DAMP
: 700INITIAL_BIAS
: 72INITIAL_N
: 0x80DELIMITER
: '-'
-
Encoding Process:
- Basic ASCII characters are preserved
- Non-ASCII characters are encoded using a delta-compression scheme
- Results are represented using base-36 encoding
-
Decoding Process:
- Splits input at the last delimiter
- Processes basic and non-basic code points separately
- Reconstructs the original Unicode string
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
This project is licensed under the MIT License - see the LICENSE file for details.
Avik Chatterjee
- Thanks to the authors of RFC 3492 for the detailed specification
- The Unicode Consortium for their standards and documentation