The blockchain, beginning with the Bitcoin Genesis Block, serves as a living, immutable document—a piece of human history and the very DNA for future AI. Project Starlight is an open-source initiative dedicated to developing and sharing a protocol for training artificial intelligence models to detect steganography within this critical data, particularly in images. The goal is to create a robust, decentralized, and community-driven resource that safeguards the integrity of these digital records. By making this knowledge accessible, we aim to ensure that by the year 2142, the detection of covert data is a common and automated practice, laying the foundation for AI training on historical Bitcoin data to build genuine AI common sense.
To immediately begin generating datasets, training the model, and running detection, please refer to the comprehensive instructions in the USAGE.md file.
cd starlight
pip install -r requirements.txt
./scripts/run_api_dev.sh # defaults: PORT=8080, BLOCKS_DIR=./blocks
# Docs: http://localhost:8080/docs (OpenAPI), health: http://localhost:8080/health
# Metrics: http://localhost:8080/metrics (Prometheus)Environment overrides: STARGATE_API_KEY (defaults to demo-api-key), ALLOW_ANONYMOUS_SCAN=true for local testing, BLOCKS_DIR for cached blocks.
The proliferation of data, especially rich media like images, being stored on public blockchains marks a new era of digital permanence and transparency. This evolution presents a critical opportunity to proactively ensure the complete integrity and trustworthiness of decentralized ledgers.
Addressing concealed information (steganography) is about more than just security; it’s about preserving authentic historical context. When high-impact, world-changing events—like an image capturing a leader's definitive moment of bravery and defiance—are recorded on-chain, we must ensure that the original, uncompromised narrative is protected. An altered image, corrupted by hidden data, could compromise the emotional and historical truth of that record.
By creating a system that can automatically verify the content of on-chain assets, we are raising the standard for transparency and trust in every byte stored, safeguarding the fundamental principle of a transparent public ledger, and future-proofing the security of decentralized networks.
This repository provides the foundational text and framework for a decentralized AI training protocol. Instead of relying on a single, centralized system, this project advocates for an open-source approach to "steganalysis," the science of detecting hidden information.
The core of our approach involves training AI models to identify the subtle statistical and pixel-level anomalies that steganography leaves behind. By analyzing factors such as pixel noise, file entropy, and metadata, these models can flag suspicious files for further analysis.
The accompanying resources in this repository, such as bitcoin_white_paper_2.md, outline proposed mechanisms like a consensus structure to further enhance this protocol by disincentivizing the embedding of malicious data. Additionally, ai_common_sense_on_blockchain.md outlines a proposed protocol for AI to send smart contract messages on the blockchain.
The training protocol focuses on three key areas to ensure a robust and scalable solution:
- Diverse Datasets: Steganalysis requires a vast, labeled dataset of both "clean" images and images with hidden data. This protocol encourages the community to contribute to the creation of such a dataset, ensuring the AI is trained on a wide range of steganographic techniques. See DATASET_GUIDELINES.md for detailed instructions on contributing high-quality datasets.
- AI Models and Architectures: We recommend using AI models, particularly deep learning architectures, that are tailored for steganalysis. While models like Convolutional Neural Networks (CNNs) are highly effective for image analysis, the protocol is designed to support and evaluate various AI approaches on diverse datasets. This flexibility applies to the steganography methods used for dataset creation.
- Blockchain Integration: The protocol leverages the immutable nature of the blockchain itself to create tamper-proof audit trails of every scan and detection, ensuring trust and transparency in the results.
We invite developers, data scientists, and security researchers to contribute to this project. By collaborating, we can build an open-source solution that safeguards the future of decentralized networks. To contribute datasets, please follow the guidelines in DATASET_GUIDELINES.md.