Permalink
Branch: master
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
407 lines (233 sloc) 33.9 KB

Chapter 6

A Deep Dive into Monero & Cryptography

Since long before the birth of computers, mathematics and cryptography have been at the center of communication and information exchange. While simple ciphers have been around since Caesar’s time [link to Caesar cipher], modern cryptography was born during the World Wars for encrypting important and confidential messages. Initially, governments and militaries funded classified cryptography research to identify protocols for protecting state secrets.

Now, cryptography is no longer limited to spies and militaries; it forms the backbone of communication and security in the internet era, and is widely studied by academic and industry researchers scattered throughout the globe.

Today, cryptography is a ubiquitous behind-the-scenes tool that enables security, management, communication, and many of the connections that improve our day-to-day lives. For example, consider the invention of Secure Socket Layer (SSL, deprecated in favor of TSL), which is based on cryptographically signing the content. Hospitals, banks, governments, and businesses all protect your data with cryptography.

This chapter discusses how cryptographic tools can be applied to a decentralized financial database to give rise to cryptocurrencies, especially Monero.

Math Fundamentals

Here is a brief introduction/recap of several mathematical principles that are at the core of cryptography.

Euclidean Division (A/B)

Dividing any number “A” by another “B”, written as A/B or A÷B returns an answer that can either be written as a quotient with a remainder, or as a decimal alone.

Generally: A/B = q with remainder r

For example: 12/4 = 3 with remainder 0, which can be written 3.0 in decimal form 13/4 = 3 with remainder 1, which can be written 3.25 in decimal form 27/5 = 5 with remainder 2, which can be written 5.4 in decimal form

Prime Numbers

A prime number is any integer (whole number) that is not divisible by any integer besides ‘1’ and itself. For example, 20 is not prime because it could be divided by 2, 4, 5, or 10, resulting in whole numbers e.g. 20 ÷ 4 = 5 - or - 20 ÷ 10 = 2 7 is prime because any integer that you divide it by will not yield a whole number e.g. 7 ÷ 3 = 2.3333 Some example prime numbers include 3, 5, 7, 11, 13, 97, 223, 997, 3413, 4421, 17837, 145601, 428567, 1171967, and even much larger numbers like 2074722246773485207821695222107608587480996474721117292752992589912196684750549658310084416732550077 or the twin primes 2,996,863,034,895  × 2^1,290,000 ± 1, which have over 350,000 digits each!

Modular arithmetic (A mod B)

Modular arithmetic describes numbers that wrap around a particular integer. An intuitive example is the 12-hour clock. If you stay up for 5 hours past 11:00 PM, you would not encounter 16:00 PM o’clock! Instead, at midnight, the time wraps around to zero (so 5 hours past 11:00 PM is 4:00 AM the next day).

Given any two positive numbers, A (the dividend) and B (the divisor), A modulo B = the remainder r from A/B.

In the context of clocks, staying up 5 hours past 11:00 PM could be represented as:

(11:00 PM + 5 hours) mod 12 = … = 16:00 mod 12 = 4:00 (AM)

Elliptic Curve (Ed25519)

Elliptic curves are defined as the set 2-dimensional (x, y) points that satisfy an equation:

y2=x3+ax+b

For example, with fixed coefficients a = 2 and b = 4, this equation becomes

y2=x3+2x+3,

which is satisfied by many pairs of points such as:

x = 3 and y = 6 x = 3 and y = -6 x = -1 and y = 0

Monero uses a particular Twisted Edwards elliptic curve for cryptographic operations, Ed25519, which is birational equivalent of the Montgomery curve Curve25519.

The ed25519 curve can be expressed algebraically as

-x2 + y2 = 1 − (121665/121666) x2 y2

Thinking back to our general elliptic curve equation, this Twisted Edwards is a special case using the parameters:

a = -1 and b = 121665/121666.

It is well known that NIST4 standard algorithms have had recent issues, [LINK] and the Twisted Edwards curve was selected to address many concerns held by the cryptography community. [LINK]

Recently, it has become clear that a NIST-backed PNRG random number generation algorithm is flawed, and contains a potential backdoor. [LINK] Seen from a broader perspective, curves selected by the NIST are also implicitly supported by the NSA. These endorsements are viewed suspiciously by the cryptography and cryptocurrency communities due to previous incidents where the NSA used their authority over NIST to weaken algorithms suggested by the latter.

Twisted Edwards curve Ed25519 is not subject to any patents, and the team behind it has developed and adapted basic cryptographic algorithms with efficiency in mind. This curve is currently believed to be secure.

Cryptography Basics

Monero is the leading secure, private, and untraceable cryptocurrency because of its unique privacy-oriented cryptographic features, which we’ll explore more thoroughly in this chapter This is one of the more technical chapters of the book, due to the mathematical nature of cryptography. More complex techniques are built upon simple principles known as “cryptographic primitives”

A cryptographic primitive is an algorithm that serves as the building block for cryptographic protocols. Monero employs a wide variety of cryptographic primitives for various uses, some of which we covered conceptually in chapters 3 and 4. Monero’s intentional approaches to privacy and (ASIC-resistant) proof of work necessitate development with more sophisticated cryptographic tools than those used by many other cryptocurrencies, such as Bitcoin.

Hashing

Chapter 4 discussed the concept of “hashing” and how its uses range from confirming data fidelity to distributing rewards in Proof of Work. Example hashes are shown in the cryptography section toward the end of Chapter 4.

Selecting a good hashing algorithm is crucial for creating generating addresses and keys in a secure way. If two different inputs produce the same hash output, this is known as a “collision.” Hashes are commonly used as an identifier in blockchain systems, relying on their effective uniqueness. Furthermore, a collision during address generation would lead to multiple individuals with the same keys and seeds; obviously this would be extremely problematic!

Monero uses the CryptoNight PoW system, which uses a special “CryptoNote” hash algorithm (unlike most cryptocurrencies that are built on standard SHA-256), which is built in the “Keccak” hash function. The Keccak algorithm won a NIST competition to be designated SHA3, and is designed by non-NSA engineers. Monero uses the “Keccak-256” hashing function with 32-byte output for both transaction and block hashing.

Monero PseudoRandom Number Generation (PRNG)

When users and computers are creating new keys, it is crucial that they find new keys that others cannot guess. This is actually a very difficult task, since most software is designed to favor reproducibility. If the computer generates randomness in a predictable way, then the output is be ostensibly random but somewhat easier to guess.

  • For example consider a PRNG that simply shuffles the digits of the current time to make a 4-digit key. So at “10:34” it might output “0413” or “1403” or “0134” … If you wanted to keep the output key secret, this would be a terrible method for a few reasons:

  • An attacker who knows that you made your key when you got to work around 9:30 AM would know that the digits “0” and “9” appear, which narrows the choices down to significantly fewer options.

  • There are no HH:MM times of day with three “9”s. In fact, there are no times with any three digits chosen from {6,7,8,9} since 17:89 h, 18:78 h, etc are impossible times. This rule eliminates many 4-digit pins, leaving the attacker to guess from a much smaller pool

The above clock-based random number generator is awful because using the time of day as an “initial seed” is predictable. The initial seed should be much more difficult for an attacker to guess. Good random number generators introduce lots of “entropy” to make their outputs unpredictable. Simply shuffling 4 digits does not introduce much entropy, another reason that our PRNG above would be insecure.

When generating wallets, the user’s operating system provides the initial seed / entropy source. Monero then repeatedly applies the Keccak hashing function, to lead to an unpredictable and non-reproducible output. Each round of hashing produces an output that is used as the input for the next hash.

Encoding

Base58 is a group of binary-to-text encoding schemes used to represent large integers as alphanumeric text. It is similar to another scheme called Base64, however it has been modified to avoid numbers and letters which might look ambiguous when printed. Monero uses this format, strictly for convenience of human users, who often must manually read or transcribe long addresses.

Monero’s Base58 Alphabet: 123456789ABCDEFGHJKLMNPQRSTUVWXYZ abcdefghijkmnopqrstuvwxyz Note: Zero (0) along with the letters I (uppercase i), O (uppercase o), and l (lowercase L) are not present in this Base58 alphabet due to their ambiguity with each other.

Symmetric and asymmetric cryptography

For encrypting data, algorithms can be characterized as “symmetric” or “asymmetric” depending on what type of keys are used.

Symmetric encryption requires the participants to share a secret, for example you encrypt a message by the password “hunter2” and the recipient uses the password “hunter2” to decrypt it. To communicate in this way, both parties must have agreed on the shared (symmetric) secret ahead of time. This practical issue limits the utility of symmetric encryption for many applications.

Asymmetric encryption allows two parties to interact securely without sharing a particular secret. This type of cryptography is woven into the framework of internet security, end-to-end messengers, and cryptocurrencies.

Basics of Elliptic Curve Cryptography

Elliptic curve point addition and scalar multiplication are used at the very basis of asymmetric elliptic curve cryptography schemes so it's important to have a basic understanding of these concepts to understand the cryptography used in Monero.

Elliptic curve point addition

Elliptic curve point addition is different from the typical addition used in every day arithmatic. To add two points together on an elliptic curve you must find the line between those two points and then find the point at which the curve intersects with that line. That point is then reflected over the x-axis to arrive at the final point. When adding a point to itself, known as point doubling, you must find the tangent line to the starting point to get to the point at which that tangent line intersects with the curve. That point is then reflected over the x-axis to arrive at the final point.

Scalar multiplication

Scalar multiplication utilizes both a point on the curve and an integer. To multiply a point, P, by an integer, S, the point is added to itself S times. In asymmetric crypto schemes a common base point on the elliptic curve is used as a generator point to generate public keys from private keys. When the curve generator point is added to itself many times it becomes very difficult to figure out how many times the generator point was added to itself by only looking at the resulting point. This problem is often referred to as the elliptic curve discrete logarithm problem. Because this problem is so difficult to solve, this kind of scalar multiplication is considered a one way function.

Wallet generation

Seed Generation and Encoding

In chapter 2 we talked about how your wallet generates a secret “seed” that is used to derive all of your keys, and access/spend your funds. In that overview, we simply considered the 25-word “seed mnemonic.”

Behind the scenes, a seed is a unique 256-bit integer from which keys and addresses are derived, for example:

112699108505435943726051051450940377552177626778909564691673845134467691053980

These are often represented as a 64-digit base16 number, for example:

f9296f587419f1cdede67de160fca14d1069ecaa4c52f012af031eeA09ee039c

(For mnemonic-style keys, this representation of the seed is actually just the private spend key itself!)

Writing down either of the above key styles would be quite difficult, and most people would be prone to make at least one mistake. Conversion to a seed mnemonic phrase is another step included only for human interpretability and usability. The mnemonic phrase essentially converts the above 256-bit number into to a “24-digit” (24-word) Base1626 “number” (since there are 1626 words in the seed dictionary). This representation of the long seed strings is much easier to read:

lamb hexagon aces acquire twang bluntly argue when unafraid awning academy nail threaten sailor palace selfish cadets click sickness juggled border thumbs remedy ridges border

When your wallet presents the 24-word seed, it adds a 25th word that functions as a “checksum.” This allows later detection of typos or mistakes. Monero’s mnemonic method encodes with a minimum 4:3 ratio. In other words, four bytes creates three words, plus one checksum word; eight bytes creates six words, plus one checksum word; and so on.

The private view key is derived by hashing the seed with Keccak-256, producing a second 256-bit integer, which is then sent to the function called “sc_reduce32” to ensure that it is compatible with the elliptic curve. The seeds created by this method will always be valid scalars as they are sent to sc_reduce32 first.

Asymmetric cryptography in cryptocurrencies

Bitcoin uses an asymmetric encryption with two keys:

  • private key - for signing transactions and for decrypting data)
  • public key - for signature verification and encrypting data

Monero’s more complex cryptographic framework requires four keys:

  • public view key - used to verify the validity of addresses,
  • private view key - used for viewing data such as the balance, fees and transactions amounts. The view key is never used for creating or signing transactions.
  • public spend key - another public key for transaction verification.
  • private spend key - used for signing transactions, i.e. spending Moneroj.

Your public Monero address is a direct representation of the pair of public keys, whereas Bitcoin (and clones) use a hash of their single public key. EdDSA keys (both private and public) are 256 bits long, or 64 hexadecimal characters. Not every 256-bit integer is a valid EdDSA scalar (private key); it must be less than the “curve order” described with the equation in the Ed25519 function section.

Monero Public Address

Monero standard address is composed of the two public keys (the public spend key + public view key). It also contains a checksum and a “network byte” which identifies both the network and the address type. Their concatenation follows these rules:

Monero standard address is composed of two public keys:

  • public spend key
  • public view key

It also contains a checksum and a "network byte" which actually identifies both the network and the address type.

Index Size in bytes Description
0 1 identifies the network and address type; 18 is for main net and it refers to "4"; 53 is for test chain corresponded to "9".
1 32 public spend key
33 32 public view key
65 4 checksum (hash created with Keccak function of the previous 65 bytes, trimmed to first 4 bytes)

The 69-byte output from this specification is then encoded into the Monero Base58 format. This conversion increases the length to a 95-character string that is easy to read and write. Example standard address:

4AdUndXHHZ6cfufTMvppY6JwXNouMBzSkbLYfpAV5Usx3skxNgYeYTRj5UzqtReoS44 qo9mtmXCqY45DJ852K5Jv2684Rge

The pseudo-code below defines the process of generating an address:

  Checksum = H(Varint(Prefix) || A || B)
  SerializedString = Base58(Prefix || A || B || Checksum)

Keys and Address Derivation

To add to the confusion, there are presently at least three different methods of private key derivation in existence for Monero, though Bitcoin also has many:

  • Original (non-deterministic) Style – The Private Spend Key and Private View Key are both independently and randomly chosen to form an account. There is no good way to back up a nondeterministic account other than keeping copies of the files. For these reasons, it is not recommended to use an account of this type.

  • Mnemonic (Electrum or Deterministic) Style – In this style, the Private View Key is derived from the Private Spend Key, so you only need to remember one thing: the seed, which is actually just a representation of the Private Spend Key itself. The Private View Key is derived by hashing the Private Spend Key with Keccak-256, producing a second 256-bit integer, which is then sent to a function that check if the private key is a valid EdDSA scalar. You can backup accounts of this type by writing down or otherwise saving the 25 word deterministic seed; you can easily restore using both Simplewallet and MyMonero.

  • MyMonero Style – This is similar to 2., but uses a 13 word seed instead of a 25 word seed. The 13 words convert to a 128-bit integer that is used for both spend and view key derivation, in the following form: the 128-bit integer is hashed with Keccak-256 to produce a 256-bit integer, a. a is sent to a function, which returns the Private Spend Key. a is hashed once more with Keccak-256 to produce a second 256-bit integer, b. b is then sent to the same function, which returns the Private View Key based on b.

You may have noticed a critical difference between this style and the Electrum Style: MyMonero’s Private View Key derivation is done by hashing random integer a, while Electrum Style derivation is done by hashing the Private Spend Key. This means that 13 and 25 word seeds are not compatible – it is not possible to create an Electrum Style seed (and account) that matches a MyMonero Style seed (and account) or vice versa; the view keypair will always be different. The rest of the address generation process is the same in all three cases. The Private Spend Key and Private View Key are sent to the ed25519 scalarmult function to create their counterparts, the Public Spend Key and Public View Key.

The Monero Blockchain

Monero has a unique kind of blockchain. We talked about what a blockchain is and why it is important. Basically, a blockchain is a distributed public ledger where each payment is recorded. The blockchain cannot be modified due to its distributed nature. It is based on various cryptography protocols and algorithms in order to avoid any cheating.

Lightning Memory Mapped Database

Monero uses the Lightning Memory Mapped Database (LMDB) system to store its blockchain. LMDB is a software library that provides a high-performance embedded transactional database in the form of a key-value store. This means that it is highly effective, and easy to search.

LMDB is written in C++ with API bindings for several programming languages and is developed by Symas Corporation. Here are a few LMDB features:

  • stores arbitrary key/data pairs as byte arrays, meaning
  • has a range-based search capability, allowing fast search (allowing...)
  • supports multiple data items for a single key, providing (providing....)
  • has a special mode for appending records at the end of the database which
  • gives a dramatic write performance increase over other similar stores.

The block

Block structure

The CryptoNote standards define the way to store and delineate data within blocks and on the blockchain. The block structure contains 3 main components:

  • block header;
  • base transaction body
  • list of transaction identifiers.

The list starts with the number of transaction identifiers that it contains.

Block Header

Each block starts with header containing key metadata. The “major_version” defines the block header parsing rules, so it can be interpreted correctly. The table below describes version 1 of the block header format. The “minor_version” defines the interpretation details that are not related to the main header parsing.

Even if the minor version is unknown, it is always safe to parse the block header of a particular major version. Parsing the block header with an unknown major version is risky, since the content of the block header may be misinterpreted.

Field Type Content
major_version varint Major block header version (always 1)
minor_version varint Minor block header version
timestamp varint Block creation time (UNIX timestamp)
prev_id hash Identifier of the previous block
nonce 4 bytes Any value which is used in the network consensus algorithm

Varint means an integer encoded in a variable-length prefix-free representation, such encoding does not contain null bytes.

Base Transaction

Each valid block contains a single “base transaction” that awards the miner with the coinbase block reward. The base transaction must follow the coin emission rules, and include the block height field.

Field Type Content
version varint Transaction format version
unlock_time varint UNIX timestamp.
input_num varint Number of inputs. Always 1 for base transactions.
input_type byte Always 0xff for base transactions
height varint Height of the block which contains the transaction
output_num varint Number of outputs
outputs array Array of outputs

List of Transaction Identifiers

Base transaction is followed by a list of transaction identifiers. A transaction identifier is a transaction body hashed with the Keccak hash function. The list starts with the number of identifiers and is followed by the identifiers themselves if it is not empty.

Calculation of Block Identifier

The identifier of a block is the result of hashing the following data with Keccak-256:

  • size of block_header, Merkle root hash, and the number of transactions in bytes (varint)
  • block_header,
  • Merkle root hash,
  • number of transactions (varint).

The goal of the Merkle root hash is to “attach” the transactions referred to in the list to the block header: once the Merkle root hash is fixed, the transactions cannot be modified. This is the reason for which you can’t modify or cheat the blockchain.

Calculating Merkle Root Hashing

Fees

As we introduced fees in the third chapter, it’s important to know how they are calculated and how Monero developers might reduce them with “Bulletproofs”.

Fees are like Taxes for the network. They incentive miners that will mine a lot of transactions then they will receive a reward which was composed by all the fees contained in the transaction.

Before going deep on Fees, we have to discuss about the Block Dynamic Size for Monero and the Block Static Size for Bitcoin. With Bitcoin, the block size was setup on 1 mB. In this moment, Bitcoin Blockchain has a scalability issue since each block can contain only a defined number of transactions.

Then, the block size limit has created a bottleneck in bitcoin, resulting in increasing transaction fees and delayed processing of transactions that cannot be fit into a block. Various proposals have come forth on how to scale bitcoin, and a contentious debate has resulted.

Instead of the Bitcoin method, Monero uses dynamic block size mechanism to control the rate at which the block size can grow. This is called the Penalty Function for oversize blocks, originally purposed by CryptoNote Developers. As part of consensus rules, part of the base block reward is withheld should a miner expand the block size above the median size of the last 100 blocks. This is coupled with a do-not-relay minimum fee.

The fees sent to a miner in a transaction are defined as:

Miner Fees = BaseReward - Penalty

Penalty is being calculated as

Penalty = BaseReward * ((BlockSize / MN ) - 1)²

Where :

  • MN is the median of the block size over the last N blocks, with N being 100 in Monero
  • BlockSize is the size of the current block
  • BaseReward is the reward as per the emission curve or where applicable the tail emission

In this way, the maximum allowed block size is 2 MN which it should stop any spamming attacks to the Monero Blockchain.

Note that the formula of the BaseReward is defined as follows:

BaseReward = 2 * ((S - A) * 2-20 * 10-12)

Where:

  • 2 is the adjustment factor for the switch to two minute blocks
  • S is the initial number of atomic units is = 264 - 1
  • A is the current circulation, which can be found here. In addition, the current circulation (emission) displayed on the block explorer has to be multiplied with 1012 (Monero uses 12 decimal places) to convert it to atomic units.

Observe that the minimum block size limit is 300 kB. Thus, miners are able to construct blocks up to 300 kB without incurring a penalty. In other words, aforementioned penalty function only “kicks in” for blocks bigger than 300 kB.

But let’s imagine a different scenario. What will happen if the median block size (retrieved by the last 100 blocks for Monero cryptocurrency) significantly diverges from the minimum block size?

Well, dynamic fee algorithm comes to play.

The dynamic fee algorithm is calculated by the weight in kB (Kilobyte) of transaction, meaning heavier a transaction is , higher the fee will be. Regarding the Monero situation, we have a low fee per kB. However, due to the high transaction size, the absolute default fee (in economical terms) is quite high.

Note that the transaction size is this big due to Monero’s inherent default privacy, i.e., the range proofs, which mask the amount values, make up ~12 kB of a single transaction. As I like telling to anyone, privacy costs.

Unfortunately that is not the only way to increase fees. Transaction priority also increases fees. A higher priority will increase the fee so that it successfully competes with other transactions to become part of the next block on the blockchain.

It is defined as:

Fee per kB = (R/R0) * (M0/M) * F0 * (60/300) * 4

  • R is the base reward
  • R0 is the reference base reward (10 XMR)
  • M is the block size limit
  • M0 is the minimum block size limit (300 kB)
  • F0 is 0.002 XMR
  • 60/300 is the adjustment factor to account for the increase of the minimum block size limit (60 kB -> 300 kB)
  • 4 is the adjustment factor to account for the default fee multiplier. That is, the lowest fee level uses a multiplier of 1, whereas the default fee level uses a multiplier of 4

Basically the inverse of the percentage increase of the median block size (against a base of the minimum block size) translates to the percentage reduction in fees. More specifically, a 600 kB median block size, which is a 100% (or factor 2) increase translates to a 50% (1/2) reduction in fees.

So why did the significant price increase not lead to a significant reduction in absolute fees, i.e., fees in XMR terms? Well, basically, the factor increase in price was significantly higher than the factor increase in usage. Furthermore, the median block size needs to be constantly above 300 kB in order for the dynamic fee algorithm to work properly. Moreover, the algorithm was designed to correlate with price, but, as we can see, price is imperfectly correlated with usage. In sum, whilst usage has grown a lot, it hasn’t grown as much as the price and therefore fees (in XMR terms) have not declined yet.

Bulletproofs

As some users reported, Monero fees are high. This is partially wrong since a comparison between others cryptocurrencies and Monero showed Monero has one of the lowest fee per kB.

Originally the problem is our transactions are heavy for the blockchain. Let’s talk about the reasons.

Monero needs to have some “tests” in order to prevent any abuse. An example of preventing abuse could be proving the transaction fee is correct, proving the amount is right and no one is trying to commit a double spend.

In the development field, there is a situation called Overflow meaning a variable reaches the max values. Anyone knows electronic devices have a limited value for anything since “infinite” is an abstract of our world.

Privacy Transactions

Stealth Addresses

Chapter 3 described a situation where Leo sent George some Monero and in doing so he used George's public keys to produce a one-time public key, also known as a stealth address, that is unlinkable to the George's real keys. This section will go deeper to explain the cryptography behind that one-time public key.

Sending

The highly technical formula described in the CryptoNote whitepaper to produce this public output is P = Hs(rA|i)G + B. This means that when Leo wants to send Monero to George he generates a 256 bit pseudorandom scalar to be used as the transaction private key, r. Leo is the only person that will ever know this key, not even George. Leo then multiplies George's public view key, A, by his pseudorandom scalar and then concatenates the output index, i, to resulting point.

This data is then run through the Hash to Scalar function. This function takes the input data, hashes it using the Keccak-256 algorithm, then takes that resulting hash modulo 2252 + 27742317777372353535851937790883648493, a prime order of the ed25519 basepoint. The ed25519 basepoint, G, is then multiplied by the scalar that is output from that function. Finally, Leo adds this point with George's public spend key, B, to produce the final output, P.

Receiving

Now, as described in chapter 3, George must scan the blockchain for outputs that belong to him. To do this he must calculate P' = Hs(aR|i)G + B. The process is very similar to what Leo had to do to send the Monero. George will get the public transaction key, R, used in the transaction from the blockchain and multiply it by his private view key, a. He then must concatenate the output index to the resulting point and that data through the Hash to Scalar function. He then multiplies the ed25519 basepoint, G, by the resulting scalar and finally adds his own public spend key, B, to the resulting point to produce the final point P'. If the output George generated independently, P', matches the output from the blockchain, P, then George knows that he owns that output and can spend the associated Monero.

Ring Signatures

Subaddresses

One way to create off-chain likability is through address reuse. If George uses a Monero address as a deposit address when he withdraws money from an exchange and then uses the same address to receive a payment from Leo, it is possible for the exchange to collude with Leo to find out that the George who used the exchange and the George that accepted Leo's payment are in fact the same person. To prevent this type of attack, Monero has a feature called subaddresses. A subaddress is an address that is derived from a main address and is cryptographically unlinkable from that main address.

Creating subaddresses

So how are subaddresses actually made? Suppose that George has a pair of private keys, the private view key and the private spend key, (a,b) and a corrosponding pair of public keys (A,B) which is equal to (a,b)G where G is the generator point on the elliptic curve. First we must chose a number, i, called the index. Typically the first subaddress will be at index 1. The subaddress will also have a pair of private keys and a pair of public keys for each index of i, which we can then call (ci,di) and (Ci,Di). The formula to create a subaddress public spend key, Di, is Hs(a|i)G+B. This means that first the index, i, is concatenated to the main address private view key, a, and then passed through the hash_to_scalar function (note: in practice the reference client wallet also concatenates the string "SubAddr" to the data to be hashed as a common salt). Then the curve generator point, G, is multiplied by the resulting scalar and finally the main address public spend key is added to the resulting point through elliptic curve point addition. After this subaddress public spend key is generated, it is recorded for use later. To calculate the subaddress public view key, Ci, Di is multiplied by the main address private spend key (Ci = a*Di). Now that the subaddress public keys have been generated they can be encoded into a valid Monero address. Subaddresses are base58 encoded with a checksum just like normal addresses but the network byte is different. Mainnet subaddresses use the network byte 0x42 which makes every mainnet subaddress start with 8.

Sending to a subaddress

When a wallet sees a Monero address with a subaddress network byte it knows that it must construct the transaction slightly differently. Typically with a non-subaddress transaction the transaction public transaction key is calculated by generating 32 random bytes as the private key and then multiplying that private key by the elliptic curve generator point through elliptic curve scalar multiplication. For a subaddress transaction however, the private transaction key is instead multiplied by the receiving subaddress's public spend key. The sender then has the choice to either send the change to his main address or one of his own subaddresses, though the result is typically the same for most users.

Receiving a transaction to a subaddress

As explain in the stealth address section, Monero users must scan incoming transactions with their private view key to check if they own a specific output on the blockchain. Subaddresses don't use the typical system where the receiver calculates P'= Hs(aR)G+B then if P' is equal to the output, P on the blockchain then he knows that he owns that output. Instead the formula for subaddresses is expressed as Di' = P - Hs(aR)G. This means that the receiver calculates Hs(aR)G as usual (see the stealth addresses section of chapter 6 to learn what this means) and then subtracts the resulting scalar from the output that is found on the blockchain, P. If the result is equal to a subaddress public spend key that was recorded when creating that subaddress, then the receiver is assured that he owns that output.