-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decoy Selection Algorithm - Areas to Improve #86
Comments
Shortly I will have a draft of a roadmap to tackle "3) Improve the distribution criteria used to select decoys" for review by relevant parties. However, it will not be public at this time. It is too early and too sensitive for public release. |
Those blue+orange plots (output age):
I do not see how you can draw any conclusion (regardless of the answers to the above) from those graphs. MyMonero+Monero-LWS didn't have the integer truncation issue in the decoy selection, and was used for an unknown percentage of outputs. We'd need to see how both algorithms, done in "real-time", compared to the actual data to even start guessing at real spends. And even then we would be assuming that there isn't some other algorithm in the wild, etc.
This should make the flood attack issue worse - an attacker can flood the network with spends AND custom decoy algorithm to alter how honest wallet users are selecting decoys. OTOH, I don't think its possible to identify "honest" spend patterns from "attacker" spend patterns to create a correct decoy distribution algorithm. |
This question makes me wonder if some of the difference between orange and blue is due to rising tx volume. Perhaps noticeably more recent decoys are selected than otherwise would be if tx volume were uniform. |
Statistical methods can reveal a lot.
@vtnerd I agree that this is a tricky issue to handle. We are still in the very early stages of examining the idea of a rolling estimator. One alternative approach would instead be to have expert analysis of the data determine the correct distribution and filter out attacks, and update it periodically. Or there could be some hybrid approach. Enforcement of a particular distribution at the consensus level would bring another set of costs and benefits. There are game theoretical issue at play, for sure. This needs CCS-funded research, which I intend to carry out. |
This response is misleading given the original context for my criticism. The author of the post (@j-berman ) was claiming that two graphs show real spends, but that doesn't appear to be true. Perhaps other statistical methods can reveal this information, but I don't see where this is demonstrated in his posts or yours.
The ability to distinguish between attack transactions and regular transactions is a bold claim, particularly in the generic flood attack where the decoy algorithm is followed. This reminds me of the person claiming to identify "Satoshi transactions", which was definitely flawed horribly.
Might be possible, but always seemed too risky. A sharp corner in a floating point calculation would cause a chain split, etc., although luckily this seems to be usually well behaved. |
@vtnerd You are right: it is not. We are working on it. I am the first qualified statistician to review Monero's mixin selection algorithm and I have been able to make quite a bit of headway in a short time, with the support of @j-berman and others.
Stay tuned on this. Something quite concrete will be released within 10 days. I will be sure to ping you once it is released. |
Yes
It was calculated at each block height for that exact reason. I iterated over every observed rct input in every ring over a block range, then re-selected an output using the wallet2 implementation from the distribution of rct outputs that would have been used in the algorithm (edit: trying to make it clear I used
Adding the MyMonero+Monero-LWS implementation to the mix: EDIT: updated above charts to keep consistent with the OP for clarity The triangular gap is still present in all cases (though smaller in the more recent interval), which seems to suggest that either another implementation is out in the wild selecting younger outputs with a high degree of frequency, or real outputs are not as well covered by the decoy selection algorithm in that triangular gap than they could be, but are still covered to a solid extent (edit: softened my wording here). More work dedicated toward trying to remove rings that are highly unlikely to be selected using the old wallet2 or monero-lws implementation from the set of all rings could be done to determine if there other implementations out in the wild. But at this point, I don't think that's a necessary pre-requisite to continuing work on patching integer truncation (# 1), or trying to move forward on binning (# 2), which are still likely tangible improvements given the evidence presented imo. Certainly my wording could have and should have been less strong in my conclusions, but the fact remains that these are areas where it seems the decoy selection algorithm evidently has room to improve via the solutions suggested. I'm trying to use the information in the best way possible to improve the algorithm, and I feel this information presented reaches the bar of providing enough tangible evidence which could be used to say for example: "we should probably go with a binning approach soon-ish because of this, even before bumping the ring size a significant amount", and I personally want to shift more of my focus and work toward solutions. If it fails to reach that bar, I'm cool with going back to the drawing board, or scrapping my ideas entirely and moving on. EDIT: Adding analysis on the 80k-230k range discrepancyI also added openmonero's implementation, which seems to have use too, for this part (and not the other) because it is more likely to select older outputs (and I zoomed out to make the charts a bit clearer and lend more context). While neither openmonero's nor MyMonero+Monero+LWS implementation seem to corroborate the apparent blue spikes (why would the spikes be confined to the 80k - 230k age range?), it is making me think there is a higher likelihood of some implementation out in the wild that could be causing the spikes, and would agree that I am jumping too strongly to the conclusion that those spikes seem to be caused by real outputs. Analysis trying to identify rings not created via the known algorithms (and bucketing rings by algorithm if possible), seems to be a necessary pre-requisite to get a better idea of causes for the spikes. Perhaps you have other ideas of known algorithms that could feasibly be out in the wild, @vtnerd? General defense of my decision making in the analysisI left out the MyMonero implementation's results in the original post so as to make this analysis simpler to understand. I agree the analysis and conclusions needed that piece of evidence to be stronger, but I didn't want to make it more confusing than it already is, and felt I provided enough information that makes it clear that what is identified is worth improving via the solutions mentioned. My point here wasn't to say "You can 100% identify real spents with certainty because of these things", it was to say, "because of these things, it seems that statistical analysis could potentially be utilized to make better guesses as to real spents, and here are some things we can do today to thwart that analysis." My wording/conclusions could have and should have been less strong, and I should have worded the post more in line with my intent, but the fact remains that these are areas where it seems the decoy selection algorithm evidently has room to improve via the solutions suggested. For the record, if I fail to defend my logic and reasoning to support what I've done thus far (and included as a huge component of my CCS proposal), I think it would make sense for me to issue refunds to anyone who wants one (and can show their tx key) for my CCS proposal, and return remaining funds to the general fund. I have no interest in money I don't deserve. EDIT: Perhaps I am taking your criticisms and questioning too personally, and this was unnecessary for me to include in my initial reply:
I understand your criticisms are all geared towards writing the strongest code, and maintaining the strongest algorithm possible, with the strongest evidence. And all your comments have been strictly related to the work I've shown, and nothing about me personally. EDIT 2: removed my claims on likely or unlikely causes that can't necessarily be fully justified with the above information as per comments |
I'd like to separate the claims from the raw research, since I think some people are getting carried away with the claims. A bad selection algorithm brings forward many problems, so the selection should fit reality. However, I caution pulling specific impact from these, since are mostly unfounded and difficult to measure. With that established, I think reasonable discussion on how to improve the selection algorithm, where applicable, can proceed. |
Edited my comments as per above comments, feedback, and criticism. Github probably isn't the best place for this and sorry to bring my personal thoughts here, but FWIW I think @vtnerd 's criticisms were fair, and I was incorrect in leaning too strongly toward conclusions given the evidence presented. The criticism was a bit harsher than I'm used to, and will probably take some getting used to on my part, but I'll develop thicker skin. It is completely understandable to maintain a very high standard for research done on the protocol (especially public research where in the worst case could be misconstrued as useful in targeting an innocent person), and for what ultimately ends up in the code, especially considering how important for people's privacy it is to get this area of the code right. Plus, over in our discussion in monero-lws, I did frame this analysis as a precursor for what could end up in the code over there, and as a precursor for patching/not patching integer truncation. So basic conclusion, as far as who was more in the right and who was more in the wrong above, I accept that he is more in the right than me (edit: and me more in the wrong), both here and there. Trying to improve. Unfiltered criticism is also welcome toward that end. |
@SamsungGalaxyPlayer What do you mean by this?:
|
On validating constructed rings match the distribution (4), here's a working code sample that shows the intuition behind the idea to assign each ring member a chunk of the PDF, and allows a margin for error for a floating point corner: #include <iostream>
#include <random>
#include <boost/math/distributions/gamma.hpp>
#include <fstream>
double RING_SIZE = 11;
int NUM_DECOYS = RING_SIZE - 1;
int NON_MATCHING_RING_MEMBER_BUFFER = 1;
double GAMMA_SHAPE = 19.28;
double GAMMA_SCALE = 1/1.61;
boost::math::gamma_distribution<double> boost_dist(GAMMA_SHAPE,GAMMA_SCALE);
std::random_device rd;
std::mt19937 gen(rd());
std::gamma_distribution<double> distribution(GAMMA_SHAPE,GAMMA_SCALE);
int get_index_in_ring(double x)
{
double cumulative_probability = boost::math::cdf(boost_dist, x);
return floor(cumulative_probability * RING_SIZE);
}
std::vector<double> select_decoys(double real_x)
{
std::vector<double> ring(RING_SIZE, 0);
int real_index_in_ring = get_index_in_ring(real_x);
ring[real_index_in_ring] = real_x;
int num_decoys_selected = 0;
while (num_decoys_selected < NUM_DECOYS)
{
double x = distribution(gen);
int idx = get_index_in_ring(x);
if (!ring[idx])
{
// could over-select outputs like client does for each index, just need to make sure request enough, especially for younger index (coinbase outputs unlock after 60 blocks)
ring[idx] = x;
++num_decoys_selected;
}
}
// in actual code, would iterate over all members and get output indexes here
return ring;
}
bool validate_ring_member_distribution(std::vector<double> ring)
{
for (int i = 0; i < RING_SIZE; ++i)
std::cout << "Ring member "<< i + 1 << ": " << ring[i] << "\n";
for (int i = 0; i < RING_SIZE; ++i)
{
// in actual code, would work backwards from output index to derive x.
// need to carefully handle how output is randomly selected from a block at the end of the gamma picker -- can get expected index of highest and
// lowest member of block output is in
double x = ring[i];
int expected_index_in_ring = get_index_in_ring(x);
// The NON_MATCHING_RING_MEMBER_BUFFER provides a margin for error for whatever reason, such as a floating point corner case
if ((i + NON_MATCHING_RING_MEMBER_BUFFER) < expected_index_in_ring || (i - NON_MATCHING_RING_MEMBER_BUFFER) > expected_index_in_ring)
{
std::cout << "Ring is bad... member " << i + 1 << " is " << (expected_index_in_ring < i ? "too young" : "too old" ) << "\n\n";
return false;
}
}
std::cout << "Ring is good!\n\n";
return true;
}
void write_csv_for_analysis()
{
std::ofstream myfile;
myfile.open("special_selection_results.csv");
myfile << "Specially Selected, Pure Gamma\n";
int NUM_SIMULATED_RINGS = 100000;
for (int i = 0; i < NUM_SIMULATED_RINGS; i++)
{
std::vector<double> special_ring = select_decoys(distribution(gen));
for (int j = 0; j < RING_SIZE; j++)
myfile << special_ring[j] << ", " << distribution(gen) << "\n";
}
myfile.close();
}
int main()
{
// in actual code, would work backwards from real output index to derive x
double real_x = distribution(gen);
std::vector<double> good_ring = select_decoys(real_x);
validate_ring_member_distribution(good_ring);
// member 2 young enough to be member 1
std::vector<double> bad_ring = {1, 5, 10, 10.5, 11.4, 11.8, 12.3, 13.0, 13.8, 14.8, 16.5};
validate_ring_member_distribution(bad_ring);
// member 9 old enough to be member 10
std::vector<double> bad_ring2 = {1, 9.1, 10, 10.5, 11.4, 11.8, 12.3, 13.0, 14.7, 14.8, 16.5};
validate_ring_member_distribution(bad_ring2);
// member 3 young enough to be member 1
std::vector<double> bad_ring3 = {1, 2, 3, 10.5, 11.4, 11.8, 12.3, 13.0, 13.8, 14.8, 16.5};
validate_ring_member_distribution(bad_ring3);
// member 8 old enough to be member 10
std::vector<double> bad_ring4 = {1, 9.1, 10, 10.5, 11.4, 11.8, 12.3, 14.8, 14.9, 15.0, 16.5};
validate_ring_member_distribution(bad_ring4);
write_csv_for_analysis();
return 0;
} Outputs:
Python to sanity check that it produces the same result as pure gamma selectingimport csv
import matplotlib.pyplot as plt
import math
MAX_X = 25
x_axis = [0] * MAX_X
ax = plt.axes()
special_gamma_counts = [0] * MAX_X
pure_gamma_counts = [0] * MAX_X
special_gamma_total = 0
pure_gamma_total = 0
with open('special_selection_results.csv', 'r') as csvfile:
rows = csv.reader(csvfile, delimiter = ',')
idx = 0
for row in rows:
if idx > 0:
special_gamma = float(row[0])
pure_gamma = float(row[1])
if special_gamma < MAX_X:
special_gamma_counts[math.floor(special_gamma)] += 1
special_gamma_total += 1
if pure_gamma < MAX_X:
pure_gamma_counts[math.floor(pure_gamma)] += 1
pure_gamma_total += 1
idx += 1
print("x | Special gamma | Pure gamma")
print("-----------------------------------")
i = 0
for special_gamma in special_gamma_counts:
x_axis[i] = i
special_gamma_counts[i] = special_gamma_counts[i] / special_gamma_total
pure_gamma_counts[i] = pure_gamma_counts[i] / pure_gamma_total
print("%-4s | %1.6f | %1.6f" % (i, special_gamma_counts[i], pure_gamma_counts[i]))
i += 1
ax.plot(x_axis, special_gamma_counts, label='Special gamma', linewidth=4)
ax.plot(x_axis, pure_gamma_counts, label='Pure gamma', linewidth=2)
plt.legend()
plt.show() Outputs:
Plotting PDF & CDF with the help of numpyimport csv
import matplotlib.pyplot as plt
import numpy as np
ax = plt.axes()
special_gamma = []
pure_gamma = []
BINS = 100
with open('special_selection_results.csv', 'r') as csvfile:
rows = csv.reader(csvfile, delimiter = ',')
idx = 0
for row in rows:
if idx > 0:
special_gamma.append(float(row[0]))
pure_gamma.append(float(row[1]))
idx += 1
special_gamma_count, special_gamma_bins_count = np.histogram(special_gamma, bins=BINS)
special_gamma_pdf = special_gamma_count / sum(special_gamma_count)
special_gamma_cdf = np.cumsum(special_gamma_pdf)
pure_gamma_count, pure_gamma_bins_count = np.histogram(pure_gamma, bins=BINS)
pure_gamma_pdf = pure_gamma_count / sum(pure_gamma_count)
pure_gamma_cdf = np.cumsum(pure_gamma_pdf)
plt.plot(special_gamma_bins_count[1:], special_gamma_pdf, label="Special gamma", linewidth=4)
plt.plot(pure_gamma_bins_count[1:], pure_gamma_pdf, label="Pure gamma", linewidth=2)
plt.title('PDF')
plt.legend()
plt.show()
plt.plot(special_gamma_bins_count[1:], special_gamma_cdf, label="Special gamma", linewidth=4)
plt.plot(pure_gamma_bins_count[1:], pure_gamma_cdf, label="Pure gamma", linewidth=2)
plt.title('CDF')
plt.legend()
plt.show() EDIT: added sanity check against pure gamma selecting |
@j-berman This will definitely be an improvement |
Rereading the thread, I find that the discrepancy exhibited by @j-berman (muchos thanks to him) between I'm turning this in my head and I see only one possible scenario :
I would be very happy to be pointed what is wrong in the above scenario. |
Enforcing the gamma distribution (or any distribution) in consensus code would prevent a future soft fork to a better algorithm (or possibly a soft fork to fix bugs in the existing algorithm). This type of consensus-enforced distribution seems naturally at odds with a continually slowing release cycle. One must also consider the possibility that bugs in the gamma validation code would necessitate bugs in the ring selection, which seems quite a troubling failure mode, versus the current scenario where privacy suffers slightly but can be fixed somewhat rapidly. |
@iamamyth that's a valid concern, but I can't foresee a situation where we would change the algorithm solely through a soft fork. You specifically want people to follow the same curve, not spend using arbitrary or old ones. Edit: I suppose you could argue the recent .3 update did this, but I like to hope this is a one-off. |
@HoverHalver You should be worried. I think you miss an even simpler explanation: The mixin selection algorithm of the reference wallet itself poorly fits the real spend distribution. It was designed by top computer scientists who unfortunately did not know what they were doing with the statistics. You need a qualified statistician or similar person to fix the algorithm. As an empirical microeconomist I am working on such a fix. It will take a few months to fully research the problem and develop the solution, however. @iamamyth I agree that the enforcement issue is quite tricky and will need to be studied for a long time to understand all of the implications.
In fact I am kicking around the idea of a continually-updating distribution estimated by consensus rules themselves. Seems far-fetched, but it may be possible. Statistics is a vast discipline. |
There was a spike in transactions but the density calculation is averaged over a year. There should be a discrepancy between the "real" and computed density during certain timeframes. I don't recall seeing this mentioned anywhere (publicly), and might be a major contributor to the differences. This is slightly different than the gamma distribution being poor overall, because a better model with an equally slow reaction should have similar issues.
A person with a math background also looked at the design, and primarily criticized the density calculation (at least to me personally). This person may have been spot-on (about 2.5 years ago). My only rebuttal was/is that it was better than what we were doing previously, and improving upon probably wasn't going to be easy. |
@vtnerd wrote
Thanks for your answer. |
On the PoC to select outputs in chunks of the PDF (or "quantiles"), even if validating at consensus using that approach doesn't move forward, I think it would be sensible to do in the client anyway. It avoids the circumstance where a ring may be constructed of only recent members, revealing that the output is recent, and aiding in timing analysis. In a PoC for wallet-side "binning" I'm working on, I'm thinking of combining the quantile approach with binning. To see why it would be beneficial, a ring like this is possible (with low probability) with the current approach:
Notice how the oldest member of the ring is from 2021-09-23. That in and of itself is useful information ("this person received Monero at some point in the last 10 days"), and also gives some reason to guess eliminate outputs older than 2021-09-23 in other rings in the tx depending on context. With the quantile method, the oldest output is guaranteed to come from the oldest quantile. The youngest possible "oldest" output right now would be from 2021-07-28 (assuming no bins, or 11 bins/2 members per bin, i.e. 10 quantiles). EDIT: created a new comment for the downside below. |
Potential downside with the quantile approach: assume you have a 2-input (or more) tx with real outputs that are recent or close in age. The quantile approach (and no binning) may land you with 2 rings where a single output in a quantile from both rings is very close in age, but the rest of the quantile outputs are further apart. This could possibly aid in analysis as well. Binning (and higher ring sizes) would mitigate it, however, it's worth continuing to weigh this downside, also weighed against how well the current algorithm protects against this (which also is known to have weak protection here). Example:
EDIT: I think binning + larger ring sizes are the real mitigation to this, and it should just be assumed the current implementation is prone to the same thing as well. |
Abstract
When someone sends you Monero, you receive an output. When you spend your Monero, you spend that output. When constructing a transaction, your wallet hides the output you are spending among a set of decoy outputs to obfuscate which among them is your real output being spent. If you receive an output on May 4th, then go to spend it on May 5th, the decoy selection algorithm will pull in 10 decoy outputs from across the chain's history to include in your transaction on May 5th, so that no one can tell which among the 11 total outputs is yours.
To be clear, the block in which each output is created constitutes metadata that is transparent on-chain, along with metadata that shows which outputs are included in which transactions. It is not transparent which outputs are real outputs being spent in each transaction, since the real outputs spent in transactions are obfuscated among decoy outputs.
All outputs on the blockchain do not have an equal probability of being selected by the decoy selection algorithm, because that would allow someone analyzing the chain to make strong guesses about which outputs are really being spent in a transaction (for example, by using a "newest spend" heuristic). Instead, the algorithm uses known spending patterns to select decoys, hiding outputs among the crowd.
Overview of this improvement proposal
There appears to be significant room for improvement in the decoy selection algorithm, and I'd like to summarize what I've found over the last month to show what is worth focusing attention on today. I'm planning on submitting a CCS proposal to work on these areas (among other tasks) full-time. This issue will be divided into 2 parts:
On the latter, I'm sure most reading this are already familiar with the "binning" strategy (discussed in #84 and in Möser et al. (2018) section 6.2). I'll be presenting a case here for why binning should be more seriously considered for implementation in the next hard fork, even before a significant ring size bump that will come when one of Triptych/Seraphis/Lelantus is ready. For those who don't know what binning is, I will explain it in more detail later on.
Evidence highlighting areas the decoy selection algorithm could improve
Currently, the decoy selection algorithm selects 10 decoys using a distribution fit to the spending pattern based on linkable Monero transactions back in 2016/17 (as recommended in Möser et al. (2018)). This approach has known weaknesses:
Assume you use Monero throughout the day one day, and have some change outputs collected from the day. Weeks later, you decide to send some Monero somewhere else. If your wallet uses multiple outputs from that same day in distinct rings in your transaction, those outputs stick out in the rings, and make it easier to guess real spents with accuracy. [TO-DO: quantify the chances of 2 distinct rings in a transaction having outputs within 100 blocks of each other, plotting age on the x-axis]. As partial mitigation to this, some wallets may warn the user that spending multiple outputs older than 30 days in a single transaction may be harmful for privacy, or that spending outputs that are within 5 blocks of each other can break ring signatures. However, there is not always much recourse available to the average user to avoid doing these things.
Assume you have 1 transaction tied to a known user. You can do temporal analysis on the outputs used in the rings of the transaction, and arrive at more likely outputs used as inputs to the transaction. For example, maybe you know around what time the user would spend an output, and therefore can deduce that only the outputs in that time window are likely the user's.
If the gamma distribution is off for whatever reason (for example, general spending patterns change materially as adoption picks up and usage changes), and the gamma fails to select decoys proportional to the real spends, then it becomes easier to guess real spends. Below is evidence showing where the gamma distribution seems to be off.
Inconsistencies with the observed shape of the distribution to the expected
CAUTION: drawing conclusions about real outputs from the following analysis is ill-advised, since the analysis is too raw to be useful for that purpose. It isn't clear if the analysis can even be used for that purpose, though, it does highlight potential issues worth assessing to improve the algorithm. Added this note after @vtnerd and @SamsungGalaxyPlayer's comments in the discussion below.
Outputs between 10 and 100 blocks old
The current decoy selection algorithm (orange) appears to be under-selecting outputs relative to the observed output ages in rings (blue), as is apparent by the large triangular gap above the orange line to the blue. Basically, there are many more outputs observed on-chain than the current decoy selection algorithm would produce over that age range, which is visualized as the triangular gap.
Here are 3 potential causes for this triangular gap:
EDIT: removed my opinionated analysis toward a cause while raw research is ongoing, since it's not as clear yet which is the cause, and doesn't serve much use for purposes of seeking to improve the algorithm at this point.
It should also be noted that there are known implementations in use that do not use the "Current decoy selection algorithm" (and to be clear, the "Current decoy selection algorithm" is wallet2's implementation before the latest round of patches for the earliest spend issue/incorrect curve fit).
Here are the distributions of outputs between 10 and 100 blocks old over various hard fork intervals for consideration:
Outputs around 100,000 to 250,000 blocks old
Notice how many more outputs are observed on chain in the following regions than the decoy selection algorithm would produce:
Outputs between 170,000 - 230,000 blocks old
Outputs between 80,000 - 240,000 blocks old
Easier visual across the chain
Below are additional charts that quantify the above gaps noticed on chain as a ratio of Current / Observed:
EDIT: note that the following is ONLY true assuming the decoy selection algorithm is the only algorithm in use, which we know is not the case. However it is still useful to understand what the charts are saying.
If the blue line is below 1 AND that algorithm were the only one in use, that means the decoy selection algorithm would be under-selecting outputs. If it is above 1 AND that algorithm were the only one in use, that means the decoy selection algorithm may be over-selecting outputs. If it is equal to 1, the decoy selection algorithm is perfectly hiding outputs.
However, considering the algorithm is not the only one in use, it seems going much further with this analysis without taking all other algorithms in use into account wouldn't be very useful.
The areas to focus attention on today
Here are 4 areas to focus attention on in order to improve upon the above aspects of the decoy selection algorithm:
1. Safely fix integer truncation in the wallet (#7798)
As highlighted in in #7798, there is a line in wallets that causes a step in the decoy selection algorithm to truncate an integer after division (Integer truncation example: 260 / 100 should equal 2.6, but truncated is 2). Today, this causes the
average_output_time
, which is just the average seconds spaced between each output over the prior year, to truncate to 1 from its true value which is around 1.9 today. This ends up causing the selected decoy to be about ~1.9x older than it would otherwise be if integer truncation were patched. Unfortunately it is not completely clear whether it is safe to make this change today for 2 reasons:Both points 1 and 2 need further analysis to arrive at a stronger conclusion.
2. Get "binning" to a production-ready implementation
Binning mitigates weaknesses from selecting decoys using spend-time patterns as the sole criteria. The basic idea behind binning is that outputs are bucketed into bins of a fixed size of similar height. With binning, when you spend an output, you reference a bin that the output is a member of, and include outputs from that bin as the decoys in your transaction. For example, your real output is in block 100, so you'll construct a bin using your real output and another output in block 100. With binning, an observer would have a much tougher go at using temporal analysis to deduce which output is yours, since your output is among a bin of similar aged decoy outputs.
Binning can be combined with the gamma selection like this: with ring size 14, you could have 7 bins of size 2 (numbers strictly meant for illustration). 6 of the bins your transaction references would be composed of 2 decoys, and the 7th would include your real and 1 decoy. The 6 bins would be selected via the gamma distribution.
Weaknesses that binning protects
Binning provides protection for users in the circumstance where they spend multiple older outputs in a single transaction, especially when those outputs are close in age. Going off the first example given in the weaknesses section above, assume you use Monero throughout the day one day, and have some change outputs collected from the day. Weeks later, you decide to send some Monero to someone. If your wallet uses multiple change outputs from that same day in distinct rings in your transaction, those outputs would be packed into bins among outputs of similar age. While it may be clearer which bins contain the real output, it would not necessarily be clear which outputs in the bins are the real outputs.
Importantly, binning also provides protection in the event the gamma is off. Assume the gamma does not correctly pick up that people are frequently spending outputs that are 100,000 blocks old. Without bins, the decoy selection algorithm would infrequently select decoy outputs that old, and therefore outputs observed on chain that are 100,000 blocks old would be clearly identifiable as very likely real. Now, if binning were in place as a fallback, then perhaps an observer can more easily recognize which bin is more likely the real bin as mentioned above, but still cannot deduce with certainty which output from the bin is the real output. It provides an additional fallback, and considering the known issues highlighted above with the gamma distribution today, I believe it is an important fallback to implement today.
Current status of binning
@UkoeHB has proposed a way to implement binning (#84) that still requires much deeper research and exploration to ensure it is both feasible and safe. As an alternative, we could implement a naive client-side binning approach, that would be very simple to implement, but would not have the space-saving benefits that @UkoeHB's proposal brings. If we are looking to bump the ring size a small amount soon without any other changes, I figure it would make sense to introduce a client-side binning implementation alongside that ring size bump, since there is no loss in space efficiency compared to the current approach. I plan to introduce a PoC in a separate issue.
3. Improve the distribution criteria used to select decoys (possibly move beyond the gamma)
More to come from @Rucknium on this section separately.
Here is a summary of his ideas in my words: the gamma distribution is too limiting in its application, since its shape can only be affected by 2 parameters. There are 2 general approaches of selecting outputs that would allow for a more malleable shape of the expected decoy distribution, to better match spending patterns compared to the current gamma distribution:
Short- to Medium-term solution: Use a generalized gamma distribution, which essentially takes in a larger number of parameters than the normal gamma. The shape of the generalized gamma could be flexed to better match historically observed spending patterns. Other more flexible parametric distributions could also be explored.
Long-term solution: Use a nonparametric estimator to bridge the divergence of the expected decoy distribution to the observed. As spending patterns change, the algorithm would adjust to match, rather than stay constant.
@Rucknium plans to expand on this in much more detail in a separate doc.
4. Implement validation at the consensus level that ensures constructed rings use the expected distribution
Today, there is next-to-no validation that a wallet is constructing decoys that follow the expected distribution. This validation can be more robust. More robust validation guarantees transaction uniformity to the distribution, and provides a stronger guarantee that all wallets conform to the expected specification. It may also make a FloodXMR attack less beneficial to the attacker (or other privacy opponents) by preventing an attacker from placing transactions on-chain that are intentionally distinguishable by their mixin distribution. Further, it can prevent users from accidentally leaving a trail (for example, by consistently delaying in submitting pre-signed transactions).
In the past, it was decided that validation for this would be too complex to implement. There may also be pernicious edge cases that cause transactions to fail unexpectedly, such as the pre-signed delayed transaction submission failure case. However, I figure it is worth revisiting to get a PoC going that shows how it can work simply, but understood if this faces pushback.
One way to do this validation is to divide the PDF of the expected distribution up by the ring size, and ensure that the ring contains outputs from each chunk of the divided PDF. For example:
When the client constructs rings, it would slot the real into its position in the PDF, and fill out the rest of the PDF chunks with decoys.
Another way proposed by @Rucknium is to perform a formal hypothesis test, with the null hypothesis being that the mixins' ages was derived from the enforced distribution. The p-value could be set to 0.05, for example. This approach is not without challenges.
Going Forward
The general idea behind this issue is to establish an understanding for exactly why these 4 areas are worth exploring further in order to improve the decoy selection algorithm. This issue could serve as a task manager of sorts for the above 4 areas to continue exploring. If an area is deemed unworthy of further exploration, it could be removed from this issue. Once all 4 areas are sufficiently explored, accepted or rejected, this issue could be closed.
Edit 1: used correct chart
Edit 2: added one of the ways wallet-cli warns users who spend outputs that are temporally close
Edit 3: removed my claims on likely or unlikely causes and/or calculations that can't necessarily be fully justified with the above information as per comments below
The text was updated successfully, but these errors were encountered: