Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variable Transaction Fees: Execution Effort #753

Merged
merged 16 commits into from
Apr 6, 2022

Conversation

janezpodhostnik
Copy link
Contributor

@janezpodhostnik janezpodhostnik commented Jan 11, 2022

Description

This FLIP builds on the foundations of the Variable Transaction fees FLIP and proposes a model for measuring the execution effort of transactions by choosing certain functions/operations, that are called during the execution of a transaction, to have a related execution effort cost. This FLIP explores a choice of functions/operations and uses data collected from sample transactions and a linear model fit to determine the cost of each chosen functions/operations, so that on average the execution effort of a transaction is proportional to the execution time of the transaction. This FLIP also explores the FLOW cost of a unit of execution effort.

@janezpodhostnik janezpodhostnik self-assigned this Jan 11, 2022
@vercel
Copy link

vercel bot commented Jan 11, 2022

This pull request is being automatically deployed with Vercel (learn more).
To see the status of your deployment, click below or on the icon next to each commit.

🔍 Inspect: https://vercel.com/onflow/flow-docs/8hEfcg71yRCZ3sSHrguf3vBqoSnF
✅ Preview: https://flow-docs-git-janez-execution-effort-onflow.vercel.app

[Deployment for c05d847 failed]

flips/20220111-execution-effort.md Outdated Show resolved Hide resolved
flips/20220111-execution-effort.md Outdated Show resolved Hide resolved
Co-authored-by: Jan Bernatik <jan.bernatik@dapperlabs.com>
Co-authored-by: Jan Bernatik <jan.bernatik@dapperlabs.com>
Co-authored-by: Jan Bernatik <jan.bernatik@dapperlabs.com>
@pgebheim
Copy link
Contributor

Some high level comments:

  1. The meat of this document is describing the methodology for data analysis. The effects of this proposal need to be up front, allowing a reader to see: "Before this proposal here are the costs, after this proposal here are the new costs," and this should ideally be in the abstract.

  2. This table supposedly summarizes what the FLIP is proposing: https://github.com/onflow/flow/blob/444c9909b1920b579e428eda51e4df2ef142a8fb/flips/20220111-execution-effort.md#final-model-proposal

However, I am not even sure what the units on this chart are and what they mean. Is this e.g. saying that function_or_loop currently has a cost of 1 effort, but after this proposal it will have a cost of 0.0141165 effort?

  1. The proposal does not discuss how this will be implemented, other than a note of that this can be done without changing the actual fees, but instead just allowing txns to measure execution effort. If that is the case, we need a proposal for what API payloads and data structures need to be maintained in order to support this behavior of reporting execution effort.

  2. On the highest level I worry that what we've done here is fragile modeling that points to a conceptually fragile cost framework.

A thousand implementation details could be affecting the various runtime costs of these operations. Future FVM changes could drastically change the actual runtime impact of the various functions, e.g. by adding changing the method of data structure serialization certain operations could become significantly more or less costly.

For an engineer targeting the FVM and actually writing Cadence it is necessary that there is a long term stable framework for how they can think of their costs. By attempting to build costs for high level activities I think we run the risk of creating a moving target which is either difficult to adjust or difficult to code toward.

Once contracts are in the wild assuming various cost factors (especially related to the max execution cost) it becomes extremely difficult to change these factors. For evidence of this look at the huge conundrum Ethereum went through when attempting to change operation gas costs during the Constantinople hard fork: https://cryptoticker.io/en/ethereum-hard-fork-delayed/

@janezpodhostnik
Copy link
Contributor Author

janezpodhostnik commented Feb 16, 2022

@pgebheim Thank you very much for the input!

  1. The meat of this document is describing the methodology for data analysis. The effects of this proposal need to be up front, allowing a reader to see: "Before this proposal here are the costs, after this proposal here are the new costs," and this should ideally be in the abstract.

The emphasis on a clear description of the process was intentional, as this way it makes the final results transparent and makes the process repeatable which also ties into point 4. I agree that the final results (and the consequences of them) should be better illustrated.

  1. ... However, I am not even sure what the units on this chart are and what they mean ...

The unit of effort is defined so that the average time taken to execute a transaction is proportional to its effort. This still has a free parameter so a suggested solution is to define that 1 unit of effort corresponds to 1 ms of execution time on an average execution node. This might be glossed over in the FLIP, I will attempt to make this clearer and mark it on the table as well.

  1. The proposal does not discuss how this will be implemented ...

No, implementation details were left out. It basically just includes keeping a tally of all the weights hit and multiplying that by a few numbers. I will add a short chapter to outline this.

  1. On the highest level I worry that what we've done here is fragile modelling that points to a conceptually fragile cost framework.

Yes this model is fragile. It is not meant as a final model, but as a first model. Two steps were taken to reduce the fragility:

  • only a few weights were chosen this way it is easier to keep track of how many times each transaction hits these specific weights and its execution time and change the weights if needed.
  • the process of how the weights are chosen is described in full here and can be repeated.

TODO list for me:

  • describe on a concrete transaction, how the fees will change. That example could be a FLOW transfer
  • update abstract to somehow include the gist of the results
  • more clearly describe the unit of effort
  • add a chapter with a high level of overview of the actual implementation

@pgebheim
Copy link
Contributor

@pgebheim Thank you very much for the input!

Thanks! for the quick reply.

No, implementation details were left out. It basically just includes keeping a tally of all the weights hit and multiplying that > by a few numbers. I will add a short chapter to outline this.

So I missed the section in #660 which referenced how the fees are emitted via Events. Perhaps just mention that in this doc as well and link to it in the Variable Transaction Fees flip. I think that describes the public interface changes that would come from this FLIP.

Copy link
Member

@AlexHentschel AlexHentschel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@janezpodhostnik thanks for the detailed write up. I need to understand a bit better how you fitted the model. I am very cautious about using "Robust Fitting".

My comments below are mainly around consistency of the scientific presentation, highlighting assumptions and notation. Can't say anything about the actual methodology of fitting the model, because there are too many questions for me at the moment.


In the [Variable Transaction fees FLIP](20211007-transaction-fees.md) the transaction execution fees are defined as the part of the transaction fees that account for the resources (bandwidth, computing power) needed to execute the transactions' script, to verify the transaction execution and to handle the propagation of transaction execution results. The execution fees (<img style="transform: translateY(0.1em); background: white;" src="./20220111-execution-effort/eq/A0hSAHHSW2.svg">) are defined as a execution effort cost function (<img style="transform: translateY(0.1em); background: white;" src="./20220111-execution-effort/eq/sCA7cYTn4j.svg">) of the execution effort (<img style="transform: translateY(0.1em); background: white;" src="./20220111-execution-effort/eq/WlYFrmB6Y8.svg">) of the transaction <img style="transform: translateY(0.1em); background: white;" src="./20220111-execution-effort/eq/Uzsd7p4YBJ.svg">.

The aim of this FLIP is to create a model for measuring the execution effort of transactions, that satisfies the following criteria:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The aim of this FLIP is to create a model for measuring the execution effort of transactions, that satisfies the following criteria:
The aim of this FLIP is to create a model for measuring the execution effort of transactions that satisfies the following criteria:


## Current state

As of [v0.23.6 release ](https://github.com/onflow/flow-go/tree/v0.23.6), execution effort is referenced to as computation cost. It is counted as 1 per every cadence function call or cadence loop made during the transaction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As of [v0.23.6 release ](https://github.com/onflow/flow-go/tree/v0.23.6), execution effort is referenced to as computation cost. It is counted as 1 per every cadence function call or cadence loop made during the transaction.
As of [v0.23.6 release](https://github.com/onflow/flow-go/tree/v0.23.6), execution effort is coarsely approximated, where we charge 1 unit of effort per cadence function call or cadence loop iteration.


As of [v0.23.6 release ](https://github.com/onflow/flow-go/tree/v0.23.6), execution effort is referenced to as computation cost. It is counted as 1 per every cadence function call or cadence loop made during the transaction.

If the execution effort exceeds the execution effort limit (also currently referenced to as gas limit or computation limit) the transaction fails. The state changes of that transaction are reverted, however the fees are still deducted for that transaction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If the execution effort exceeds the execution effort limit (also currently referenced to as gas limit or computation limit) the transaction fails. The state changes of that transaction are reverted, however the fees are still deducted for that transaction.
If the execution effort exceeds the execution effort limit (also currently referenced to as gas limit or computation limit) the transaction fails. While the state changes of that transaction are discarded, the fees are still deducted for that transaction.


The assumption here was that the processing cost of a running a single function <!-- $N$ --> <img style="transform: translateY(0.1em); background: white;" src="./20220111-execution-effort/eq/4FLaqCTwtI.svg"> times scales linearly with <!-- $N$ --> <img style="transform: translateY(0.1em); background: white;" src="./20220111-execution-effort/eq/4FLaqCTwtI.svg">. This assumption is made only for transactions where the execution effort of the transaction is not above the execution effort limit.

By choosing correct functions for the weights we can find acceptable linear correlation between transaction execution time (<!-- $t$ --> <img style="transform: translateY(0.1em); background: white;" src="./20220111-execution-effort/eq/cbdkfkWybi.svg">) and it's execution effort <!-- $\frac{t}{E} = \textbf{const.}$ --> <img style="transform: translateY(0.1em); background: white;" src="./20220111-execution-effort/eq/ujui9morSm.svg">.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I don't think a linear correlation between execution effort and time holds. The execution effort decomposes into multiple components, most prominently the following:

  1. Running the transaction locally on the execution node (which we generally expect to be roughly similar to the cost of running the transaction locally on the Verification Node). I'll call this the "computation-time effort".
  2. The networking effort of the Execution nodes sharing the necessary data with Verifiers (and potentially access nodes). I'll call this the "networking effort".

While I think that 1. is linearly related to execution time, 2. has no linear time dependency. It is totally fine for you to restrict your attention to 1. in this Flip. Nevertheless, I think it is important to

  • repeatedly emphasize this simplification
  • choose a notation that we can organically extend
    • For example, in this flip you talk about the computation-time effort. You currently denote this as E. Now imagine that you are writing a subsequent Flip in 5 months, where you want to include the networking effort. In that future Flip, you will need symbols for computation-time effort as well as networking effort.
    • If you change your notation in the future Flip, that will be super confusing for people, because the same notion might mean different things in the different flips.
    • My recommendation, based of years of experience with scientific publications, is to think ahead a bit. At the moment, you restrict your attention only to the computation-time effort. But the other efforts are still there, we just choose to not account for them. Ideally, your notation would reflect this. So instead of just talking about "the effort", be more specific and talk about "computation-time effort", and include this in your nomenclature, e.g. by using E_ct instead of E.
      Thereby, you reflect the fact that there are other efforts besides the computation-time effort. Furthermore, you can organically extend your notation in future Flips, without creating a clash of notation.
Suggested change
By choosing correct functions for the weights we can find acceptable linear correlation between transaction execution time (<!-- $t$ --> <img style="transform: translateY(0.1em); background: white;" src="./20220111-execution-effort/eq/cbdkfkWybi.svg">) and it's execution effort <!-- $\frac{t}{E} = \textbf{const.}$ --> <img style="transform: translateY(0.1em); background: white;" src="./20220111-execution-effort/eq/ujui9morSm.svg">.
By choosing suitable the weights for the most relevant functions, we can find acceptable linear correlation between transaction execution time (<!-- $t$ --> <img style="transform: translateY(0.1em); background: white;" src="./20220111-execution-effort/eq/cbdkfkWybi.svg">) and it's execution effort <!-- $\frac{t}{E} = \textbf{const.}$ --> <img style="transform: translateY(0.1em); background: white;" src="./20220111-execution-effort/eq/ujui9morSm.svg">.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the formula: t/E = const
For notation, I would recommend to follow the broadly used conventions:

  • Symbols in bold font denote vectors. const is a scalar quantity and not a vector. Hence, it should not be bold.
  • You have a variable const in your equation. You have chosen this variable name in reference to the english word "constant", yet it is still a mathematical variable and not a word. The usage of colon is discouraged as part of variable names, because it often clashes with mathematical operators that also use . or pseudo-code where . denotes a field or function of an object.

In latex, a common convention is to use\texttt{const} to identify variable names using the typewriter font.


Having a model like that, the secondary goal of this FLIP is to look at how each unit on execution effort would be priced.

## Motivation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In scientific literature, the term "motivation" is usually used when you explain your reasoning for assumptions or simplifications you are making.
In this section, you describe why this work is important. Generally, this is included in the introduction (see https://www.nature.com/scitable/topicpage/scientific-papers-13815490/) without a specific head line. Degree-focused scientific work (most commonly a PhD thesis or Master thesis) often goes much more into detail why the research is important and then it makes sense to have a sub-section "Motivation" within the "Introduction" section. But you are not writing a thesis here 😉 , so I would suggest to remove the headline here.

Suggested change
## Motivation

If you really desire a header why your work is important you could use the head-line "Impact". The specific reason why I am trying to avoid using "motivation" here is because I think that you need more detailed motivation of your "Proposed Design". And there, the headline "Motivation" would most suitable in my opinion.


The `function_or_loop_call` weight is an exception as that counts any cadence function (function calls in the cadence script) and any cadence loop. This is also the weight that is already currently in place, as discussed above.

The weights `GetValue` and `SetValue` are also different, as instead of just counting the number of times a transaction calls those functions, they instead count how many bytes were read or written, when they are called.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weights GetValue and SetValue

I think GetValue and SetValue would conventionally be called features

Copy link
Contributor Author

@janezpodhostnik janezpodhostnik Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Features is a good name. I was also considering just calling them functions or operations, but that seemed too broad. They are more precisely: "a place in the code with an associated weight".


#### Outliers and robust linear model fitting

The following is a linear model fit on the data and plotting the data on a graph of execution time taken (in milliseconds) vs execution effort.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the figure below, you plot transactions with their "execution effort". Can you explain again where you got the model parameters for computing the "execution effort"?


The columns with weights `ProgramChecked`, `GetCode` and `ValueExists` were removed from the weight data matrix <!-- $M$ --> <img style="transform: translateY(0.1em); background: white;" src="./20220111-execution-effort/eq/SmblEwKco6.svg">.

#### Outliers and robust linear model fitting
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😨 I am a bit worried here about the use if robust linear model fitting. As Wolfram explained, robust linear model fitting introduces an axiomatic bias into the model. There is a strong risk here of over-fitting to the training data (benchmark transactions), so its predictive performance on the training set is better, but potentially worse for unknown transactions. Meaning, the model-generated "execution effort" would not be a good predictor for real-world effort.

Copy link
Contributor Author

@janezpodhostnik janezpodhostnik Feb 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a property of the data I tried to include in the model. The "robust fit" + custom weighting function for outliers Is what I used to try to do this.

The property I had in mind (and is also visible in the data) is that for a specific transaction on a specific machine there exists a t_0 such that the execution time for this transaction on this machine (across multiple runs) is t=t_min + delta t where delta t >= 0. In other words the machine cannot run the transaction faster than t_0 but it can (and will) run it slower. (The error of execution time is more positive than negative).


The following is a linear model fit on the data and plotting the data on a graph of execution time taken (in milliseconds) vs execution effort.

![DemoFit](./20220111-execution-effort/demo-fit.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple request:

  • could you please include a legend, to clearly identify the red as the desired output for a perfect model; and the blue as the predictions of our linear model.
  • Can you reflect the density in your graph. The data points in the around the model completely cover the domain. There could be significant differences in density, which are important to evaluate the performance of our model "by human inspection".

Lets take a look at the model's performance for fast transactions (short
To me, it looks there is a very distinct different


The function that converts the residual of a data point to its weight was chosen to be an asymmetrical function with a different cut-off to the left and to the right. This is because the outliers were mostly to the right of the graph, and the signal was to the left of the graph. The cut-off point was 16 times of the mean of the residuals while the cut-off point to the right was 10 times of the mean residual.

![Residual2Weight](./20220111-execution-effort/residual-to-weight.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this graph come from? Sorry, I don't understand. Who determined that this is exactly the shape we wanted to use?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relates to my reply here #753 (comment). I tried to capture a special property of the errors of execution time.

@bluesign
Copy link
Contributor

The original goal described in the Variable Transaction fees FLIP was that for 95% of transactions to be cheaper after the transition to variable transaction fees.

I think if you have a goal like this, it is also important to put some real cost in the calculation. ( PS: This goal is not realistic but it is not important in this FLIPs context )

If we think let's say running an execution node costs X now, out target should be when 100% utilization of execution node, at least sum(transaction fees) > X , I think otherwise we are allowing abuse.

This means that the 95th percentile execution time times the price of one unit of execution effort should be half of the current fixed fees (the other half will be inclusion effort):

why here is a split of 50/50 ? If there is a calculation here, I think would be nice to share. For me it is not making sense, but maybe I am missing something.

@janezpodhostnik janezpodhostnik marked this pull request as ready for review March 29, 2022 18:04
@pgebheim pgebheim merged commit f030897 into janez/transaction-fees Apr 6, 2022
@pgebheim
Copy link
Contributor

pgebheim commented Apr 6, 2022

Merging this pull request so that the canonical document can be linked. Any future edits should come in the form of a secondary PR.

pgebheim added a commit that referenced this pull request Apr 6, 2022
* tx fees first version

* minor fixes

* second pass

* updates

* diagram fix

* fix image

* update

* redesign fee breakdown

* update fees flip

* wording changes

* fixed spelling mistakes

* Update 20211007-transaction-fees.md

* The diagram isn't needed

* Wording changes

* some wording changes

* Variable Transaction Fees: Execution Effort (#753)

* execution effort intial commit

* Cleanup

* fix wrong graph

* Update flips/20220111-execution-effort.md

fixing typo

* Update 20220111-execution-effort.md

* Update 20220111-execution-effort.md

* Update flips/20220111-execution-effort.md

Co-authored-by: Jan Bernatik <jan.bernatik@dapperlabs.com>

* Update flips/20220111-execution-effort.md

Co-authored-by: Jan Bernatik <jan.bernatik@dapperlabs.com>

* Update flips/20220111-execution-effort.md

Co-authored-by: Jan Bernatik <jan.bernatik@dapperlabs.com>

* robust fitting graph description fix

* minor fixes + warning under construction

* Update execution effort

* update FLIP

* some cleanup

* some cleanup

* remove unused assets

Co-authored-by: Jan Bernatik <jan.bernatik@dapperlabs.com>

Co-authored-by: Janez Podhostnik <janez.podhostnik@gmail.com>
Co-authored-by: Jan Bernatik <jan.bernatik@dapperlabs.com>
Co-authored-by: Janez Podhostnik <67895329+janezpodhostnik@users.noreply.github.com>
@peterargue peterargue deleted the janez/execution-effort branch January 17, 2023 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
FLIP Flow Improvement Proposal S-Governance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants