Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gas intrinsic proto #64

Merged
merged 3 commits into from
Nov 12, 2021
Merged

Gas intrinsic proto #64

merged 3 commits into from
Nov 12, 2021

Conversation

olonho
Copy link
Contributor

@olonho olonho commented Oct 12, 2021

As our measurements show that gas counting is significant part of execution cost,
we need to optimize it as much as possible. This PR provides support for shared
fast gas counter structure, second part is implemented in near/nearcore#5121

Fast gas counter has three fields: current counter, limit and op cost. When we see call to host
function called gas with single argument we do not actually emit such a call, but instead
emit inline instruction sequence to increment gas counter per requested amount and if limit is hit -
GasExceed trap happens.

For fast gas counter to work VM embedder shall provide gas counter context, and VM will trap on call
to gas function if such context wasn't provided.

@olonho olonho changed the base branch from master to near-main October 12, 2021 11:50
@@ -63,6 +63,9 @@ pub enum TrapCode {

/// An atomic memory access was attempted with an unaligned pointer.
UnalignedAtomic = 11,

/// Run out of gas.
GasLimitExceeded = 12,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be GasExceeded in NEAR, if you pass prepaid_gas as GasCounter.limit:

            let res = if new_burnt_gas > self.max_gas_burnt {
                Err(HostError::GasLimitExceeded.into())
            } else if new_used_gas > self.prepaid_gas {
                Err(HostError::GasExceeded.into())

So I suggest rename it to GasExceeded to avoid confusion.

I don't suggest to pass max_gas_burnt as GasCounter.limit, because in reality nearcore rpc would prevent a function call has more than max_gas_burnt of prepaid_gas. In other word, prepaid_gas is reached first and check whether prepaid_gas is exceeded happen more frequently and should be fast. (Although, in some vm runner and param estimator unit tests, there's cases where prepaid_gas > max_burnt_gas so both GasExceeded and GasLimitExceeded need to be handled)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Comment on lines 5 to 10
#[derive(Error, Debug)]
#[error("Host env initialization error: {0}")]
#[error("Host env initialization error")]
pub enum HostEnvInitError {
/// An error occurred when accessing an export
Export(ExportError),
/// No gas counter supplied
NoGasCounter,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thiserror is a (ahem) proc-macro for generating error types. It's a good and ecosystem-approved proc-macro though, so we might end up using it a bunch and learn its API: https://docs.rs/thiserror/1.0.30/thiserror/.

I think here we want:

#[derive(Error, Debug)]
#[error("Host env initialization error")]
pub enum HostEnvInitError {
    /// An error occurred when accessing an export
    #[error("Host env initialization error: {0}")]
    Export(ExportError),
    /// No gas counter supplied
    #[error("No gas counter supplied")]
    NoGasCounter,
}

Comment on lines 40 to 63
(export "foo" (func $foo))
(export "bar" (func $bar))
(export "zoo" (func $zoo))
(func $foo
call 0
i32.const 42
call 1
call 0
i32.const 100
call 1
call 0
)
(func $bar
call 0
i32.const 100
call 1
)
(func $zoo
loop
i32.const 100
call 1
br 0
end
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be somewhat shorter

Suggested change
(export "foo" (func $foo))
(export "bar" (func $bar))
(export "zoo" (func $zoo))
(func $foo
call 0
i32.const 42
call 1
call 0
i32.const 100
call 1
call 0
)
(func $bar
call 0
i32.const 100
call 1
)
(func $zoo
loop
i32.const 100
call 1
br 0
end
)
(func (export "foo")
call 0
i32.const 42
call 1
call 0
i32.const 100
call 1
call 0
)
(func (export "bar")
call 0
i32.const 100
call 1
)
(func (export "zoo")
loop
i32.const 100
call 1
br 0
end
)

}
}
}
return None;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return None;
None

Comment on lines 392 to 393
// TODO: maybe speed up this lookup.
for import in self.module.imports.keys() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think we actually must do this. We only check the set of imports on instantiation, not on compilation. So, if one sends a contract with 10_000 imported functions, this will be quite slow.

Comment on lines 396 to 400
if intrinsic.name == *import_name && intrinsic.signature == *signature {
return Some(intrinsic.clone());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, is this logic correct at all? It seems that we replace any calls with matching signature? Ie, if there are two imports, foo and bar, with identical signatures, and we intrisify foo, in the codegen we will replace to both foo and bar.

lib/compiler-singlepass/src/codegen_x64.rs Outdated Show resolved Hide resolved
Comment on lines 23 to 24
#[macro_use]
extern crate memoffset;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more idiomatic to not write extern crate at all, and just do a normal

use memoffset::offset_of;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how they recommended it crate docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it doesn't seem to work in suggested way.

Copy link
Contributor

@matklad matklad Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the docs are just old, extern crate is no longer necessary since Rust 2018: https://doc.rust-lang.org/edition-guide/rust-2018/path-changes.html#no-more-extern-crate

The following diff works for me:

diff --git a/lib/compiler-singlepass/src/codegen_x64.rs b/lib/compiler-singlepass/src/codegen_x64.rs
index 41695196a..64500ca5a 100644
--- a/lib/compiler-singlepass/src/codegen_x64.rs
+++ b/lib/compiler-singlepass/src/codegen_x64.rs
@@ -2,6 +2,7 @@ use crate::address_map::get_function_address_map;
 use crate::config::{Intrinsic, IntrinsicKind};
 use crate::{common_decl::*, config::Singlepass, emitter_x64::*, machine::Machine, x64_decl::*};
 use dynasmrt::{x64::Assembler, DynamicLabel};
+use memoffset::offset_of;
 use smallvec::{smallvec, SmallVec};
 use std::collections::HashMap;
 use std::iter;
diff --git a/lib/compiler-singlepass/src/lib.rs b/lib/compiler-singlepass/src/lib.rs
index 2afb629db..202c0b9da 100644
--- a/lib/compiler-singlepass/src/lib.rs
+++ b/lib/compiler-singlepass/src/lib.rs
@@ -19,6 +19,3 @@ mod x64_decl;
 
 pub use crate::compiler::SinglepassCompiler;
 pub use crate::config::Singlepass;
-
-#[macro_use]
-extern crate memoffset;

As our measurements show that gas counting is significant part of execution cost,
we need to optimize it as much as possible. This PR provides support for shared
fast gas counter structure, second part is implemented in near/nearcore#5121

Fast gas counter has three fields: current counter, limit and op cost. When we see call to host
function called `gas` with single argument we do not actually emit such a call, but instead
emit inline instruction sequence to increment gas counter per requested amount and if limit is hit -
GasExceed trap happens.

For fast gas counter to work VM embedder shall provide gas counter context, and VM will trap on call
to `gas` function if such context wasn't provided.
olonho and others added 2 commits November 12, 2021 16:49
Co-authored-by: Aleksey Kladov <aleksey.kladov@gmail.com>
@olonho olonho merged commit 856c52f into near-main Nov 12, 2021
@olonho olonho deleted the intrinsic-gas branch November 12, 2021 13:52
Comment on lines +449 to +453
_ => self.assembler.emit_imul(
Size::S32,
count_location,
Location::GPR(count_reg),
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excuse me if I'm speaking nonsense, but this looks somewhat sketchy.

First, this is truncating one of the 64-bit operands (count_reg) to 32 bits and is using signed multiplication (i.e. implicitly reinterpreting a the truncated unsigned integer into a signed one.) Since the addition again uses the resulting count_reg as a 64-bit register (can't tell if dynasm somehow puts a 64-bit result of the 32bit*32bit multiplication into the same register or if it drops the upper half of the result), the distinction between imul and mul may matter, too.

In case the count_reg happened to have the now-sign-bit set as part of its original representation (are opcode costs between 231 and 232 plausible?), or, costs like 0x1_0000_0000, this code may severely undercount or overcount the actual amount of gas consumed too.

Some potential solutions

At least some of the concern would be resolved if we jo or jcd to overflow from here, but AFAICT this ideally wants a mul instruction with 64-bit operands as well.

Alternatively the opcode_cost could be made a i32, though not sure how useful negative costs are.

If we think this branch is not being hit at all, we could panic!() this branch.

Some test code?

Tests fail with the following diff applied
diff --git a/lib/compiler-singlepass/src/codegen_x64.rs b/lib/compiler-singlepass/src/codegen_x64.rs
index ddad47af1..403dd5f81 100644
--- a/lib/compiler-singlepass/src/codegen_x64.rs
+++ b/lib/compiler-singlepass/src/codegen_x64.rs
@@ -429,13 +429,6 @@ impl<'a> FuncGen<'a> {
                     ),
                     Location::GPR(base_reg),
                 );
-                let current_burnt_reg = self.machine.acquire_temp_gpr().unwrap();
-                // Read current gas counter.
-                self.assembler.emit_mov(
-                    Size::S64,
-                    Location::Memory(base_reg, counter_offset),
-                    Location::GPR(current_burnt_reg),
-                );
                 // Read opcode cost.
                 let count_reg = self.machine.acquire_temp_gpr().unwrap();
                 self.assembler.emit_mov(
@@ -443,15 +436,27 @@ impl<'a> FuncGen<'a> {
                     Location::Memory(base_reg, opcode_cost_offset),
                     Location::GPR(count_reg),
                 );
-                // Multiply instruction count by opcode cost.
-                match count_location {
-                    Location::Imm32(imm) => self.assembler.emit_imul_imm32_gpr64(imm, count_reg),
-                    _ => self.assembler.emit_imul(
-                        Size::S32,
-                        count_location,
-                        Location::GPR(count_reg),
-                    ),
-                }
+
+                let location_reg = self.machine.acquire_temp_gpr().unwrap();
+                self.assembler.emit_mov(
+                    Size::S64,
+                    count_location,
+                    Location::GPR(location_reg),
+                );
+                self.assembler.emit_imul(
+                    Size::S32,
+                    Location::GPR(location_reg),
+                    Location::GPR(count_reg),
+                );
+                self.machine.release_temp_gpr(location_reg);
+
+                // Read current gas counter.
+                let current_burnt_reg = self.machine.acquire_temp_gpr().unwrap();
+                self.assembler.emit_mov(
+                    Size::S64,
+                    Location::Memory(base_reg, counter_offset),
+                    Location::GPR(current_burnt_reg),
+                );
                 // Compute new cost.
                 self.assembler.emit_add(
                     Size::S64,
@@ -460,6 +465,7 @@ impl<'a> FuncGen<'a> {
                 );
                 self.assembler
                     .emit_jmp(Condition::Overflow, self.special_labels.integer_overflow);
+
                 // Compare with the limit.
                 self.assembler.emit_cmp(
                     Size::S64,
diff --git a/tests/compilers/fast_gas_metering.rs b/tests/compilers/fast_gas_metering.rs
index a4c013136..2764fb054 100644
--- a/tests/compilers/fast_gas_metering.rs
+++ b/tests/compilers/fast_gas_metering.rs
@@ -169,6 +169,42 @@ fn test_gas_intrinsic_regular() {
     assert_eq!(gas_counter.burnt(), 10_000_000_000_000_726);
 }
 
+#[test]
+fn test_gas_intrinsic_large_opcode_cost() {
+    let store = get_store();
+    let mut gas_counter = FastGasCounter::new(0x1_0000_0000_0000, 0x1_0000_0000);
+    let module = get_module(&store);
+    let instance = Instance::new_with_config(
+        &module,
+        unsafe { InstanceConfig::new_with_counter(ptr::addr_of_mut!(gas_counter)) },
+        &imports! {
+            "host" => {
+                "func" => Function::new(&store, FunctionType::new(vec![], vec![]), |_values| {
+                    Ok(vec![])
+                }),
+                "has" => Function::new(&store, FunctionType::new(vec![ValType::I32], vec![]), |_| {
+                    Ok(vec![])
+                }),
+                "gas" => Function::new(&store, FunctionType::new(vec![ValType::I32], vec![]), |_| {
+                    // It shall be never called, as call is intrinsified.
+                    assert!(false);
+                    Ok(vec![])
+                }),
+            },
+        },
+    );
+    assert!(instance.is_ok());
+    let instance = instance.unwrap();
+    dbg!(gas_counter.burnt());
+    let zoo_func = instance
+        .exports
+        .get_function("zoo")
+        .expect("expected function zoo");
+    dbg!(gas_counter.burnt());
+    let _e = zoo_func.call(&[]).err().expect("error calling function");
+    assert_eq!(gas_counter.burnt(), 0x100_0000_0000);
+}
+
 #[test]
 fn test_gas_intrinsic_default() {
     let store = get_store();

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to take looking into this onto my plate if you would like me to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is relatively time critical (we want to have the corresponding code in nearcore this week to get it into the next release), and pretty important overall, so, yeah, it totally makes sense to look into this closely asap.

@olonho
Copy link
Contributor Author

olonho commented Nov 15, 2021

Practically speaking opcode cost is intended to be way below 2G, and presume the best fix would be assert that on instantiation.

@nagisa nagisa mentioned this pull request Nov 16, 2021
near-bulldozer bot pushed a commit to near/nearcore that referenced this pull request Nov 17, 2021
 To speed up execution and lower gas cost we need to implement optimised gas counting algorithm.
It involves intrinsified code generation for `gas()` external function invocations and mechanism to pass
an external gas counter pointer.

 In current form we refer to git branch in Wasmer repo matching near/wasmer#64, as we need to make certain change in Wasmer codebase.

 After this change we can reduce gas cost for single Wasm instruction to 312_544 gas.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants