# New rustls/tokio Library

https://github.com/denoland/rustls-tokio-stream/

 - Replaces rustls-tokio + our custom code to split read/write halves
 - Old code is stable, but difficult to modify

# New rustls/tokio Library

 - Designed in layers:
   * A tokio task that takes a TLS connection and drives a handshake in the background
   * A stream for a handshaked TLS connection
   * A stream for a TLS connection that buffers writes and pauses reads until handshake is complete
 - More robust: extensive testing at each layer, written in Rust

# New rustls/tokio Library

 - Current focus is on reliability, follow-up work will be on performance

# Fast Streams

 - Slowly working on replacing all resource read/write operations with `deno_core` code
 - Will allow for "big bang" optimizations once we have less implementations
 - Big project, will take some time
 - Major output from first round: `resourceForReadableStream`
   * Optimized resource layer over a `ReadableStream`
   * Supports backpressure and packet aggregation
   * Replaces custom code in `Deno.serve`
   * Will shortly replace code in `fetch` and `node:http`

In [27]:
console.log("Running benchmark...");
let process = Deno.run({ cmd: ["cargo", "bench", "--bench", "ops_sync", "--features=unsafe_runtime_options"], cwd: "../deno_core/", stdout: "piped", stderr: "piped" });
await process.status();
let benchOut = new TextDecoder().decode(await Deno.readAll(process.stdout));
console.log("Finished benchmark...");
console.log(benchOut);

Running benchmark...
Finished benchmark...

running 34 tests
test baseline                               ... bench:         696 ns/iter (+/- 35)
test bench_op_arraybuffer                   ... bench:       5,941 ns/iter (+/- 167)
test bench_op_bigint                        ... bench:       3,249 ns/iter (+/- 45)
test bench_op_bigint_return                 ... bench:       3,148 ns/iter (+/- 88)
test bench_op_buffer                        ... bench:       4,428 ns/iter (+/- 90)
test bench_op_buffer_nofast                 ... bench:      42,518 ns/iter (+/- 1,394)
test bench_op_buffer_old                    ... bench:       3,096 ns/iter (+/- 88)
test bench_op_external                      ... bench:       3,451 ns/iter (+/- 99)
test bench_op_external_nofast               ... bench:      11,750 ns/iter (+/- 238)
test bench_op_option_u32                    ... bench:       8,588 ns/iter (+/- 193)
test bench_op_string                        ... bench:      11,347 ns/iter (+/- 321)
test ben

In [121]:
import pl from "npm:nodejs-polars"

let names = [], times = [];
for (let line of benchOut.split('\n')) {
    if (line.startsWith('test ') && line.includes('...')) {
        let [nameBits, timeBits, ...rest] = line.split('bench:');
        let [_, name] = nameBits.trim().split(" ");
        let [timeComma] = timeBits.trim().split(" ");
        let time = timeComma.replace(/,/g, '');
        names.push(name);
        times.push(time);
    }
}

let df = new pl.DataFrame({
    name: names,
    time: times,
})


//df.filter(pl.col("name").str.contains('bench_op_string_').and(pl.col("name").str.contains('_1000000')));
let r = df.toRecords()
    .filter((row) => row.name.includes('op_string'))
    .reduce((input, row) => { input[row.name] = row.time; return input }, {});
    
let comparisons = [
    ["small", "bench_op_string_old", "bench_op_string"],
    ["1,000", "bench_op_string_old_large_1000", "bench_op_string_large_1000"],
    ["1,000,000", "bench_op_string_old_large_1000000", "bench_op_string_large_1000000"],
    ["1,000 utf8", "bench_op_string_old_large_utf8_1000", "bench_op_string_large_utf8_1000"],
    ["1,000,000 utf8", "bench_op_string_old_large_utf8_1000000", "bench_op_string_large_utf8_1000000"],
    ["ByteString", "bench_op_string_bytestring", "bench_op_string_onebyte"],
];

let dfrec = { name: [], old: [], new: [], speedup: [] };

for (let row of comparisons) {
    dfrec.name.push(row[0]);
    dfrec.old.push(r[row[1]]);
    dfrec.new.push(r[row[2]]);
    dfrec.speedup.push( ((r[row[1]] - r[row[2]]) / r[row[1]] * 100).toFixed(2) + "%" );
}

const dfString = new pl.DataFrame(dfrec);
dfString

name,old,new,speedup
small,12906,11347,12.08%
1000,137081,106884,22.03%
1000000,322584,401908,-24.59%
"1,000 utf8",2940762,2116437,28.03%
"1,000,000 utf8",14910987,11264374,24.46%
ByteString,59943,4802,91.99%


# `#[op2]`

# A brief history of ops
### (incomplete)

 * Note: a high-level reconstruction that skips or misses some details

- early ops: JSON and binary buffers sent from JS to Rust
  * _lots_ of serialization overhead

# A brief history of ops (incomplete)

- `serde_v8` + codegen via traits
  * No more JSON overhead
  * ops dispatched via central table

- `[op]`: proc macros
  * one function per op
  * ops can now have custom number of parameters
  * still using `serde_v8` + codegen

# A brief history of ops (incomplete)

- fastcalls and custom per-type dispatch
  * Skip serde_v8 for some basic types
  * Allow v8 to call Rust directly from JIT'd code

# `#[op2]`

Evolution from `#[op]`
 
<table><tr style="background-color: white"><td style="text-align:left !important;">

Before:

```rust
 #[op]
 pub fn op_do_something(...) {
   do_something()
 }
```
 
</td><td style="text-align:left !important;">

After:
    
```rust
 #[op2]
 pub fn op_do_something(...) {
   do_something()
 }
```
    
</td></tr></table>

# `#[op2]`

 - Maintainability:
   - Parsing and codegen split into distinct steps
   - Fast and slow codegen separate to evolve as we have bandwidth
   - Codegen for each input/output type is separate

# `#[op2]`

 - Designed for performance:
   - Locks removed for almost all sync ops (unless they touch the state)
   - Context and other objects only created as necessary
   - Metrics are pluggable and have near-zero cost when disabled
   - Removed allocations for most strings

In [122]:
dfString

name,old,new,speedup
small,12906,11347,12.08%
1000,137081,106884,22.03%
1000000,322584,401908,-24.59%
"1,000 utf8",2940762,2116437,28.03%
"1,000,000 utf8",14910987,11264374,24.46%
ByteString,59943,4802,91.99%


# `#[op2]`

 - _Explicit_ over _implicit_, clarity for developers
 - Annotations indicate where developer should pay attention to argument type because of performance or other concerns
 
```rust
#[op2]
pub fn op_something(
    #[smi] id: u32,
    #[string] name: &str,
    #[buffer(copy)] buffer_in: JsBuffer,
    #[buffer] buffer_out: &mut [u8],
    #[serde] control: ComplexStruct) {
}
```

# `#[op2]`

 - _Explicit_ over _implicit_, clarity for developers
 - Fast is now very explicit: `#[op2(fast)]` is self-checking
 - Shortcuts for common patterns: `#[state]`, `v8::Global`
 
 (insert error here)
 


# `#[op2]`

Self-documenting: https://docs.rs/deno_ops/latest/deno_ops/attr.op2.html


# `#[op2]`

 - Future plans:
   * Final benchmark of op vs op2, ensure op2 is fast or faster
   * More helpers: `#[resource]`, `ScopeFunction`