Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build CDHASH properly when loading iseq from binary #4511

Merged
merged 2 commits into from
May 21, 2021

Conversation

XrXr
Copy link
Member

@XrXr XrXr commented May 19, 2021

Before this change, CDHASH operands were built as plain hashes when
loaded from binary. Without setting up the hash with the correct
st_table type, the hash can sometimes be an ar_table. When the hash is
an ar_table, lookups can call the eql? method on keys of the hash,
which makes the opt_case_dispatch instruction not "leaf" as it
implicitly declares.

The following script trips the stack canary for checking the leaf
attribute for opt_case_dispatch on VM_CHECK_MODE > 0 (enabled by
default with RUBY_DEBUG).

rb_vm_iseq = RubyVM::InstructionSequence

iseq = rb_vm_iseq.compile(<<-EOF)
  case Class.new(String).new("foo")
  when "foo"
    42
  end
EOF

puts rb_vm_iseq.load_from_binary(iseq.to_binary).eval

This commit changes the binary loading logic to build CDHASH with the
right st_table type. The dumping logic and the dump format stays the
same.

Before this change, CDHASH operands were built as plain hashes when
loaded from binary. Without setting up the hash with the correct
st_table type, the hash can sometimes be an ar_table. When the hash is
an ar_table, lookups can call the `eql?` method on keys of the hash,
which makes the `opt_case_dispatch` instruction not "leaf" as it
implicitly declares.

The following script trips the stack canary for checking the leaf
attribute for `opt_case_dispatch` on VM_CHECK_MODE > 0 (enabled by
default with RUBY_DEBUG).

    rb_vm_iseq = RubyVM::InstructionSequence

    iseq = rb_vm_iseq.compile(<<-EOF)
      case Class.new(String).new("foo")
      when "foo"
        42
      end
    EOF

    puts rb_vm_iseq.load_from_binary(iseq.to_binary).eval

This commit changes the binary loading logic to build CDHASH with the
right st_table type. The dumping logic and the dump format stays the
same.
@XrXr
Copy link
Member Author

XrXr commented May 19, 2021

It's possible to make the diff smaller at the cost of re-hashing twice by converting to the right st_table type after using the normal hash loading code. Let me know if you would prefer that.

@shyouhei shyouhei requested review from k0kubun and ko1 May 19, 2021 23:21
@k0kubun
Copy link
Member

k0kubun commented May 20, 2021

Thanks for maintaining leafness of opt_case_dispatch after loading it from binary.

The following script trips the stack canary for checking the leaf attribute for opt_case_dispatch on VM_CHECK_MODE > 0

I'm not sure how to test internal data structure matches directly, but a possible approach to mitigate the risk in other places could be to have a CI that always loads iseq from dumped binary and checks the canary ... if it's not too slow.

@XrXr
Copy link
Member Author

XrXr commented May 20, 2021

CI that always loads iseq from dumped binary and checks the canary

👍🏼 there is already http://ci.rvm.jp/results/trunk-iseq_binary@phosphorus-docker and I guess it just needs -DRUBY_DEBUG for the canary checks. GitHub Action has several RUBY_DEBUG runs already, so it looks like the stack canary doesn't slow things down that much.

compile.c Outdated
// Similar to ibf_load_object(), but specifically for CDHASH.
static VALUE
ibf_load_cdhash(const struct ibf_load *load, VALUE object_index)
{
Copy link
Contributor

@ko1 ko1 May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we re-use ibf_load_object() instead of this code?

  ibf_load_cdhash(){ hash = ibf_load_object(); RHASH_TBL_RAW(cdhash)->type = &cdhash_type; }

I didn't try this approach and it may be tried by you before.

Copy link
Member Author

@XrXr XrXr May 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried in c5c7287. Does a bit of extra copying, but smaller diff. If the extra copying is a problem, I guess another option would be to refactor ibf_load_object() to allow an alternative way of loading hashes.

@XrXr XrXr merged commit b2fc592 into ruby:master May 21, 2021
@XrXr XrXr deleted the iseq-binary-cdhash branch May 21, 2021 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants