Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Modification of children is not reflected in fragment #2916

Closed
vcc-LG opened this issue Jun 28, 2023 · 3 comments · Fixed by #2918
Closed

[bug] Modification of children is not reflected in fragment #2916

vcc-LG opened this issue Jun 28, 2023 · 3 comments · Fixed by #2918
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.

Comments

@vcc-LG
Copy link

vcc-LG commented Jun 28, 2023

Please describe the bug

If I take a Nokogiri::HTML4::DocumentFragment with a child which has tags (e.g. <div>a</div>) and which has a neighbour which is text only, e.g.:

<div>a</div>b

then on modification of the inner_html of the first child (<div>a</div>) the object_id of its neighbour (b) will change, and therefore if I update the content of the neighbour it will not be reflected in the DocumentFragment. For example:

doc = Nokogiri::HTML.fragment('<div>a</div>b')
doc.children.each do |child|
  child.inner_html = child.inner_html.upcase
  child.content = child.content.upcase
end

pp doc.to_s

should output "<div>A</div>B" but instead outputs "<div>A</div>b"

Help us reproduce what you're seeing

#! /usr/bin/env ruby
# frozen_string_literal: true

require 'nokogiri'
require 'minitest/autorun'
require 'test/unit'
include Test::Unit::Assertions
require 'pry'
class NokogiriTest < Test::Unit::TestCase
  describe 'Nokogiri' do
    describe 'when document children all have tags' do
      let(:fragment) { Nokogiri::HTML.fragment('<div>a</div><div>b</div>') }

      it 'should not change child object_ids on inner_html modification of first child (<div>a</div>)' do
        fragment_child_object_ids = fragment.children.map(&:object_id)
        fragment.children[0].inner_html = fragment.children[0].inner_html.upcase

        assert_equal(fragment_child_object_ids, fragment.children.map(&:object_id))
      end

      it 'should not change child object_ids on inner_html modification of second child (<div>b</div>)' do
        fragment_child_object_ids = fragment.children.map(&:object_id)
        fragment.children[1].inner_html = fragment.children[1].inner_html.upcase

        assert_equal(fragment_child_object_ids, fragment.children.map(&:object_id))
      end

      it 'should update fragment on child inner html or content modification' do
        fragment.children.each do |child|
          child.inner_html = child.inner_html.upcase
          child.content = child.content.upcase
        end

        assert_equal('<div>A</div><div>B</div>', fragment.to_s)
      end
    end

    describe 'when document child text node has tagged sibling' do
      let(:fragment) { Nokogiri::HTML.fragment('a<div>b</div>') }

      it 'should not change child object_ids on inner_html modification of first child (a)' do
        fragment_child_object_ids = fragment.children.map(&:object_id)
        fragment.children[0].inner_html = fragment.children[0].inner_html.upcase

        assert_equal(fragment_child_object_ids, fragment.children.map(&:object_id))
      end

      it 'should not change child object_ids on inner_html modification of second child (<div>b</div>)' do
        fragment_child_object_ids = fragment.children.map(&:object_id)
        fragment.children[1].inner_html = fragment.children[1].inner_html.upcase

        assert_equal(fragment_child_object_ids, fragment.children.map(&:object_id))
      end

      it 'should not change child object ids on content modification of children' do
        fragment_child_object_ids = fragment.children.map(&:object_id)
        fragment.children.each do |child|
          child.content = child.content.upcase
        end

        assert_equal(fragment_child_object_ids, fragment.children.map(&:object_id))
      end

      it 'should update fragment on child inner html or content modification' do
        fragment.children.each do |child|
          child.inner_html = child.inner_html.upcase
          child.content = child.content.upcase
        end

        assert_equal('A<div>B</div>', fragment.to_s)
      end
    end

    describe 'when document child tagged node has text sibling' do
      let(:fragment) { Nokogiri::HTML.fragment('<div>a</div>b') }

      # FAILING TEST #
      it 'should not change child object_ids on inner_html modification of first child (<div>a</div>)' do
        fragment_child_object_ids = fragment.children.map(&:object_id)
        fragment.children[0].inner_html = fragment.children[0].inner_html.upcase

        assert_equal(fragment_child_object_ids, fragment.children.map(&:object_id))
      end

      it 'should not change child object_ids on inner_html modification of second child (b)' do
        fragment_child_object_ids = fragment.children.map(&:object_id)
        fragment.children[1].inner_html = fragment.children[1].inner_html.upcase

        assert_equal(fragment_child_object_ids, fragment.children.map(&:object_id))
      end

      it 'should not change child object ids on content modification of children' do
        fragment_child_object_ids = fragment.children.map(&:object_id)
        fragment.children.each do |child|
          child.content = child.content.upcase
        end

        assert_equal(fragment_child_object_ids, fragment.children.map(&:object_id))
      end

      # FAILING TEST #
      it 'should update fragment on child inner html or content modification' do
        fragment.children.each do |child|
          child.inner_html = child.inner_html.upcase
          child.content = child.content.upcase
        end

        assert_equal('<div>A</div>B', fragment.to_s)
      end
    end
  end
end

Which outputs:

.F

Failure:
Nokogiri::when document child tagged node has text sibling#test_0004_should update fragment on child inner html or content modification:
Expected: "<div>A</div>B"
  Actual: "<div>A</div>b"

.F

Failure:
Nokogiri::when document child tagged node has text sibling#test_0001_should not change child object_ids on inner_html modification of first child (<div>a</div>):
Expected: [1640, 1660]
  Actual: [1640, 1680]
.......

Finished in 0.003051s, 3605.3753 runs/s, 3605.3753 assertions/s.
11 runs, 11 assertions, 2 failures, 0 errors, 0 skips

Environment

# Nokogiri (1.15.2)
    ---
    warnings: []
    nokogiri:
      version: 1.15.2
      cppflags:
      - "-I/Users/leo.garcia/.rbenv/versions/3.1.3/lib/ruby/gems/3.1.0/gems/nokogiri-1.15.2-arm64-darwin/ext/nokogiri"
      - "-I/Users/leo.garcia/.rbenv/versions/3.1.3/lib/ruby/gems/3.1.0/gems/nokogiri-1.15.2-arm64-darwin/ext/nokogiri/include"
      - "-I/Users/leo.garcia/.rbenv/versions/3.1.3/lib/ruby/gems/3.1.0/gems/nokogiri-1.15.2-arm64-darwin/ext/nokogiri/include/libxml2"
      ldflags: []
    ruby:
      version: 3.1.3
      platform: arm64-darwin22
      gem_platform: arm64-darwin-22
      description: ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e) [arm64-darwin22]
      engine: ruby
    libxml:
      source: packaged
      precompiled: true
      patches:
      - 0001-Remove-script-macro-support.patch
      - 0002-Update-entities-to-remove-handling-of-ssi.patch
      - 0003-libxml2.la-is-in-top_builddir.patch
      - '0009-allow-wildcard-namespaces.patch'
      - 0010-update-config.guess-and-config.sub-for-libxml2.patch
      - 0011-rip-out-libxml2-s-libc_single_threaded-support.patch
      libxml2_path: "/Users/leo.garcia/.rbenv/versions/3.1.3/lib/ruby/gems/3.1.0/gems/nokogiri-1.15.2-arm64-darwin/ext/nokogiri"
      memory_management: ruby
      iconv_enabled: true
      compiled: 2.11.4
      loaded: 2.11.4
    libxslt:
      source: packaged
      precompiled: true
      patches:
      - 0001-update-config.guess-and-config.sub-for-libxslt.patch
      datetime_enabled: true
      compiled: 1.1.38
      loaded: 1.1.38
    other_libraries:
      zlib: 1.2.13
      libiconv: '1.17'
      libgumbo: 1.0.0-nokogiri
@vcc-LG vcc-LG added the state/needs-triage Inbox for non-installation-related bug reports or help requests label Jun 28, 2023
@flavorjones
Copy link
Member

@vcc-LG Thanks for the thorough bug report. I'll take a look!

flavorjones added a commit that referenced this issue Jul 3, 2023
@flavorjones flavorjones added topic/memory Segfaults, memory leaks, valgrind testing, etc. and removed state/needs-triage Inbox for non-installation-related bug reports or help requests labels Jul 3, 2023
@flavorjones
Copy link
Member

flavorjones commented Jul 3, 2023

This was an interesting bug! For historical context, see #283 and #595. This is fixed by #2918.

flavorjones added a commit that referenced this issue Jul 3, 2023
**What problem is this PR intended to solve?**

Closes #2916

Related to #283 and #595

**Have you included adequate test coverage?**

Yes

**Does this change affect the behavior of either the C or the Java
implementations?**

Brings C in line with Java behavior.
@vcc-LG
Copy link
Author

vcc-LG commented Jul 4, 2023

Oh wow thank you so much for getting a fix out so quickly!

flavorjones added a commit that referenced this issue Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/memory Segfaults, memory leaks, valgrind testing, etc.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants