Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NodeSet.dup does not deep copy #1678

Closed
dazza-codes opened this issue Oct 3, 2017 · 11 comments
Closed

NodeSet.dup does not deep copy #1678

dazza-codes opened this issue Oct 3, 2017 · 11 comments

Comments

@dazza-codes
Copy link

dazza-codes commented Oct 3, 2017

Take note of the Element object IDs in this console exploration of whether or not NodeSet.dup is actually returning new objects (as a deep copy implies). The Document.dup is doing a deep copy, but NodeSet.dup is not. (This console is using a class wrapping Nokogiri instance variables, but it should be obvious anyway; LMK if you need a "pure" Nokogiri" example.)

> record_setB.print
<?xml version='1.0'?>
<records>
  <REC>
    <UID>WOS:A2</UID>
  </REC>
  <REC>
    <UID>WOS:B2</UID>
  </REC>
</records>

> record_setB.doc
=> #(Document:0x2b17e94437d0 {
  name = "document",
  children = [
    #(Element:0x2b17ea12b1c8 {
      name = "records",
      children = [
        #(Element:0x2b17ea12f804 { name = "REC", children = [
            #(Element:0x2b17ea1333dc { name = "UID", children = [ #(Text "WOS:A2")] })] }),
        #(Element:0x2b17ea13b438 { name = "REC", children = [
            #(Element:0x2b17ea13e188 { name = "UID", children = [ #(Text "WOS:B2")] })] })]
      })]
  })

> record_setB.doc.dup
=> #(Document:0x2b17ea3a2708 {
  name = "document",
  children = [
    #(Element:0x2b17ea3a843c {
      name = "records",
      children = [
        #(Element:0x2b17ea3acba4 { name = "REC", children = [ 
            #(Element:0x2b17ea3b0c18 { name = "UID", children = [ #(Text "WOS:A2")] })] }),
        #(Element:0x2b17ea3b8b0c { name = "REC", children = [
            #(Element:0x2b17ea3c18c4 { name = "UID", children = [ #(Text "WOS:B2")] })] })]
      })]
  })

So far, so good. The Document.dup is returning all new object IDs thanks to a deep copy. Now let's see if the NodeSet.dup is doing a deep copy:

> rec_node_setB = record_setB.doc.search('REC')
=> [#<Nokogiri::XML::Element:0x2b17ea12f804 name="REC" 
  children=[#<Nokogiri::XML::Element:0x2b17ea1333dc name="UID" 
    children=[#<Nokogiri::XML::Text:0x2b17ea137324 "WOS:A2">]>]>,
 #<Nokogiri::XML::Element:0x2b17ea13b438 name="REC" 
  children=[#<Nokogiri::XML::Element:0x2b17ea13e188 name="UID" 
    children=[#<Nokogiri::XML::Text:0x2b17ea14297c "WOS:B2">]>]>]

> rec_node_setB.dup
=> [#<Nokogiri::XML::Element:0x2b17ea12f804 name="REC" 
  children=[#<Nokogiri::XML::Element:0x2b17ea1333dc name="UID" 
    children=[#<Nokogiri::XML::Text:0x2b17ea137324 "WOS:A2">]>]>,
 #<Nokogiri::XML::Element:0x2b17ea13b438 name="REC" 
  children=[#<Nokogiri::XML::Element:0x2b17ea13e188 name="UID" 
    children=[#<Nokogiri::XML::Text:0x2b17ea14297c "WOS:B2">]>]>]

Nope, it's got all the same object IDs.

@dazza-codes
Copy link
Author

dazza-codes commented Oct 3, 2017

Also NodeSet.deep_dup does not deep copy (cf. object IDS ^^):

> rec_node_setB.deep_dup
=> [#<Nokogiri::XML::Element:0x2b17ea12f804 name="REC" 
  children=[#<Nokogiri::XML::Element:0x2b17ea1333dc name="UID" 
    children=[#<Nokogiri::XML::Text:0x2b17ea137324 "WOS:A2">]>]>,
 #<Nokogiri::XML::Element:0x2b17ea13b438 name="REC" 
  children=[#<Nokogiri::XML::Element:0x2b17ea13e188 name="UID" 
    children=[#<Nokogiri::XML::Text:0x2b17ea14297c "WOS:B2">]>]>]

@dazza-codes
Copy link
Author

dazza-codes commented Oct 3, 2017

WRT documentation, it doesn’t say anything on NodeSet.dup behavior in this doc:

cf. http://www.rubydoc.info/github/sparklemotion/nokogiri/Nokogiri/XML/Document#dup-instance_method where it states the default behavior is a deep copy.

So, I suppose I’m asking for consistent behavior in the default assumptions for how this method works across Nokogiri objects. I suppose I'm also asking for clarification on how to use the NodeSet.dup method to preserve immutability of the original NodeSet object.

@ar7max
Copy link

ar7max commented Oct 3, 2017

I agree. Duplication behaviour is confusing me too.

@flavorjones
Copy link
Member

Can you help me understand why there is an assumption that NodeSet.dup would do a deep copy?

As a counterexample, Ruby Arrays don't do a deep copy:

$ irb
2.4.1 :001 > obj = Object.new
 => #<Object:0x00000000d52b40> 
2.4.1 :002 > arr = [obj]
 => [#<Object:0x00000000d52b40>] 
2.4.1 :003 > dup = arr.dup
 => [#<Object:0x00000000d52b40>] 

What I mean to say is, this is functioning as intended. I'd love to hear why you think it should do a deep copy, though.

@ar7max
Copy link

ar7max commented Feb 5, 2018

As for me it is all about black box "Document/Document Fragment" abstraction.

Now I need to dig into doc and copy all by hand, so Im using abstraction over ruby abstractions but I still need to use ruby abstractions for things to work.

It's kinda misleading for many people. Not to say I'm right or you are wrong, just my biased opinion.

@flavorjones
Copy link
Member

@ar7max I'm not sure I understand ... I'll ask a different way: why do you expect an Enumerable in this case (NodeSet) to behave differently from the canonical Ruby Enumerable (Array)?

@ar7max
Copy link

ar7max commented Feb 5, 2018

@flavorjones because word "node" comes first I think.

@flavorjones
Copy link
Member

I'm going to go ahead and close this, as I personally feel that NodeSet#dup is doing the right thing; and changing it at this point would be a breaking change to the API that we're not ready to make at this time.

I'd welcome any PRs about the docs that may clarify this issue for future users.

@opoudjis
Copy link
Contributor

Same behaviour appears to obtain for Node.dup.

It's very misleading for many people. Especially if it applies to a single object.

@flavorjones
Copy link
Member

@opoudjis I'm not sure I know how to respond, if you'd care to have a conversation please open a new issue?

@opoudjis
Copy link
Contributor

opoudjis commented May 2, 2018

I'll do a pull request on the documentation, to add a note saying that if you're going to strip elements from the node or nodeset clone, don't clone, create a new document based on the node instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants