Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Elements to update the backing DOM on set(), remove(), et al #2017

Merged
merged 2 commits into from
Oct 24, 2023

Conversation

jhy
Copy link
Owner

@jhy jhy commented Oct 24, 2023

Improvement: in the Elements list, added direct support for:

  • #set(index, element),
  • #remove(index),
  • #remove(object),
  • #clear(),
  • #removeAll(collection),
  • #retainAll(collection),
  • #removeIf(filter),
  • #replaceAll(operator).

These methods update the original DOM, as well as the Elements list.

@jhy jhy self-assigned this Oct 24, 2023
@jhy jhy added this to the 1.17.1 milestone Oct 24, 2023
@jhy jhy merged commit 61ac59b into master Oct 24, 2023
12 checks passed
@jhy jhy deleted the elements branch October 24, 2023 04:51
@jhy jhy removed the improvement label Oct 24, 2023
@jhy jhy removed their assignment Oct 24, 2023
@jhy jhy removed this from the 1.17.1 milestone Oct 24, 2023
@jhy jhy linked an issue Oct 24, 2023 that may be closed by this pull request
@jhy jhy added this to the 1.17.1 milestone Oct 24, 2023
@jhy jhy self-assigned this Oct 24, 2023
jhy added a commit that referenced this pull request Oct 24, 2023
@Azahe
Copy link

Azahe commented Dec 7, 2023

Hey - sorry if this is not the right place for such comments - this change kinda caught us by surprise by braking our code, for example we were already using removeIf relaying on Elements resulting from select being a copy rather than a direct reference to a DOM - and honestly in our use-case we never want to modify the original DOM - is there maybe a way to disable this mutability in jsoup Elements or maybe some kind of approach that we can take rather than making defensive copies everywhere we select something? Its kinda difficult to make sure we fixed everything place by place as we have a lot of html/xml parsers code.

@jhy
Copy link
Owner Author

jhy commented Dec 14, 2023

Hi @Azahe, sorry to hear the change impacted your code. The intention of Elements has been for the list and its contents to remain attached to the DOM. The remove / set methods not being included was a gap. I can't see a clean way for a global setting that would preserve the previous release's behaviour -- with some methods connected and others not.

Could you explain a bit more about your use case about what / how you are wanting to remove items?

Perhaps we could have new methods like "deselect" or some other name that drops the element from the list (but not from the DOM).

@Azahe
Copy link

Azahe commented Dec 15, 2023

Hey, so we are using jsoup for parsing html/xml - essentially these are banking pdf statements (converted to xml) or just bank websites (just in case: its all legal, we are a licensed entity, we use official APIs where possible but some banks just prefer to pay fines to EU than implement official APIs).

Now our use-case is to simply get data from these sources and represent it in some other format - so we have absolutely no need for modifying the DOM. If anything, modification of DOM only introduces risk of mistakes for us. I am not sure about the broader audience of Jsoup, but I imagined that was somewhat of a main usecase - if I wanted to generate html/xml I would probably use some serialization thingy or template language (like thymeleaf). I guess that if one wants to edit html - mutability of DOM would be useful, but, its hard to imagine for me a realistic use-case for this where I would like to use Jsoup (the only thing that comes to my mind would be writing browser tests? or generally testing something that renders html).

So it might seem a little rude proposition but for us simply dropping this feature would be the best 🙃 - but obviously I am missing the intention behind introducing this in the first place.

Having some kind of separate api for mutable DOM would also solve the problem - if there was a way to do something like Jsoup.parseMutable(...) while parsing the DOM or really any way to opt-in rather than have mutability by default, like having Elements#mutateDom() or anything in this direction.
Having separate api for immutability - so, like Jsoup.parseImmutable(...) would also probably make the migration way easier (we have only a few places parsing the DOM that are easy to find).

@jhy
Copy link
Owner Author

jhy commented Dec 22, 2023

Thanks. Mutating HTML is a very common use case for jsoup and I don't intend on changing these mutator methods.

Can you describe how you're selecting elements and what logic is in your removeIf? Would using a :not(xxx) selector, or the Elements.not(selector) be appropriate?

If not -- perhaps a new notIf that acts like the not method and returns a new list without the elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

replaceAll on Elements does not work
2 participants