Support for anchors (A tags with internal link syntax) #77

Closed
wants to merge 392 commits into
from

Conversation

Projects
None yet
@rorygibson

I've added support for hrefs starting with '#', so that links to internal anchors on a page don't get ripped out by the Cleaner, which is something I need for my day job and thought might be useful to you.

W3C syntax for anchors states simply that they must start with a # and contain no spaces.
I've added 2 tests in the CleanerTest class that document this behaviour.

If you could merge this in I'd be grateful; currently we're using the forked jar, but it'd be nice to stay on trunk.

jhy added some commits Feb 3, 2010

Implemented Element#wrap and #Elements#wrap
Also protected Node.replaceChild, removeChild, addChild.
Added val() and val(string) to Element and Elements.
Treat contents of textarea as text, not data.

Closes #14

kzn and others added some commits Jan 16, 2011

Added javadocs for Evaluators.
Updated tests.
Updated parser
Anton Kazennikov
Merge remote branch 'upstream/master'
Conflicts:
	src/main/java/org/jsoup/parser/TokenQueue.java
Reverted changes that only allow empty tags in pre-defined instances.
Markup like <tag /> needs to be parsed as an empty element.
Removed com.sun.xml.internal.ws.util.StringUtils to fix https://githu…
…b.com/jhy/jsoup/issues/#issue/69

"jsoup/src/main/java/org/jsoup/select/selectors/AndSelector.java:[8,35] package com.sun.xml.internal.ws.util does not exist"
Ensure that Jsoup.Connect handles relative redirects in cases where the
underlying HTTP stack doesn't automatically follow them.

Fixes #73
Updated Jsoup.Connection so that cookies set on a redirect response w…
…ill be included on the redirected request and response.
Moved .wrap, .before, and .after from Element to Node for flexibility…
…. Overriding implementations in Element still return Element.
Added ability to change an element's tag with Element.tagName(String)…
…, and to change many at once with Elements.tagName(String).
Fixed issue with selector parser where some boolean AND + OR combined…
… queries (e.g. "meta[http-equiv], meta[content]") were being parsed incorrectly as OR only queries (e.g. former as "meta, [http-equiv], meta[content]")

Fixed issue where a content-tye specified in a meta tag may not be reliably detected, due to the above issue.
@analytically

This comment has been minimized.

Show comment
Hide comment
@analytically

analytically May 1, 2012

This should also support single quotes: eg Content-Type:text/html; charset='utf-8'

Site that has this: http://www.roundwoodpark.herts.sch.uk/

This should also support single quotes: eg Content-Type:text/html; charset='utf-8'

Site that has this: http://www.roundwoodpark.herts.sch.uk/

@ishults

This comment has been minimized.

Show comment
Hide comment
@ishults

ishults Jul 23, 2014

Contributor

I noticed this was never merged -- is this fix not wanted, or were there issues with the implementation? Would it be worth submitting a new pull request for this issue?

Contributor

ishults commented Jul 23, 2014

I noticed this was never merged -- is this fix not wanted, or were there issues with the implementation? Would it be worth submitting a new pull request for this issue?

@jhy

This comment has been minimized.

Show comment
Hide comment
@jhy

jhy Oct 2, 2014

Owner

Merged with #441, thanks

Owner

jhy commented Oct 2, 2014

Merged with #441, thanks

@jhy jhy closed this Oct 2, 2014

zazi added a commit to dswarm/jsoup that referenced this pull request Oct 15, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment