Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change 'tag' to return Nothing, rather than throw an exception. #93

Merged
merged 1 commit into from Feb 3, 2017

Conversation

merijn
Copy link
Contributor

@merijn merijn commented Jan 19, 2017

This allows parsers to conditionally accept a tag, depending on it's attributes. I frequently find myself needing to decide whether to accept a tag based on it's attributes, which isn't possible with the current xml-conduit API, this simple change makes this operation trivial.

See also #60 (which already has a pull request, but I think that solution is needlessly complex).

This allows parsers to conditionally accept a tag, depending on it's
attributes.
@snoyberg
Copy link
Owner

I'm overall 👎. It's a semantically breaking API change without any change in the type of the function. Also, I haven't used this layer of the API in a while, but I think this will leave the stream in an inconsistent state.

@merijn
Copy link
Contributor Author

merijn commented Jan 24, 2017

I'd be more than happy to move this functionality into a different function, leaving tag unchanged and providing a separate one that supports this function. As for the stream state, I've stared at the xml-conduit code (and used it in my own stuff) and it seems to work 100% as expected.

@snoyberg
Copy link
Owner

You're right, I wasn't looking at enough of the code. The fact is that I'm not intimately familiar with these combinators anymore. It does seem like a breaking change to switch to Nothing instead of an exception, but maybe I'm wrong. I'm having trouble picturing a scenario where that would cause a user problems.

@merijn
Copy link
Contributor Author

merijn commented Jan 24, 2017

You are right that it is breaking in the sense that if I have a parser foo and bar where foo demands some attribute exists then foo `orE` bar currently results in a parse error in foo as a result of the exception in the attribute parser. In this branch a failure in the attribute parser of foo results in bar being tried.

As I said, if you oppose this breakage I'm okay with moving the behaviour into a different combinator and leaving 'tag' as-is, but the inability to "fail" and try a new parser based on the attributes of a tag is a real issue when scraping messier XML.

@snoyberg
Copy link
Owner

OK, for now let's do it in a new function. We can consider deprecating the old one in favor of the new one if that makes sense.

@merijn
Copy link
Contributor Author

merijn commented Jan 24, 2017

Since there's so many versions of the combinators using tag, I'm wondering if it's preferable to duplicate all of them in the same module or just provide another module that has the same names as Text.XML.Stream.Parse but with the changed behaviour for attribute parsers (at the very least that'd save me the difficulty of inventing new names for the new version...)

@snoyberg
Copy link
Owner

If we're already discussing a new module, it may make sense to simply move all of the combinators into a separate package which isn't tied to the API compatibility of this package. Then you'd be free to modify the behavior quickly without worrying about breakage, and when stable, we could deprecate the combinators in xml-conduit and point to the new package.

@merijn
Copy link
Contributor Author

merijn commented Jan 25, 2017

My main objection to that would be duplicating all the code currently in Text.XML.Stream.Parse with all the maintenance hassle it implies. My idea was more along the lines of generalise tag (and everything using it?) to an internal:

generalisedTag :: MonadThrow m
    => (SomeException -> ConduitM Event o m (Maybe c))
    -> (Name -> Maybe a)
    -> (a -> AttrParser b)
    -> (b -> ConduitM Event o m c)
    -> ConduitM Event o m (Maybe c)

where the first argument is the "failure" case, which means the current tag basically becomes generalisedTag (lift . monadThrow) and the version in this branch to generalisedTag (const (return Nothing))

So that both current behaviour and my proposed behaviour can be defined as a list of trivial specialisations (as above), which avoids duplicating the code and maintenance. The only real issue/question I had was whether it'd be better to export both versions from the the same module (i.e. Text.XML.Stream.Parse), if so that'd require inventing new names for the new versions that don't fail completely on attribute parse failure. Alternatively, we could just use the exact same names as the current combinators, but exported from a different module.

I don't feel duplicating all the other code in the module to a new package would be beneficial as then any changes/fixes in xml-conduit would have to be manually ported over.

@k0ral
Copy link
Collaborator

k0ral commented Jan 25, 2017

Sorry to chime in, but I'd like to suggest a different strategy:

  • release a new version A.B.C.(D+1) of xml-conduit with the old tag implementation only (i.e. no code change), and a deprecation marker on it to warn users about the upcoming backward-incompatible change
  • release a new version A.(B+1).C.0 of xml-conduit with the new tag implementation only
  • on Hackage, set the "Preferred versions" such that A.B.C.(D+1) is preferred over A.(B+1).C.0
  • after a reasonable time, remove the preferred version so that A.(B+1).C.0 becomes the new default

This way, the API remains clean (no renaming, no duplicated code/module/package), users are warned about the incompatible change, and the new behavior is opt-in.
The drawback is that further changes/improvements to xml-conduit will only be available to users that have crossed the "preferred" line, but given that the library doesn't evolve a lot, that's probably a minor issue.

What do you say ? :)

@snoyberg
Copy link
Owner

I wouldn't trust preferred versions for much to be honest.

Also to be honest: I think this package deserves to have an additional maintainer, as for the past four years my work has required significantly less XML work. You've contributed a lot @k0ral; would you be interested?

@k0ral
Copy link
Collaborator

k0ral commented Jan 26, 2017

@snoyberg Maybe I misunderstood, but I already enlisted as co-maintainer for xml-conduit.

@snoyberg
Copy link
Owner

snoyberg commented Jan 29, 2017 via email

@merijn
Copy link
Contributor Author

merijn commented Feb 2, 2017

Ok, so is there any preferred approach? I don't mind doing the work, but I'm a bit unclear what to actually do to see this get into xml-conduit :)

@snoyberg
Copy link
Owner

snoyberg commented Feb 3, 2017

If @k0ral wants to make the decisions here, I'm happy to defer.

@merijn
Copy link
Contributor Author

merijn commented Feb 3, 2017

Actually, @k0ral, I just realised some other combinators I would really like to see in xml-conduit (see #96), which also rely on a generalised version of tag. I can easily do whatever coding/documenting/example writing that is needed for both of these, as soon as I know what the best way to approach this is.

@k0ral
Copy link
Collaborator

k0ral commented Feb 3, 2017

@snoyberg No problem, I'm happy to help things move on and handle the consequences.
I'll go with the strategy I explained earlier, if only to experiment and validate whether it is viable.

@merijn I'll have a look at #96 and come back to you.

@k0ral k0ral merged commit 9d298ea into snoyberg:master Feb 3, 2017
This was referenced Feb 3, 2017
@eflister
Copy link

eflister commented Oct 23, 2020

This allows parsers to conditionally accept a tag, depending on it's attributes. I frequently find myself needing to decide whether to accept a tag based on it's attributes, which isn't possible with the current xml-conduit API, this simple change makes this operation trivial.

hey guys, can you still see this even tho this is long closed? i'm trying to figure out how to succeed/fail conditional on an attribute value. but so far i can't figure it out -- can you include an example in the docs? and one for requireAttrRaw? :)

i thought i had figured out the "trivial" method using join to merge my own Maybe layer with the one the supplied parsers use, but that resulted in some very strange behavior!
https://stackoverflow.com/questions/64492279/why-doesnt-runconduit-send-all-the-data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants