New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Atom.xsl stylesheet #420
base: 3.3
Are you sure you want to change the base?
Conversation
- Replaced 'yomiko' instance URI to eprints vocabulary URI for transforming eprints_status. - Removed 'ignore' whitelist for more robustness. - Refactored transformations for atom:title and atom:summary. - Added transformations for dcterms:type and dcterms:subject.
|
See #419 |
|
How does #419 <#419> resolve the
other issues that I've fixed with the stylesheet:
- Removed 'ignore' whitelist for more robustness.
- Refactored transformations for atom:title and atom:summary.
- Added transformations for dcterms:type and dcterms:subject.
The merged PR does not allow one to set custom subjects using the Atom
Publishing Protocol and still generates invalid XML just in case anyone
tries to do so..
*Semiodesk GmbH | *Werner-von-Siemens-Str. 6 Geb. 15k, 86159 Augsburg,
Germany | Phone: +49 821 8854401 | Fax: +49 821 8854410 | www.semiodesk.com
This e-mail message may contain confidential or legally privileged
information and is intended only for the use of the intended recipient(s).
Any unauthorized disclosure, dissemination, distribution, copying or the
taking of any action in reliance on the information herein is prohibited.
E-mails are not secure and cannot be guaranteed to be error free as they
can be intercepted, amended, or contain viruses. Anyone who communicates
with us by e-mail is deemed to have accepted these risks. Semiodesk GmbH is
not responsible for errors or omissions in this message and denies any
responsibility for any damage arising from the use of e-mail. Any opinion
and other statement contained in this message and any attachment are solely
those of the author and do not necessarily represent those of the company.
2017-11-23 15:51 GMT+01:00 Jiadi Yao <notifications@github.com>:
… Closed #420 <#420>.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#420 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAecy0ESle531JzHbH_ebnioCBLpVoNpks5s5YZ5gaJpZM4JyaCF>
.
|
|
Hi Sebastian, As you are suggesting changes to one of the core EPrints files, naturally we are nervous to accept changes, unless we can fully validate your claims in your comments. How can we test for robustness that your changes brings (Removed 'ignore' whitelist for more robustness). e.g. can we find a case where the existing code fails, while your code passes? I can review your PR again if you could assist us by providing test cases of your changes. Thank you very much! |
|
Hi Jiadi,
thanks for taking care. It took me a while to replicate the issue after
this time, but better late than never.. ;) Please go to:
http://www.utilities-online.info/xsltransformation/
Enter the following valid Atom PP XML freely taken from here [0] (9.2.1)
which is used to create a deposit:
<?xml version="1.0"?>
<entry xmlns="http://www.w3.org/2005/Atom">
<title>Atom-Powered Robots Run Amok</title>
<id>urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a</id>
<updated>2003-12-13T18:30:02Z</updated>
<author><name>John Doe</name></author>
<content>Some text.</content>
<summary>Summary</summary>
</entry>
Now enter the contents of the current Atom.xsl [1] file into the XSL
section of the page and hit 'Transform XML with XSL'. The output is the
following:
<?xml version="1.0" encoding="utf-8"?>
<eprints xmlns="http://eprints.org/ep2/data/2.0" xmlns:atom="
http://www.w3.org/2005/Atom" xmlns:ept="http://eprints.org/ep2/xslt/1.0">
<eprint>
<creators>
<item>
<name>
<given>John</given>
<family>Doe</family>
</name>
<id/>
</item>
</creators>
<title>Atom-Powered Robots Run Amok</title>*urn:uuid:1225c695-cfb8-4ebb-aaaa-80da344efa6a2003-12-13T18:30:02ZSome
text.*<abstract>Summary</abstract></eprint>
</eprints>
Please not the bold strings after the title element. This is invalid XML
generated by unknown tags (id, content) in the source XML, which happens
because there actually is a 'ignore' whitelist at the end of the file. That
means these *known *elements are ignored and do not appear in the output,
whereas other *unknown *elements are not and simply slip through the
transformation as plain text. Since you cannot know every XML element that
a 3rd party software can push into EPrints using APP (like dcterms:title,
dcterms:subject, etc.), the code you are running does fail very easily with
valid input. Please compare the output with my version of the file [2]:
<?xml version="1.0" encoding="utf-8"?>
<eprints xmlns="http://eprints.org/ep2/data/2.0" xmlns:dcterms="
http://purl.org/dc/terms/" xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:ept="http://eprints.org/ep2/xslt/1.0">
<eprint>
<title>Atom-Powered Robots Run Amok</title>
<abstract>Summary</abstract>
<creators>
<item>
<name>
<given>John</given>
<family>Doe</family>
</name>
<id/>
</item>
</creators>
</eprint>
</eprints>
This delivers the same results but this time a validating XML document
which can be consumed by EPrints. This is because there are only
transformations in the my version of the stylesheet for *known *elements.
All *unknown *XML elements are simply ignored. So much about the claimed
'Improved robustness', I hope this makes things clear. As you can see, the
title and abstract is still there.
All the best,
~Sebastian
[0] https://bitworking.org/projects/atom/rfc5023.html
[1]
https://raw.githubusercontent.com/eprints/eprints/c5486c28e5a0135a1543cf99dcb1dc93c92c464f/perl_lib/EPrints/Plugin/Import/XSLT/Atom.xsl
[2]
https://raw.githubusercontent.com/faubulous/eprints/07851ecf431e7b9e7acb655c5d2fab2c83f11094/perl_lib/EPrints/Plugin/Import/XSLT/Atom.xsl
*Semiodesk GmbH | *Werner-von-Siemens-Str. 6 Geb. 15k, 86159 Augsburg,
Germany | Phone: +49 821 8854401 | Fax: +49 821 8854410 | www.semiodesk.com
This e-mail message may contain confidential or legally privileged
information and is intended only for the use of the intended recipient(s).
Any unauthorized disclosure, dissemination, distribution, copying or the
taking of any action in reliance on the information herein is prohibited.
E-mails are not secure and cannot be guaranteed to be error free as they
can be intercepted, amended, or contain viruses. Anyone who communicates
with us by e-mail is deemed to have accepted these risks. Semiodesk GmbH is
not responsible for errors or omissions in this message and denies any
responsibility for any damage arising from the use of e-mail. Any opinion
and other statement contained in this message and any attachment are solely
those of the author and do not necessarily represent those of the company.
2017-11-23 16:28 GMT+01:00 Jiadi Yao <notifications@github.com>:
… Hi Sebastian,
Thanks very much for your contribution to the EPrints community!
As you are suggesting changes to one of the core EPrints files, naturally
we are nervous to accept changes, unless we can fully validate your claims
in your comments.
For example, what benefit does it offer with your "Refactored
transformations for atom:title and atom:summary". How can we be sure that
the refactoring has not broken certain edge cases?
How can we test for robustness that your changes brings (Removed 'ignore'
whitelist for more robustness). e.g. can we find a case where the existing
code fails, while your code passes?
I can review your PR again if you could assist us by providing test cases
of your changes.
Thank you very much!
Jiadi
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#420 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAecyxsY16HCOnPvIFKzXga8XbvmPap2ks5s5Y85gaJpZM4JyaCF>
.
|
transforming eprints_status.
The text was updated successfully, but these errors were encountered: