Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra characters in the abstract when exported in bibtex and RIS formats #95

Closed
jmnobrega opened this issue Jun 13, 2022 · 4 comments
Closed
Assignees

Comments

@jmnobrega
Copy link

jmnobrega commented Jun 13, 2022

When exporting to bibtex or RIS formats the abstract contents include <p> and </p> (in the form of &lt;p&gt; and &lt;/p&gt;), at the beginning and end. Those characters are not part of the abstract contents added to the publication.

@bozana
Copy link
Collaborator

bozana commented Jun 26, 2024

I investigated a little bit and here is what I have found out:

It seems the library we use assumes that the citations are displayed in an HTML page. It returns the citation styles coated withing two

elements.
And the library escapes HTML characters, for example in title and abstract.

Also, our abstract contains HTML elements.

However the BibTex and RIS format are not displayed on the page as the other formats, but provided by us for download. We do remove the

elements that we get returned from the library, but our abstract have HTML elements too.

That means that, for example, a title would then look like this there:
Title &amp; Test
or abstract would look like this:
&amp;lt;p&amp;gt;The antimicrobial, heavy metal resistance patterns and ... (&amp;amp;gt;56.4 kb) encoding .... &amp;lt;/p&amp;gt;.

Thus, I can see the following possibilities how to deal with it:

  1. Do not provide BibTex and RIS for download, but display them as other styles on the page, so that users can copy & paste them, if needed.
  2. Do nothing i.e. leave it as it is. In that case the users will have those HTML tags and escaping when they import the BibTex or RIS format into their citation software.
  3. I actually do not think this is a solution, but to mention it here however: to somehow remove HTML elements from our abstract (and title). The HTML escaping will however remain i.e. for example the sign '&' if used in text would be escaped (or we would need to revert it somehow too).

I think I prefer the solution #1.

@bozana
Copy link
Collaborator

bozana commented Jun 26, 2024

Hmm... Now I see that we save some parts of the title and abstract html encoded, for example:
<p>Abstract: One more test' &lt; <strong>EN &amp; sign </strong> bla bla... </p>
So the only solution is to somehow disable the encoding by the library we use... :-\

@bozana
Copy link
Collaborator

bozana commented Jun 27, 2024

Hmmm... I think we can use htmlspecialchars_decode for download citations BibTeX and RIS

@jonasraoni
Copy link
Contributor

Fixes will be covered by #118

@jonasraoni jonasraoni closed this as not planned Won't fix, can't repro, duplicate, stale Jun 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants