Embedded TeX in BibTeX #2

Open
ghost opened this Issue May 26, 2012 · 28 comments

4 participants

@ghost

Are there plans in getting jekyll-scholar to handle TeX commands within the BibTex files? For example, I could have a title that looks like:

title={{$I_{S}A$}},
...

Any suggestions on how to handle this?

@inukshuk
Owner

It it generally recommended (e.g., here) to use markup for sub- and superscript in math mode; therefore we didn't include it in the latex-decode gem which is used by bibtex-ruby to convert LaTeX commands which can be expressed in pure unicode (like diacritics). Since in jekyll-scholar the output format is clearly defined (HTML) we could write a few bibtex-ruby filters for jekyll-scholar to convert LaTeX to HTML… in fact I think that's a great idea. Including math mode will be a lot of work though, might be worth looking around for a LaTeX math to HTML converter first.

@ghost

Perhaps this: ritex?

@inukshuk
Owner

That looks great. I'll take a look at as soon I have the time. Thanks!

@arademaker

I would suggest http://www.mathjax.org/ as a more easy way to handle the LaTeX code in the browser.

@inukshuk
Owner

MathJax looks pretty cool, too. Is this easy to add to a jekyll-scholar setup or do you think we should change something to support this solution better?

@arademaker

It would be easy if I found a way to avoid the translation of some LaTeX symbols by jekyll-scholar and/or jekyll-scholar-extras. If you look at:

http://arademaker.github.com/bibliography/publication-55.html

You will see that some symbols (\exists, \forall, \sqcup) while others are not (for instance, \neg). I don't know what piece of code is doing this translation from the source of my bibtex. Take a look in the abstract of publication-55 entry in:

https://github.com/arademaker/arademaker.github.com/blob/source/_bibliography/rademaker.bib

The value of the variable {{ page.entry.abstract }} in my template

https://github.com/arademaker/arademaker.github.com/blob/source/_layouts/bibtex.html

is not the raw value of the field "abstract"!! Why?

@inukshuk
Owner

This is because we're using the latex-decode gem to convert all BibTeX fields by default. This is typically what our users expect (e.g., it will convert stuff like umlauts and diacritics). However, if, like in this case, we want the fields to not be converted we should add a configuration option to do this.

Any suggestions?

I'll try to think of something until tomorrow. Meanwhile, if you want to experiment and just disable the LaTeX filter for now, the default options are set here – simply change the default options to be an empty hash and your values should come through unfiltered.

@arademaker

Hum... Interesting...

I am thinking about a Liquid filter like

https://github.com/mojombo/jekyll/wiki/liquid-extensions

{{ page.entry.abstract | markdownify }}

versus

{{ page.entry.abstract | raw }}

That would be great! I don't know how hard would be to get that. If I understood right, the transformation is begging doing before that...

@ghost

My 2 cents: I suggested Ritex because it generates MathML, and I believe MathJax uses Javascript? I don't know how MathML is rendered, but non-javascript solutions are attractive in my opinion. But, I don't have a lot of information about these alternatives ...

@inukshuk
Owner

So I just pushed a new version that allows you to configure 'bibliography_class' and 'details_link_class' – those values will be set as CSS class names in the details link and the bibliography list. That should make it easier to apply styles.

@hdpatel that's how I understood it, too. MathJax can parse LaTeX which makes it an attractive solution while Browser support for MathML is still lacking (at least I think that major browser support is still poor?).

Regarding the LaTeX filter, I still think that normally you will expect it to take place, @arademaker your own bibliography, for example, contains lots of LaTeX diacritics that would not be displayed correctly in the browser if we didn't run the LaTeX filter. Perhaps we could just parse the bibliography twice – once with the conversion and once without and then add a liquid filter that gives you access to either of the two? This is perhaps not the best solution, but it would give us more flexibility.

@ghost

Quickly looking at MathJax, it seems that it can also render MathML. If that's true, then perhaps generating MathML is a preferred alternative; assuming that in the near future all popular browsers will support MathML rendering? http://www.mathjax.org/demos/mathml-samples/

@inukshuk
Owner

You're right. Plus, I still like the idea of adding ritex to latex-decode, because that way we wouldn't even have to think about not running the LaTeX filter. This way, latex-decode could convert regular LaTeX to unicode if possible and math mode to MathML. That means that @arademaker's abstract, for example, would be converted normally and he could then add in MathJax.

So yeah, we should open a ticket to add ritex support to latex-decode.

@arademaker

@hdpatel wrote "...non-javascript solutions are attractive in my opinion...". I don't think that javascript is still something that should be avoided. Let us face it: all modern web app use javascript. Javascript used to be a concern about 5 years ago... The problem with the current situation is that (see my abstract) only SOME symbols are converted...

@inukshuk thank you about the CSS classes! That will be very useful for sure! About the latex-decode, rites and MathJax... My vote is to have the most possible flexible solution. The idea of having access to both versions of the fields is good but maybe can turns the compilation slower... Any way, the real problem today is that since only some symbols are converted, we end up with a messed up HTML that MathJax can't handle. I thin that we should have better control about which bibtex field's values should be converted and how...

Nice conversation and sharing of ideas!

@ghost

@inukshuk: Did a feature like this ever make it in or should I open a ticket for ritex support to latex-decode?

@inukshuk
Owner

Thanks for the reminder. Will look at this today – in addition to adding it to latex-decode we could also put a converter into scholar based on ritex to allow you to convert latex math to mathml on pages and posts.

@inukshuk
Owner

@hdpatel @arademaker I've added a rudimentary implementation to latex-decode. What we would need now, are useful examples for testing. It would be a big help if you could post a few examples of LaTeX strings and what you would expect to find in the rendered output – this will be MathML. In addition to that you load MathJax to make sure it renders on Internet Explorer (I think the other browsers support MathML by now).

@ghost

Here are some LaTeX strings that I use:

\underline{firstName LastName$^g$}
$I_{S}A$
$\tilda$
$3^{rd}$
@colliand

I was happy to find this thread. It would be great to have jekyll-server parse TeX expressions appearing in bibtex files. Bibtex files can be easily exported from MathSciNet which contain lots of TeX if more examples are required. Any updates on the initial request?

@inukshuk
Owner

@colliand Meanwhile Chrome has moved away from MathML so I think the best approach now is to use MathJax. Since latex-decode only supports simple pattern matching using regular expressions it is not feasible to distinguish between regular TeX and math mode, therefore, depending on your requirements there is an easy and a difficult solution to displaying math.

  • If you have only math mode commands and no other TeX directives then I would suggest to simply turn off the latex-decode filter and use MathJax on the website.

  • If you have both math mode commands and other TeX directives then it is trickier. Short of writing a full TeX parser I think the only thing you can do is write your own filter based on regular expressions tailored to your needs.

@colliand

Thanks @inukshuk for the fast reply. I got some advice from @mmun and followed his suggestion to add
bibtex_filters: []
just beneath
bibliography_template: "%{reference}"
in _config.yml

That worked.

@inukshuk
Owner

Yes, that effectively turns off the LaTeX to unicode conversion when parsing the bibliography. If all your directives are of such a nature that MathJax (you're using MathJax I assume?) can handle them this is the best solution right now. Glad it works!

@dpo
dpo commented Apr 4, 2016

In my field, the title of virtually every bib entry has LaTeX commands in it, going from \textsf{...} to math. Has anybody made progress on this issue? I have MathJax enabled on my site but yet an entry with math only and no other LaTeX commands isn't rendered properly, even if I disable the latex filter.

@inukshuk
Owner

The latex filter converts only LaTeX directives which can be translated to unicode; that doesn't include directivees like textsf -- however, it's quite easy to add such filters yourself for jekyll-scholar. If you have a lot of math in your bibliography, I suggest to enable MathJax and add custom filters for those directives which it does not cover.

For reference, look at the superscript filter that was added recently -- simply add such a filtler to your _plugins folder and add its name to the bibtex_filters array in your configuration.

@dpo
dpo commented Apr 4, 2016

Thanks for the pointers! Changing simple markup is easy with filters, and that works well. But for math, I'm not sure what the filter should look like. MathJax understands LaTeX markup but after applying the latex filter, the dollar signs are gone, so it doesn't get a chance.

On a side note, there is a small problem with the regexp used in the superscript filter. Currently, it matches the longest string between a pair of braces but I think what you want is the shortest string, or else, applying the filter to E = mc\textsuperscript{2} where E is {C}apitalized puts everything between the 2 and the C in superscript. It's enough to change the regex to /\\textsuperscript(\{((?:.|\g<1>)*?)\})/ with an extra ? after the *.

@inukshuk
Owner

Yes, so to work with MathJax the idea is to disable the LaTeX filter; in your case you would add additional custom filters to convert everything else that's not covered by MathJax already (including conversions that the default latex filter would normally handle; you can just copy conversions from the latex-decode gem as needed) -- it's a bit sad, but we haven't figured out a better approach to this yet.

PRs are always welcome if you found a bug!

@dpo
dpo commented Apr 4, 2016

So here's what I did and it seems to work:
1. Enable MathJax on my site
2. clone latex-decode locally, and comment out https://github.com/inukshuk/latex-decode/blob/master/lib/latex/decode.rb#L41
3. in my site's _plugins/ext.rb, add

$:.unshift '/path/to/latex-decode/lib'  # susbtitute your local path
require 'jekyll/scholar'

to use the local version of latex-decode.

Now I have proper math in my bib entries!

In view of this, it would be good to have a way to enable/disable latex-decode filters. Simply disabling the math filter does the trick. And perhaps the math filter should be renamed unicode-math or something in that vein, to avoid confusion with MathJax?!

@inukshuk
Owner

Good idea; we could add a few filters which call the latex decode modules individually; then you could swap out latex filter with any combination of the individual ones.

By the way, instead of overriding the entire gem, you can just override the decode method (or better yet the decode method of the Math module to do nothing) in your ext.rb -- that would probably make your code easier to handle because you don't have to clone into latex-decode.

@dpo
dpo commented Apr 4, 2016

Thanks. That's indeed simpler:

require 'latex/decode'
# Disable the Math module of latex-decode as it interferes with MathJax
module LaTeX
  module Decode
    class Maths < Decoder
      def self.decode! (string)
        string
      end
    end
  end
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment