TeX for the web #5

davidar · 2015-10-03T06:25:02Z

It has long been possible to convert TeX to HTML (#1). However, I think it's fair to say that the results are often hideous, as web browsers (by default) suck at typesetting compared to TeX. Fortunately, it is now possible to work around some of these deficiencies with JS and CSS, which I've tied together in ~~this demo~~ https://davidar.io/TeX.js/ ( https://github.com/davidar/TeX.js )

The aim of this is to achieve (an approximation to) the professional quality of TeX typesetting, whilst integrating with the web and optimising for on-screen viewing better than a PDF viewer can.

rht · 2015-10-03T10:27:44Z

Past attempt in firefox, https://bugzilla.mozilla.org/show_bug.cgi?id=630181

rht · 2015-10-03T10:31:31Z

@bramstein

davidar · 2015-10-04T05:02:04Z

Yes, @bramstein it would be fantastic to have your input on this :)

@rht Yeah, I saw that issue, and was somewhat amused by this comment:

[...] is a huge issue for web browsers, which sometimes have to deal with giant (think tens of megabytes) paragraphs.

bramstein · 2015-10-04T08:02:59Z

I think the performance argument is not a very good one. It'll get slow with very large paragraphs, but there are ways around that (splitting the paragraph, falling back to the greedy line breaking algorithm, etc.) The bigger issue is that some parts of CSS are incompatible with the TeX model. Even if it were possible to combine CSS and the glue and boxes model, it'll require a significant rewrite (which browser vendors are understandably not a huge proponent of).

As for doing it as a library: I think that is a reasonable approach if you limit support to a subset of HTML and CSS. All modern browser now support sub-pixel positioning, so some of the ugly hacks I had to do in Typeset are no longer necessary (and the whole thing becomes much more performant).

davidar · 2015-10-05T02:06:44Z

I think the performance argument is not a very good one.

Me either

The bigger issue is that some parts of CSS are incompatible with the TeX model. Even if it were possible to combine CSS and the glue and boxes model, it'll require a significant rewrite (which browser vendors are understandably not a huge proponent of).

Frankly I'd be happy with anything more intelligent than the greedy algorithm used by browsers (somewhat disturbingly it seems IE is the only one supporting something like this currently)

As for doing it as a library: I think that is a reasonable approach if you limit support to a subset of HTML and CSS.

Definitely, I only intend to support the basic subset output by LaTeX-to-HTML conversion tools like tex4ht or LaTeXML

All modern browser now support sub-pixel positioning, so some of the ugly hacks I had to do in Typeset are no longer necessary (and the whole thing becomes much more performant).

That's good to hear

@bramstein I know you've said that typeset.js is likely to never be production ready, but how much work would it take to make it robust enough to handle the specific use case I'm interested in here? As in, I can drop the script into a basic HTML document, and it Just Works. For context, I'd like to (eventually) be able to produce HTML versions of the articles in the creative commons arxiv subset ( #1 ) that look (almost) as good as the PDFs. It would be great if this included Knuth-Plass line breaking (but I'm not a web developer, so am somewhat limited in what I'm able to achieve myself)

davidar · 2015-10-05T04:08:33Z

Alright, here's my first approximation (just using greedy justification for now):

Before
After
PDF for comparison

rht · 2015-10-05T10:11:52Z

The bigger issue is that some parts of CSS are incompatible with the TeX model.

If you have an example to pinpoint this incompatibility...
(I don't know much of TeX box/glue plumbing)

It's either full TeX typesetting onto a subset of html/css/js, or parts of TeX on full html/css/js (which e.g. for math, is already well supported).
@bramstein Why do you suggest the former?

If the goal is to better format the tex4ht/latexml out of scientific papers, then the former is preferable.
If the goal is to bring TeX quality typesetting to the web, the latter can be done in piecemeal, https://github.com/w3c/dpub-pagination (why would there be page breaks in a web document?).

Also, mind the format size:

pdf: 352KB
mhtml: 4.1MB
justified mhtml: 4.3MB
justified mhtml.tar.bz2: 3.1MB

(This one needs justification as well: https://github.com/worrydream/EarlyHistoryOfSmalltalk)

rht · 2015-10-05T10:33:21Z

(...what is it like to read originally paged books but without the page breaks helper?)

davidar · 2015-10-06T09:41:37Z

@rht most of that 4MB is poorly compressed images, which can be improved (eg. using SVG instead of PNG)

Re pagination: I don't think trying to emulate physical books too closely is a good idea, but something definitely needs to be done to improve location memory

Edit: it would be cool if you could leverage something like https://en.m.wikipedia.org/wiki/Method_of_loci for this purpose, eg: gradually changing background colour/pattern/image as you scroll down the page

davidar · 2015-10-10T10:26:21Z

Another example: https://ipfs.io/ipfs/Qmav57P5mmwcpUtmgRb2tp9j6YpXZdgobxDv7VBeJtgtCp/

rht · 2015-10-10T14:25:05Z

@davidar sorry for the late re, I wonder if it is useful to have a more fine-grained href (paragraph, section), like https://github.com/ipfs/go-ipfs/blob/master/core/bootstrap.go#L4.
The paper itself was uploaded in 2011 http://arxiv.org/pdf/1104.2778v1.pdf.

For the images, there is also https://www.npmjs.com/package/gulp-imagemin.

For the qualia of a book, now that it is confined to a flat screen, it has less attributes and becomes less of a physical 'thing'. This stuff is more related to #2.

For the experiment with method of loci, it would have been preferred if the author had incorporated this method from the beginning. Because it is more subjective (there is risk of fogging the intention of the author), and more pervasive than just a change in font or layout. If I were to use one, I'd construct such that the mnemonic is naturally connected to the text e.g. a book about the innards of a ship (thought) & the ship it describes (extension).

rht · 2015-10-10T14:33:30Z

(papers are often annotated externally, but codes aren't. They are instead referred by range of line number
edit: but CR is annotation)

jbenet · 2015-10-11T14:57:34Z

@davidar that looks really good!! maybe soon we'll have damn clickable references :)

rht · 2015-10-11T15:02:00Z

@davidar that looks really good!! maybe soon we'll have damn clickable references :)

imported modules in code are not clickable either (unless with sourcegraph).

rht · 2015-10-12T08:46:01Z

The raw of https://github.com/rht/papers/blob/href/ipfs-cap2pfs/ipfs-p2p-file-system.pdf has clickable references.

davidar · 2015-10-12T09:16:33Z

@rht yes, section/paragraph linking is definitely something I'd like to do

For the qualia of a book, now that it is confined to a flat screen, it has less attributes and becomes less of a physical 'thing'.

Yeah, I'm not trying to emulate a physical book, but I'd like to remedy some of the deficiencies of on-screen reading in terms of recall, etc.

Because it is more subjective (there is risk of fogging the intention of the author), and more pervasive than just a change in font or layout.

I've experimented with subtly changing the background colour based on scroll position, which seemed to work quite nicely, although it had some technical problems, so i decided to take it out for the moment.

@davidar that looks really good!! maybe soon we'll have damn clickable references :)

@jbenet Yes, that's definitely on my radar, I really hate traditional bibliographies (there's this thing called hyperlinks, people). Of course, you can't hyperlink a dead tree, but who prints stuff these days?

The raw of https://github.com/rht/papers/blob/href/ipfs-cap2pfs/ipfs-p2p-file-system.pdf has clickable references.

Cool, although it's not quite as seamless as it could be (e.g. having a citation link directly to the section of the article the author is referencing).

davidar · 2015-10-12T09:57:24Z

I've broken this into a separate project now: https://davidar.io/TeX.js/

Please submit bugs / feature requests to https://github.com/davidar/TeX.js/issues

rht · 2015-10-12T12:38:51Z

Since the html page can't be annotated (/PR-ed),
davidar/TeX.js@2268e11#diff-eacf331f0ffc35d4b482f1d15a887d3bR19 (more citation needed)

I thought apple had brought typography to the web? The os in the screenshot is NeXTSTEP.
But indeed, there was no hyphenation in retina iOS book reader in 2010, http://www.subtraction.com/2010/06/08/better-screen-same-typography/.

I've experimented with subtly changing the background colour based on scroll position

But again, this is just a mnemonic tool (associating 2 random slightly related facts, much like naming star constellations). Unless the background color is calculated based on the aggregate sentiment of the text in a page/paragraph or something (and there is still risk of fogging the author's intention).

traditional bibliographies (there's this thing called hyperlinks, people)

The recent (in TeX timescale) biblatex package by default displays url if the field exists, but this is the amount of boilerplate code for clickable refs in https://github.com/rht/papers/blob/href/ipfs-cap2pfs/ipfs-cap2pfs.tex#L12-L27.

having a citation link directly to the section of the article the author is referencing

The ecosystem doesn't exist yet, but meanwhile, this can be done manually by the author, e.g.

"Git has already influenced distributed filesystem design". The fact is stated in http://sigops.org/sosp/sosp13/papers/p151-mashtizadeh.pdf #section3.1sentence1.
"Even today, BitTorrent maintains a massive deployment where tens of millions of nodes churn daily." The fact referred is in https://www.cl.cam.ac.uk/~lw525/publications/P2P2013_13.pdf #sectionIV.Fsentence2.

Similarly, to cite the definition of merkledag in the paper, https://ipfs.io/ipfs/QmR7GSQM93Cx5eAg6a6yRzNde1FQv7uL6X1o4k7zrJa3LX/ipfs.draft3.pdf #section2.3sentence2footnote.

rht · 2015-10-12T13:03:20Z

hyphenation on the web, 2011, http://blog.fontdeck.com/post/9037028497/hyphens.

davidar · 2015-10-12T13:17:03Z

Since the html page can't be annotated (/PR-ed)

You're welcome to PR the HTML page (is there a difficulty in doing so?)

I thought apple had brought typography to the web? The os in the screenshot is NeXTSTEP.

Yes, I'm having trouble seeing the relevance here though?

But again, this is just a mnemonic tool

Of course. I'm not trying to associate semantically meaningful images to the text, I'm simply trying to improve the ability to recall the position in the text where you read something. The baseline is "I read this phrase at the bottom of the left-hand page when I was roughly two-thirds of the way through the book", so I'm not aiming for anything more meaningful than that.

biblatex package by default displays url if the field exists

I'm not sure if this is what @jbenet meant, but personally I'm talking about removing the bibliography entirely in favour of embedding hyperlinks directly into the in-text citations. (Although someone can generate a bibliography from this information if they so desire.)

The ecosystem doesn't exist yet

That's why I'm trying to bootstrap the ecosystem with the arXiv corpus ;)

rht · 2015-10-12T13:39:33Z

You're welcome to PR the HTML page (is there a difficulty in doing so?)

I mean, the display of the paper (https://davidar.io/TeX.js/) can't be annotated that I can only comment on the source code.

"I read this phrase at the bottom of the left-hand page when I was roughly two-thirds of the way through the book"

That is still a more precise address than referring to a background color shade.

I'm talking about removing the bibliography entirely in favour of embedding hyperlinks directly into the in-text citations

Had thought of that when parsing what 'clickable references' means. But wikipedia still does with displaying the references in a section https://en.wikipedia.org/wiki/Bibliography#References.

Edit: s/background color/background color shade/

davidar · 2015-10-13T05:47:52Z

I plan on integrating https://hypothes.is soon, so stay tuned ;)
Yes, we need to balance precision against recall. The essential feature of recalling location in physical books is a combination of a low frequency (approximate position in book) and high frequency (left right top bottom of page) component. So, perhaps two colours could work better? Note that I'm not taking about communicating locations, but about subconscious recall.
But Wikipedia also has popups when you hover over citations in the text. You can certainly have both, yes.

davidar · 2015-10-13T12:36:33Z

@rht You should now be able to directly annotate https://davidar.io/TeX.js/ (and any other page using T_EX.js) thanks to @hypothesis (cc @RichardLitt @nickstenning) 😄

Note to self: think about integrating @ipfs and @hypothesis (cc @jbenet)

jbenet · 2015-10-14T08:08:04Z

@davidar yes, we should do that. there's much overlap.

cc @tilgovi -- we should put public annotations on ipfs. -- also, once we get capabilities, private ones too

jbenet · 2015-10-14T08:08:40Z

@davidar this works very well, good stuff!

tilgovi · 2015-10-27T20:41:43Z

Would this be a good repo to open an issue for designing and discussing ipfs comments?

whyrusleeping · 2015-10-27T23:07:09Z

@tilgovi i think so

tilgovi · 2015-10-27T23:22:32Z

Opened #12.

jbenet · 2015-12-04T14:12:20Z

@davidar where was your hypothesis annotated version? not finding it

davidar · 2015-12-05T02:21:51Z

@jbenet the hypothesis enabled version doesn't seem to have made it into ipfs yet, will add it to my to-do list

jbenet · 2016-01-25T00:24:38Z

cc @BigBlueHat

davidar added help wanted status/in-progress In progress labels Oct 10, 2015

davidar self-assigned this Oct 10, 2015

davidar mentioned this issue Oct 13, 2015

Sprint Oct 5th ipfs/team-mgmt#36

Closed

19 tasks

davidar mentioned this issue Oct 14, 2015

arXMLiv / CorTeX ipfs-inactive/archives#31

Open

davidar mentioned this issue Oct 16, 2015

Better text justification davidar/TeX.js#6

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TeX for the web #5

TeX for the web #5

davidar commented Oct 3, 2015

rht commented Oct 3, 2015

rht commented Oct 3, 2015

davidar commented Oct 4, 2015

bramstein commented Oct 4, 2015

davidar commented Oct 5, 2015

davidar commented Oct 5, 2015

rht commented Oct 5, 2015

rht commented Oct 5, 2015

davidar commented Oct 6, 2015

davidar commented Oct 10, 2015

rht commented Oct 10, 2015

rht commented Oct 10, 2015

jbenet commented Oct 11, 2015

rht commented Oct 11, 2015

rht commented Oct 12, 2015

davidar commented Oct 12, 2015

davidar commented Oct 12, 2015

rht commented Oct 12, 2015

rht commented Oct 12, 2015

davidar commented Oct 12, 2015

rht commented Oct 12, 2015

davidar commented Oct 13, 2015

davidar commented Oct 13, 2015

jbenet commented Oct 14, 2015

jbenet commented Oct 14, 2015

tilgovi commented Oct 27, 2015

whyrusleeping commented Oct 27, 2015

tilgovi commented Oct 27, 2015

jbenet commented Dec 4, 2015

davidar commented Dec 5, 2015

jbenet commented Jan 25, 2016

TeX for the web #5

TeX for the web #5

Comments

davidar commented Oct 3, 2015

rht commented Oct 3, 2015

rht commented Oct 3, 2015

davidar commented Oct 4, 2015

bramstein commented Oct 4, 2015

davidar commented Oct 5, 2015

davidar commented Oct 5, 2015

rht commented Oct 5, 2015

rht commented Oct 5, 2015

davidar commented Oct 6, 2015

davidar commented Oct 10, 2015

rht commented Oct 10, 2015

rht commented Oct 10, 2015

jbenet commented Oct 11, 2015

rht commented Oct 11, 2015

rht commented Oct 12, 2015

davidar commented Oct 12, 2015

davidar commented Oct 12, 2015

rht commented Oct 12, 2015

rht commented Oct 12, 2015

davidar commented Oct 12, 2015

rht commented Oct 12, 2015

davidar commented Oct 13, 2015

davidar commented Oct 13, 2015

jbenet commented Oct 14, 2015

jbenet commented Oct 14, 2015

tilgovi commented Oct 27, 2015

whyrusleeping commented Oct 27, 2015

tilgovi commented Oct 27, 2015

jbenet commented Dec 4, 2015

davidar commented Dec 5, 2015

jbenet commented Jan 25, 2016