Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow for asynchronous locale loading #8

Closed
adomasven opened this issue Jun 9, 2016 · 19 comments
Closed

Workflow for asynchronous locale loading #8

adomasven opened this issue Jun 9, 2016 · 19 comments

Comments

@adomasven
Copy link

Hello,

I was wondering whether there are any plans for better async locale loading support. This would make a lot of sense in a scenario where citeproc-js is used in a website. Since support for synchronous XHR request support has been dropped in modern browsers, to use citeproc-js one has to either pre-load all locale xml files or parse the style file and figure out which locales will be needed before instantiating the CSL engine.

Ideally, the definition of sys.retrieveLocale() would change to either support a callback function as the second parameter or to handle promise return values. Depending on how citeproc-js works, this may not be the best strategy and require an asyncification of the whole API.

An alternative solution is to expose required locales after instantiating the CSL engine with a style XML, such that the API user can load the relevant locale files asynchronously before making additional calls like makeBibliography().

@fbennett
Copy link
Contributor

fbennett commented Jun 9, 2016

Yes, it does look like it's time to do something about this. I've struggled with the setup dance in the code behind the processor demos, and it's definitely not fun. [1] The alternative solution (allowing instantiation before locales are loaded) will not work with the current processor code, since the instantiation steps assume the presence of required locales. It could be made to work, I think, but I don't relish the deep dig into the code that would be needed. Plus we would probably have a long tail of bug reports from unanticipated breakage.

Off the top of my head, maybe we could combine the approaches in your first and second para? Leaving the current processor synchronous API as-is, we could offer a CSL.initAsync() method (say) that instantiates a framework object, and uses a promise chain to load the style, sniff its locales, load them, and build the processor. Once ready, methods on the framework object would become functional, and a callback set on initAsync() (for page setup etc) would be executed.

I'm up to my ears with other work at the moment (some of it processor-related), so I won't have time to work on it for awhile; but does that line of thought make sense?

[1] As you can see from that code, which relies on web workers and some very ugly callback methods, I'm not quite up to speed on modern ways of handling async, but I'm getting there.

@adomasven
Copy link
Author

The alternative solution (allowing instantiation before locales are loaded) will not work with the current processor code, since the instantiation steps assume the presence of required locales.

Ah, this does make things a bit more complicated.

Off the top of my head, maybe we could combine the approaches in your first and second para? Leaving the current processor synchronous API as-is, we could offer a CSL.initAsync() method (say) that instantiates a framework object, and uses a promise chain to load the style, sniff its locales, load them, and build the processor. Once ready, methods on the framework object would become functional, and a callback set on initAsync() (for page setup etc) would be executed.

Yeah, something like this could work, although it doesn't seem as easy as I thought it may be from implementation side of things. If I understand you right, a function signature like

CSL.Engine.initAsync(style, retrieveItem, retrieveLocale, callback, locale, forceLocale)

where

  • style is the XML/JSON of the csl file (same as now)
  • retrieveItem a function with a signature retrieveItem(itemID) (same as now)
  • retrieveLocale a function with a signature retrieveLocale(locale, localeCallback), where the localeCallback has a signature localeCallback(localeXML)
  • callback with a signature callback(cslEngineInstance)
  • locale optional locale (same as now)
  • forceLocale boolean whether to force the locale (same as now)

So the workflow from the user side of things would be something like this:

function retrieveItem(itemID) {
  return items[itemID]
}

// may be called multiple times with different `locale` param
function retrieveLocale(locale, localeCallback) {
  var xhr = new XMLHttpRequest();
  xhr.open('GET', 'https://raw.githubusercontent.com/Juris-M/citeproc-js-docs/master/locales-' + lang + '.xml', true);
  xhr.onload = function() {
    localeCallback(xhr.responseText); // pass back the locale
  };
  xhr.send(null);
}

CSL.Engine.initAsync(styleXML, retrieveItem, retrieveLocale, function(cslEngine) {
  window.cslEngine = cslEngine;
  //kick off some other process, using cslEngine
});

And on the implementation side of things

CSL.Engine.initAsync = function(style, retrieveItem, retrieveLocale, callback, locale, forceLocale) {
  var localeArray = parseStyleAndReturnLocales(style);
  var localeXMLs = {};
  function retrieveLocales() {
    var localeToLoad = localeArray.pop(); // remove and return last element or `undefined` if empty
    if (localeToLoad) {
      // call the async user locale function
      return retrieveLocale(localeToLoad, function(localeXML) {
        // store localeXML
        localeXMLs[localeToLoad] = localeXML;
        // recursively retrieve remaining locales
        retrieveLocales();
      });
    }
    // `localeToLoad` is undefined meaning all locales have been loaded asynchronously
    var sys = {
      retrieveItem: retrieveItem,
      retrieveLocale: function(localeName) {
        return localeXMLs[localeName];
      }
    }
    var cslEngine = new CSL.Engine(sys, style, locale, forceLocale);
    // return the csl engine to the user
    callback(cslEngine);
  }
  // kick off the asynchronous load of locales
  retrieveLocales();
}

Essentially, the only missing part which makes asyncification difficult as citeproc-js user is that we don't have the code for parseStyleAndReturnLocales(style), which is part of citeproc-js. In fact, if you exposed a function like CSL.Style.parseAndReturnLocales(style), the whole convoluted initAsync wouldn't be necessary, as the user could pre-load relevant locales and then provide a proper sys object to new CSL.Engine()

And regarding promises, I wouldn't worry too much about it anyway. Any API that uses callbacks can be promisified by its users and those with experience using promises wouldn't have a problem doing it themselves.


So yeah, while writing this example out I arrived at the two solutions:

  1. Probably simpler, expose a function CSL.Style.parseAndReturnLocales(style)
  2. More complicated CSL.Engine.initAsync()

I guess the only issue here is that this way the styles file would have to be parsed twice (once to figure out which locales are needed by the style file and once after instantiating the engine). I don't think there is a way to change without modifying the engine constructor, which would potentially involve digging through a lot of code. Then again, parsing the locale file twice takes comparatively little file in relation to XHR requests, so not a big deal.

@fbennett
Copy link
Contributor

Very good - thanks for putting time in on the examples. I have code for parsing locales out of a style, and it can be tacked onto the raw CSL object to expose it. I'll try to get that in place reasonably soon, and document it in the ReadTheDocs docs.

@dsifford
Copy link
Contributor

I load all resources asynchronously in my academic bloggers toolkit repo. You can look there for reference.

Its basically just a series of stacked promises.

@adomasven
Copy link
Author

I took a look, but it appears you're loading a locale as specified by wordpress installation before CSL engine instantiation.

Correct me if I'm wrong, but I believe different csl styles require different locales specified within the style xml. The purpose of retrieveLocale() is to provide the locale xml, but because the function has to be synchronous and available on CSL engine construction there is no way to load the appropriate locales asynchronously.

@dsifford
Copy link
Contributor

I believe different csl styles require different locales specified within the style xml

I don't think this is correct. But then again, I could be wrong.

The way I understand it is that the locale xml simply defines how each citation is parsed according to the user's locale. I confirmed this with a Thai user.

@dstillman
Copy link
Contributor

dstillman commented Jun 11, 2016

I believe different csl styles require different locales specified within the style xml

I don't think this is correct. But then again, I could be wrong.

@adomasven is correct. Styles can define a default-locale. E.g., the Nature style sets default-locale="en-GB", since nothing submitted to Nature should use a non-English locale, regardless of what is set by the user.

@adomasven
Copy link
Author

@dsifford by the way, I gathered this from old docs citeproc-js, inaccesible via the direct address now, thus a web.archive link

@dsifford
Copy link
Contributor

@adomasven is correct. Styles can define a default-locale. E.g., Nature sets default-locale="en-GB".

Ah, that would explain the "unable to find locale "en-GB" warnings I get in the console.

I concur with your thoughts then. Citeproc should be retrieving these resources asyncronously using HTTP from the github repo. That wouldn't be difficult to shim into the current processor.

@dstillman
Copy link
Contributor

No, that's not what we're suggesting. citeproc-js shouldn't rely on an external HTTP service — getting the locales is up to the consumer (which very well may have them cached locally). The point is just that the code flow here needs to accommodate asynchronous loading, since even a load from a local file might be asynchronous.

@dsifford
Copy link
Contributor

Fair enough.

I'll confess that I really didn't read through this entire conversation, so I can't really speak directly to your suggestions.

I'll sit back and watch you guys hash this out. Too busy with other things to contribute right now.

@dsifford
Copy link
Contributor

dsifford commented Aug 3, 2016

I now understand this issue.

What's the status on this @fbennett ? Need a hand with anything?

@dsifford
Copy link
Contributor

dsifford commented Aug 3, 2016

@adomasven Not sure if you're still needing a way to grab locales on the fly, but I came up with something for the time being using Web Workers (as @fbennett suggested).

Basically, here's the gist of what I have happening:

When the app fires up, I use a promise chain to grab the current user's default locale from the github repo and store it in an ES6 map called localeStore, while simultaneously calling the Worker to grab all the others.

After I have the default locale, I instantiate the CSL.Engine sys object with a function that is bound to the calling class.

Here's that function: (note: I use typescript)

private getRemoteLocale(loc: string): string {
        const normalizedLocale = <string>this.locales[loc] || 'en-US';
        const fallback = <string>this.locales[this.store.locale] || 'en-US';
        return this.localeStore.has(normalizedLocale)
            ? this.localeStore.get(normalizedLocale)
            : this.localeStore.get(fallback);
    }

When citeproc calls this function, it first tries to get whatever locale that it needs. If it can't, it serves back the user's default locale.

Here is the small worker that I use.

function fetchLocales() {
    return new Promise(function(resolve, reject) {
        var req = new XMLHttpRequest();
        req.onreadystatechange = function() {
            if (req.readyState === 4) {
                if (req.status !== 200) reject(new Error(req.responseText));
                resolve(JSON.parse(req.responseText));
            }
        };
        req.open("GET", "https://api.github.com/search/code?q=repo%3Acitation-style-language%2Flocales%20extension%3Axml");
        req.send(null);
    })
    .then(function(data) {
        var promises = [];
        data.items.forEach(function(item) {
            promises.push(this.fetchSingleLocale(item.name));
        });
        Promise.all(promises).then(function() {
            self.close();
        });
    })
    .catch(function(e) { return e; });
}

function fetchSingleLocale(file) {
    return new Promise(function(resolve, reject) {
        var req = new XMLHttpRequest();
        req.onreadystatechange = function() {
            if (req.readyState === 4) {
                if (req.status !== 200) reject(new Error(req.responseText));
                var key = file.match(/locales-(.+).xml/)[1];
                self.postMessage([key, req.responseText]);
                resolve(true);
            }
        };
        req.open("GET", 'https://raw.githubusercontent.com/citation-style-language/locales/master/' + file);
        req.send(null);
    })
}

fetchLocales();

Finally, when the worker posts the data back to the main thread, it gets saved to the localeStore.

constructor(store) {
        this.store = store;
        this.worker = new Worker(ABT_meta.jsURL + 'worker.js');
        this.worker.onmessage = (e) => {
            this.localeStore.set(e.data[0], e.data[1]);
        }
        this.worker.postMessage('');
        this.init();
    }

Feel free to rip me apart if you think this could be made better 👍

Here's the repo if you'd like to dig into this further (still haven't pushed the code to it yet though): https://github.com/dsifford/academic-bloggers-toolkit/tree/mobx

@dsifford
Copy link
Contributor

dsifford commented Aug 3, 2016

FYI: The endpoint in the fetchLocales function above is slightly incorrect. To make this fetch ALL of them (rather than the first 30), you need to add &per_page=100 to the end of the URL.

@fbennett
Copy link
Contributor

To simplify this problem of identifying style-specific locales, I've added a synchronous method to the processor that can be run before the processor is instantiated. Here is its signature:

  CSL.getLocaleNames = function (myxml, preferredLocale)

The first argument myxml is the style code, in whatever form it's available (serialized XML, DOM, or E4X if that's recognized in the installed processor). preferredLocale is the optional locale to be fed to the processor, if any (e.g. "fr-FR"). The method returns an array of unique normalized locale IDs corresponding to standard CSL locales, that can be used to generate filenames etc., for use in loading locale data in advance of processor instantiation.

@dsifford
Copy link
Contributor

@fbennett I'm not sure I'm completely understanding what that function would be used for but, pushing that aside for the moment, what would be the expense of making the call asyncronous? Would that require a gigantic refactor or something?

Reason for me asking: I'm not sure how much longer synchronous anything will be around in modern browsers. Chrome and Firefox are actively trying to kill it.

@fbennett
Copy link
Contributor

fbennett commented Sep 22, 2016

It just does what it says. It's a helper function, if you don't need it, nothing is lost.

It's just a little function that extracts some data from a string or a DOM object already in memory, applies a mapping, and returns the result. Since the XML for input has to arrive via I/O somehow, it's probably going to be running in a callback anyway; but if it needs to be run async, caller could wrap it in a promise.

(Here is a little sample of its use. As you can see, it doesn't solve the larger problem of asynchronous loading, it just gathers the data necessary to do so, following suggestion (1) of @adomasven above.)

@dsifford
Copy link
Contributor

@fbennett Ah, I see. Thanks for the clarification.

@fbennett
Copy link
Contributor

fbennett commented Feb 5, 2017

Closing this - the helper function mentioned above addresses the original issue raised in the thread.

@fbennett fbennett closed this as completed Feb 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants