Skip to content
This repository has been archived by the owner on Mar 19, 2024. It is now read-only.

Advice on using in web server #276

Open
atomkirk opened this issue Jul 15, 2017 · 26 comments
Open

Advice on using in web server #276

atomkirk opened this issue Jul 15, 2017 · 26 comments

Comments

@atomkirk
Copy link

Is there any advice about how to use fasttext on a production web server?

  1. Does the data have to come from a file or can it be passed to fasttext directly? (perhaps after being loaded into memory from a database)
  2. Can the model be returned in memory so it can be stored in a db or does it have to be written to a file?
  3. Can fasttext read to and write to presigned s3 urls directly or can it only read from local hd?
  4. Any advice on training on billions of examples (too many to load into memory all at once?)
@loretoparisi
Copy link

@atomkirk you can use the compiled binary, load once, and query by pipe the output to stdout from stdin as you normally would do in the shell.

See here my Node.js implementation. It works pretty good supposed that you fork process on demand.

@bittlingmayer
Copy link

There are a few npm packages floating around out there -- fasttext-js, node-fasttext-wrap, node-fasttext -- but none is really ready for production.

@loretoparisi
Copy link

loretoparisi commented Jul 25, 2017

in my experience, a node child_process would do its work on heavy load without any problem with express + node cluster etc. to serve api in production. The best solution would be to build a node native module an wrap the C headers, but I'm not aware of this solution on npm/github. Of course, a python wrapper would work the same. The choice depends on your production environment.

@dcsan
Copy link

dcsan commented Aug 13, 2017

gensim has a pretty full featured wrapper with bindings for all the parameters
https://radimrehurek.com/gensim/models/wrappers/fasttext.html

@matanox
Copy link

matanox commented Sep 20, 2017

@loretoparisi Just to note that your problem there can be memory. The model may typically comprise several hundreds of megabytes, possibly approaching one GB, replicated in each worker child process running the fasttext executable. A further scalable solution may involve using the original fasttext C api, from a good C/C++ web server (provided that queries are batched such that the http communication costs are amortized). Or making a PR to load a model from shared memory.

@loretoparisi
Copy link

loretoparisi commented Sep 21, 2017

@matanster thanks for point this out. In my experience I do not have memory leaks in the library, of course if you do pre-processing of the training/test set is better or use node.js streaming api. The response time of the model using a node child_process + an express api is in msec is acceptable (on average the inference api takes ~200msec), and I do not have performance metrics on a C++ server (do you have any suggestion)? The choice depends on your infrastructure as well, etc.
I agree for model loading over shared memory. Let's consider that as soon as you startup your model like:

var fastText = new FastText({
    serializeTo: './band_model',
    trainFile: './band_train.txt'
});

fastText.train()
.then(done=> {
    console.log("train done.");
})
.catch(error => {
    console.error(error);
})

the fastText instance keeps the model loaded so you can do inference (predict) like

var sample="Our Twitter run by the band and crew to give you an inside look into our lives on the road. Get #FutureHearts now: http://smarturl.it/futurehearts";
fastText.load()
.then(done => {
    return fastText.predict(sample);
})
.then(labels=> {
    console.log("TEXT:", sample, "\nPREDICT:",labels );
    sample="LBi Software provides precisely engineered, customer-focused #HRTECH solutions. Our flagship solution, LBi HR HelpDesk, is a SaaS #HR Case Management product.";
    return fastText.predict(sample);
})
.then(labels=> {
    console.log("TEXT:", sample, "\nPREDICT:",labels );
    fastText.unload();
})
.catch(error => {
    console.error(error);
});

in your running web service instance. See fasttext.js for more details.

@matanox
Copy link

matanox commented Sep 21, 2017

@loretoparisi in the shared memory suggestion, I meant OS shared memory, so that the model occupies a single copy for all spun up fasttext processes, to reduce memory consumption for concurrent implementations. Not because it is better than teasing apart fasttext into a proper library, but because it might be easier to get a PR merged with this small additional feature.

But as to latency, I recall getting a response time of around 1.5 msec per single sentence in my clojure server implementation, so 200 msecs per response sounds a little odd. I mean it is closer to starting a fasttext process for every request, which I recall taking about 0.5 msecs. These measurements were taken on a 6th generation i7. (One or both of us surely have their numbers or implementation wrong). 200 msecs is really a lot for this, unless you're actually talking about predicting a large set of inputs in one request.

@loretoparisi
Copy link

loretoparisi commented Sep 21, 2017

@matanster thanks for your stats. The 200msec response time is at json api level i.e. express api response time let's say the call to app.get('/api/predict') plus the fasttext.js predict method call response time. It would be interesting to calculate better metrics directly i.e. the response time of the predict method call only.

@myoldusername
Copy link

myoldusername commented Oct 15, 2017

@loretoparisi please can you tell me how
you can use the compiled binary, load once, and query by pipe the output to **stdout** from **stdin** as you normally would do in the shell.
I dont want to use node , i need my web server to pass from php exec to fasttext , but it has to be loaded into memory

@dcsan
Copy link

dcsan commented Oct 15, 2017

@myoldusername that sounds like a php specific question.

the php exec method doesnt seem to have a way to capture output of a running process and to stream new queries into it. maybe there is some hack you can do with & process to fork it.
http://php.net/manual/en/function.exec.php

by comparison node allows you to spawn a child process and then continuously listen for on('data', .. update events and also pipe new data into the process. fasttext supports this with the - operator for interactive mode.
https://nodejs.org/api/child_process.html#child_process_child_process_spawn_command_args_options

@myoldusername
Copy link

myoldusername commented Oct 16, 2017

@dcsan thank you for your reply , well i can not use node since my the node fasttext needs centos >= 7.

some one told me this :

create a named pipe with mkfifo, and have that be the input to fasttext. 
then you direct script output to that named pipe.
root@server [/fasttext] mkfifo testpipe
root@server [/fasttext] ./fasttext predict-prob model_gender.bin testpipe

But when i send string to the "testpipe", fasttext print to the std command line and exit ,
How can i let fasttext return to the named pipe ? and keep it self in the memory in this way i can use any php script to send shell_exec("echo sample > testpipe"); but since i am not thats good with linux , i am asking for your kindly advice

@dcsan
Copy link

dcsan commented Oct 16, 2017

you need to add the - param when you run fast text to tell it to use interactive mode.

@myoldusername
Copy link

@dcsan it's seems very hard to me to do this .... OMG LOL
./fasttext predict-prob model_gender.bin - < testpipe

Now the testpipe redirect any string to fasttext , OK

(echo -e "Michel" && cat) > testpipe

Send a Michel string to fasttext , now fasttext window print to the stdout the prediction , and the
(echo -e "Michel" && cat) > testpipe
hang to insert a new line ....

i want to send
./fasttext predict-prob model_gender.bin - < testpipe to background , maybe with & and when i send any text to the testpipe it will directly print to stdout and exit , but the main ./fasttext predict-prob model_gender.bin - < testpipe has to be stay in memory for farther request ,

Please kindly may you help me with this !

@loretoparisi
Copy link

@myoldusername maybe you want to send to background a process with input and feed contents from another window? This is possibile using that process (fasttext in background) pid from the secondo screen.

@myoldusername
Copy link

@loretoparisi please kindly may show me a small demo how can i do it? Please, honestly i gave up...

@loretoparisi
Copy link

@myoldusername well a quick and dirty solution is using a fifo

$ mkfifo in
$ tail -f in | ./fasttext predict-prob /root/ft_model.bin - 2

and in another shell you do:

 $ echo "hello classify me" > in

so you will get

$ tail -f in | ./fasttext predict-prob /root/ft_model.bin - 2
__label__spam 1 __label__ham 1.95313e-08

A better approach will be to get the output redirection in your client command line like this:

$ nohup ./fasttext predict-prob /root/ft_model.bin - 2 < in &

and then something like

$ echo "hello classify me" >> in

but this could be a bit tricky to be handled. I strongly suggest a wrapper like my node client or a python btw.

@dcsan
Copy link

dcsan commented Oct 17, 2017

wow, nice tips @loretoparisi !

@myoldusername you could try getting a free node hosting, there are lots of options. It does seem php doesn't have good support for this type of streaming i/o to system level tasks which is what node really excels at. Python has generally better NLP libraries though, but the higher level support for fastText seems to be lagging a generation behind.

@myoldusername
Copy link

@loretoparisi i love to use node but it seems the fastText thats compiled with it only works on CentOS 7, is thete a workaround for this. My centos is 6.9 and getting a free one is not a solution, traffic is involved. Otherwise wit service is available. But i love locally apps

Regarding your example, thank you very much, but can i replace node fastText tool with the one i already compiled in my box?

Regards

@loretoparisi
Copy link

@myoldusername yes just replace the executable file in bin folder here. I will update the library soon with a better implementation that handles system wide executable installation (so it will work if you have like /usr/local/binfasttext as well)

@loretoparisi
Copy link

loretoparisi commented Oct 18, 2017

I have updated fasttext.js with a better child process handing, bug fixes and a full working server prediction api example.

@cpuhrsch
Copy link
Contributor

Hello all,

Thank you for working on this together. Can we consider this issue resolved or should we keep it open?

Thanks,
Christian

@bittlingmayer
Copy link

@cpuhrsch

I suppose it would make sense to update the documentation to point to the unofficial but reasonably popular libs for various languages - wrappers, or anything else tightly integrated with fastText. At least that is the approach I see similar libs taking.

Somebody ctrl+Fing https://github.com/facebookresearch/fastText or searching the whole site or whole repo for python or pip install or nodejs or npm should find something useful, of course with clear wording about what is unofficial.

@matanox
Copy link

matanox commented Jan 24, 2018

It's helpful IMHO, if the library turned into a library in the sense of providing a safe API, rather than having only a main function and wrappers in other languages. We do not all want to use node.js, and there'd be further benefit in a flexible API enabling a "proper" embedding. Sure, node.js is fast, but there's a limit to how much throughput you can get off a single fasttext process. Nor is fasttext necessarily known to be tested, as a long-running daemon-like process.

Right now fasttext is not really a library just yet, apart from having some python API. It is a command line with a (python) API, and you cannot use it at ultra-high concurrency until an API has been designed to use one copy of the model in memory for prediction by many threads/processes. Unless you are willing to have the model duplicated in RAM by the number of disparate processes/threads making predictions over it (models start at 0.5GB typically), it's going to be highly wasteful or just restrictive.

It is pretty fast in prediction, but the C++ code base should be relatively easy to refactor into a library with better concurrency support and testing.

@loretoparisi
Copy link

loretoparisi commented Jan 26, 2018

@matanster that makes sense, in my experience just consider that if you run inference from the compiled fasttext C++ you get like 10-12msec average response time. That said if you use node.js (like mine) or python wrappers, you have these options:

  • a native binding header
  • a executable fork (e.g. for node.js using the child_process, etc.) for each model to run as an api
  • a queue in front of that executable model in order to handle incoming requests and serve the inference with some policy of your choice.
  • you must consider the api stack overhead in any case.

Regarding node.js, if you have a express environment or a node http server, it's the best choice in my opinion. You can have a look at some simple benchmarks I did for the simple model provided by facebook as language identification api here:
https://github.com/loretoparisi/fasttext.js/blob/master/benchmarks.md

I agree that it cannot be used as-it-is if you want to achieve the best performances as a web service and if you will to handle concurrency and multi-threading, there is need for a new C++ wrapper for that, but I beat that this is not the aim of this library as it is here, maybe some different repo.

@matanox
Copy link

matanox commented Jan 27, 2018

I find there is no need for this library to implement a web server/service. It suffice for it to expose a concurrency considerate C++ API, to let other do just that. Right now I doubt it is should be called a library, other than providing a python API it is a tool, and the distance for making the C++ code a library is so small.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants