Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load into Memory #1

Closed
myoldusername opened this issue Oct 18, 2017 · 12 comments
Closed

Load into Memory #1

myoldusername opened this issue Oct 18, 2017 · 12 comments

Comments

@myoldusername
Copy link

myoldusername commented Oct 18, 2017

Dear @loretoparisi
I installed your fasttext.js in order to solve memory problem that we discus about in facebookresearch/fastText#276 (comment)

Now when i run :
node fasttext_predict.js
it take like 5 sec to load the module,

"use strict";

(function() {

var DATA_ROOT='./data';

var FastText = require('./fasttext.js/lib/index');
var fastText = new FastText({
    loadModel: DATA_ROOT + '/model_gender.bin' // must specifiy filename and ext
});

var sample="Bashar Al Masri";
fastText.load()
.then(done => {
    return fastText.predict(sample);
})
.then(labels=> {
    console.log("TEXT:", sample, "\nPREDICT:",labels );
    sample="Hisahm al mjude";
    return fastText.predict(sample);
})
.then(labels=> {
    console.log("TEXT:", sample, "\nPREDICT:",labels );
   fastText.unload();
})
.catch(error => {
    console.error("predict error",error);
});

}).call(this);

and It return to stdout the prediction and exit , due to fastText.unload();
Now i need to call this file "node fasttext_predict.js UserName" from any place passing some args [UserName] to it and return to the stdout the result directly , since you saide it will be loaded into memory , in order to be able to get this result from the php webserver.

It is the same problem with the C++ file loading , i need it to be run in the background !

@loretoparisi
Copy link
Owner

loretoparisi commented Oct 18, 2017

@myoldusername I have just updated the library with several improvements for the child process run. I have also added a server example that will help your needs. Please check the README.

@myoldusername
Copy link
Author

myoldusername commented Oct 18, 2017

Upstanding.... I will test it today since i am out of town, today i will give you a feedback.

You are awesome......

@myoldusername
Copy link
Author

It is working as expected , THANK YOU SO MUCH .
You made my day !

Can i send you a donation ?

@myoldusername
Copy link
Author

Some times it crach when i pass the string text if the string is unicode, like Chinese

I advice you to add normalization method to remove all non characters, e.g all special characters and smiles characters...

@loretoparisi
Copy link
Owner

loretoparisi commented Oct 19, 2017

@myoldusername yes this is a good point there are minor functions in utils like Util.removeDiacritics here https://github.com/loretoparisi/fasttext.js/blob/master/lib/util.js#L238

and the dataset is normalized in FastText.normalize https://github.com/loretoparisi/fasttext.js/blob/master/lib/index.js#L438

but of course for symbolics languages it's different, since it must be handled with Unicode i.e. unicode conversion and normalization before prediction.
Be aware that this normalization must be done on the training set too i.e. you have to apply the same normalization to training/test set and to the sample for the inference.

In my backend I do unicode normalization in Java, but here I would prefer a node solution. Will look into!

@myoldusername
Copy link
Author

Well i am working with language classification training set which provide by fastText with respect to them.

I use to pass some languages paragraphs to the localhost url it works, but some time it suddenly crashed even with normalized strings.. I am not sure i will make farther test to see if my copy paste string has some hidden characters.. Since unicode has some nasty stuff lol.

Regarding node solution, i think it will be an awesome idea to apply.

With respect.

Yours

@loretoparisi
Copy link
Owner

Yes this could be a very tricky task when dealing with languages that needs Unicode. By the way I'm using the same model too, so I have added the compressed version of the model in the example, and some env var so that you can go:

cd examples/
export MODEL=./data/lid.176.ftz 
export PORT=9001
node server

and then

http://localhost:9001/?text=%EB%9E%84%EB%9E%84%EB%9D%BC%20%EC%B0%A8%EC%B0%A8%EC%B0%A8%20%EB%9E%84%EB%9E%84%EB%9D%BC\n%EB%9E%84%EB%9E%84%EB%9D%BC%20%EC%B0%A8%EC%B0%A8%EC%B0%A8%20%EC%9E%A5%EC%9C%A4%EC%A0%95%20%ED%8A%B8%EC%9C%84%EC%8A%A4%ED%8A%B8%20%EC%B6%A4%EC%9D%84%20%EC%B6%A5%EC%8B%9C%EB%8B%A4

that will be correctly detected as KO:

{
	"response_time": 0.001,
	"predict": [{
			"label": "KO",
			"score": "1"
		},
		{
			"label": "TR",
			"score": "1.95313E-08"
		}
	]
}

NOTE
My input text was 랄랄라%20차차차%20랄랄라\n랄랄라%20차차차%20장윤정%20트위스트%20춤을%20춥시다, but when you put in a url it will be automatically encoded with the encodeUriComponent method.

@myoldusername
Copy link
Author

Well i like to bring to your attention that sometime when i pass a regular string, for unknown reasons the node server file freeze and i have to kill it and restart it again..

@loretoparisi
Copy link
Owner

@myoldusername put here that text and the url as cut&paste from the browser

@myoldusername
Copy link
Author

@loretoparisi
Copy link
Owner

loretoparisi commented Oct 20, 2017

uhm I guess you have some issues in your env:

$ export PORT=3030
$ export MODEL=./data/lid.176.ftz 
$ node server.js 
model loaded
server is listening on 3030

you therefore call http://localhost:3030/?text=bader and you get:

{
	response_time: 0.002,
	predict: [{
			label: "EN",
			score: "0.125931"
		},
		{
			label: "CA",
			score: "0.0847617"
		}
	]
}

This should work without any issues:

$ time curl -s "http://localhost:3030/?text=bader"
{
  "response_time": 0,
  "predict": [
    {
      "label": "EN",
      "score": "0.125931"
    },
    {
      "label": "CA",
      "score": "0.0847617"
    }
  ]
}
real	0m0.027s
user	0m0.005s
sys	0m0.006s

and now we do some benchmarking as well calling 1, 10 and 100 times iteratively:

$ ab -n 1 "http://localhost:3030/?text=bader"
This is ApacheBench, Version 2.3 <$Revision: 1757674 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        
Server Hostname:        localhost
Server Port:            3030

Document Path:          /?text=bader
Document Length:        164 bytes

Concurrency Level:      1
Time taken for tests:   0.001 seconds
Complete requests:      1
Failed requests:        0
Total transferred:      271 bytes
HTML transferred:       164 bytes
Requests per second:    712.76 [#/sec] (mean)
Time per request:       1.403 [ms] (mean)
Time per request:       1.403 [ms] (mean, across all concurrent requests)
Transfer rate:          188.63 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     1    1   0.0      1       1
Waiting:        1    1   0.0      1       1
Total:          1    1   0.0      1       1
[loretoparisi@:mbploreto task]$ ab -n 10 "http://localhost:3030/?text=bader"
This is ApacheBench, Version 2.3 <$Revision: 1757674 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        
Server Hostname:        localhost
Server Port:            3030

Document Path:          /?text=bader
Document Length:        164 bytes

Concurrency Level:      1
Time taken for tests:   0.011 seconds
Complete requests:      10
Failed requests:        4
   (Connect: 0, Receive: 0, Length: 4, Exceptions: 0)
Total transferred:      2726 bytes
HTML transferred:       1656 bytes
Requests per second:    941.00 [#/sec] (mean)
Time per request:       1.063 [ms] (mean)
Time per request:       1.063 [ms] (mean, across all concurrent requests)
Transfer rate:          250.50 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    1   0.5      1       2
Waiting:        0    1   0.3      1       1
Total:          1    1   0.5      1       2

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      2
  95%      2
  98%      2
  99%      2
 100%      2 (longest request)
[loretoparisi@:mbploreto task]$ ab -n 100 "http://localhost:3030/?text=bader"
This is ApacheBench, Version 2.3 <$Revision: 1757674 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        
Server Hostname:        localhost
Server Port:            3030

Document Path:          /?text=bader
Document Length:        168 bytes

Concurrency Level:      1
Time taken for tests:   0.095 seconds
Complete requests:      100
Failed requests:        73
   (Connect: 0, Receive: 0, Length: 73, Exceptions: 0)
Total transferred:      27208 bytes
HTML transferred:       16508 bytes
Requests per second:    1054.37 [#/sec] (mean)
Time per request:       0.948 [ms] (mean)
Time per request:       0.948 [ms] (mean, across all concurrent requests)
Transfer rate:          280.15 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     0    1   1.2      0       9
Waiting:        0    1   1.2      0       9
Total:          0    1   1.2      1       9

Percentage of the requests served within a certain time (ms)
  50%      1
  66%      1
  75%      1
  80%      1
  90%      2
  95%      3
  98%      7
  99%      9
 100%      9 (longest request)

@loretoparisi
Copy link
Owner

I have added here some benchmarkes therefore I'm closing this issue. Feel free to re-open it if you have any problem.

loretoparisi pushed a commit that referenced this issue Mar 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants