Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

save vectors in parallel #843

wants to merge 2 commits into
base: master
Choose a base branch


Copy link

@mpgarate mpgarate commented Jul 6, 2019


This PR is a followup to issue 755.

It uses threads to parallelize the saveVectors method across multiple cores, which dramatically reduces runtime for large datasets. The order of the output is not guaranteed.


With binary built from master branch:

$ time ./fasttext-master supervised -input data/amazon_review_polarity_csv/train.csv -output out-single                                                                                  
Read 282M words
Number of words:  6735845
Number of labels: 0
Progress: 100.0% words/sec/thread: 2829461 lr:  0.000000 loss:      -nan ETA:   0h 0m

real    7m39.323s
user    11m50.848s
sys     0m25.348s

With binary built using parallel saveVectors

$ time ./fasttext supervised -input data/amazon_review_polarity_csv/train.csv -output out-parallel                                                                                       
Read 282M words
Number of words:  6735845
Number of labels: 0
Progress: 100.0% words/sec/thread: 2754298 lr:  0.000000 loss:      -nan ETA:   0h 0m

real    3m21.412s
user    17m18.682s
sys     0m18.058s

Validated equal outputs with:

$ wc -l out-parallel.vec
6735846 out-parallel.vec
$ wc -l out-single.vec
6735846 out-single.vec
$ cat out-single.vec | sort > out-single.vec.sorted
$ cat out-parallel.vec | sort > out-parallel.vec.sorted
$ cmp out-single.vec.sorted out-parallel.vec.sorted 

CPU details

Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
Address sizes:       39 bits physical, 48 bits virtualCPU(s):              8
On-line CPU(s) list: 0-7Thread(s) per core:  2
Core(s) per socket:  4Socket(s):           1NUMA node(s):        1Vendor ID:           GenuineIntel
CPU family:          6Model:               142
Model name:          Intel(R) Core(TM) i7-8650U CPU @ 1.90GHzStepping:            10
CPU MHz:             800.038CPU max MHz:         4200.0000
CPU min MHz:         400.0000BogoMIPS:            4224.00
Virtualization:      VT-xL1d cache:           32K
L1i cache:           32KL2 cache:            256K
L3 cache:            8192K
NUMA node0 CPU(s):   0-7
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp md_clear flush_l1d
* save vectors in parallel
Copy link

@mpgarate mpgarate commented Sep 1, 2019

I updated the branch to resolve a merge conflict that came up since submitting the original PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants