Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File filter super slow with 3500 notes #35

Open
DivineDominion opened this issue Sep 26, 2015 · 26 comments
Open

File filter super slow with 3500 notes #35

DivineDominion opened this issue Sep 26, 2015 · 26 comments

Comments

@DivineDominion
Copy link

I like this project's concept very much! Somebody brought it up on our blog and I immediately checked this out because I still search for a Notational Velocity-like app on other platforms than Mac.

That being said, indexing my 3500 notes took 28+ seconds. Opening the file locator takes a bit (the lag is noticeable), but that's okay. Filtering the results is impossibly slow though.

It seems nvatom doesn't include the document body in search results of DocQuery. I guess that there's the real bottleneck.

I'll have a look at the source myself and see whether the performance can be increased.

@seongjaelee
Copy link
Owner

Thanks for raising the issue. =)

There are three topics I can think of regarding to the performance:

  • Initiation: activating nvatom for the first time takes a long time
    • we can probably cache what is in lunr to make it a bit faster?
    • we can start activiating the package as soon as Atom launches in the background.
  • Memory: nvatom takes a lot of memory - is it? (this is a tradeoff between speed and memory.)
  • Searching: searching takes a long time - this is a lunr problem.

If I understood correctly, your major concern is searching.

It seems nvatom doesn't include the document body in search results of DocQuery.

I don't understand how "not including the document body in search results" is related to the performance. Could you explain this more? We we do search, we search the document body, too. The searching is done within lunr.

By the way, if you have a large file, try searching with "Find in project" feature in Atom. It is Cmd+Shift+F in OSX. I wonder how the search quality differs between lunr and atom in such a large directory case.

@seongjaelee
Copy link
Owner

We can consider migrating to search-index from lunr. Need a performance comparison first.

@jonmagic
Copy link
Contributor

We can consider migrating to search-index from lunr. Need a performance comparison first.

I'd love to tackle this. Here are the steps I'm thinking:

  • find a corpus of text large enough to build some meaningful benchmarks against
  • for both lunr and search-index benchmark:
    • time to load documents
    • time to search documents
    • memory usage

I've got a few minutes of hack time this weekend so I'm going to dive into this.

@seongjaelee
Copy link
Owner

This is awesome, Jonathan! I think this benchmark is something worth for a blog posting. (I've been searching for benchmarks by myself but couldn't find it.)

@ghost
Copy link

ghost commented Sep 27, 2015

👍 I like this issue very much. Thanks @jonmagic for looking at the benchmarks.

@jonmagic
Copy link
Contributor

Here are some benchmark results and a repo with code to run the benchmarks.

~/Projects/benchmark-javascript-search-engines (master)!🐑 💨 ✨  npm run benchmark 

> benchmark-javascript-search-engines@ benchmark /Users/jonmagic/Projects/benchmark-javascript-search-engines
> node benchmark.js

Starting up
Loading fixtures
Fixtures loaded
Starting benchmark
fs#readdir x 235 ops/sec ±32.98% (60 runs sampled)
fs#readFile x 663 ops/sec ±1.29% (81 runs sampled)
fs#stat x 709 ops/sec ±1.13% (37 runs sampled)
lunr#add 10 documents x 1,765 ops/sec ±2.02% (85 runs sampled)
search-index#add 10 documents x 7.64 ops/sec ±18.58% (23 runs sampled)
lunr#search x 2.05 ops/sec ±2.77% (10 runs sampled)
search-index#search x 157 ops/sec ±3.29% (76 runs sampled)
Benchmark of 642 documents complete

~/Projects/benchmark-javascript-search-engines (master)!🐑 💨 ✨  npm run benchmark 

> benchmark-javascript-search-engines@ benchmark /Users/jonmagic/Projects/benchmark-javascript-search-engines
> node benchmark.js

Starting up
Loading fixtures
Fixtures loaded
Starting benchmark
fs#readdir x 203 ops/sec ±41.48% (53 runs sampled)
fs#readFile x 662 ops/sec ±1.10% (82 runs sampled)
fs#stat x 711 ops/sec ±1.03% (30 runs sampled)
lunr#add 10 documents x 1,817 ops/sec ±1.65% (86 runs sampled)
search-index#add 10 documents x 8.05 ops/sec ±19.00% (25 runs sampled)
lunr#search x 2.09 ops/sec ±3.01% (10 runs sampled)
search-index#search x 159 ops/sec ±3.25% (77 runs sampled)
Benchmark of 642 documents complete

~/Projects/benchmark-javascript-search-engines (master)!🐑 💨 ✨  npm run benchmark 

> benchmark-javascript-search-engines@ benchmark /Users/jonmagic/Projects/benchmark-javascript-search-engines
> node benchmark.js

Starting up
Loading fixtures
Fixtures loaded
Starting benchmark
fs#readdir x 207 ops/sec ±37.45% (55 runs sampled)
fs#readFile x 660 ops/sec ±1.10% (79 runs sampled)
fs#stat x 712 ops/sec ±1.14% (30 runs sampled)
lunr#add 10 documents x 1,798 ops/sec ±2.06% (84 runs sampled)
search-index#add 10 documents x 7.86 ops/sec ±18.82% (23 runs sampled)
lunr#search x 2.12 ops/sec ±2.51% (10 runs sampled)
search-index#search x 160 ops/sec ±2.97% (77 runs sampled)
Benchmark of 642 documents complete

~/Projects/benchmark-javascript-search-engines (master)!🐑 💨 ✨  npm run benchmark 

> benchmark-javascript-search-engines@ benchmark /Users/jonmagic/Projects/benchmark-javascript-search-engines
> node benchmark.js

Starting up
Loading fixtures
Fixtures loaded
Starting benchmark
fs#readdir x 216 ops/sec ±35.36% (60 runs sampled)
fs#readFile x 659 ops/sec ±1.10% (81 runs sampled)
fs#stat x 715 ops/sec ±1.33% (31 runs sampled)
lunr#add 10 documents x 1,789 ops/sec ±2.11% (86 runs sampled)
search-index#add 10 documents x 7.80 ops/sec ±20.73% (23 runs sampled)
lunr#search x 2.13 ops/sec ±2.49% (10 runs sampled)
search-index#search x 170 ops/sec ±3.21% (75 runs sampled)
Benchmark of 642 documents complete

A few things to note:

  • I benchmarked some fs operations because they are also used in nvatom for loading directory and file contents and metadata
  • I did not tune either of the search engines in any way, it may be possible to improve performance through configuration. For example, with search-index is it possible to search on a field without storing the whole field in the index? We don't need it in the returned search result since we're going to read it from disk again anyways.
  • search-index has a distinct advantage over lunr in that it saves the index to disk and can be searched on immediately. We've talked about hooking up storage to lunr to do something similar but we get it for free with search-index. There is a caveat of course, since we're not loading the files into memory from scratch each time we'll need an algorithm to delete or update indexed documents when something happens to them outside of the nvatom workflow (where we can apply changes immediately).

Looking forward to hearing what folks think. I think it might be worth dropping the docquery requirement, pull over the chokidar dependency and some of the watcher code I wrote in docquery, and then hook up the search-index package directly. Thoughts?

@seongjaelee
Copy link
Owner

Wow, your work is impressive! Thanks very much!

If I understand correctly, search-index takes more time for the initial loading, but it's much faster for searching. Since we are eventually serialize/deserialize cache anyway (even though search-index does it automatically), search-index is the winner.

For reloading files that is modified while atom is dead... we can keep last modified timestamps and do the (possibly async) load when atom starts.

For a specific implementation details, I like your suggestion, dropping the docquery from nvatom, since this last-modified-timestamp spec is too specific to be generalized in docQuery.

I think I can do the rest, if you don't want to implement it. (However this is not a small project, and I have a limited network access this weekend, so it will take a couple of weeks or more).

@jonmagic
Copy link
Contributor

I think I can do the rest, if you don't want to implement it. (However this is not a small project, and I have a limited network access this weekend, so it will take a couple of weeks or more).

I actually want to take a stab at this if that's ok :) I'm thinking about the following modules (coffeescript classes) right now:

  • file watcher, a wrapper on chokidar or the atom file watcher that just provides a nice interface specifically for our needs
  • document indexer, adds, updates, and deletes records in the search index
  • search, provides the search interface which works with a query or returns a list of recently modified files, and also scrubs search results of any items that no longer match the file they represent on disk and emits an event that the document indexer module can listen for to make sure the index gets updated
  • user interface, everything you've already built

I'll use coffeescript since that is what this project is implemented in, but I may introduce some new things like Promises to make async code cleaner. I'll probably start a PR that outlines my strategy and then do small PR's into the overview PR that implement each module with tests. Ideally we discuss and do some thorough code review before merging any new things. Thoughts?

@ghost ghost mentioned this issue Sep 29, 2015
@seongjaelee
Copy link
Owner

It will be an honor to review your code. I'm not familar with coffeescript nor javascript - I never used them in my workspace yet - so it will be a great opportunity to work with you and learn from you. 😄 Thanks!

@ghost
Copy link

ghost commented Sep 29, 2015

Thank you @jonmagic I'm ready to help Ubuntu testing. 👍 This is very exciting to see this progress towards a solid NValt alternative, and cross-platform as well! There hasn't really been a good alternative on Linux since the increasingly defunct nvpy And there's really no good Nvalt solution for Windows. So nvatom fills a much-needed gap for many users, myself included.

I'm not familar with coffeescript nor javascript

@seongjaelee You fooled me. I've been impressed with your work so far. 😉

@jonmagic
Copy link
Contributor

jonmagic commented Oct 6, 2015

So I did a lot of hacking on this in the past 8 days, but along the way I was learning a bunch of new stuff so I've mostly thrown out my code and I'm starting from scratch today. Just wanted to leave a note here that I haven't forgotten and work is happening :) Hopefully I'll have something to show in the next few days.

@ghost
Copy link

ghost commented Oct 7, 2015

👍 Good!

@jonmagic jonmagic mentioned this issue Oct 26, 2015
8 tasks
@DivineDominion
Copy link
Author

So since #41 is effectively abandoned, did you have luck with your setup @xhn35rq?

@ghost
Copy link

ghost commented May 29, 2016

@DivineDominion yes, I've replaced most of the functionality of nvatom with several different plugins. Btw you might check out this new atom plugin: https://atom.io/packages/textual-velocity

Edit: I should mention I don't use textual-velocity. I prefer a less structured note-taking approach something more akin to plan9's acme linking approach than notational velocity search/create approach.

@DivineDominion
Copy link
Author

textual-velocity seems to do a fair job at maintaining notes once it started up. Thanks for the link!

Can you point me to details for what you're referring to ("plan9's acme linking approach")?

To me, the linking doesn't matter as long as I get results instantly :)

@ghost
Copy link

ghost commented May 29, 2016

Can you point me to details for what you're referring to ("plan9's acme linking approach")?

Sure, I'll email you with some details so we don't off-topic this issue too much.

@benoitdepaire
Copy link

@xhn35rq
Would you mind pointing out which plugins you used to replace the functionality of nvatom? I took a look at textual-velocity, but there appears to be an issue when installing on Windows10.

Thanks

@ghost
Copy link

ghost commented Jun 20, 2016

@benoitdepaire here's some thoughts in a rough form for you. Hope this helps: http://faq.surge.sh/nvalt-notational-velocity-atom/

@seongjaelee
Copy link
Owner

seongjaelee commented Jul 3, 2016

Started working on it.
https://github.com/seongjaelee/nvatom/tree/topic-35

It still uses lunr; but it seems like working. Search index save/load feature works.
I tested with 1,190 bible section files, 6.3M of text, and the initial loading takes less than 5 secs... though the size of cache is 4M. I need to benchmark more, and test more. Well, I am using SSD.

@seongjaelee
Copy link
Owner

It seems like search-index cannot be used; it returns "Failed to require the main module of 'nvatom' because it requires an incompatible native module." error.

@DivineDominion
Copy link
Author

Meh, so the module is outdated or so? :/

@seongjaelee
Copy link
Owner

I actually am trying to find out the exact reason but couldn't dig more... When atom/atom#6771 is applied, I would track down the further cause. Currently, I'm thinking some module is not compatible with atom - maybe apm's version is too low?

The topic-35 branch still uses lunr. I have been refactoring it. I tried with another dataset, https://github.com/xHN35RQ/10000-markdown-files, but it failed to load all items. I isolated only 1,000 items, but it still failed to load. So refactoring is not helping much for the performance, so far.

@seongjaelee
Copy link
Owner

I tested with search-index and lunr in javascript (not coffeescript), and and it seems like search-index takes more size, and building takes more time, especially when we are dealing with truely large amount of data. I need to build a repository, something like Jonathan made, to show/prove it to other people here... but I lost my motivation at this point.

With lunr, I don't think saving/loading cache would bring a huge speed improvement with large amount of texts... which might imply that nvatom is not scalable.

@DivineDominion
Copy link
Author

Too bad the expectations are negative. I'm going to work on an Electron-based app that does indexing later this year. When the indexing works, I'll try to make an Atom plugin from this.

On 22 Jul 2016, at 08:01, Seongjae Lee notifications@github.com wrote:

I tested with search-index and lunr in javascript (not coffeescript), and and it seems like search-index takes more size, and building takes more time, especially when we are dealing with truely large amount of data. I need to build a repository, something like Jonathan made, to show/prove it to other people here... but I lost my motivation at this point.

With lunr, I don't think saving/loading cache would bring a huge speed improvement with large amount of texts... which might imply that nvatom is not scalable.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

@ghost
Copy link

ghost commented Jul 22, 2016

I'm going to work on an Electron-based app that does indexing later this year.

I've begun to wonder if indexing and other intensive manipulations should be handled outside the Electron ecosystem. I'm starting to think Electron should be used for user interface, while the indexing & searching should be handled by OS native applications & scripts. We have excellent languages like Go and Rust which can be used to build the indexing/search database, while Electron can be used to it's strength of user interface.

Some examples and relevant links:

  1. Raph Levien's design decisions for his new Xi editor are pertinent, specifically to splitting the UI from the back-end data manipulation.
  2. VSCode by Microsoft is built on Electron, but feels (and is) much faster than Atom. It might be worth investigating their code-base to determine how MS has optimized their Electron app.
  3. I've noticed that adding packages to Atom affects the overall speed, so I've recently begun to use Atom just for the user interface, and move some of the package functionality out into OS native tools. For example, I'm working on a plugin right now that links Atom with the Plan9 Plumber to give Atom enhanced contextual file opening and linking functionality.

In summary: It seems that a modular approach similar to the Xi editor where you split the UI from the back-end would allow you to support multiple front-ends, not only Electron, but also Atom plugins, native OS UI's and perhaps even plugins for other editors like Sublime, Emacs, Vim... etc.

@DivineDominion
Copy link
Author

xi's approach to use Rust as a backend is cool. This could work: the Atom plugin by default uses some JavaScript library, which will be slow without an index. If a Rust service runs as another process in the background, though, it could query that service for results instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants