-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File filter super slow with 3500 notes #35
Comments
Thanks for raising the issue. =) There are three topics I can think of regarding to the performance:
If I understood correctly, your major concern is searching.
I don't understand how "not including the document body in search results" is related to the performance. Could you explain this more? We we do search, we search the document body, too. The searching is done within lunr. By the way, if you have a large file, try searching with "Find in project" feature in Atom. It is Cmd+Shift+F in OSX. I wonder how the search quality differs between lunr and atom in such a large directory case. |
We can consider migrating to search-index from lunr. Need a performance comparison first. |
I'd love to tackle this. Here are the steps I'm thinking:
I've got a few minutes of hack time this weekend so I'm going to dive into this. |
This is awesome, Jonathan! I think this benchmark is something worth for a blog posting. (I've been searching for benchmarks by myself but couldn't find it.) |
👍 I like this issue very much. Thanks @jonmagic for looking at the benchmarks. |
Here are some benchmark results and a repo with code to run the benchmarks.
A few things to note:
Looking forward to hearing what folks think. I think it might be worth dropping the docquery requirement, pull over the chokidar dependency and some of the watcher code I wrote in docquery, and then hook up the search-index package directly. Thoughts? |
Wow, your work is impressive! Thanks very much! If I understand correctly, search-index takes more time for the initial loading, but it's much faster for searching. Since we are eventually serialize/deserialize cache anyway (even though search-index does it automatically), search-index is the winner. For reloading files that is modified while atom is dead... we can keep last modified timestamps and do the (possibly async) load when atom starts. For a specific implementation details, I like your suggestion, dropping the docquery from nvatom, since this last-modified-timestamp spec is too specific to be generalized in docQuery. I think I can do the rest, if you don't want to implement it. (However this is not a small project, and I have a limited network access this weekend, so it will take a couple of weeks or more). |
I actually want to take a stab at this if that's ok :) I'm thinking about the following modules (coffeescript classes) right now:
I'll use coffeescript since that is what this project is implemented in, but I may introduce some new things like Promises to make async code cleaner. I'll probably start a PR that outlines my strategy and then do small PR's into the overview PR that implement each module with tests. Ideally we discuss and do some thorough code review before merging any new things. Thoughts? |
It will be an honor to review your code. I'm not familar with coffeescript nor javascript - I never used them in my workspace yet - so it will be a great opportunity to work with you and learn from you. 😄 Thanks! |
Thank you @jonmagic I'm ready to help Ubuntu testing. 👍 This is very exciting to see this progress towards a solid NValt alternative, and cross-platform as well! There hasn't really been a good alternative on Linux since the increasingly defunct
@seongjaelee You fooled me. I've been impressed with your work so far. 😉 |
So I did a lot of hacking on this in the past 8 days, but along the way I was learning a bunch of new stuff so I've mostly thrown out my code and I'm starting from scratch today. Just wanted to leave a note here that I haven't forgotten and work is happening :) Hopefully I'll have something to show in the next few days. |
👍 Good! |
@DivineDominion yes, I've replaced most of the functionality of Edit: I should mention I don't use textual-velocity. I prefer a less structured note-taking approach something more akin to plan9's acme linking approach than notational velocity search/create approach. |
Can you point me to details for what you're referring to ("plan9's acme linking approach")? To me, the linking doesn't matter as long as I get results instantly :) |
Sure, I'll email you with some details so we don't off-topic this issue too much. |
@xhn35rq Thanks |
@benoitdepaire here's some thoughts in a rough form for you. Hope this helps: http://faq.surge.sh/nvalt-notational-velocity-atom/ |
Started working on it. It still uses lunr; but it seems like working. Search index save/load feature works. |
It seems like search-index cannot be used; it returns "Failed to require the main module of 'nvatom' because it requires an incompatible native module." error. |
Meh, so the module is outdated or so? :/ |
I actually am trying to find out the exact reason but couldn't dig more... When atom/atom#6771 is applied, I would track down the further cause. Currently, I'm thinking some module is not compatible with atom - maybe apm's version is too low? The topic-35 branch still uses lunr. I have been refactoring it. I tried with another dataset, https://github.com/xHN35RQ/10000-markdown-files, but it failed to load all items. I isolated only 1,000 items, but it still failed to load. So refactoring is not helping much for the performance, so far. |
I tested with search-index and lunr in javascript (not coffeescript), and and it seems like search-index takes more size, and building takes more time, especially when we are dealing with truely large amount of data. I need to build a repository, something like Jonathan made, to show/prove it to other people here... but I lost my motivation at this point. With lunr, I don't think saving/loading cache would bring a huge speed improvement with large amount of texts... which might imply that nvatom is not scalable. |
Too bad the expectations are negative. I'm going to work on an Electron-based app that does indexing later this year. When the indexing works, I'll try to make an Atom plugin from this.
|
I've begun to wonder if indexing and other intensive manipulations should be handled outside the Electron ecosystem. I'm starting to think Electron should be used for user interface, while the indexing & searching should be handled by OS native applications & scripts. We have excellent languages like Go and Rust which can be used to build the indexing/search database, while Electron can be used to it's strength of user interface. Some examples and relevant links:
In summary: It seems that a modular approach similar to the Xi editor where you split the UI from the back-end would allow you to support multiple front-ends, not only Electron, but also Atom plugins, native OS UI's and perhaps even plugins for other editors like Sublime, Emacs, Vim... etc. |
xi's approach to use Rust as a backend is cool. This could work: the Atom plugin by default uses some JavaScript library, which will be slow without an index. If a Rust service runs as another process in the background, though, it could query that service for results instead. |
I like this project's concept very much! Somebody brought it up on our blog and I immediately checked this out because I still search for a Notational Velocity-like app on other platforms than Mac.
That being said, indexing my 3500 notes took 28+ seconds. Opening the file locator takes a bit (the lag is noticeable), but that's okay. Filtering the results is impossibly slow though.
It seems nvatom doesn't include the document body in search results of DocQuery. I guess that there's the real bottleneck.
I'll have a look at the source myself and see whether the performance can be increased.
The text was updated successfully, but these errors were encountered: