Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimal / simple implementation of SQLite-backed history #376

Closed
wants to merge 5 commits into from

Conversation

phiresky
Copy link
Contributor

@phiresky phiresky commented Apr 1, 2022

This PR implements a SQLite-based history. I kept it minimal for easy merging and future expansion.

It should reproduce all features of the txt-backed history (arrow navigation, prefix filtering, history menu etc) and passes all the tests from file_backed. The History trait stayed almost the same so far.

It doesn't really offer any additional features currently except storing an arbitrary context together with each history entry. In the future there should be functionality to filter by session, PWD, hostname, etc, but those will require deeper changes.

when run with --features sqlite, the demo binary will store the history into history.sqlite3 and also add the run timestamp as metadata.

some notes:

  • most of the history methods I think should probably be fallible. i didn't change this since i don't know what kind of error propagation you want - so right now there's more unwrap than should be.
  • the tests for file and sqlite backend should probably be merged (same for every "History" impl). right now i copy-pasted all the tests to the sqlite backed history, but barely modified anything.
  • i think the "cursor" functionality (back, forward, etc) should be separated out into a separate trait, but it works as is
  • i (intentionally) didn't implement the capacity limiting feature
  • the iterator currently loads everything in memory (like the file-based one does), to fix that some of the ownership / lifetimes would need changing
  • right now i json-serialize the context so the library user can decide what to store. this is inefficient in regards to storage, but it's going to be hard to store structured data in a generic manner otherwise. One solution might be to use a binary serialization method that's more efficient, but due to the sqlite json functions it's easy to create efficient indices on the schema as it is right now (e.g. create index on history(context->>'cwd', context->>'timestamp') for directory-based history

@fdncred
Copy link
Collaborator

fdncred commented Apr 2, 2022

@phiresky, First of all, let me say, wow! wow! wow! I was excited to see this PR. Thank you very much.

As you may have heard, if you participate on Discord, that we've been planning this feature for over 7 months now. Your PR kind of justifies our ideas that there would be a demand for such a feature. We have many plans on what I've been calling historydb for some time now. In fact, you'll laugh at this, I had a dream about it last night.

Having thought about this so much, I personally have written two demos/prototypes for such a feature. The most recent, prototype, is here. Without going into a ton of detail, my gut reaction after the wows was to look at my early prototype and your code here and compare and contrast. To be clear, I'm confident that what you have done here is more elegant and coded better than what I wrote - there is no doubt in my mind. I was just hacking things together. But there are a few things we may want to consider updating.

One thing that I'd like to work with you on is storing more information. I don't want to get into a position where we have to start a db with 2 columns, update to 3, update to 4, etc. I'd really like to bypass db migration as much as possible, even though I'm confident we'll have to do it at some point. I'd rather just start out with kind of what we're planning on storing and run with that. If you look at the HistoryItem struct in rsq you can see what we're planning. Some of the columns are debatable, others probably won't be, and still others may be used for secret plans - haha.

Another thing we want to ensure is that we are able to have a Database trait where other datasources can be used like sqlite, redis, others. We've also discussed a connection pool using something like r2d2 although we don't have final decisions.

There are also minor things like sqlite feature flags, initialization things like WAL, transactions, naming the file, etc that are so in the weeds right now that it doesn't really matter, but 'the devil is in the details' so we'll need to get there eventually.

Sorry for the wall of text but like I hope you hear my appreciation for what you've done, and I'm also hoping that you'll be willing to work with us. But I'm not the only voice either. I'm tagging @jntrnr @sholderbach @elferherrera @stormasm @kubouch to jump into the discussion.

@elferherrera
Copy link
Contributor

Thank a lot for helping with this one. As @fdncred pointed out it is a much desired feature.

To complement the previous comment, would you mind moving your implementation to nushell?
The idea for reedline repo is to stay slim, meaning that it has to contain enough for people to use it to build their own projects with it. The DB history doesn't fit the description. The plan for the history is to be in a new create inside nushell crates and to be an implementation for nushell.

If you are willing to continue working with this feature (which I really hope you do), do you mind moving this to nushell? We can work together to make it work like we want it there.

@stormasm
Copy link
Contributor

stormasm commented Apr 2, 2022

@phiresky we really look forward to working with you on getting this concept going... And based on your experience clearly you have some expertise in this area that will complement our skills as a team...

My 2 cents following up to @fdncred and @elferherrera is to possibly (prior to moving the code to nushell / crate) to work with us on a nice design....

Moving the code to a crate is important but we don't want you to do more work than is needed...

One of the reasons for doing this is we are not clear yet on exactly where all of the functionality will lie in terms of the crate / nushell / reedline facets...

As a team we have been hashing out / thinking about / discussing this idea for awhile, with no solid design / or code in place yet except for a nice demo that @fdncred wrote which you should review just to get a sense of some of the concepts we are thinking about....

https://github.com/fdncred/rsq
@fdncred mentions this above in his comments...

I could go on with some more things / but for now I will close...

I have your branch installed locally on my machine and will probably have some more questions about your code. I am in the process of reviewing your code now, and understanding more of what you have written...

Again, many thanks ! for all of your hard work so far... As a team, we clearly look forward to working with you...

@stormasm
Copy link
Contributor

stormasm commented Apr 2, 2022

@phiresky I have your code installed and I am trying to see the history land in the sqlite database...

Here are the steps I am running:

  1. cargo build --features=sqlite
  2. cargo run

Then I type in some commands / however they are landing in the history.txt file...

What other steps do I need to run to see the history land in the db ?

@phiresky
Copy link
Contributor Author

phiresky commented Apr 2, 2022

I was just hacking things together

Me too ;)

storing more information

The way I wrote it is there's a single "context" column that can store whatever information wanted - which in the case of nushell would be e.g. timestamp, hostname, pwd, exit status, runtime.
The advantage here is that migrations shouldn't be needed, since you can just add more (optional) items to your HistoryContext struct. The disadvantage is mainly the increased storage.

The DB history doesn't fit the description

The reason I did it like this was that I imagined that reedline could be used as a library for other REPLs, where you still want a history but probably want a different kind of historical context (e.g. for a PostgreSQL history it might be the database the command ran in instead of the PWD).

If you think it makes more sense to integrate it more tightly with nushell (i.e. move the code @elferherrera) then the considerations change and it might make more sense to use separate DB columns instead (since it doesn't need to be as generic) - but it'll probably need some capability of migrations.

@stormasm you need to use cargo run --features=sqlite, run ignores the feature flags given to build. I can recommend SqliteBrowser and sqlite3 history.sqlite3 .dump to look at the db

I did look at the rsq repo a bit before writing my prototype, but then I kind of ignored it - I have a few comments about that as well, I'll join the discord.

@sholderbach
Copy link
Member

sholderbach commented Apr 2, 2022

I disagree with @elferherrera on the point about where the development of this should take place. And given that there will be changes necessary to the history interface, I favour this to take place in reedline (with the dependency behind the feature flag)

@fdncred
Copy link
Collaborator

fdncred commented Apr 6, 2022

@phiresky would you be willing to update this based on the conversations we've had here and in discord? The biggest changes are just to go away from the json context and to individual columns. Your other proposals sounds reasonable.

@phiresky
Copy link
Contributor Author

phiresky commented Apr 6, 2022

sure, i'll update it / propose a new version

@sholderbach
Copy link
Member

What is your idea for the API to provide the Session/Execution info before or after executing the command?

@phiresky
Copy link
Contributor Author

phiresky commented Apr 6, 2022

In this PR I implemented it with a update function that takes the historical context and returns a new one:

           history
               .update_last_command_context(|mut c| {
                    c.timestamp = SystemTime::now().duration_since(SystemTime::UNIX_EPOCH).unwrap().as_millis() as u64;
                    c
                })

That function is called once immediately before running the command, and then should be run again after the command is done (with exit status etc)

@fdncred
Copy link
Collaborator

fdncred commented Apr 15, 2022

@phiresky how's it going on the new implementation based on our discussions?

@phiresky
Copy link
Contributor Author

Hey! It's going okay. I'll try to get it to a PRable state this weekend. If I can't manage, I'll the current state into a draft pr. I'm not really sure what to do with the existing History trait yet. Maybe I'll remove it and add a HistoryCursor thing that handles the arrow navigation stuff (seems like it could be the same thing for all history databases)

@fdncred
Copy link
Collaborator

fdncred commented Apr 16, 2022

Thanks for the update. Not rushing you, just hadn't heard from you in a while. Good luck!

@sholderbach sholderbach marked this pull request as draft May 1, 2022 22:10
@sholderbach
Copy link
Member

Closing as #401 successfully landed

@sholderbach sholderbach closed this Jun 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants