New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visual optimisation for the CSV import process #1072

Open
justinclift opened this Issue Jul 31, 2017 · 24 comments

Comments

Projects
None yet
3 participants
@justinclift
Copy link
Member

justinclift commented Jul 31, 2017

[a follow up to #1002] 馃槃

With @iKlsR's recent improvement for the CSV process, I was wondering how easy would it be to have each entry be removed from the list as it's import completes.

That way people could see the list shrink visibly onscreen, and the left over list would just still be there in the window when all the selected ones are done, ready for the user to choose new options and go again.

@iKlsR mentioned it seems doable, so creating this issue as a placeholder. 馃槃

I'm opening this issue because:

  • DB4S needs a feature

I'm using DB4S version:

  • 3.10.0-beta1

MKleusberg added a commit that referenced this issue Sep 19, 2017

Visual optimisation for the CSV import process
When importing multiple CSV files at once, remove each entry from the
list of CSV files as its import completes. This way people can see the
list shrink visibly onscreen.

Also don't close the window if there are still files left to be
imported. This allows the user to import unchecked files, too, probably
using different settings.

See issue #1072.
@MKleusberg

This comment has been minimized.

Copy link
Member

MKleusberg commented Sep 19, 2017

Done 馃槃 @justinclift Can you give it a quick test?

Good idea by the way 馃槃

@justinclift

This comment has been minimized.

Copy link
Member Author

justinclift commented Sep 19, 2017

@MKleusberg Thanks. Yep, will do shortly. Looking into another issue first though. 馃槃

Related this to, any interest in #1121? That one's really annoying (as an end user), with the only workaround being to use the command line sqlite3 util instead. Guessing it'll be a simple fix too. 馃槃

@justinclift

This comment has been minimized.

Copy link
Member Author

justinclift commented Sep 19, 2017

Got around to trying this before anything else after all. 馃槈

In testing so far, the visual list shrinking concept does seem like a good approach.

Some important problems cropped up in testing:

  • If "Separate tables" isn't enabled, the prompt for There is already a table of that name. Do you want to import the data into it? appears every time. Oops. 馃槈
    • Interestingly, even when answering No to that, the entry for the csv file still disappears from the list. It probably shouldn't, instead being left there (maybe getting unticked?) so the user can come back to it.
  • Massive slowdown as the imports progressed. The first few CSV files import quickly (subsecond), with each subsequent import getting slower and slower. After about 30 or so it's noticeably slower, and keeps on getting slower. Sometimes DB4S would crash after 30-40 imports too. Not consistently though.
  • Cancelling out of importing crashes DB4S (every time)
  • Clicking on checkboxes for the csv entries changes the "Table name" field even if the "Separate tables" checkbox is disabled and I've already set a Table name.
    • This one caught me out a few times, as the "Table name" field is the first thing in the UI so I set that first... then wonder why everything is importing into a wrong table name when doing the import. Only to realise it's because I subsequently deselected a few entries from the list (which auto changed the table name to save into)

As a side thing, it'd be a useful UI tweak in the dialog to have that divider between the csv file list and the preview be resizable by the user. For example, when the window is fairly short the preview only takes up a few lines:

db4s_import_csv_short

However when the window is taller, the preview takes up quite a few lines. Being able to drag a divider to show more of the CSV list and less of the preview would be nifty:

db4s_import_csv_long

@justinclift

This comment has been minimized.

Copy link
Member Author

justinclift commented Sep 19, 2017

Ran through this several more times, and found some info.

  • It's not the CSV import itself which slows down. It's that the time between subsequent files being imported gets longer and longer.
    • eg When DB4S is doing the post-import stuff such as adding the entry to the DB structure tab (etc). this gets a lot slower, pretty quickly. Maybe don't run that at all until the csv importing is done?
  • The crash during imports is easily (and totally consistently) trigger-able by pressing a key while the import dialog is showing up. Doesn't seem to trigger for the first 10-15 progress dialogs, but pressing a key any time after that will 100% kill it. (event storm problem?)
    • This might be the crash that I was seeing as "when cancelling the import" too, as I was doing that by pressing the Esc key (many times) to try and break out of the progress dialog loop.

On the plus side, memory use is keeping consistent. So that recent change is definitely working well. 馃槃

@MKleusberg

This comment has been minimized.

Copy link
Member

MKleusberg commented Sep 19, 2017

Cool, thanks for testing 馃槃 I'll look into these issues over the next couple of days.

@MKleusberg

This comment has been minimized.

Copy link
Member

MKleusberg commented Sep 27, 2017

  • If "Separate tables" isn't enabled, the prompt for There is already a table of that name. Do you want to import the data into it? appears every time. Oops. 馃槈
    • Interestingly, even when answering No to that, the entry for the csv file still disappears from the list. It probably shouldn't, instead being left there (maybe getting unticked?) so the user can come back to it.

This should be fixed now. See issue #1121. Clicking 'No' still removes the entry from the list which I think is okay now that we have a Cancel button which doesn't do that. So 'No' is like answering the question in the message box and then treating the file as done, and 'Cancel' is like not even answering the question in the message box and stopping the process, if that makes sense 馃槈

@MKleusberg

This comment has been minimized.

Copy link
Member

MKleusberg commented Sep 27, 2017

  • Massive slowdown as the imports progressed. The first few CSV files import quickly (subsecond), with each subsequent import getting slower and slower. After about 30 or so it's noticeably slower, and keeps on getting slower. Sometimes DB4S would crash after 30-40 imports too. Not consistently though.

I can't seem to reproduce this. No matter whether I import into separate tables or not it just works, no noticeable slowdown or crash. I guess this might have been fixed by 28446a4 but I'm not sure, especially because that commit is two days older than your comment. But it might be worth checking again, with and without that commit, just to be sure 馃槃

@justinclift

This comment has been minimized.

Copy link
Member Author

justinclift commented Sep 27, 2017

Cool. I'll try it again now with separate tables, and see what happens.

@MKleusberg

This comment has been minimized.

Copy link
Member

MKleusberg commented Sep 27, 2017

Can you try to reproduce that crash which happens when pressing buttons during the import when a debugger is attached? It's not happening for me either 馃槮

@justinclift

This comment has been minimized.

Copy link
Member Author

justinclift commented Sep 27, 2017

Ahhh, good thinking. Yeah, I'll try that too.

With the slow down, it's definitely happening still. Did you try with this?

Maybe there's something special/weird with the Backblaze data set which DB4S doesn't like? Not seeing how though.

Hmmm, I can try on my OSX desktop later on tonight, and if the problem happens there too I'll make a screen recording of the capture process so you can see the slow down in action, in case that helps with figuring it out. 馃槃 Useful to try?

@MKleusberg

This comment has been minimized.

Copy link
Member

MKleusberg commented Sep 27, 2017

Yes, already tried these files. I haven't imported them all yet, only 100 or so but from what you write, that should have been enough. The screen capture might turn out useful, so if it's no bother to you, please do so 馃槃 You can also try running in a debugger and breaking the execution while it's in between parsing two files and taking too much time. Would be interesting to see in what part of the code we're in at this time.

@justinclift

This comment has been minimized.

Copy link
Member Author

justinclift commented Sep 27, 2017

Will do. 馃槃

MKleusberg added a commit that referenced this issue Sep 27, 2017

Add splitter to the Import CSV dialog
This adds a splitter between the CSV file list and the file preview in
the Import CSV dialog.

See issue #1072.
@MKleusberg

This comment has been minimized.

Copy link
Member

MKleusberg commented Sep 27, 2017

As a side thing, it'd be a useful UI tweak in the dialog to have that divider between the csv file list and the preview be resizable by the user. For example, when the window is fairly short the preview only takes up a few lines:

Just pushed a commit for that 馃槂

@justinclift

This comment has been minimized.

Copy link
Member Author

justinclift commented Sep 27, 2017

Yep, that's perfect now too. Excellent. 馃榾

import csv splitter working

MKleusberg added a commit that referenced this issue Sep 29, 2017

@MKleusberg

This comment has been minimized.

Copy link
Member

MKleusberg commented Sep 29, 2017

  • Clicking on checkboxes for the csv entries changes the "Table name" field even if the "Separate tables" checkbox is disabled and I've already set a Table name.
    • This one caught me out a few times, as the "Table name" field is the first thing in the UI so I set that first... then wonder why everything is importing into a wrong table name when doing the import. Only to realise it's because I subsequently deselected a few entries from the list (which auto changed the table name to save into)

I've just pushed a commit that slightly improves the situation here. Clicking the checkboxes doesn't reset the table name anymore now. However, clicking on the row and thus selecting it, still does. I might change that later if needed 馃槃

@justinclift

This comment has been minimized.

Copy link
Member Author

justinclift commented Sep 29, 2017

Cool. 馃槃

Just tried it... and it's better. Still seems really easy to click on the row instead of the checkbox though, even when being careful (just had that happen 3 times in a row, and yep, was trying not to click on the wrong bit each time).

Either way though, this is probably good enough for now unless you're feeling motivated to keep at it. 馃槃

@MKleusberg

This comment has been minimized.

Copy link
Member

MKleusberg commented Sep 30, 2017

Meh, I think we could leave it as it is and change it if somebody complains 馃槈

@justinclift

This comment has been minimized.

Copy link
Member Author

justinclift commented Sep 30, 2017

Works for me. 馃槃

@MKleusberg

This comment has been minimized.

Copy link
Member

MKleusberg commented Sep 30, 2017

Cool, closing this issue then 馃槃

If anybody sees this and gets really mad at us for closing this issue, they're always welcome to reopen it 馃槈

@MKleusberg MKleusberg closed this Sep 30, 2017

@MKleusberg MKleusberg reopened this Sep 30, 2017

@MKleusberg

This comment has been minimized.

Copy link
Member

MKleusberg commented Sep 30, 2017

Oops, forgot about all the other problem 馃槅

@justinclift

This comment has been minimized.

Copy link
Member Author

justinclift commented Sep 30, 2017

Interestingly, although on OSX the slowdown as subsequent CSV tables are imported definitely still happens... visually it's showing up differently.

On OSX, the point where the delay seems to be increasing is when the import CSV progress dialog is still showing, but at or near 100% completion (for each individual dialog).

With the Linux (Fedora 25, Xfce), the increasing delay was in between each progress dialog appearing.

I'll attempt to get a screen recording of this happening next... 馃槃

@justinclift

This comment has been minimized.

Copy link
Member Author

justinclift commented Sep 30, 2017

Oh... the CSV import completely locked up on the 2013-08-23 CSV table. (I selected the complete 2013 data set, ordered by date from earliest to latest)

Looking at the memory use in top, it's nothing strange. Hovering at ~165MB, which seems right now we're doing streaming imports, so it's not that.

Will get a recording happening shortly.

@justinclift

This comment has been minimized.

Copy link
Member Author

justinclift commented Sep 30, 2017

These recordings are a bit large in file size (~100MB), but anything smaller and the recording software (ancient) just made them blurry.

This one shows an import of the 2013 Backblaze CSV's with "separate tables" enabled. It slows down over time then completely locks up on the 2013-08-23 table:

This one is the same files being imported, but all going into one big table. It completes ok, and doesn't seem to have any increasing slowdown over time. There are a few pauses occasionally (disk flushing maybe?), but that's about it:

Any ideas? 馃槃

@justinclift

This comment has been minimized.

Copy link
Member Author

justinclift commented Sep 30, 2017

As a data point, the crash-when-pressing-a-key-during-multi-file-CSV-import doesn't seem to happen on OSX. I'll try it again later back on Linux, with a debugger attached.

@chrisjlocke chrisjlocke added the import label Feb 12, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment