New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CSV Import #418
CSV Import #418
Conversation
Are you on OSX? It might have something to do with database credentials. Here's my
Make sure you have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Posted a review. I will also try to run this branch locally some time this weekend.
pkg/api/api.go
Outdated
req.ParseMultipartForm(0) | ||
table := req.FormValue("table") | ||
|
||
field_delimiter, err := ParseFieldDelimiter(req.FormValue("field_delimiter")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use camelCase for vars: fieldDelimiter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pkg/api/api.go
Outdated
field_delimiter, err := ParseFieldDelimiter(req.FormValue("field_delimiter")) | ||
if err != nil { | ||
log.Print(err) | ||
c.JSON(400, Error{err.Error()}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use badRequest(c, err)
instead of c.JSON(400, Error{err.Error()})
. Same applies to the rest of the error checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
pkg/api/api.go
Outdated
createQuery := statements.CreateNewTableQuery(table, header) | ||
insertQuery := statements.GenerateBulkInsertQuery(table, header, len(data)) | ||
|
||
_, err = db.NewTable(createQuery) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add a db.CreateTable
/ db.DropTable
functions instead of using db.NewTable
which does not do any specific work, you can use db.query
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't understand this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, i didn't realize at which package i was looking at a the time. db.query
is not available in this case.
I suggest you rename NewTable
func into something else, like QueryWithoutHistory
, QueryInternal
, or Exec
so that we can use it for any queries that don't need to be recorded.
return | ||
} | ||
|
||
result, err := db.BulkInsert(insertQuery, statements.Flatten(data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should cleanup that newly created table (and only if it was created during import) if the insert fails. Alternatively you can perform the import in a single transaction so if something fails the whole import is rolled back and there would not be any unwanted artifacts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively you can perform the import in a single transaction
There is a contradiction. Huge transactions are bad. So if a table is really huge, it is impractical to populate the entire table in a single transaction. But if we use multiple transactions, then to clear the table we must issue an additional "truncate table" or "drop table" statement explicitly. Then, what if the import failed due to lost sql server connection? In this case we can't enforce cleanup. So the state of the table after failure depends on the nature of failure.
So if I was to decide, I would not do that. If something failed, I would leave cleaning table up to the user. I would just document that and that's it. So, failures get simpler to understand: whenever failure happened, user knows that (s)he must clean the table by himself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im fine with your proposal, that approach would be similar to the sql import (using psql
) and in case of an error it's up to the user to decide what to do next.
I'm on Debian Stretch, but I did some manual setup when installing postgres (forgot them already). I think first of all I'll reinstall postgress afresh in an official way, and then retry. |
Thanks, now tests are working. I also had issues with my non-US locale. I tried to avoid that and did some changes like SET lc_messages='C', it helps for some cases. If you like, I can add the commit with that to this PR, or as a separate PR. But I was unable to change psql's (client side) messages language, so So I'll try to fix the rest of your notes. What goes to cleaning the table, I still suggest not to do that. Just write in the import that we are adding records, instead of calling it "import". But if you confirm your decision, I'll implement it. |
Haha, I found this one: https://www.postgresql.org/docs/9.6/sql-copy.html |
So something simple as So one way to go is to simply run psql from stdin. We check that fields match, generate field order so that csv and postgres table match, direct file contents to stdin of psql and spawn psql from pgweb. We can fail however under stupid platforms like ms windows where stdin can be broken sometimes (e.g. fail to support utf-8), or if there are some bugs/limitations related to spawning. So furhter experiments are required. Can you please give a feedback about this approach? |
Fixed all the issues that I was able to understand and which I believe are really issues. The only thing that is missing is tests. |
pkg/api/api.go
Outdated
if utf8.RuneCountInString(fieldDelimiter) != 1 { | ||
return '?', errors.New("field delimiter must be a single character (comma is a standard one, CR and LF and 0xFFFD are not allowed)") | ||
} | ||
delimiter_char, delimiter_char_size := utf8.DecodeRuneInString(fieldDelimiter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs to be camelcase'd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
pkg/api/routes.go
Outdated
@@ -52,4 +52,5 @@ func SetupRoutes(router *gin.Engine) { | |||
api.GET("/history", GetHistory) | |||
api.GET("/bookmarks", GetBookmarks) | |||
api.GET("/export", DataExport) | |||
api.POST("/importCSV", DataImportCSV) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you reformat to /import/csv
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I posted a few more notes. As for |
Seem to fix everything. Also added refreshing of table lists and some simple input data validation. Most questionalbe is the change in css files - I removed a fixed alignment for output pane. All seem to be ok at a glance, maybe I overlooked something. Without that change import button didn't work. |
Added an integration test. Maybe it is horrible due to lack of understaning of pgweb's globals, but I made it work finally :) |
I dislike this way of testing, but I don't know how to do it better. What is actually done here is a fork of main module. After the fork, application was initialized so that testing scenario has access to both web API (can send requests and get the response), and to the database directly. When testing web api, session stuff is kind of faked, so that there is no login / call api/ logout sequence. IMO this is messy. What I really want to do here is:
This way it is a sort of API testing. There are options: i) make the thin layer between main() and app functionality, like ii) use Docker. I think that's an overkill iii) write a separated golang application and use a makefile to run it (we will be unable to use Your ideas? |
Yeah, tests are kind of tricky due to the way how pgweb is written (this'll change with 1.0 version). |
Actually I'm in progress to test it via API endpoint and it seems that I'm not very far from obtaining a prettier test. I personally prefer integration tests because they give more value per line of test code :) I hope to finish that today and maybe will do another squashed PR. |
Well, now I'm satisfied with the result. Didn't do squash (too tired today), and recognized that there were no need to export initOptions. Also not sure about removing "absolute" from some s.
|
Superseded by #422 |
Hi! I picked up a closed PR #308 for issue #281 and fixed all issues you mentioned. The only problem is that I'm unable to do
make text
. Make asks password several times and fails. I changed postgres user password, but it didn't help. I guess something is wrong with my setup. Application itself works fine. Still there are not system tests for CSV import, but I'd like to add them only when I'm sure I can run test suite. Can you please elaborate how to set things up?