Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV Import #418

Closed
wants to merge 41 commits into from
Closed

CSV Import #418

wants to merge 41 commits into from

Conversation

budden
Copy link

@budden budden commented Feb 15, 2019

Hi! I picked up a closed PR #308 for issue #281 and fixed all issues you mentioned. The only problem is that I'm unable to do make text. Make asks password several times and fails. I changed postgres user password, but it didn't help. I guess something is wrong with my setup. Application itself works fine. Still there are not system tests for CSV import, but I'd like to add them only when I'm sure I can run test suite. Can you please elaborate how to set things up?

@sosedoff
Copy link
Owner

Are you on OSX? It might have something to do with database credentials. Here's my pg_hba.conf file:

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust

Make sure you have postgres user created. Other than that you should be able to run make test without any problems.

Copy link
Owner

@sosedoff sosedoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posted a review. I will also try to run this branch locally some time this weekend.

pkg/api/api.go Outdated
req.ParseMultipartForm(0)
table := req.FormValue("table")

field_delimiter, err := ParseFieldDelimiter(req.FormValue("field_delimiter"))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use camelCase for vars: fieldDelimiter

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

pkg/api/api.go Outdated
field_delimiter, err := ParseFieldDelimiter(req.FormValue("field_delimiter"))
if err != nil {
log.Print(err)
c.JSON(400, Error{err.Error()})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use badRequest(c, err) instead of c.JSON(400, Error{err.Error()}). Same applies to the rest of the error checking.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

pkg/api/api.go Outdated
createQuery := statements.CreateNewTableQuery(table, header)
insertQuery := statements.GenerateBulkInsertQuery(table, header, len(data))

_, err = db.NewTable(createQuery)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add a db.CreateTable / db.DropTable functions instead of using db.NewTable which does not do any specific work, you can use db.query instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't understand this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, i didn't realize at which package i was looking at a the time. db.query is not available in this case.
I suggest you rename NewTable func into something else, like QueryWithoutHistory, QueryInternal, or Exec so that we can use it for any queries that don't need to be recorded.

return
}

result, err := db.BulkInsert(insertQuery, statements.Flatten(data))
Copy link
Owner

@sosedoff sosedoff Feb 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should cleanup that newly created table (and only if it was created during import) if the insert fails. Alternatively you can perform the import in a single transaction so if something fails the whole import is rolled back and there would not be any unwanted artifacts.

Copy link
Author

@budden budden Feb 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively you can perform the import in a single transaction

There is a contradiction. Huge transactions are bad. So if a table is really huge, it is impractical to populate the entire table in a single transaction. But if we use multiple transactions, then to clear the table we must issue an additional "truncate table" or "drop table" statement explicitly. Then, what if the import failed due to lost sql server connection? In this case we can't enforce cleanup. So the state of the table after failure depends on the nature of failure.

So if I was to decide, I would not do that. If something failed, I would leave cleaning table up to the user. I would just document that and that's it. So, failures get simpler to understand: whenever failure happened, user knows that (s)he must clean the table by himself.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im fine with your proposal, that approach would be similar to the sql import (using psql) and in case of an error it's up to the user to decide what to do next.

@budden
Copy link
Author

budden commented Feb 16, 2019

Are you on OSX?

I'm on Debian Stretch, but I did some manual setup when installing postgres (forgot them already). I think first of all I'll reinstall postgress afresh in an official way, and then retry.

@budden
Copy link
Author

budden commented Feb 16, 2019

Here's my pg_hba.conf

Thanks, now tests are working. I also had issues with my non-US locale. I tried to avoid that and did some changes like SET lc_messages='C', it helps for some cases. If you like, I can add the commit with that to this PR, or as a separate PR. But I was unable to change psql's (client side) messages language, so
finally I had to change my Linux's locale.

So I'll try to fix the rest of your notes. What goes to cleaning the table, I still suggest not to do that. Just write in the import that we are adding records, instead of calling it "import". But if you confirm your decision, I'll implement it.

@budden
Copy link
Author

budden commented Feb 16, 2019

Haha, I found this one: https://www.postgresql.org/docs/9.6/sql-copy.html
It allows to feed data via stdin and can read csv format. So it looks like we should not parse csv ourselves, but instead delegate the file contents to the psql via stdin...
I'll experiment with it.

@budden
Copy link
Author

budden commented Feb 16, 2019

So something simple as
/y/go/src/github.com/sosedoff/pgweb$ cat /y/a.csv | psql -c "\\copy zzz (text,id) from stdin with (format csv, header true)" does the job. \copy command is advertized to be a counterpart of copy which one can run from psql if having no permissions to call copy. Also one can use just copy, w/o backslashes - it works either.

So one way to go is to simply run psql from stdin. We check that fields match, generate field order so that csv and postgres table match, direct file contents to stdin of psql and spawn psql from pgweb. We can fail however under stupid platforms like ms windows where stdin can be broken sometimes (e.g. fail to support utf-8), or if there are some bugs/limitations related to spawning. So furhter experiments are required. Can you please give a feedback about this approach?

@budden
Copy link
Author

budden commented Feb 16, 2019

Fixed all the issues that I was able to understand and which I believe are really issues. The only thing that is missing is tests.

pkg/api/api.go Outdated
if utf8.RuneCountInString(fieldDelimiter) != 1 {
return '?', errors.New("field delimiter must be a single character (comma is a standard one, CR and LF and 0xFFFD are not allowed)")
}
delimiter_char, delimiter_char_size := utf8.DecodeRuneInString(fieldDelimiter)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be camelcase'd

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -52,4 +52,5 @@ func SetupRoutes(router *gin.Engine) {
api.GET("/history", GetHistory)
api.GET("/bookmarks", GetBookmarks)
api.GET("/export", DataExport)
api.POST("/importCSV", DataImportCSV)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you reformat to /import/csv?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@sosedoff
Copy link
Owner

I posted a few more notes. As for COPY with psql - im okay with that approach as long as it works with SSH tunnel (see database dump feature)

@budden
Copy link
Author

budden commented Feb 21, 2019

Seem to fix everything. Also added refreshing of table lists and some simple input data validation. Most questionalbe is the change in css files - I removed a fixed alignment for output pane. All seem to be ok at a glance, maybe I overlooked something. Without that change import button didn't work.
Didn't add the test for the entire import. I didn't find an essential api tests so there is no infrastructure, and so far I'm not smart enough to generate one quickly. Any hints? What goes to COPY, I decided not to go into it now - it is just another project.

@budden
Copy link
Author

budden commented Feb 22, 2019

Added an integration test. Maybe it is horrible due to lack of understaning of pgweb's globals, but I made it work finally :)

@budden
Copy link
Author

budden commented Feb 24, 2019

I dislike this way of testing, but I don't know how to do it better. What is actually done here is a fork of main module. After the fork, application was initialized so that testing scenario has access to both web API (can send requests and get the response), and to the database directly. When testing web api, session stuff is kind of faked, so that there is no login / call api/ logout sequence. IMO this is messy. What I really want to do here is:

  • create and import the DB
  • run pgweb as an application from the script (with Exec.Command and pipes)
  • login into it
  • call /import/csv api to import data
  • call /query api
  • logout
  • drop db

This way it is a sort of API testing.
However, there are questions. I don't see how to do that with an embedded testing framework, because I need to build pgweb before running the test.

There are options:

i) make the thin layer between main() and app functionality, like func main() { return Main() }. For testing purposes, run Main in a goroutine, and also run a test script in a main thread. Test script only communicates with Main via HTTP and when done sends termination signal.

ii) use Docker. I think that's an overkill

iii) write a separated golang application and use a makefile to run it (we will be unable to use go test)

Your ideas?

@sosedoff
Copy link
Owner

Yeah, tests are kind of tricky due to the way how pgweb is written (this'll change with 1.0 version).
I'd be interested in a test in client_test.go that handles the actual CSV import, don't bother too much with testing the API endpoint. If you're not sure how to proceed - do the full squash of all your changes and i can add the tests.

@budden
Copy link
Author

budden commented Feb 24, 2019

Actually I'm in progress to test it via API endpoint and it seems that I'm not very far from obtaining a prettier test. I personally prefer integration tests because they give more value per line of test code :) I hope to finish that today and maybe will do another squashed PR.

@budden
Copy link
Author

budden commented Feb 24, 2019

Well, now I'm satisfied with the result. Didn't do squash (too tired today), and recognized that there were no need to export initOptions. Also not sure about removing "absolute" from some

s.

@budden budden closed this Feb 25, 2019
@budden
Copy link
Author

budden commented Feb 25, 2019

Superseded by #422

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants