Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: parser handling comments and empty lines #446

Merged
merged 14 commits into from
Jan 26, 2023
Merged

Conversation

mfridman
Copy link
Collaborator

@mfridman mfridman commented Jan 20, 2023

Fix #226, fix #198. Preserve #200 and #372

There are two ways we can approach this:

  1. Require users to enclose such statements with -- +goose StatementBegin and -- +goose StatementEnd. And we guarantee the statement will have comments preserved.
  2. Always preserve leading comments ^-- within a statement before a delimiter (;) is found.

@mfridman mfridman changed the title fix: parser with leading commnets fix: parser with leading comments Jan 21, 2023
@@ -11,6 +11,10 @@ BEGIN
v_url ALIAS FOR $3;
BEGIN '';

--
Copy link
Collaborator Author

@mfridman mfridman Jan 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes the issue reported here: #198 (comment)

Once we're in the middle of a statement, we preserve comments.

Furthermore, we preserve whitespace which was reported here: #200 (comment)

And initially fixed here: #372 (I've captured this test)

@@ -16,4 +16,27 @@ L48zPl/CFvA+KJJ6LklxfwWeVDQ+ve2OIW0B1uLhR/MsoYbDQztbgIayg6ieMO/KlQIDAQAB
');
-- +goose StatementEnd

INSERT INTO ssh_keys (id, "publicKey") VALUES (2, '-----BEGIN OPENSSH PRIVATE KEY-----
Copy link
Collaborator Author

@mfridman mfridman Jan 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This statement is not wrapped with -- +goose Statement, and we validate that the final line (below) is correctly included in parsed statement.

-----END OPENSSH PRIVATE KEY-----

var buf bytes.Buffer
scanBuf := bufferPool.Get().([]byte)
defer bufferPool.Put(scanBuf)
scanBufPtr := bufferPool.Get().(*[]byte)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes an issue staticcheck found:

SA6002: argument should be pointer-like to avoid allocations (staticcheck)

Nice explanation from the docs.

A sync.Pool is used to avoid unnecessary allocations and reduce the amount of work the garbage collector has to do.

When passing a value that is not a pointer to a function that accepts an interface, the value needs to be placed on the heap, which means an additional allocation. Slices are a common thing to put in sync.Pools, and they’re structs with 3 fields (length, capacity, and a pointer to an array). In order to avoid the extra allocation, one should store a pointer to the slice instead.

@@ -161,7 +161,7 @@ func ParseSQLMigration(r io.Reader, direction bool) (stmts []string, useTx bool,
continue
}
default:
return nil, false, fmt.Errorf("failed to parse migration: unexpected state %q on line %q, see https://github.com/pressly/goose#sql-migrations", stateMachine, line)
return nil, false, fmt.Errorf("failed to parse migration: unexpected state %d on line %q, see https://github.com/pressly/goose#sql-migrations", stateMachine, line)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The %q verb prints it out as hex, changing to %d so it's the int value.

- unexpected state '\x00' on line ..
+ unexpected state '0' on line

@mfridman mfridman added this to the v3.9.0 milestone Jan 22, 2023
@mfridman mfridman changed the title fix: parser with leading comments fix: parser handling comments and empty lines Jan 23, 2023
END;
$$ LANGUAGE plpgsql;

-- 3 this comment will be preserved
Copy link
Collaborator Author

@mfridman mfridman Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In isolation, it looks like we should not be including these comments, but I think it's defensible to say this is a user error and they should avoid putting trailing comments after the statement but before the final -- +goose StatementEnd annotation.

-- +goose StatementBegin
[statement ...];
-- 3 this comment will be preserved
  -- 4 this comment will be preserved


-- +goose StatementEnd

Copy link
Collaborator Author

@mfridman mfridman Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do a second pass over the complex statement and trim any empty lines/comments after the final semicolon. I'll keep this in mind, likely in a follow-up PR.

Copy link
Collaborator Author

@mfridman mfridman Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added logic in 98ca499 that does a final cleanup of the statement. Effectively this:

func cleanupStatement(input string) string {
	if n := strings.LastIndex(input, ";"); n > 0 {
		return input[:n+1]
	}
	return input
}

So that this input:

-- +goose StatementBegin




-- 1 this comment will NOT be preserved
  -- 2 this comment will NOT be preserved


CREATE FUNCTION do_something(sql TEXT) RETURNS INTEGER AS $$
BEGIN
  -- initiate technology
  PERFORM something_or_other(sql);

  -- increase technology
  PERFORM some_other_thing(sql);

  -- technology was successful
  RETURN 1;
END;
$$ LANGUAGE plpgsql;

-- 3 this comment will NOT be preserved
  -- 4 this comment will NOT be preserved


-- +goose StatementEnd

Will end up as the following statement:

CREATE FUNCTION do_something(sql TEXT) RETURNS INTEGER AS $$
BEGIN
  -- initiate technology
  PERFORM something_or_other(sql);

  -- increase technology
  PERFORM some_other_thing(sql);

  -- technology was successful
  RETURN 1;
END;
$$ LANGUAGE plpgsql;

@mfridman
Copy link
Collaborator Author

Alright @VojtechVitek , pretty happy with how this ended up. It's a lot easier to test locally and the logic fixes some edge cases with the parser around leading/trailing empty lines and comments, as well as comments within statements that should be preserved.

Copy link
Collaborator

@VojtechVitek VojtechVitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

that was a tricky one! nice work 🎉

@mfridman mfridman merged commit bf26560 into master Jan 26, 2023
@mfridman mfridman deleted the gh-226-parser branch January 26, 2023 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Multi-line SQL data containing lines with leading --'s is corrupted. not comments added
2 participants