Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the ability to parse/unparse files #127

Merged
merged 5 commits into from
Dec 10, 2022

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Dec 10, 2022

First of all, thank you again for this library -- it is really awesome.

Rationale

I am trying to implement "test script completion mode" in DataFusion. In order to do so, I would like to be able to update parsed Records with new output and then write them back to a file. Ideally the only changes in the file would be the new results and the original file will have all the comments, whitespace, etc.

DataFusion ticket: apache/datafusion#4570, apache/datafusion#4570 (comment)

Changes

  1. Add impl Display for Record (refactor unparse)
  2. Add tests for roundtrip parsing / unparsing
  3. Add Record::Whitespace so the whitespace in the original files can be reconstructed (found during testing)

Questions

  1. I had a question about whitespace after Record::Statement and RecordQuery which I will note inline
  2. If you accept this change, I would be happy to add more round trip parsing tests for the other example files

Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Copy link
Contributor Author

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sqllogictest/src/parser.rs Show resolved Hide resolved
// the statement parser includes a newline between the items but the display
// output does not, so add it here
// Not sure about this one
format!("{}\n", record)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this to be necessary to recreate the original .slt file.

I think it would more naturally go in the impl Display itself, however I didn't want to mess up any of your existing tests. What do you think @xxchan ?

Here is what the output looks like without this clause:

Click me
running 2 tests
test parser::tests::test_include_glob ... ok
test parser::tests::test_basic ... FAILED

failures:

---- parser::tests::test_basic stdout ----

*********
diff:
*********
statement ok
create table t(v1 int not null, v2 int not null, v3 int not null)
-   
statement ok
insert into t values(1,4,2), (2,3,3), (3,4,4), (4,3,5)

-   
query I
select * from example_basic
----
Alice
Bob
Eve
-   
statement ok
drop table t
-   
# The error message is: Hey you got FakeDBError!

statement error
give me an error
-   
statement error FakeDB
give me an error
-   
statement error (?i)fakedb
give me an error
-   
statement error Hey you
give me an error
-   
# statement is expected to fail with error: "Hey we", but got error: "Hey you got FakeDBError!"
# statement error Hey we
# give me an error

# query error is actually equivalent to statement error
query error Hey you
give me an error
-   
hash-threshold 3

query I
select * from example_basic
----
Alice
Bob
Eve
-   
hash-threshold 1

query I
select * from example_basic
----
3 values hashing to b5b44edac84d34d6af3be2a88bfae352
-   
hash-threshold 0

query I
select * from example_basic
----
Alice
Bob
Eve
-   


*********
output:
*********
statement ok
create table t(v1 int not null, v2 int not null, v3 int not null)
statement ok
insert into t values(1,4,2), (2,3,3), (3,4,4), (4,3,5)

query I
select * from example_basic
----
Alice
Bob
Eve
statement ok
drop table t
# The error message is: Hey you got FakeDBError!

statement error
give me an error
statement error FakeDB
give me an error
statement error (?i)fakedb
give me an error
statement error Hey you
give me an error
# statement is expected to fail with error: "Hey we", but got error: "Hey you got FakeDBError!"
# statement error Hey we
# give me an error

# query error is actually equivalent to statement error
query error Hey you
give me an error
hash-threshold 3

query I
select * from example_basic
----
Alice
Bob
Eve
hash-threshold 1

query I
select * from example_basic
----
3 values hashing to b5b44edac84d34d6af3be2a88bfae352
hash-threshold 0

query I
select * from example_basic
----
Alice
Bob
Eve

Copy link
Member

@xxchan xxchan Dec 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I understand the problem wrongly here.


I am aware of the trailing newline problem. AFAIK usually Display doesn't contain trailing newline so that users can control it. e.g., println won't contain an additional newline. So I decided to format Display without trailing newline, and in sqllogictest-bin, I append newline after a record, and insert newline between records.

if format {
record.unparse(outfile)?;
writeln!(outfile)?;
return Ok(());
}

if !*first_record {
writeln!(outfile)?;
}
update_record(outfile, &mut runner, record, format)
.await
.context(format!("failed to run `{}`", style(filename).bold()))?;
*first_record = false;

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW why it doesn't work to add newline to every record here in record_to_string?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it is because newline is consumed when parsing Statement and Query 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I think it may be reasonable to add trailing newline in Display for Query and Statement.. Since empty line is their terminator. Other records doesn't require empty lines in between.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh it is because newline is consumed when parsing Statement and Query 🤔

Yes, I think this is the reason.

Logically a Statement and Query must always be followed by a blank line (that newline isn't the equivalent of println). Much like a multi line comment contains a "\n" within it even though there impl Display

Or put another way, the following would not be parsed correctly.

query II
select a, b from foo
---
1 2
query II
select a, b from bar
---
3 4

Query must always end with a newline (other than the last one in the file):

query II
select a, b from foo
---
1 2

select a, b from bar
---
3 4

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW why it doesn't work to add newline to every record here in record_to_string?

You if you do that, then there are extra newlines between every record (as I am already using writeln)

// the statement parser includes a newline between the items but the display
// output does not, so add it here
// Not sure about this one
format!("{}\n", record)
Copy link
Member

@xxchan xxchan Dec 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: I understand the problem wrongly here.


I am aware of the trailing newline problem. AFAIK usually Display doesn't contain trailing newline so that users can control it. e.g., println won't contain an additional newline. So I decided to format Display without trailing newline, and in sqllogictest-bin, I append newline after a record, and insert newline between records.

if format {
record.unparse(outfile)?;
writeln!(outfile)?;
return Ok(());
}

if !*first_record {
writeln!(outfile)?;
}
update_record(outfile, &mut runner, record, format)
.await
.context(format!("failed to run `{}`", style(filename).bold()))?;
*first_record = false;

Comment on lines +632 to +633
/// Parses the specified file into Records, and ensures the
/// results of unparsing them are the same
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess current implementation is still not enough to ensure unparse(parse()) is identity,
because spaces can be anywhere in the records, not only between records.

e.g.,

    # aa
  query    II
...

Currently it will be unparsed to

# aa
query II

But do we really need identical reconstruction (It can be hard)? If not, maybe we can just test parse(unparse) is the same record?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But preserving newlines makes some sense.

Copy link
Contributor Author

@alamb alamb Dec 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we really need identical reconstruction (It can be hard)? If not, maybe we can just test parse(unparse) is the same record?

I don't really need exact reconstruction -- what I really want is to be able to "update" a file in place with expected output. If part of that update also ends up "normalizing" some of the commands (like collapsing query III query III) that would be fine with me

However I think removing all blank newlines makes the files harder to read so I would prefer to keep them in

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, maybe we can just test parse(unparse) is the same record?

Testing parse/unparse is a great idea -- I will add that

// the statement parser includes a newline between the items but the display
// output does not, so add it here
// Not sure about this one
format!("{}\n", record)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW why it doesn't work to add newline to every record here in record_to_string?

@@ -132,6 +132,7 @@ pub enum Record {
},
Condition(Condition),
Comment(Vec<String>),
Whitespace(String),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the comment below, what about renaming it to Newline(usize)?

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To summarize, I think we can:

  • Add trailing newline to Display for Statement and Query
  • Rename Whitespace to Newline
  • Test parse(unparse(parse(..))) instead of unparse(parse(..))

Others LGTM. Thanks a lot!

@alamb
Copy link
Contributor Author

alamb commented Dec 10, 2022

I tried to add a test that Records from the parsed result was the same as the original Records (code below for my own reference).

However, it turns out that Record doesn't implement PartialEq (for comparison) and when I tried to derive it failed as regex can't be derived. If you think it would be a valuable test, I can write when this PR is merged.

+
+        // The orignal and parsed records should be logically requivalent
+        let output_file = tempfile::NamedTempFile::new()
+            .expect("Error creating tempfile");
+        output_file.write_all(output_contents.as_bytes())
+            .expect("Unable to write file");
+
+        let output_path = output_file.into_temp_path();
+        let reparsed_records = parse(&output_path.to_string_lossy());
+        assert_eq!(records, reparsed_records);
+

Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
Signed-off-by: Andrew Lamb <andrew@nerdnetworks.org>
@alamb
Copy link
Contributor Author

alamb commented Dec 10, 2022

Thank you @xxchan !

To summarize, I think we can:

  1. Add trailing newline to Display for Statement and Query

Done

Rename Whitespace to Newline

Done

Test parse(unparse(parse(..))) instead of unparse(parse(..))

I think there is value to testing unparse(parse()) as I described in #127 (comment). However, I will remove this test if you disagree.

I would like to add the test for parse(unparse(parse())) as a subsequent PR if that is OK with you to keep this PR smaller as explained in #127 (comment)

Copy link
Member

@xxchan xxchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@alamb
Copy link
Contributor Author

alamb commented Dec 10, 2022

Here is the proposed mechanism for testing parse(unparse(parse())) is identity: #129

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants