Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the Ruby code generator to emit a serialized proto instead of the DSL #12319

Closed
wants to merge 13 commits into from

Conversation

haberman
Copy link
Member

This PR removes the DSL from the code generator, in anticipation of splitting the DSL out into a separate package.

Given a .proto file like:

syntax = "proto3";

package pkg;

message TestMessage {
  optional int32 i32 = 1;
  optional TestMessage msg = 2;
}

Generated code before:

# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: protoc_explorer/main.proto

require 'google/protobuf'

Google::Protobuf::DescriptorPool.generated_pool.build do
  add_file("test.proto", :syntax => :proto3) do
    add_message "pkg.TestMessage" do
      proto3_optional :i32, :int32, 1
      proto3_optional :msg, :message, 2, "pkg.TestMessage"
    end
  end
end

module Pkg
  TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass
end

Generated code after:

# frozen_string_literal: true
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: test.proto

require 'google/protobuf'
  
descriptor_data = "\n\ntest.proto\x12\x03pkg\"S\n\x0bTestMessage\x12\x10\n\x03i32\x18\x01 \x01(\x05H\x00\x88\x01\x01\x12\"\n\x03msg\x18\x02 \x01(\x0b\x32\x10.pkg.TestMessageH\x01\x88\x01\x01\x42\x06\n\x04_i32B\x06\n\x04_msgb\x06proto3"
begin
  Google::Protobuf::DescriptorPool.generated_pool.add_serialized_file(descriptor_data)
rescue TypeError => e
  # <compatibility code, see below>
end

module Pkg
  TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass
end

This change fixes nearly all remaining conformance problems that existed previously. This is a side effect of moving from the DSL (which is lossy) to a serialized descriptor (which preserves all information).

Backward Compatibility

This change should be 100% compatible with Ruby Protobuf >= 3.18.0, released in Sept 2021. Additionally, it should be compatible with all existing users and deployments. However there is some special compatibility code I inserted to achieve this level of backward compatibility.

Without the compatibility code, there is an edge case that could break backward compatibility. The existing code is lax in a way that the new code would be more strict.

When we use a full serialized descriptor, it will contain a list of all .proto files imported by this file (whereas the DSL never added dependencies properly):

// Names of files imported by this file.
repeated string dependency = 3;

add_serialized_file will verify that all dependencies listed in the descriptor were previously added with add_serialized_file. Generally that should be fine, because the generated code will contain Ruby require statements for all dependencies, and the descriptor will fail to load anyway if the types we depend on were not previously defined in the DescriptorPool.

But there is a potential for problems if there are ambiguities around file paths. For example, consider the following scenario:

// foo/bar.proto

syntax = "proto2";

message Bar {}
// foo/baz.proto

syntax = "proto2";

import "bar.proto";

message Baz {
  optional Bar bar = 1;
}

If you invoke protoc like so, it will work correctly:

$ protoc --ruby_out=. -Ifoo foo/bar.proto foo/baz.proto
$ RUBYLIB=. ruby baz_pb.rb

However if you invoke protoc like so, and didn't have any compatibility code, it would fail to load:

$ protoc --ruby_out=. -I. -Ifoo foo/baz.proto
$ protoc --ruby_out=. -I. -Ifoo foo/bar.proto
$ RUBYLIB=foo ruby foo/baz_pb.rb
foo/baz_pb.rb:10:in `add_serialized_file': Unable to build file to DescriptorPool: Depends on file 'bar.proto', but it has not been loaded (Google::Protobuf::TypeError)
	from foo/baz_pb.rb:10:in `<main>'

The problem is that bar.proto is being referred to by two different canonical names: bar.proto and foo/bar.proto. This is a user error: each import should always be referred to by a consistent full path. Hopefully user errors of this sort are rare, but it is hard to know without trying.

The code in this PR prints a warning using warn if we detect that this edge case has occurred. We will plan to remove this compatibility code in the next major version.

@haberman haberman requested review from a team as code owners March 23, 2023 00:42
@haberman haberman requested review from acozzette and esorot and removed request for a team March 23, 2023 00:42
@haberman haberman added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Mar 23, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Mar 23, 2023
Copy link
Contributor

@dazuma dazuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks reasonable in general. (Thank you in particular for using an escaped string, instead of __END__ -- that's been a headache for the current descriptor_pb.rb.)

Are there golden outputs where we can actually see what this looks like (including the compatibility code)?

@dazuma
Copy link
Contributor

dazuma commented Apr 6, 2023

@haberman What's the next step on this?

@haberman
Copy link
Member Author

I'm going to try to improve the error message for the case that the fallback code kicks in.

@haberman
Copy link
Member Author

I improved the error message when the compatibility code is triggered.

The message now looks something like this:

Warning: Protobuf detected import path issue while loading generated file ruby/tests/basic_test_pb.rb
- tests/basic_test.proto imports test_import_proto2.proto, but that file was compiled using the name tests/test_import_proto2.proto
Each proto file must use a consistent fully-qualified name.
This will become an error in the next version.

@haberman haberman added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Apr 12, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Apr 12, 2023
@haberman haberman added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Apr 12, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Apr 12, 2023
@haberman haberman added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Apr 12, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Apr 12, 2023
@haberman haberman added the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Apr 12, 2023
@github-actions github-actions bot removed the 🅰️ safe for tests Mark a commit as safe to run presubmits over label Apr 12, 2023
Copy link
Contributor

@dazuma dazuma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general. Couple minor nits on the warning message.

end
end
warn "Each proto file must use a consistent fully-qualified name."
warn "This will become an error in the next version."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "next major version"

parsed.clear_dependency
serialized = parsed.class.encode(parsed)
file = pool.add_serialized_file(serialized)
warn "Warning: Protobuf detected import path issue while loading generated file #{__FILE__}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grammar nit: "an import path issue" or "import path issues"?

copybara-service bot pushed a commit that referenced this pull request Apr 13, 2023
…the DSL (#12319)

This PR removes the DSL from the code generator, in anticipation of splitting the DSL out into a separate package.

Given a .proto file like:

```proto
syntax = "proto3";

package pkg;

message TestMessage {
  optional int32 i32 = 1;
  optional TestMessage msg = 2;
}
```

Generated code before:

```ruby
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: protoc_explorer/main.proto

require 'google/protobuf'

Google::Protobuf::DescriptorPool.generated_pool.build do
  add_file("test.proto", :syntax => :proto3) do
    add_message "pkg.TestMessage" do
      proto3_optional :i32, :int32, 1
      proto3_optional :msg, :message, 2, "pkg.TestMessage"
    end
  end
end

module Pkg
  TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass
end
```

Generated code after:

```ruby
# frozen_string_literal: true
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: test.proto

require 'google/protobuf'

descriptor_data = "\n\ntest.proto\x12\x03pkg\"S\n\x0bTestMessage\x12\x10\n\x03i32\x18\x01 \x01(\x05H\x00\x88\x01\x01\x12\"\n\x03msg\x18\x02 \x01(\x0b\x32\x10.pkg.TestMessageH\x01\x88\x01\x01\x42\x06\n\x04_i32B\x06\n\x04_msgb\x06proto3"
begin
  Google::Protobuf::DescriptorPool.generated_pool.add_serialized_file(descriptor_data)
rescue TypeError => e
  # <compatibility code, see below>
end

module Pkg
  TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass
end
```

This change fixes nearly all remaining conformance problems that existed previously. This is a side effect of moving from the DSL (which is lossy) to a serialized descriptor (which preserves all information).

## Backward Compatibility

This change should be 100% compatible with Ruby Protobuf >= 3.18.0, released in Sept 2021. Additionally, it should be compatible with all existing users and deployments. However there is some special compatibility code I inserted to achieve this level of backward compatibility.

Without the compatibility code, there is an edge case that could break backward compatibility. The existing code is lax in a way that the new code would be more strict.

When we use a full serialized descriptor, it will contain a list of all `.proto` files imported by this file (whereas the DSL never added dependencies properly): https://github.com/protocolbuffers/protobuf/blob/dfb71558a2226718dc3bcf5df27cbc11c1f72382/src/google/protobuf/descriptor.proto#L65-L66

`add_serialized_file` will verify that all dependencies listed in the descriptor were previously added with `add_serialized_file`.  Generally that should be fine, because the generated code will contain Ruby `require` statements for all dependencies, and the descriptor will fail to load anyway if the types we depend on were not previously defined in the DescriptorPool.

But there is a potential for problems if there are ambiguities around file paths. For example, consider the following scenario:

```proto
// foo/bar.proto

syntax = "proto2";

message Bar {}
```

```proto
// foo/baz.proto

syntax = "proto2";

import "bar.proto";

message Baz {
  optional Bar bar = 1;
}
```

If you invoke `protoc` like so, it will work correctly:

```
$ protoc --ruby_out=. -Ifoo foo/bar.proto foo/baz.proto
$ RUBYLIB=. ruby baz_pb.rb
```

However if you invoke `protoc` like so, and didn't have any compatibility code, it would fail to load:

```
$ protoc --ruby_out=. -I. -Ifoo foo/baz.proto
$ protoc --ruby_out=. -I. -Ifoo foo/bar.proto
$ RUBYLIB=foo ruby foo/baz_pb.rb
foo/baz_pb.rb:10:in `add_serialized_file': Unable to build file to DescriptorPool: Depends on file 'bar.proto', but it has not been loaded (Google::Protobuf::TypeError)
	from foo/baz_pb.rb:10:in `<main>'
```

The problem is that `bar.proto` is being referred to by two different canonical names: `bar.proto` and `foo/bar.proto`. This is a user error: each import should always be referred to by a consistent full path. Hopefully user errors of this sort are rare, but it is hard to know without trying.

The code in this PR prints a warning using `warn` if we detect that this edge case has occurred. We will plan to remove this compatibility code in the next major version.

Closes #12319

COPYBARA_INTEGRATE_REVIEW=#12319 from haberman:ruby-gencode-binary 5c0e8f2
FUTURE_COPYBARA_INTEGRATE_REVIEW=#12319 from haberman:ruby-gencode-binary 5c0e8f2
PiperOrigin-RevId: 524059189
copybara-service bot pushed a commit that referenced this pull request Apr 13, 2023
…the DSL (#12319)

This PR removes the DSL from the code generator, in anticipation of splitting the DSL out into a separate package.

Given a .proto file like:

```proto
syntax = "proto3";

package pkg;

message TestMessage {
  optional int32 i32 = 1;
  optional TestMessage msg = 2;
}
```

Generated code before:

```ruby
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: protoc_explorer/main.proto

require 'google/protobuf'

Google::Protobuf::DescriptorPool.generated_pool.build do
  add_file("test.proto", :syntax => :proto3) do
    add_message "pkg.TestMessage" do
      proto3_optional :i32, :int32, 1
      proto3_optional :msg, :message, 2, "pkg.TestMessage"
    end
  end
end

module Pkg
  TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass
end
```

Generated code after:

```ruby
# frozen_string_literal: true
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: test.proto

require 'google/protobuf'

descriptor_data = "\n\ntest.proto\x12\x03pkg\"S\n\x0bTestMessage\x12\x10\n\x03i32\x18\x01 \x01(\x05H\x00\x88\x01\x01\x12\"\n\x03msg\x18\x02 \x01(\x0b\x32\x10.pkg.TestMessageH\x01\x88\x01\x01\x42\x06\n\x04_i32B\x06\n\x04_msgb\x06proto3"
begin
  Google::Protobuf::DescriptorPool.generated_pool.add_serialized_file(descriptor_data)
rescue TypeError => e
  # <compatibility code, see below>
end

module Pkg
  TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass
end
```

This change fixes nearly all remaining conformance problems that existed previously. This is a side effect of moving from the DSL (which is lossy) to a serialized descriptor (which preserves all information).

## Backward Compatibility

This change should be 100% compatible with Ruby Protobuf >= 3.18.0, released in Sept 2021. Additionally, it should be compatible with all existing users and deployments. However there is some special compatibility code I inserted to achieve this level of backward compatibility.

Without the compatibility code, there is an edge case that could break backward compatibility. The existing code is lax in a way that the new code would be more strict.

When we use a full serialized descriptor, it will contain a list of all `.proto` files imported by this file (whereas the DSL never added dependencies properly): https://github.com/protocolbuffers/protobuf/blob/dfb71558a2226718dc3bcf5df27cbc11c1f72382/src/google/protobuf/descriptor.proto#L65-L66

`add_serialized_file` will verify that all dependencies listed in the descriptor were previously added with `add_serialized_file`.  Generally that should be fine, because the generated code will contain Ruby `require` statements for all dependencies, and the descriptor will fail to load anyway if the types we depend on were not previously defined in the DescriptorPool.

But there is a potential for problems if there are ambiguities around file paths. For example, consider the following scenario:

```proto
// foo/bar.proto

syntax = "proto2";

message Bar {}
```

```proto
// foo/baz.proto

syntax = "proto2";

import "bar.proto";

message Baz {
  optional Bar bar = 1;
}
```

If you invoke `protoc` like so, it will work correctly:

```
$ protoc --ruby_out=. -Ifoo foo/bar.proto foo/baz.proto
$ RUBYLIB=. ruby baz_pb.rb
```

However if you invoke `protoc` like so, and didn't have any compatibility code, it would fail to load:

```
$ protoc --ruby_out=. -I. -Ifoo foo/baz.proto
$ protoc --ruby_out=. -I. -Ifoo foo/bar.proto
$ RUBYLIB=foo ruby foo/baz_pb.rb
foo/baz_pb.rb:10:in `add_serialized_file': Unable to build file to DescriptorPool: Depends on file 'bar.proto', but it has not been loaded (Google::Protobuf::TypeError)
	from foo/baz_pb.rb:10:in `<main>'
```

The problem is that `bar.proto` is being referred to by two different canonical names: `bar.proto` and `foo/bar.proto`. This is a user error: each import should always be referred to by a consistent full path. Hopefully user errors of this sort are rare, but it is hard to know without trying.

The code in this PR prints a warning using `warn` if we detect that this edge case has occurred. We will plan to remove this compatibility code in the next major version.

Closes #12319

COPYBARA_INTEGRATE_REVIEW=#12319 from haberman:ruby-gencode-binary 5c0e8f2
FUTURE_COPYBARA_INTEGRATE_REVIEW=#12319 from haberman:ruby-gencode-binary 5c0e8f2
PiperOrigin-RevId: 524059189
copybara-service bot pushed a commit that referenced this pull request Apr 13, 2023
…the DSL (#12319)

This PR removes the DSL from the code generator, in anticipation of splitting the DSL out into a separate package.

Given a .proto file like:

```proto
syntax = "proto3";

package pkg;

message TestMessage {
  optional int32 i32 = 1;
  optional TestMessage msg = 2;
}
```

Generated code before:

```ruby
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: protoc_explorer/main.proto

require 'google/protobuf'

Google::Protobuf::DescriptorPool.generated_pool.build do
  add_file("test.proto", :syntax => :proto3) do
    add_message "pkg.TestMessage" do
      proto3_optional :i32, :int32, 1
      proto3_optional :msg, :message, 2, "pkg.TestMessage"
    end
  end
end

module Pkg
  TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass
end
```

Generated code after:

```ruby
# frozen_string_literal: true
# Generated by the protocol buffer compiler.  DO NOT EDIT!
# source: test.proto

require 'google/protobuf'

descriptor_data = "\n\ntest.proto\x12\x03pkg\"S\n\x0bTestMessage\x12\x10\n\x03i32\x18\x01 \x01(\x05H\x00\x88\x01\x01\x12\"\n\x03msg\x18\x02 \x01(\x0b\x32\x10.pkg.TestMessageH\x01\x88\x01\x01\x42\x06\n\x04_i32B\x06\n\x04_msgb\x06proto3"
begin
  Google::Protobuf::DescriptorPool.generated_pool.add_serialized_file(descriptor_data)
rescue TypeError => e
  # <compatibility code, see below>
end

module Pkg
  TestMessage = ::Google::Protobuf::DescriptorPool.generated_pool.lookup("pkg.TestMessage").msgclass
end
```

This change fixes nearly all remaining conformance problems that existed previously. This is a side effect of moving from the DSL (which is lossy) to a serialized descriptor (which preserves all information).

## Backward Compatibility

This change should be 100% compatible with Ruby Protobuf >= 3.18.0, released in Sept 2021. Additionally, it should be compatible with all existing users and deployments. However there is some special compatibility code I inserted to achieve this level of backward compatibility.

Without the compatibility code, there is an edge case that could break backward compatibility. The existing code is lax in a way that the new code would be more strict.

When we use a full serialized descriptor, it will contain a list of all `.proto` files imported by this file (whereas the DSL never added dependencies properly): https://github.com/protocolbuffers/protobuf/blob/dfb71558a2226718dc3bcf5df27cbc11c1f72382/src/google/protobuf/descriptor.proto#L65-L66

`add_serialized_file` will verify that all dependencies listed in the descriptor were previously added with `add_serialized_file`.  Generally that should be fine, because the generated code will contain Ruby `require` statements for all dependencies, and the descriptor will fail to load anyway if the types we depend on were not previously defined in the DescriptorPool.

But there is a potential for problems if there are ambiguities around file paths. For example, consider the following scenario:

```proto
// foo/bar.proto

syntax = "proto2";

message Bar {}
```

```proto
// foo/baz.proto

syntax = "proto2";

import "bar.proto";

message Baz {
  optional Bar bar = 1;
}
```

If you invoke `protoc` like so, it will work correctly:

```
$ protoc --ruby_out=. -Ifoo foo/bar.proto foo/baz.proto
$ RUBYLIB=. ruby baz_pb.rb
```

However if you invoke `protoc` like so, and didn't have any compatibility code, it would fail to load:

```
$ protoc --ruby_out=. -I. -Ifoo foo/baz.proto
$ protoc --ruby_out=. -I. -Ifoo foo/bar.proto
$ RUBYLIB=foo ruby foo/baz_pb.rb
foo/baz_pb.rb:10:in `add_serialized_file': Unable to build file to DescriptorPool: Depends on file 'bar.proto', but it has not been loaded (Google::Protobuf::TypeError)
	from foo/baz_pb.rb:10:in `<main>'
```

The problem is that `bar.proto` is being referred to by two different canonical names: `bar.proto` and `foo/bar.proto`. This is a user error: each import should always be referred to by a consistent full path. Hopefully user errors of this sort are rare, but it is hard to know without trying.

The code in this PR prints a warning using `warn` if we detect that this edge case has occurred. We will plan to remove this compatibility code in the next major version.

Closes #12319

COPYBARA_INTEGRATE_REVIEW=#12319 from haberman:ruby-gencode-binary 5c0e8f2
FUTURE_COPYBARA_INTEGRATE_REVIEW=#12319 from haberman:ruby-gencode-binary 5c0e8f2
PiperOrigin-RevId: 524059189
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants