Adjustment to the function that determines whether to import AvroTypeGen in the code to be generated #141

thiagodpf · 2024-08-31T00:08:47Z

The sort.Search() function is not producing the desired result for the logic inside the shouldImportAvroTypeGen() function.

Assuming that the definitions slice contains two elements, the first being an element that requires the inclusion of the import and the second as an enumerator (which does not require the import). When iterating through the elements of the namespace.Definitions map, when the code is iterating over the element that needs the import, the sort.Search() function will internally try to iterate through the slice from back to front, since the last element of the slice is not the element that needs the import and since the comparison is being made with ==, the sort.Search() function will return a new index (which should be used for a future ordered insertion in the slice).

In short, I think that the sort package is not the best option for what is intended to be done in the shouldImportAvroTypeGen() function.

This tweak seems to be much simpler to understand and makes more sense for the purpose of the shouldImportAvroTypeGen() function.

Fix to the problem reported in this issue: #130

The iteration order over maps is not specified and is not guaranteed to be the same from one iteration to the next. Therefore, iterating through the namespace.Definitions map may sometimes not work with the sort.Search() function. This tweak seems to be much simpler to understand and makes more sense for the purpose of the shouldImportAvroTypeGen() function.

rogpeppe · 2024-08-31T17:04:38Z

Have you found a case where this actually goes wrong? From a very brief look at the code (away from my laptop!) it looks like parseFile sorts the definitions slice before returning it so it looks ok to use sort.Search to me.

thiagodpf · 2024-09-02T11:58:25Z

Hi @rogpeppe !
Here is my .avsc

{
    "type": "record",
    "name": "BusinessModelChangeAudit",
    "namespace": "br.com.ifd.content",
    "fields": [
        {
            "name": "context",
            "type": [
                "null",
                "string"
            ],
            "default": null,
            "doc": "information about the execution of the transaction"
        },
        {
            "name": "created_at",
            "doc": "The time that this event has created (epoch)",
            "type": "long",
            "logicalType": "timestamp-millis"
        },
        {
            "name": "current_providers",
            "doc": "current providers related to the event",
            "type": {
                "type": "array",
                "items": "string"
            }
        },
        {
            "name": "previous_providers",
            "doc": "previous providers related to the event",
            "type": {
                "type": "array",
                "items": "string"
            }
        },
        {
            "name": "product_external_id",
            "type": "string",
            "doc": "identifier product salesforce"
        },
        {
            "name": "response_transmitted",
            "type": "boolean",
            "default": false,
            "doc": "indicates whether the return was reported to salesforce"
        },
        {
            "name": "status",
            "type": {
                "type": "enum",
                "name": "status",
                "symbols": [
                    "PENDING",
                    "SUCCESS",
                    "FAILURE",
                    "ABORTED",
                    "REJECTED",
                    "INTERMEDIATE"
                ]
            },
            "doc": "transaction status"
        },
        {
            "name": "step",
            "type": "string",
            "doc": "This field indicates at which step the intermediate status occurs during a process or workflow"
        },
        {
            "name": "transaction_id",
            "doc": "transaction unique identifier",
            "type": "string",
            "logicalType": "UUID"
        },
        {
            "name": "updated_at",
            "doc": "The time that this event has updated (epoch)",
            "type": "long",
            "logicalType": "timestamp-millis"
        }
    ]
}

and here is the generated code:
(missing import "github.com/heetch/avro/avrotypegen")

// Code generated by avrogen. DO NOT EDIT.

package bmca

import (
	"fmt"
	"strconv"
)

type BusinessModelChangeAudit struct {
	// information about the execution of the transaction
	Context *string `json:"context"`

	// The time that this event has created (epoch)
	Created_at int64 `json:"created_at"`

	// current providers related to the event
	Current_providers []string `json:"current_providers"`

	// previous providers related to the event
	Previous_providers []string `json:"previous_providers"`

	// identifier product salesforce
	Product_external_id string `json:"product_external_id"`

	// indicates whether the return was reported to salesforce
	Response_transmitted bool `json:"response_transmitted"`

	// transaction status
	Status Status `json:"status"`

	// This field indicates at which step the intermediate status occurs during a process or workflow
	Step string `json:"step"`

	// transaction unique identifier
	Transaction_id string `json:"transaction_id"`

	// The time that this event has updated (epoch)
	Updated_at int64 `json:"updated_at"`
}

// AvroRecord implements the avro.AvroRecord interface.
func (BusinessModelChangeAudit) AvroRecord() avrotypegen.RecordInfo {
	return avrotypegen.RecordInfo{
		Schema: `{"fields":[{"default":null,"doc":"information about the execution of the transaction","name":"context","type":["null","string"]},{"doc":"The time that this event has created (epoch)","logicalType":"timestamp-millis","name":"created_at","type":"long"},{"doc":"current providers related to the event","name":"current_providers","type":{"items":"string","type":"array"}},{"doc":"previous providers related to the event","name":"previous_providers","type":{"items":"string","type":"array"}},{"doc":"identifier product sales force","name":"product_external_id","type":"string"},{"default":false,"doc":"indicates whether the return was reported to salesforce","name":"response_transmitted","type":"boolean"},{"doc":"transaction status","name":"status","type":{"name":"status","symbols":["PENDING","SUCCESS","FAILURE","ABORTED","REJECTED","INTERMEDIATE"],"type":"enum"}},{"doc":"This field indicates at which step the intermediate status occurs during a process or workflow","name":"step","type":"string"},{"doc":"transaction unique identifier","logicalType":"UUID","name":"transaction_id","type":"string"},{"doc":"The time that this event has updated (epoch)","logicalType":"timestamp-millis","name":"updated_at","type":"long"}],"name":"br.com.ifd.content.BusinessModelChangeAudit","type":"record"}`,
		Required: []bool{
			1: true,
			2: true,
			3: true,
			4: true,
			6: true,
			7: true,
			8: true,
			9: true,
		},
	}
}

type Status int

const (
	StatusPending Status = iota
	StatusSuccess
	StatusFailure
	StatusAborted
	StatusRejected
	StatusIntermediate
)

var _Status_strings = []string{
	"PENDING",
	"SUCCESS",
	"FAILURE",
	"ABORTED",
	"REJECTED",
	"INTERMEDIATE",
}

// String returns the textual representation of Status.
func (e Status) String() string {
	if e < 0 || int(e) >= len(_Status_strings) {
		return "Status(" + strconv.FormatInt(int64(e), 10) + ")"
	}
	return _Status_strings[e]
}

// MarshalText implements encoding.TextMarshaler
// by returning the textual representation of Status.
func (e Status) MarshalText() ([]byte, error) {
	if e < 0 || int(e) >= len(_Status_strings) {
		return nil, fmt.Errorf("Status value %d is out of bounds", e)
	}
	return []byte(_Status_strings[e]), nil
}

// UnmarshalText implements encoding.TextUnmarshaler
// by expecting the textual representation of Status.
func (e *Status) UnmarshalText(data []byte) error {
	// Note for future: this could be more efficient.
	for i, s := range _Status_strings {
		if string(data) == s {
			*e = Status(i)
			return nil
		}
	}
	return fmt.Errorf("unknown value %q for Status", data)
}

thiagodpf · 2024-09-02T12:38:55Z

I believe the problem in the current algorithm is not related to the ordering of the definitions slice, but rather with the == comparator is being used within the sort.Search() function, when in fact (I believe) the comparator >= should be used if you want to keep using the sort.Search() function (although there would be side effects to think about 😅).

I don't know if it will help, but below is a screenshot where I managed to reproduce the problem, notice the order of the elements within the namespace.Definitions map and the if condition (since the sort.Search() function did not find the index and returned the size of the slice):

thiagodpf · 2024-09-02T13:20:03Z

I was reviewing the description I left in the pull request and I saw that I gave too much emphasis to the iteration through namespace.Definitions map, when in fact the problem is in the condition within the sort.Search() function (my bad 😅) so I thought it best to edit it.

I think (just my opinion) that the sort.Search() function is not the most suitable for this case since it seems to be better used in "comparable" values (less than, greater than, etc...) but not with raw strings.

I even thought that the sort.SearchStrings() could be useful to us, but the problem is that when it doesn't find the element in the slice, it will return the index where the element should be inserted, but it's not what we want at that point...

Still it wouldn't be a big problem given the current if conditions later on... 🤔
I can redo the PR to use sort.SearchStrings() if you prefer, what do you think?

rogpeppe · 2024-09-02T21:21:51Z

cmd/avrogo/generate.go

@@ -29,10 +29,7 @@ const nullType = "avrotypegen.Null"
 // schema.RecordDefinition by looking at their match within given parsed namespace
 func shouldImportAvroTypeGen(namespace *parser.Namespace, definitions []schema.QualifiedName) bool {


I haven't looked in details but that == comparison is definitely wrong!

How about something like this instead, which keeps the binary search but (hopefully!) fixes the logic:

// shouldImportAvroTypeGen reports whether avrotypegen is required. It checks that the definitions given are of type // schema.RecordDefinition by looking at their match within given parsed namespace func shouldImportAvroTypeGen(namespace *parser.Namespace, definitions []schema.QualifiedName) bool { for _, def := range namespace.Definitions { searchName := def.AvroName().Name _, found := sort.Find(len(definitions), func(i int) int { return strings.Compare(searchName, definitions[i].Name) }) if found { switch def.(type) { case *schema.RecordDefinition, *schema.FixedDefinition: return true } } } return false }

Note: I haven't actually run the above code :)

It worked perfectly! I'll adjust the PR.

Thanks for the help, I really don't have much experience with the sort package and I was able to learn some cool stuff! 🙇‍♂️

Adjustment in the function that determines whether or not the avrotypegen package should be imported into the file to be generated.

thiagodpf mentioned this pull request Aug 31, 2024

undefined: avrotypegen #130

Open

rogpeppe reviewed Sep 2, 2024

View reviewed changes

Update generate.go

d093b3e

Adjustment in the function that determines whether or not the avrotypegen package should be imported into the file to be generated.

thiagodpf changed the title ~~Update generate.go~~ Adjustment to the function that determines whether to import AvroTypeGen in the code to be generated Sep 3, 2024

thiagodpf requested a review from rogpeppe September 3, 2024 01:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjustment to the function that determines whether to import AvroTypeGen in the code to be generated #141

Adjustment to the function that determines whether to import AvroTypeGen in the code to be generated #141

thiagodpf commented Aug 31, 2024 •

edited

Loading

rogpeppe commented Aug 31, 2024

thiagodpf commented Sep 2, 2024

thiagodpf commented Sep 2, 2024 •

edited

Loading

thiagodpf commented Sep 2, 2024 •

edited

Loading

rogpeppe Sep 2, 2024 •

edited

Loading

thiagodpf Sep 3, 2024

		@@ -29,10 +29,7 @@ const nullType = "avrotypegen.Null"
		// schema.RecordDefinition by looking at their match within given parsed namespace
		func shouldImportAvroTypeGen(namespace *parser.Namespace, definitions []schema.QualifiedName) bool {

Adjustment to the function that determines whether to import AvroTypeGen in the code to be generated #141

Are you sure you want to change the base?

Adjustment to the function that determines whether to import AvroTypeGen in the code to be generated #141

Conversation

thiagodpf commented Aug 31, 2024 • edited Loading

rogpeppe commented Aug 31, 2024

thiagodpf commented Sep 2, 2024

thiagodpf commented Sep 2, 2024 • edited Loading

thiagodpf commented Sep 2, 2024 • edited Loading

rogpeppe Sep 2, 2024 • edited Loading

Choose a reason for hiding this comment

thiagodpf Sep 3, 2024

Choose a reason for hiding this comment

thiagodpf commented Aug 31, 2024 •

edited

Loading

thiagodpf commented Sep 2, 2024 •

edited

Loading

thiagodpf commented Sep 2, 2024 •

edited

Loading

rogpeppe Sep 2, 2024 •

edited

Loading