Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

NameSpaceRequiredException when reusing child class #246

Closed
mhils opened this issue Feb 24, 2023 · 5 comments
Closed

NameSpaceRequiredException when reusing child class #246

mhils opened this issue Feb 24, 2023 · 5 comments

Comments

@mhils
Copy link
Contributor

mhils commented Feb 24, 2023

Hi there! First off, thank you for the fantastic library. 馃嵃

I'm running into the following issue in a migration to dataclasess-avroschema:

from dataclasses_avroschema.avrodantic import AvroBaseModel


class ChildA(AvroBaseModel):
    id: int


class ChildB(AvroBaseModel):
    id: int


class Parent(AvroBaseModel):
    x: ChildA | ChildB
    y: ChildA | ChildB

    class Meta:
        namespace = "com.example"


d = Parent(x=ChildA(id=42), y=ChildB(id=43))
print(d.serialize(serialization_type="avro-json"))
  File "C:\Python3.11\Lib\site-packages\dataclasses_avroschema\fields.py", line 773, in get_avro_type
    raise NameSpaceRequiredException(field_type=self.type, field_name=self.name)
dataclasses_avroschema.exceptions.NameSpaceRequiredException: Required namespace in Meta for type <class '__main__.ChildA'>. The field y is using an exiting type

Removing y fixes the issue and correctly produces:

{"x": {"com.example.ChildA": {"id": 42}}}

The problem seems to be that exist_type turns True, but I don't understand enough of the codebase to know what the correct fix is. 馃槂

@marcosschroh
Copy link
Owner

marcosschroh commented Feb 24, 2023

You're welcome. The error dataclasses_avroschema.exceptions.NameSpaceRequiredException: Required namespace in Meta for type <class '__main__.ChildA'>. The field y is using an exiting type says that you have to define a namespace for type <class '__main__.ChildA'>. The same should be done for type <class '__main__.ChildB'>:

from dataclasses_avroschema.avrodantic import AvroBaseModel


class ChildA(AvroBaseModel):
    id: int

    class Meta:
        namespace = "com.example"

class ChildB(AvroBaseModel):
    id: int

    class Meta:
        namespace = "com.example"

class Parent(AvroBaseModel):
    x: ChildA | ChildB
    y: ChildA | ChildB

    
d = Parent(x=ChildA(id=42), y=ChildB(id=43))
print(d.serialize(serialization_type="avro-json"))

# >>> b'{"x": {"com.example.ChildA": {"id": 42}}, "y": {"com.example.ChildA": {"id": 43}}}'

@mhils
Copy link
Contributor Author

mhils commented Feb 24, 2023

Thanks for the swift reply! Maybe I reduced the example above too much: I'm using the model generator to generate ChildA/ChildB/Parent from an .asvc file (which in turn is generated from an .avdl definition). In this case, only the top-level record has a Meta class with a namespace.

  1. .avdl -> .avsc conversion using avro-tools.jar: Except for the top-level record, there are no explicit namespaces in the asvc file. This is ok according to the Avro spec:

    In this case the namespace is taken from the most tightly enclosing schema or protocol.

  2. .avsc -> pydantic conversion using the model generator:
    Child classes do not get equipped with explicit namespaces (mirroring the asvc file)
  3. .serialize() eventually borks on missing namespaces.

I guess avro-tools's behavior is pretty much set in stone, so the question is whether 2) or 3) should be fixed? It feels like 3) is the culprit here (hence my minimal example focusing on this), but I'm happy to be convinced otherwise. :)

@marcosschroh
Copy link
Owner

You're welcome. The problem here is that when there are "schema relationships" and a type is used more than once (like in your example) the namespace is required. I see this as a good practice to always include the namespace but in reality it is not required (the avsc schema generated with avro-tools is also correct if it does not include the namespaces). Perhaps we should remove this hard restriction

@mhils
Copy link
Contributor Author

mhils commented Feb 25, 2023

I see that the namespace is required at the moment, but I don't think I don't exactly understand why that's the case? It seems like the existing type could just be reused then? If I apply the patch below, things just work for me (except for a test that basically matches the problem here and expects an error).

patch.diff

diff --git a/dataclasses_avroschema/fields.py b/dataclasses_avroschema/fields.py
index 10cb8a1..6a4a472 100644
--- a/dataclasses_avroschema/fields.py
+++ b/dataclasses_avroschema/fields.py
@@ -748,8 +748,9 @@ class RecordField(BaseField):
             record_type["name"] = name
         else:
             if metadata.namespace is None:
-                raise NameSpaceRequiredException(field_type=self.type, field_name=self.name)
-            record_type = f"{metadata.namespace}.{name}"
+                record_type = name
+            else:
+                record_type = f"{metadata.namespace}.{name}"
         if self.default is None:
             return [field_utils.NULL, record_type]

Here's a full repro that matches more closely what I'm doing:

from pathlib import Path
from subprocess import run
import json

if False:
    # generate avsc
    Path("example.avdl").write_text("""
    @namespace("com.example")
    protocol MyProtocol {
      record Child {
        int id;
      }
    
      record Parent {
        union { null, Child } x;
        union { null, Child } y;
      }
    }
    """)
    run(["java", "-jar", "avro-tools-1.11.1.jar", "idl2schemata", "./example.avdl"], check=True)
else:
    # same as above, just hardcoded.
    Path("Parent.avsc").write_text("""
    {
      "type" : "record",
      "name" : "Parent",
      "namespace" : "com.example",
      "fields" : [ {
        "name" : "x",
        "type" : [ "null", {
          "type" : "record",
          "name" : "Child",
          "fields" : [ {
            "name" : "id",
            "type" : "int"
          } ]
        } ]
      }, {
        "name" : "y",
        "type" : [ "null", "Child" ]
      } ]
    }
    """)


# Convert avsc to Python
from dataclasses_avroschema import ModelGenerator

model_generator = ModelGenerator()
schema = json.loads(Path("Parent.avsc").read_text())
result = model_generator.render(schema=schema)
Path("generated.py").write_text(result)


# Try to serialize generated model
from generated import Parent, Child

d = Parent(
    x=Child(id=42),
    y=Child(id=43),
)
print(d.serialize(serialization_type="avro-json"))  # NameSpaceRequiredException

@marcosschroh
Copy link
Owner

Yes, the existing type can be reused so you don鈥檛 have to repeat the whole schema again, instead use the field.name. We should applied the patch, probably after merging your PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants