Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Supporting Parquet datatypes #108

Open
ayushbindlish opened this issue Mar 19, 2024 · 5 comments
Open

Question: Supporting Parquet datatypes #108

ayushbindlish opened this issue Mar 19, 2024 · 5 comments

Comments

@ayushbindlish
Copy link

I have a use case where I need to generate data for parquet datatypes. I am currently using a custom version of JSF. Would you like to have this feature here?

JSON looks like the following:

"UInt32": {
      "type": "uint32"
    },
    "UInt64": {
      "type": "uint64"
    },
    "Float16": {
      "type": "float16"
    }

[number.py:jsf.src.schema_types.number:line 304 - generate()] - INFO: Generating random uint32
[number.py:jsf.src.schema_types.number:line 52 - generate()] - DEBUG: is_float: False
[number.py:jsf.src.schema_types.number:line 72 - generate()] - INFO: Generated number: 35227457
[number.py:jsf.src.schema_types.number:line 333 - generate()] - INFO: Generating random uint64
[number.py:jsf.src.schema_types.number:line 52 - generate()] - DEBUG: is_float: False
[number.py:jsf.src.schema_types.number:line 72 - generate()] - INFO: Generated number: 4669327448559716910
[number.py:jsf.src.schema_types.number:line 362 - generate()] - INFO: Generating random float16
[number.py:jsf.src.schema_types.number:line 57 - generate()] - DEBUG: is_float: True
[number.py:jsf.src.schema_types.number:line 72 - generate()] - INFO: Generated number: 1.920763087895552e+17

@ghandic
Copy link
Owner

ghandic commented Mar 19, 2024

I don't believe JSON schema supports those numeric types, unless you can point me to the definition in the schema?

@ayushbindlish
Copy link
Author

ayushbindlish commented Mar 19, 2024

As you correctly mentioned, json schema does not support these datatypes but this is just for ease of data generation. For a given datatype, there are implicit ranges which can be manipulated using "minimum" and "maximum" but should always fall between the range for that datatype.

@ghandic
Copy link
Owner

ghandic commented Mar 19, 2024

I wouldn't add those types into jsf directly but if you proposed a PR for allowing people to extend the JSON Schema types with custom types it would work.

Then you would just manage your custom type generator classes outside of jsf

@ayushbindlish
Copy link
Author

@ghandic Any ideas on how this can be done?

@ghandic
Copy link
Owner

ghandic commented Apr 3, 2024

Yes, unfortunately I don't have time to implement but PR's are welcome.

Would be along the lines of making subclasses of the given base class and defining a mapping for a type to the class that should be ran

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants