Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: pyorc.StructRepr difference between Tuple and Dict? #22

Closed
jornfranke opened this issue Oct 15, 2020 · 2 comments
Closed

Question: pyorc.StructRepr difference between Tuple and Dict? #22

jornfranke opened this issue Oct 15, 2020 · 2 comments

Comments

@jornfranke
Copy link

jornfranke commented Oct 15, 2020

Hi,

Can you explain the difference in pyorc.StructRepr between Tuple and Dict?
Is it just different how Writer.write accepts the values as parameters?
If this is the latter, can you please provide an example with a dict? Are the keys then column names and the value their value for the corresponding cell?

thank you.
Very nice library btw.: I use it to provide an exporter to Orc for the Scrapy framework under: https://github.com/zuinnote/scrapy-contrib-bigexporters

best regards

@noirello
Copy link
Owner

Hi,

It affects both Reader and Writer.
When the struct_repr parameter is set to pyorc.StructRepr.DICT, then the Reader returns the rows as dicts and the Writer's write method expects a dict with all the keys of the column names. (It can have other keys that are not represented in the schema, but they will be ignored.)

import pyorc
input = open('./deps/examples/TestOrcFile.test1.orc', 'rb')
r = pyorc.Reader(input, struct_repr=pyorc.StructRepr.DICT)
print(next(r))
{'boolean1': False, 'byte1': 1, 'short1': 1024, 'int1': 65536, 'long1': 9223372036854775807, 'float1': 1.0, 'double1': -15.0, 'bytes1': b'\x00\x01\x02\x03\x04', 'string1': 'hi', 'middle': {'list': [{'int1': 1, 'string1': 'bye'}, {'int1': 2, 'string1': 'sigh'}]}, 'list': [{'int1': 3, 'string1': 'good'}, {'int1': 4, 'string1': 'bad'}], 'map': {}}
r = pyorc.Reader(input, struct_repr=pyorc.StructRepr.TUPLE)
print(next(r))
(False, 1, 1024, 65536, 9223372036854775807, 1.0, -15.0, b'\x00\x01\x02\x03\x04', 'hi', ([(1, 'bye'), (2, 'sigh')],), [(3, 'good'), (4, 'bad')], {})
output = open('test.orc', 'wb')
w = pyorc.Writer(output, 'struct<a:int,b:float,c:string>', struct_repr=pyorc.StructRepr.DICT)
w.write({'a': 1, 'b': 2.0, 'c': 'test'})
w.write({'a': 1, 'b': 2.0})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'c'

@jornfranke
Copy link
Author

great thanks a lot for the detailed and fast answer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants