Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i have a json file that combine with a lot of dict? how can i using ijson.items to one by one get dict? #51

Closed
faker09 opened this issue May 13, 2021 · 7 comments
Labels
question Further information is requested

Comments

@faker09
Copy link

faker09 commented May 13, 2021

No description provided.

@rtobar
Copy link

rtobar commented May 13, 2021

@faker09 you'll need to provide more details, otherwise it's nearly impossible to give any advice. Also look if any ijson questions in StackOverflow cover what you need.

@rtobar rtobar added the question Further information is requested label May 13, 2021
@faker09
Copy link
Author

faker09 commented May 13, 2021

because my json file is too large(8GB), i wanna use ijson.items to read this file.

below example

{"datum":{"com.bbn.tc.schema.avro.cdm20.Event":{"uuid":"A.-\u0003\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000","sequence":null,"type":"EVENT_CREATE_OBJECT","threadId":{"int":3056},"subject":{"com.bbn.tc.schema.avro.cdm20.UUID":"u*N\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},"predicateObject":{"com.bbn.tc.schema.avro.cdm20.UUID":">\"-\u0003\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},"predicateObjectPath":{"string":"C:\\Users\\admin\\Documents\\Documents"},"predicateObject2":null,"predicateObject2Path":null,"timestampNanos":1557272968456297200,"names":null,"parameters":null,"location":null,"size":null,"programPoint":null,"properties":{"map":{"HasMacro":"4"}}}},"CDMVersion":"20","type":"RECORD_HOST","hostId":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000","sessionNumber":0,"source":"SOURCE_WINDOWS_MARPLE"}

{"datum":{"com.bbn.tc.schema.avro.cdm20.Event":{"uuid":"B.-\u0003\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000","sequence":null,"type":"EVENT_OTHER","threadId":{"int":3056},"subject":{"com.bbn.tc.schema.avro.cdm20.UUID":"u*N\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},"predicateObject":{"com.bbn.tc.schema.avro.cdm20.UUID":">\"-\u0003\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},"predicateObjectPath":{"string":"C:\\Users\\admin\\Documents\\Documents"},"predicateObject2":null,"predicateObject2Path":null,"timestampNanos":1557272968456415900,"names":{"array":["FileIoClose"]},"parameters":null,"location":null,"size":null,"programPoint":null,"properties":null}},"CDMVersion":"20","type":"RECORD_HOST","hostId":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000","sessionNumber":0,"source":"SOURCE_WINDOWS_MARPLE"}

{"datum":{"com.bbn.tc.schema.avro.cdm20.Event":{"uuid":"C.-\u0003\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000","sequence":null,"type":"EVENT_CREATE_OBJECT","threadId":{"int":3056},"subject":{"com.bbn.tc.schema.avro.cdm20.UUID":"u*N\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},"predicateObject":{"com.bbn.tc.schema.avro.cdm20.UUID":">\"-\u0003\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000"},"predicateObjectPath":{"string":"C:\\Users\\admin\\Documents\\Documents"},"predicateObject2":null,"predicateObject2Path":null,"timestampNanos":1557272968456469400,"names":null,"parameters":null,"location":null,"size":null,"programPoint":null,"properties":{"map":{"HasMacro":"4"}}}},"CDMVersion":"20","type":"RECORD_HOST","hostId":"\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000","sessionNumber":0,"source":"SOURCE_WINDOWS_MARPLE"}

@faker09
Copy link
Author

faker09 commented May 13, 2021

i means that if i use

data_json=ijson.items(filename, 'item')
for i in data_json:
      print(i)

it don't output anything

@rtobar
Copy link

rtobar commented May 14, 2021

@faker09 the problem is you are passing ijson the filename instead of a file object. You need to open your file yourself, then give the resulting file object to ijson.

@faker09
Copy link
Author

faker09 commented May 14, 2021

@ faker09问题是您要通过ijson传递文件名而不是文件对象。您需要自己打开文件,然后将生成的文件对象提供给ijson。

sorry, i use a confusing parameter name. actually, i opened a file and pass a file object to ijson.items()

@rtobar
Copy link

rtobar commented May 14, 2021

@faker09 can you please then provide a cleaner extract of your JSON file? In particular, how does the file start? I get the impression you have a file with multiple top-level JSON objects instead of a single one, in which case you'll need to use an empty prefix and multiple_values=True when invoking ijson.items. But again, if you provide a cleaner extract of the JSON file (make sure it formats correctly when putting it into the comments here), the code you are using, and any error you might be receiving, it would be better.

@rtobar
Copy link

rtobar commented May 19, 2021

Closing for lack of clearer explanations.

@rtobar rtobar closed this as completed May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants