-
Notifications
You must be signed in to change notification settings - Fork 403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pandas 0.23.0 - json_normalize - 'STIXdatetime' object has no attribute 'nanosecond' #17
Comments
Hey @Cyb3rWard0g...I'm going to pass this along to the folks working on the python-stix2 lib to get their feedback on this. As always, thanks so much for bringing this to our attention! |
Thank you @jburns12 👍 it is probably nothing, but I figured I would let you guys know first just in case 😄 . Np. Thank you guys! |
Hi @Cyb3rWard0g, it sounds like you might be working with
For example, assuming |
Hey @emmanvg Thank you for the suggestion and for taking a look at this issue. When I work with the data from ATT&CK content in STIX, I first parse every field to rename some of the columns and put everything in dictionaries. Those dictionaries then get aggregated in a list. STIX objects do not get passed to the list. I checked every object in techniques with a for loop :
Now, I dont know how to check if those dictionaries are also considered STIX objects. When I pass serialize() to the objects it does not work because I am now working with lists and dictionaries. I loop through STIX objects and then pass them to dictionaries and then to a long list to manage the fields better and keep it consistent across all objects. I am releasing this first beta version of a script I am working on today. I hope it helps to provide more context to the question. 👍 |
Hi @Cyb3rWard0g, the behavior you are experiencing may be a bug that needs to be addressed in
Don't put them in dictionaries and instead serialize them and load the strings back to have a JSON object that you can pass to the |
Hey @emmanvg ! Thank you very much for the information. I will then use that method to convert the STIX objects to Json first. The reason why I rename most of the fields is because when you want to have mappings going matrix -> tactic -> technique -> technique id -> group -> group id -> software -> software id , I cannot use the field "name" or simply "id" since it will create conflicts. I will work on the serialize update. thank you again!!! 👍 |
@Cyb3rWard0g, I took a quick look at your project. Based on your usage basics notebook I noticed what you meant by modifying the pandas matrix and how you manipulate the STIX data (I am not a pandas expert). Though, when looking at your code this method creates new dictionaries based on direct interaction with |
I really appreciate the help and thank you for providing all those details that make perfect sense!!! 👍 I will try the str(obj['created']) then and it seemed that will solve the issue since it will not be returning STIXdatetime object instead. This is why I wanted to share it with you guys first before opening all kinds of issues in Pandas without much information on how cti-python-stix2 was handling those functions. I am glad I could help 👍 Thank you again. I will share some updates of the changes to the script in the next couple of days. |
This is easy to reproduce without python-stix2. If you subclass python's import datetime
import pandas as pd
class DTTest(datetime.datetime):
pass
our_dt = DTTest(2012,1,2)
pd.to_datetime(our_dt) produces:
The stacktrace is a little different (I called a different function, to try to simplify), but I bet it's the same underlying issue. There is no documented "nanosecond" field on datetime. Instance attributes are documented here. It works fine with a plain datetime object. Pandas might be doing some voodoo it shouldn't be doing. I'd ask what's going on in a pandas help forum. Give them a simple test like this and ask why pandas is misbehaving. In the meantime, I guess you need to avoid passing STIXdatetime instances into pandas! |
Actually, this existing pandas issue is probably the cause. Opened only 11 days ago, and is still open. |
Hey @chisholm ! very interesting. so when you say
You mean to follow the str(obj['created']) example to any object returning STIXdatetime correct? |
@chisholm thank you very much for providing more details. It is a 0.23.0 version issue since I am not having the same issues with Pandas 0.22.0 in Python2.7 and 3.6. I mentioned that they added this to their Pandas version 0.23.0 https://github.com/pandas-dev/pandas/blob/0.23.x/pandas/core/dtypes/cast.py#L920
I tracked the error I had back to that part of the library. |
If
Actually, both of our stacktraces track further back than cast.py (or datetimes.py in my case). They track back to a "tslib.pyx" file. I suspect that's where the problem is, and is probably not Python code. My stacktrace references an |
niceee makes sense. Thank you @chisholm 😄 |
Good evening @chisholm . I wanted to provide an update on how I am managing the STIXdatetime object type error. I noticed that even without using Pandas and simply "json.dumps(stix_object)", I get the same error. I pass the STIX object to a dictionary to apply a standard naming convention to all the data I retrieve from TAXII. for example when I run the function from my library, it returns a dictionary with STIX object types.
I can confirm that the created field is a STIXdatetime:
Then I just simply:
So I was wondering if it is also related to what @emmanvg mentioned earlier in this thread
Anyways. I am just using the workaround passing the STIXdatetime object type to a str.
when I do that I get the following date and it can be passed as a string instead of STIXdatetime:
Just sharing some updates on how I am approaching this error 👍 Thank you for all your help an time ! Have a great weekend! |
@Cyb3rWard0g, you should not call |
I think you must mean the same bit of code emmanvg linked to earlier in the thread. Yeah, STIXdatetime is not JSON serializable via the default encoder. It's not a bug. Python's built-in encoder doesn't know anything about stix2 types. The types supported by the built-in encoder are listed here. As he notes, people who stick with the plain stix2 objects can call the serialize() method to obtain JSON. If you've got stix2 types embedded in a different data structure and want to serialize it all to JSON, you could try the encoder provided by the library, which he linked to. E.g. |
No problem @Cyb3rWard0g! I think it is OK to close. If you have stix2 library problems in the future I would recommend opening them there to better address it. |
Yep. The main problem in this issue was a pandas bug, not a stix2 bug. |
Thanks for the detailed report, @Cyb3rWard0g . And thanks for the helpful follow-ups, @emmanvg and @chisholm . Based on what everyone has said, this is:
There's not currently a way to convert a Thanks again! |
Thank you so much for all the detailed information and for teaching me the right way to do things. I learned a lot in this thread issue. Thank you again. and any future issue will be open in the STIX repo. 👍 I hope you all have a great weekend!!! |
Just for confirmation as I found nothing in the doc. Is it still the recommended solution today if we need data as native dict ? |
As far as I know yes that's the best way to do it. You may get more info if you ask on the cti-python-stix2 issue tracker though since the team over there would know more about the capabilities of their library. FWIW if you're trying to use ATT&CK with Pandas in 2021, we have an official way of doing that: https://github.com/mitre-attack/mitreattack-python/tree/master/mitreattack/attackToExcel#accessing-the-pandas-dataframes |
Good evening Team,
I hope you guys are having a good day. I have been playing with the ATT&CK STIX content for the past week and I wanted to report an issue that I am not sure if this is an issue with Pandas or the STIX library. However, I figured it would be good to share it here first just in case I am missing something and also if anyone is having an issue when using pandas 0.23.0 (Latest Version) with ATT&CK STIX content via TAXII.
I tested ATT&CK STIX content with Pandas 0.21.0 and 0.22.0 and everything was working fine. I was getting everything fine like this:
However, when I tested it with Pandas 0.23.0, I got the following error:
The reason why at the beginning I thought it was STIX library was due to the following error message at the end:
This is a very specific error in version 0.23.0 so I checked the changes to that specific definitions in pandas:
** Pandas Version 0.22.0:**
https://github.com/pandas-dev/pandas/blob/0.22.x/pandas/core/dtypes/cast.py#L879
** Pandas Version 0.23.0:**
https://github.com/pandas-dev/pandas/blob/0.23.x/pandas/core/dtypes/cast.py#L908
So they added the following in version 0.23.0:
https://github.com/pandas-dev/pandas/blob/0.23.x/pandas/core/dtypes/cast.py#L920
I checked the STIXdateTime class arguments and I dont see nanoseconds as an option
https://github.com/oasis-open/cti-python-stix2/blob/master/stix2/utils.py#L24
I am not sure if there is anything that needs to be done on the STIX library side.
I downgraded the Python3 Pandas package to 0.22.0 and it worked fine. I didnt want to start an issue in Pandas before asking you guys if this makes sense and if it is possible that nanoseconds needs to be defined as an argument for the STIXdatetime class.
I hope you all have a great weekend! No rush at all on this one. I will keep working with Pandas 0.22.0 for now. I dont need to use pandas to collect or filter the data initially. I use it for a better representation of the results after collecting everything via STIX and TAXII libraries. Therefore, if you want to close this issue since I am using an external library, I would understand. It is just that the STIXdatetime error message caught my attention and I wasnt sure if nanosecond is an standard or anything that needs to be defined on the STIX side. If not, then this issue can be close 😄
Once again guys, great job and thank you for all your help!! I hope you all have a great weekend!!!
The text was updated successfully, but these errors were encountered: