Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance, Design and other fixes #100

Merged
merged 12 commits into from
Nov 8, 2017
Merged

Conversation

emmanvg
Copy link
Contributor

@emmanvg emmanvg commented Nov 3, 2017

The major changes include:

  • A new serialize() method to add flexibility for performance. You can pass the same arguments you would for a json.dumps() call. For example, sort_keys=True, indent=4, etc.
  • Created a CustomProperty class, which developers can extend from to create their custom properties. It is wrapper, but it helps to signal the usage of custom properties. This fixes the problem when an object is parsed with allow_custom=True but the _properties dictionary would not get updated.
  • ABC for DataStore, DataSink and DataSource. I think the changes made are good for proper concrete implementations of these ABCs. For example, DataStore was never abstract...

closes #99
closes #98
closes #96

@codecov-io
Copy link

codecov-io commented Nov 3, 2017

Codecov Report

Merging #100 into master will increase coverage by 0.12%.
The diff coverage is 94.52%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #100      +/-   ##
==========================================
+ Coverage   97.35%   97.47%   +0.12%     
==========================================
  Files          56       56              
  Lines        5102     5119      +17     
==========================================
+ Hits         4967     4990      +23     
+ Misses        135      129       -6
Impacted Files Coverage Δ
stix2/test/test_bundle.py 100% <100%> (ø) ⬆️
stix2/test/test_memory.py 100% <100%> (ø) ⬆️
stix2/sources/memory.py 100% <100%> (ø) ⬆️
stix2/sources/__init__.py 100% <100%> (+3.77%) ⬆️
stix2/test/test_identity.py 100% <100%> (ø) ⬆️
stix2/base.py 100% <100%> (ø) ⬆️
stix2/test/test_data_sources.py 100% <100%> (ø) ⬆️
stix2/test/test_filesystem.py 100% <100%> (ø) ⬆️
stix2/sources/filesystem.py 98.16% <100%> (-0.04%) ⬇️
stix2/sources/taxii.py 34.17% <33.33%> (+0.84%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5aa8c66...258ea4b. Read the comment docs.

@clenk
Copy link
Contributor

clenk commented Nov 3, 2017

We talked about this offline, but just to document it: We also need to make sure custom properties' ordering works if they are passed to the constructor via a custom_properties dictionary instead of allow_custom. Like the 2nd example on https://stix2.readthedocs.io/en/latest/guide/custom.html#Custom-Properties.

@gtback
Copy link
Contributor

gtback commented Nov 3, 2017

@clenk: Is there a reason you think they would be handled differently? Everything gets collapsed into the same dictionary, whether they came from custom_properties dict or a standard kwarg (when allow_custom=True). It's interesting (and probably not desirable) that the custom properties come first in both examples on the Jupyter notebook you linked to.

Ignore this, I was looking at master and didn't see how this PR changed the behavior.

stix2/base.py Outdated
# The custom properties will get added to the bottom.
# Matched with a CustomProperty.
extra_kwargs = list(set(kwargs) - set(cls._properties))
self._properties.update([(x, CustomProperty()) for x in extra_kwargs])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we want to do this. It updates the class instance of _properties, meaning that any custom properties on the first instantiation of an SDO are allowed on any future instance, even if allow_custom is false for the future ones.

It's perhaps likely that people want to use the same custom properties on every instance of the same SDO that they create, but I'm not sure how I feel about the way this is done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point! So, could we (possible approaches):

  • Make _properties an instance variable?
  • Handle custom in a completely different structure?
  • Go back with the old method (required checking the _inner dict)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll go back to the #85 fix. That was working fine.

@gtback
Copy link
Contributor

gtback commented Nov 7, 2017

I'd like to figure out better ways to track the version history of the Jupyter notebooks. I noticed that a couple switched from Python 2 to Python 3 (which is OK, but I just want to make sure what we do is deliberate). @clenk , @mbastian1135 any ideas?

self.id = make_id()
self.source = source
self.sink = sink

def get(self, stix_id, allow_custom=False):
@abstractmethod
def get(self, stix_id): # pragma: no cover
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than making these abstract methods, can we just define them similar to get(self, *args, **kwargs), and pass them on the source? I'd rather not require subclasses to override these.

@clenk
Copy link
Contributor

clenk commented Nov 7, 2017

@gtback The notebooks will always switch Python version when someone re-executes cells in them unless we all use the exact same version of Python. I think we could use a git filter to ignore the lines about python version. See https://stackoverflow.com/q/16244969/ We could have git ignore the execution_count lines as well.

If you're referring to tracking changes in the content of output cells instead, there are some ways to handle that too: http://nbsphinx.readthedocs.io/en/0.2.16/usage.html#Using-Notebooks-with-Git.

@gtback
Copy link
Contributor

gtback commented Nov 7, 2017

@clenk: I just want to make sure we don't accidentally introduce extraneous diffs in the Notebook JSON files (output, Python version, execution count, etc.). That link to the nbsphinx output is extremely useful, but do we really want to not pre-run any of the cells? Will that even work on ReadTheDocs? (sorry, more questions than answers, I know)

Copy link
Contributor

@gtback gtback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There just seem to be a lot of intrusive changes to the data sources; to me, it's a code smell that we need to refactor this all a bit. I wish there were an easier way to handle these functions. Let's talk about this a bit more.



class DataSink(object):
class DataSink(with_metaclass(ABCMeta)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having DataSink and DataSource as abstract classes is OK. I just think DataStore can provide a lot of default functionality (delegating to source and sink attributes.

@@ -35,6 +35,85 @@ def __init__(self, stix_dir, bundlify=False):
self.source = FileSystemSource(stix_dir=stix_dir)
self.sink = FileSystemSink(stix_dir=stix_dir, bundlify=bundlify)

def get(self, stix_id, allow_custom=False, version=None, _composite_filters=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this is specific to the FileSystem, I don't think. I'm hoping we can suitably abstract it with *args and **kwargs in the base class.

@clenk
Copy link
Contributor

clenk commented Nov 8, 2017

As the Jupyter notebook versioning discussion seems tangential to this PR, I've moved it to #101.

@emmanvg
Copy link
Contributor Author

emmanvg commented Nov 8, 2017

Hopefully I have addressed the concerns with the DataStores. It looks a lot cleaner now. Note: I did not change anything related to Jupyter Notebook since there is a new issue open. If nothing else needs to be fixed i'd say its ready to be merged! 👍

@gtback gtback merged commit ef6dade into oasis-open:master Nov 8, 2017
@gtback gtback added this to the 0.4 milestone Jun 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants