New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export and merge functions #17
base: master
Are you sure you want to change the base?
Export and merge functions #17
Conversation
…e.py The new attribute system will involve meta classes other helper objects I prefer to keep separate from the objects that use them for neatness.
Rather than specify which data model the collection is a 'collection' of in an overridden __init__ - now subclasses override the object_class class attribute with the correct data model. Not hugely important, stylistic choice - prefer avoiding super calls.
Previously a MembershipCollection object would be generated on request on accessing the membership property. Now this is generated and assigned at creation. Reason: collections need to be permanent so that changes to their objects can be repackaged into a file.
In new system, rather than the properties of a PopoloObject being defined as @property-ed functions accessing properties of self.data - this will be replaced by the 'Attribute' class - that given a attribute will control getting and setting to self.data. Subclasses of Attribute control different ways of talking to data e.g. talking to data that is a date. A small meta-class is used for PopoloObject to avoid repetition in definitions. e.g. person = Property(attr=person) can be expressed as person = Property()
Almost all previous @property-ed functions replaced with Attribute object. Functionality for 'getting' and default values are identical to old arrangement.
When a new 'twitter' value is set, if a url - the link value is set - and the username is also extracted for contact details. If not a url, just updates the contact details.
Date fields for Person and Organization were set to default to None rather than the vague .PAST and .FUTURE from approxdate. This has been changed to bring it into sync with other Membership and Event objects. Person Test updated to reflect this change.
Popolo objects can now export to json (to_json) or direct to a file name.
Two popolo objects can now be merged. Each collectiontype has a value that must be made 'unique' (either name or id). In a collision, the 'larger' will be kept (assumed more information) and all refs to discarded ids replaced. More complex functionality e.g. the bigger person object picking up all possible alternate names of the smaller one, can be specified by overriding the absorb function.
Datetime objects will be reduced to just their date, then converted to isoformat
Exposed json_data as something that can be accessed, but that is not independent of child objects. New objects are based on a deepcopy of the incoming dict.
Updated person tests with test for __gt__ and __lt__
…dAttributes Originally this was so you could build RelatedAttributes on something other than an ID field. This was never necessary, so removed to tidy up.
These features will also be useful in the approx_dates package directly, and will be ported there at some point.
Returns either unique time from membership or that from overall legislative period.
Nicer way of getting a wikidata, parlparse, etc for an individual.
Replaces with a cross-platform compatible hashing method so membership ids are generated consistently
Will absorb gender of 'other' when self has none.
c718cf5
to
6df1ea7
Compare
6df1ea7
to
37b644c
Compare
pop.to_filename(filename) | ||
os.remove(filename) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line at end of file
blank line contains whitespace
filename = mktemp() | ||
pop.to_filename(filename) | ||
os.remove(filename) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line contains whitespace
assert pop2.to_json() == pop.to_json() | ||
|
||
assert pop.memberships[0].id == pop2.memberships[0].id | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line contains whitespace
j = pop.to_json() | ||
pop2 = Popolo(json.loads(j)) | ||
assert pop2.to_json() == pop.to_json() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line contains whitespace
pop.add(m) | ||
|
||
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
too many blank lines (2)
membership_attributes = {"role":"MP", | ||
"person_id":"person1", | ||
"organisation_id":"org1", | ||
"area_id":"area1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing whitespace after ':'
|
||
membership_attributes = {"role":"MP", | ||
"person_id":"person1", | ||
"organisation_id":"org1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing whitespace after ':'
pop.add(e) | ||
|
||
membership_attributes = {"role":"MP", | ||
"person_id":"person1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing whitespace after ':'
assert getattr(e, k) == v | ||
pop.add(e) | ||
|
||
membership_attributes = {"role":"MP", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing whitespace after ':'
setattr(e, k, v) | ||
assert getattr(e, k) == v | ||
pop.add(e) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line contains whitespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @ajparsons - I think I'm OK with these changes in principle, but there are few things that make this hard to review for me. I've added HoundCI now, who's automatically commented on the not very interesting stylistic things. I've also left some line comments (although GitHub showing some of these as "outdated" now after I've rebased).
A key thing is that I'd like the tests to pass at each commit; I mentioned this particularly in the case of a commit that should have just been a refactoring, but caused the tests to start failing. A nice trick for checking that this is the case is to use:
git rebase -i --exec 'tox' master
So basically, I like the direction this is heading in, but I'd like to come back to it once some of these things are sorted out.
@@ -7,3 +7,6 @@ | |||
/htmlcov | |||
/everypolitician_popolo.egg-info | |||
/dist | |||
.project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think generally I'd expect people to put ignore rules that are specific to a particular IDE into .git/info/exclude
(or to have them in a per-user gitignore configured with: git config --global core.excludesfile $HOME/.gitignore
. (I'm happy with adding the .gitattributes
line-endings configuration, though.)
self.all_popolo = all_popolo | ||
self.object_class = object_class | ||
self.object_class = self.__class__.object_class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this line is actually necessary, is it? self.foo
falls back to trying to find a class variable foo, e.g.:
In [1]: class Thing(object):
...: foo = 'class foo'
...: bar = 'class bar'
...: def __init__(self):
...: self.foo = 'instance foo'
...: def print_stuff(self):
...: print self.foo
...: print self.bar
...:
In [2]: Thing().print_stuff()
instance foo
class bar
@@ -4,44 +4,20 @@ | |||
|
|||
|
|||
from approx_dates.models import ApproxDate | |||
from six.moves.urllib_parse import urlsplit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With a refactoring like this commit, I'd really like the tests to still pass at this point, but the imports in other modules haven't been updated and the tests don't pass at this point, so I think the commit is incomplete. (That would also make some of the later commits easier to read.)
popolo_data/importer.py
Outdated
|
||
def __init__(self, json_data=None): | ||
if json_data == None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny stylistic thing - since None
is a singleton, PEP 8 suggests if json_data is None:
@@ -1,30 +1,180 @@ | |||
from datetime import date |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some typos in this commit message (e.g. "In new system", "a attribute", extra space before "to self.data"
will return None rather than an error if the value is absent. | ||
|
||
''' | ||
def __init__(self,attr="",default=None,null=False,allow_multiple=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be spaces after the commas here (we really should have an automated check for this kind of thing)
self.allow_multiple = allow_multiple | ||
|
||
def __get__(self, obj, type=None): | ||
if self.allow_null_default == False and self.default_value == None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this be if (not self.allow_null_default) and (self.default_value is None):
?
37b644c
to
ccbd20d
Compare
Repackaged version of the export feature branch - split into more commits to separate different features and refactoring. Incorporating changes made to package since December.
Now also includes a feature for intelligent merging of similar popolo (where there will be overlap of individuals - but one file might have additional memberships).
Includes workaround for no to_string function in approx_date package - this logic also in mysociety/approx_dates#2 to approx_dates.
Tests written for export + merge features.