Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export and merge functions #17

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

ajparsons
Copy link

Repackaged version of the export feature branch - split into more commits to separate different features and refactoring. Incorporating changes made to package since December.

Now also includes a feature for intelligent merging of similar popolo (where there will be overlap of individuals - but one file might have additional memberships).

Includes workaround for no to_string function in approx_date package - this logic also in mysociety/approx_dates#2 to approx_dates.

Tests written for export + merge features.

…e.py

The new attribute system will involve meta classes other helper objects
I prefer to keep separate from the objects that use them for neatness.
Rather than specify which data model the collection is a 'collection'
of in an overridden __init__ - now subclasses override the
object_class class attribute with the correct data model.

Not hugely important, stylistic choice - prefer avoiding super calls.
Previously a MembershipCollection object would be
generated on request on accessing the membership property.
Now this is generated and assigned at creation.

Reason: collections need to be permanent so that changes to
their objects can be repackaged into a file.
In new system, rather than the properties of a PopoloObject
being defined as @property-ed functions accessing properties
of self.data - this will be replaced by the 'Attribute' class
 - that given a attribute will control getting and setting
 to self.data.

Subclasses of Attribute control different ways of talking to data
e.g. talking to data that is a date.

A small meta-class is used for PopoloObject to avoid repetition
in definitions.
e.g. person = Property(attr=person) can be expressed as
person = Property()
Almost all previous @property-ed functions replaced with
Attribute object.

Functionality for 'getting' and default values are identical
to old arrangement.
When a new 'twitter' value is set, if a url - the link value is
set - and the username is also extracted for contact details.

If not a url, just updates the contact details.
Date fields for Person and Organization were set to default to
None rather than the vague .PAST and .FUTURE from approxdate.

This has been changed to bring it into sync with other Membership
and Event objects.

Person Test updated to reflect this change.
Popolo objects can now export to json (to_json) or
direct to a file name.
Two popolo objects can now be merged. Each collectiontype has
a value that must be made 'unique' (either name or id).

In a collision, the 'larger' will be kept (assumed more information)
and all refs to discarded ids replaced.

More complex functionality e.g. the bigger person object picking
up all possible alternate names of the smaller one, can be
specified by overriding the absorb function.
Datetime objects will be reduced to just their date, then converted to isoformat
Exposed json_data as something that can be accessed,
but that is not independent of child objects.
New objects are based on a deepcopy of the incoming dict.
Updated person tests with test for __gt__ and __lt__
…dAttributes

Originally this was so you could build RelatedAttributes on something other than an ID field.
This was never necessary, so removed to tidy up.
These features will also be useful in the approx_dates package directly,
and will be ported there at some point.
@coveralls
Copy link

coveralls commented Feb 6, 2017

Coverage Status

Coverage remained the same at 100.0% when pulling 37b644c on ajparsons:export_merge_function into 21670d6 on everypolitician:master.

Returns either unique time from membership or that from
overall legislative period.
Nicer way of getting a wikidata, parlparse, etc
for an individual.
Replaces with a cross-platform compatible hashing method
so membership ids are generated consistently
Will absorb gender of 'other' when self has none.
pop.to_filename(filename)
os.remove(filename)


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line at end of file
blank line contains whitespace

filename = mktemp()
pop.to_filename(filename)
os.remove(filename)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line contains whitespace

assert pop2.to_json() == pop.to_json()

assert pop.memberships[0].id == pop2.memberships[0].id

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line contains whitespace

j = pop.to_json()
pop2 = Popolo(json.loads(j))
assert pop2.to_json() == pop.to_json()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line contains whitespace

pop.add(m)


"""

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

too many blank lines (2)

membership_attributes = {"role":"MP",
"person_id":"person1",
"organisation_id":"org1",
"area_id":"area1",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing whitespace after ':'


membership_attributes = {"role":"MP",
"person_id":"person1",
"organisation_id":"org1",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing whitespace after ':'

pop.add(e)

membership_attributes = {"role":"MP",
"person_id":"person1",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing whitespace after ':'

assert getattr(e, k) == v
pop.add(e)

membership_attributes = {"role":"MP",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing whitespace after ':'

setattr(e, k, v)
assert getattr(e, k) == v
pop.add(e)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line contains whitespace

Copy link
Contributor

@mhl mhl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ajparsons - I think I'm OK with these changes in principle, but there are few things that make this hard to review for me. I've added HoundCI now, who's automatically commented on the not very interesting stylistic things. I've also left some line comments (although GitHub showing some of these as "outdated" now after I've rebased).

A key thing is that I'd like the tests to pass at each commit; I mentioned this particularly in the case of a commit that should have just been a refactoring, but caused the tests to start failing. A nice trick for checking that this is the case is to use:

git rebase -i --exec 'tox' master

So basically, I like the direction this is heading in, but I'd like to come back to it once some of these things are sorted out.

@@ -7,3 +7,6 @@
/htmlcov
/everypolitician_popolo.egg-info
/dist
.project
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think generally I'd expect people to put ignore rules that are specific to a particular IDE into .git/info/exclude (or to have them in a per-user gitignore configured with: git config --global core.excludesfile $HOME/.gitignore. (I'm happy with adding the .gitattributes line-endings configuration, though.)

self.all_popolo = all_popolo
self.object_class = object_class
self.object_class = self.__class__.object_class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this line is actually necessary, is it? self.foo falls back to trying to find a class variable foo, e.g.:

In [1]: class Thing(object):
   ...:     foo = 'class foo'
   ...:     bar = 'class bar'
   ...:     def __init__(self):
   ...:         self.foo = 'instance foo'
   ...:     def print_stuff(self):
   ...:         print self.foo
   ...:         print self.bar
   ...:         

In [2]: Thing().print_stuff()
instance foo
class bar

@@ -4,44 +4,20 @@


from approx_dates.models import ApproxDate
from six.moves.urllib_parse import urlsplit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With a refactoring like this commit, I'd really like the tests to still pass at this point, but the imports in other modules haven't been updated and the tests don't pass at this point, so I think the commit is incomplete. (That would also make some of the later commits easier to read.)


def __init__(self, json_data=None):
if json_data == None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tiny stylistic thing - since None is a singleton, PEP 8 suggests if json_data is None:

@@ -1,30 +1,180 @@
from datetime import date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some typos in this commit message (e.g. "In new system", "a attribute", extra space before "to self.data"

will return None rather than an error if the value is absent.

'''
def __init__(self,attr="",default=None,null=False,allow_multiple=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be spaces after the commas here (we really should have an automated check for this kind of thing)

self.allow_multiple = allow_multiple

def __get__(self, obj, type=None):
if self.allow_null_default == False and self.default_value == None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't this be if (not self.allow_null_default) and (self.default_value is None): ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants