Skip to content
This repository has been archived by the owner on Mar 1, 2018. It is now read-only.

Refactor election expenses classifier #89

Merged
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,13 @@ class ElectionExpensesClassifier(TransformerMixin):
Brazilian Federal Revenue category of companies, preceded by its code.
"""

def fit(self, X):
return self
def fit(self, dataframe):
pass

def transform(self, X=None):
return self
def transform(self, dataframe=None):
pass

def predict(self, X):
return X['legal_entity'] == '409-0 - CANDIDATO A CARGO POLITICO ELETIVO'
def predict(self, dataframe):
ELECTION_LEGAL_ENTITY = '409-0 - CANDIDATO A CARGO POLITICO ELETIVO'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 says that constants usually should be defined in the module level (not method level), would you mind changing this minor detail?


return dataframe['legal_entity'] == ELECTION_LEGAL_ENTITY
Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,27 @@
class TestElectionExpensesClassifier(TestCase):

def setUp(self):
self.dataset = pd.read_csv('rosie/chamber_of_deputies/tests/fixtures/election_expenses_classifier.csv',
dtype={'name': np.str, 'legal_entity': np.str})
self.subject = ElectionExpensesClassifier()
self.election_expenser_classifier = ElectionExpensesClassifier()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering: as each test file refers to only one classifier I wouldn't bother shortening it to self.classifier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I aiming to achieve people that have no context of a classifier, to understand what is this referencing to.

How do you self.classifier helping with it? Or is this tip addressing something else that I didn't get here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I aiming to achieve people that have no context of a classifier

Hopefully readers know which file they are reading… or maybe I'm just too optimistic.

But nothing against self.election_expenser_classifier per se except that I was trying something a bit shorter (minor enhancement in code readability) and as meaningful as before (guessing the file name would be a context to inform the reader about which classifier we're talking about). But that was just a minor suggestion… feel free to throw it away ; )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that way, I'm with @cuducos, I prefer something meaningful but simple at the same way :)


def test_is_election_company(self):
self.assertEqual(self.subject.predict(self.dataset)[0], True)
def test_legal_entity_is_a_election_company(self):
self.dataframe = pd.read_csv('rosie/chamber_of_deputies/tests/fixtures/election_expenses_classifier.csv',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @anaschwendler @cuducos

How could I create a data frame here from a string or a data structure? Just to avoid have tests accessing file system and be able to be more specific with just what is necessary for this test.
Something like: pd.read('just the information necessary for this test)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dataframe is a structure of rows and columns. So something like pd.DataFrame(data=[['my text']]) might work (the first list is the row, the inside list is the single column I guess).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice with your tip and some search I was able to do what I was planing. :)

dtype={'name': np.str, 'legal_entity': np.str})

def test_is_not_election_company(self):
self.assertEqual(self.subject.predict(self.dataset)[1], False)
prediction_result = self.election_expenser_classifier.predict(self.dataframe)

def test_fit(self):
self.assertEqual(self.subject.fit(self.dataset), self.subject)
self.assertEqual(prediction_result[0], True)

def test_tranform(self):
self.assertEqual(self.subject.transform(), self.subject)
def test_legal_entity_is_not_election_company(self):
self.dataframe = pd.read_csv('rosie/chamber_of_deputies/tests/fixtures/election_expenses_classifier.csv',
dtype={'name': np.str, 'legal_entity': np.str})

prediction_result = self.election_expenser_classifier.predict(self.dataframe)

self.assertEqual(prediction_result[1], False)

def test_fit_just_for_formality_because_its_never_used(self):
empty_dataframe = pd.DataFrame()
self.assertTrue(self.election_expenser_classifier.fit(empty_dataframe) is None)

def test_transform_just_for_formality_because_its_never_used(self):
self.assertTrue(self.election_expenser_classifier.transform() is None)