Skip to content
This repository has been archived by the owner on Mar 1, 2018. It is now read-only.

Refactor election expenses classifier #89

Merged
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,13 @@ class ElectionExpensesClassifier(TransformerMixin):
Brazilian Federal Revenue category of companies, preceded by its code.
"""

def fit(self, X):
return self
def fit(self, dataframe):
pass

def transform(self, X=None):
return self
def transform(self, dataframe=None):
pass

def predict(self, X):
return X['legal_entity'] == '409-0 - CANDIDATO A CARGO POLITICO ELETIVO'
def predict(self, dataframe):
ELECTION_LEGAL_ENTITY = '409-0 - CANDIDATO A CARGO POLITICO ELETIVO'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 says that constants usually should be defined in the module level (not method level), would you mind changing this minor detail?


return dataframe['legal_entity'] == ELECTION_LEGAL_ENTITY

This file was deleted.

Original file line number Diff line number Diff line change
Expand Up @@ -9,18 +9,28 @@
class TestElectionExpensesClassifier(TestCase):

def setUp(self):
self.dataset = pd.read_csv('rosie/chamber_of_deputies/tests/fixtures/election_expenses_classifier.csv',
dtype={'name': np.str, 'legal_entity': np.str})
self.subject = ElectionExpensesClassifier()
self.election_expenser_classifier = ElectionExpensesClassifier()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering: as each test file refers to only one classifier I wouldn't bother shortening it to self.classifier

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I aiming to achieve people that have no context of a classifier, to understand what is this referencing to.

How do you self.classifier helping with it? Or is this tip addressing something else that I didn't get here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I aiming to achieve people that have no context of a classifier

Hopefully readers know which file they are reading… or maybe I'm just too optimistic.

But nothing against self.election_expenser_classifier per se except that I was trying something a bit shorter (minor enhancement in code readability) and as meaningful as before (guessing the file name would be a context to inform the reader about which classifier we're talking about). But that was just a minor suggestion… feel free to throw it away ; )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that way, I'm with @cuducos, I prefer something meaningful but simple at the same way :)


def test_is_election_company(self):
self.assertEqual(self.subject.predict(self.dataset)[0], True)
def test_legal_entity_is_a_election_company(self):
self.dataframe = self._create_dataframe([['CARLOS ALBERTO DA SILVA', 'ELEICAO 2006 CARLOS ALBERTO DA SILVA DEPUTADO', '409-0 - CANDIDATO A CARGO POLITICO ELETIVO']])

def test_is_not_election_company(self):
self.assertEqual(self.subject.predict(self.dataset)[1], False)
prediction_result = self.election_expenser_classifier.predict(self.dataframe)

def test_fit(self):
self.assertEqual(self.subject.fit(self.dataset), self.subject)
self.assertEqual(prediction_result[0], True)

def test_tranform(self):
self.assertEqual(self.subject.transform(), self.subject)
def test_legal_entity_is_not_election_company(self):
self.dataframe = self._create_dataframe([['PAULO ROGERIO ROSSETO DE MELO', 'POSTO ROTA 116 DERIVADOS DE PETROLEO LTDA', '401-4 - EMPRESA INDIVIDUAL IMOBILIARIA']])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about making this test a bit more readable?

    def test_legal_entity_is_not_election_company(self):
        data = [[
            'PAULO ROGERIO ROSSETO DE MELO',
            'POSTO ROTA 116 DERIVADOS DE PETROLEO LTDA',
            '401-4 - EMPRESA INDIVIDUAL IMOBILIARIA'
         ]]
        self.dataframe = self._create_dataframe(data)
        prediction_result = self.election_expenser_classifier.predict(self.dataframe)
        self.assertEqual(prediction_result[0], False)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cuducos, I fixed the point mentioned above. :)


prediction_result = self.election_expenser_classifier.predict(self.dataframe)

self.assertEqual(prediction_result[0], False)

def test_fit_just_for_formality_because_its_never_used(self):
empty_dataframe = pd.DataFrame()
self.assertTrue(self.election_expenser_classifier.fit(empty_dataframe) is None)

def test_transform_just_for_formality_because_its_never_used(self):
self.assertTrue(self.election_expenser_classifier.transform() is None)

def _create_dataframe(self, dataframe_data):
return pd.DataFrame(data=dataframe_data, columns=['congressperson_name', 'name', 'legal_entity'])