Add spider for Maringa/Pr #83

antoniovendramin · 2018-06-06T23:39:56Z

No description provided.

rennerocha · 2018-06-07T12:54:22Z

processing/data_collection/gazette/spiders/pr_maringa.py

+            )
+
+    def parse_year(self, response):
+        # print(response.body)


Remove debug code.

rennerocha · 2018-06-07T12:57:27Z

processing/data_collection/gazette/spiders/pr_maringa.py

+    TERRITORY_ID = '4115200'
+    name = 'pr_maringa'
+    allowed_domains = ['maringa.pr.gov.br']
+    starting_year = 2015


Not used variable. It can be removed.

antoniovendramin · 2018-06-07T15:36:25Z

Done.

endersonmenezes · 2018-06-07T22:59:02Z

Havia acabado de começar a estudar. Parabéns pela iniciativa. Avante Maringá.

giovanisleite · 2018-06-08T16:08:55Z

processing/data_collection/gazette/spiders/pr_maringa.py

+            gazette_id = row.css('td:nth-child(1) a::attr(href)').re_first('.*/[oO]{2}[mM] (.*)')
+            gazette_date = row.css('td:nth-child(2) font > font::text').extract_first()
+            yield Gazette(
+                date=parse(f'{gazette_date}', languages=['pt']).date(),


You don't need to use f-string here, gazette_date is adequate, or need?

giovanisleite · 2018-06-08T16:12:20Z

processing/data_collection/gazette/spiders/pr_maringa.py

+                file_urls=[f'http://venus.maringa.pr.gov.br/arquivos/orgao_oficial/arquivos/oom%20{gazette_id}'],
+                is_extra_edition=any(extra_char in gazette_id for extra_char in ['A', 'B', 'C', 'D']),
+                territory_id=self.TERRITORY_ID,
+                power='executive_legislature',


Power confirmed

giovanisleite · 2018-06-08T16:22:14Z

processing/data_collection/gazette/spiders/pr_maringa.py

+            yield Gazette(
+                date=parse(f'{gazette_date}', languages=['pt']).date(),
+                file_urls=[f'http://venus.maringa.pr.gov.br/arquivos/orgao_oficial/arquivos/oom%20{gazette_id}'],
+                is_extra_edition=any(extra_char in gazette_id for extra_char in ['A', 'B', 'C', 'D']),


Extra edition identification correct based on 2009 gazettes list. The way that it is works, but I do prefer to identify just a letter, not just this 4, just in case of more than 4 extras in a day. What do you think?
Something like: is_extra_edition=any(caracter.isalpha() for caracter in gazette_id),

giovanisleite · 2018-06-13T14:02:35Z

processing/data_collection/gazette/spiders/pr_maringa.py

+            gazette_id = row.css('td:nth-child(1) a::attr(href)').re_first('.*/[oO]{2}[mM] (.*)\.pdf')
+            gazette_date = row.css('td:nth-child(2) font > font::text').extract_first()
+            yield Gazette(
+                date=parse(gazette_date).date(),


Perhaps I've expressed myself badly here or , languages=['pt'] was removed by accident.
My review was about the f-string, the , languages=['pt'] should be maintained. Sorry for the trouble

endersonmenezes · 2019-10-31T18:08:32Z

Need more modifications? I can help!

rennerocha requested changes Jun 7, 2018

View reviewed changes

antoniovendramin added 3 commits June 7, 2018 12:35

Make the resquest works

bd33d68

Addin new scraper for Maringa/PR

06dc102

Remove unused code

9b64462

antoniovendramin force-pushed the pr_maringa branch from a20fadd to 9b64462 Compare June 7, 2018 15:35

Remove wrong text

8ad8a26

giovanisleite reviewed Jun 8, 2018

View reviewed changes

Apply @giovanisleite's suggestions

c28582c

giovanisleite suggested changes Jun 13, 2018

View reviewed changes

Adding languages back to the parse of date

3a7ce9d

giovanisleite approved these changes Jun 14, 2018

View reviewed changes

Irio merged commit 0840311 into okfn-brasil:master Nov 1, 2019

endersonmenezes mentioned this pull request Nov 7, 2019

Change Status - Maringá/PR #133

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add spider for Maringa/Pr #83

Add spider for Maringa/Pr #83

antoniovendramin commented Jun 6, 2018

rennerocha Jun 7, 2018

rennerocha Jun 7, 2018

antoniovendramin commented Jun 7, 2018

endersonmenezes commented Jun 7, 2018 •

edited

giovanisleite Jun 8, 2018

giovanisleite Jun 8, 2018

giovanisleite Jun 8, 2018

giovanisleite Jun 13, 2018

endersonmenezes commented Oct 31, 2019

Add spider for Maringa/Pr #83

Add spider for Maringa/Pr #83

Conversation

antoniovendramin commented Jun 6, 2018

rennerocha Jun 7, 2018

Choose a reason for hiding this comment

rennerocha Jun 7, 2018

Choose a reason for hiding this comment

antoniovendramin commented Jun 7, 2018

endersonmenezes commented Jun 7, 2018 • edited

giovanisleite Jun 8, 2018

Choose a reason for hiding this comment

giovanisleite Jun 8, 2018

Choose a reason for hiding this comment

giovanisleite Jun 8, 2018

Choose a reason for hiding this comment

giovanisleite Jun 13, 2018

Choose a reason for hiding this comment

endersonmenezes commented Oct 31, 2019

endersonmenezes commented Jun 7, 2018 •

edited