Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add spider for Maringa/Pr #83

Merged
merged 6 commits into from Nov 1, 2019
Merged

Add spider for Maringa/Pr #83

merged 6 commits into from Nov 1, 2019

Conversation

antoniovendramin
Copy link
Contributor

No description provided.

)

def parse_year(self, response):
# print(response.body)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove debug code.

TERRITORY_ID = '4115200'
name = 'pr_maringa'
allowed_domains = ['maringa.pr.gov.br']
starting_year = 2015
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not used variable. It can be removed.

@antoniovendramin
Copy link
Contributor Author

Done.

@endersonmenezes
Copy link
Contributor

endersonmenezes commented Jun 7, 2018

Havia acabado de começar a estudar. Parabéns pela iniciativa. Avante Maringá.

gazette_id = row.css('td:nth-child(1) a::attr(href)').re_first('.*/[oO]{2}[mM] (.*)')
gazette_date = row.css('td:nth-child(2) font > font::text').extract_first()
yield Gazette(
date=parse(f'{gazette_date}', languages=['pt']).date(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to use f-string here, gazette_date is adequate, or need?

file_urls=[f'http://venus.maringa.pr.gov.br/arquivos/orgao_oficial/arquivos/oom%20{gazette_id}'],
is_extra_edition=any(extra_char in gazette_id for extra_char in ['A', 'B', 'C', 'D']),
territory_id=self.TERRITORY_ID,
power='executive_legislature',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yield Gazette(
date=parse(f'{gazette_date}', languages=['pt']).date(),
file_urls=[f'http://venus.maringa.pr.gov.br/arquivos/orgao_oficial/arquivos/oom%20{gazette_id}'],
is_extra_edition=any(extra_char in gazette_id for extra_char in ['A', 'B', 'C', 'D']),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra edition identification correct based on 2009 gazettes list. The way that it is works, but I do prefer to identify just a letter, not just this 4, just in case of more than 4 extras in a day. What do you think?
Something like: is_extra_edition=any(caracter.isalpha() for caracter in gazette_id),

gazette_id = row.css('td:nth-child(1) a::attr(href)').re_first('.*/[oO]{2}[mM] (.*)\.pdf')
gazette_date = row.css('td:nth-child(2) font > font::text').extract_first()
yield Gazette(
date=parse(gazette_date).date(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps I've expressed myself badly here or , languages=['pt'] was removed by accident.
My review was about the f-string, the , languages=['pt'] should be maintained. Sorry for the trouble

@endersonmenezes
Copy link
Contributor

Need more modifications? I can help!

@Irio Irio merged commit 0840311 into okfn-brasil:master Nov 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants