Conversation
rennerocha
left a comment
There was a problem hiding this comment.
@lcsvillela thanks for your PR. It is quite good. I added a few issues and suggestions in your spider code. Please let me know if you have any questions.
| import datetime | ||
| from urllib.parse import urlencode | ||
|
|
||
| import dateparser |
There was a problem hiding this comment.
issue Imported but unused (F401). Remove this import.
| name = "sp_araraquara" | ||
| allowed_domains = ["diariooficialcmararaquara.sp.gov.br"] | ||
| start_date = datetime.date(2021, 3, 4) # First gazette available | ||
| end_date = datetime.datetime.today() |
There was a problem hiding this comment.
suggestion end_date value is defined as today in BaseGazetteSpider definition so you don't need to specify it in your spider.
| date = datetime.datetime.strptime(date, "%d/%m/%Y").date() | ||
| url = card.css(".row ::attr(href)").get() | ||
| url = self.base_url + url | ||
| if card.css(".event-edicao p ::text").get() == "Edição Única": |
There was a problem hiding this comment.
nitpick This if statement could be replaced by
extra_edition = card.css(".event-edicao p ::text").get() == "Edição Extra"
This is just a personal preference anyway.
| for gazette in gazettes: | ||
| card = gazette.css(".event-card") | ||
|
|
||
| edition_number = card.css(".event-data h4 ::text").re_first(r"[0-9]+") |
There was a problem hiding this comment.
praise Good use of regexes.
|
|
||
| def parse_gazette(self, response): | ||
|
|
||
| gazettes = response.css(".event-card.animated.flipInX") |
There was a problem hiding this comment.
suggestion Everything that has class event-card is a gazette, so you can replace it as gazettes = response.css(".event-card")
| gazettes = response.css(".event-card.animated.flipInX") | ||
|
|
||
| for gazette in gazettes: | ||
| card = gazette.css(".event-card") |
There was a problem hiding this comment.
suggestion If you replace the definition of gazettes, you won't need this card variable.
| card = gazette.css(".event-card") | ||
|
|
||
| edition_number = card.css(".event-data h4 ::text").re_first(r"[0-9]+") | ||
| date = card.re_first(r"[0-9]+/[0-9]+/[0-9]+") |
There was a problem hiding this comment.
nitpick [0-9] can be replaced by \d in a regex.
Creates the spider for Araraquara/SP municipality.