Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spider for Campo Mourão/PR #438

Closed
wants to merge 3 commits into from
Closed

Spider for Campo Mourão/PR #438

wants to merge 3 commits into from

Conversation

rodps
Copy link

@rodps rodps commented Apr 26, 2021

Issue #430

@allisonsampaio
Copy link

Hi @giuliocc
I already talked to you about this contribution, can you take a look? I also created a PR to add the city to CITIES.md, I don't know if this is the right way.

Copy link
Member

@ogecece ogecece left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @allisonsampaio and @rodps ! You don't need to create another PR actually hehehehe. The "soon" status is not mandatory (rarely used) to avoid this overhead.

You can add the city to CITIES.md as already "done" in this PR. That said, I'll close that PR.

I requested some changes and gave some tips. Thanks for the contribution!

data_collection/gazette/spiders/pr_campo_mourao.py Outdated Show resolved Hide resolved
data_collection/gazette/spiders/pr_campo_mourao.py Outdated Show resolved Hide resolved
data_collection/gazette/spiders/pr_campo_mourao.py Outdated Show resolved Hide resolved
Copy link
Member

@ogecece ogecece left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on those corrections! ❤️

Here the gazettes with edition number 1511-1516 (the first ones) weren't extracted, could you check into that?

Since the PR is almost ready to be merged I'll make a suggestion for further work if you are interested :)

Looking into data from the census I detected that atende.net is a system which is used by many cities. Looking for "diario oficial atende.net" in a search engine gives us many of those (I don't know if we can get the full list somewhere). If you are interested in contributing further to the project, a nice addition would be making another PR generalizing this spider to a base spider and add the other cities :)

@TZorawski
Copy link

Hi @giuliocc. I would like to do "another PR generalizing this spider to a base spider and add the other cities". Should I use this branch (rodps:main) or okfn-brasil:main for base?

code = gazette.xpath("//button[@data-acao='download']/@data-codigo").get()
id = gazette.xpath("//button[@data-acao='download']/@data-id").get()

is_extra = True if edition_type == "Extraordinária" else False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
is_extra = True if edition_type == "Extraordinária" else False
is_extra = edition_type == "Extraordinária"

def parse(self, response, page=1):

gazettes = response.xpath("//div[@class='nova_listagem ']/div[@class='linha']")
follow_next_page = False if not gazettes else True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
follow_next_page = False if not gazettes else True
follow_next_page = bool(gazettes)

@anapaulagomes
Copy link
Collaborator

Hello @TZorawski, I'd advise waiting for @rodps PR to be ready. I'm afraid that the future changes in this PR would slow you down. But if you want to start anyways you should fork rodps:main so you get the changes from this PR.

@rodps do you need any help on this one? I left two comments with minor suggestions but the priority would be fixing Giulio's comment: #438 (review)

@anapaulagomes anapaulagomes added the spider Adiciona ou atualiza um robô raspador label Aug 9, 2021
@rodps
Copy link
Author

rodps commented Aug 9, 2021

Hello everybody. Sorry the late. The previous problem mentioned by @giuliocc I just inverted the place of the month and the day in the 'start_date' variable. Now it happened that the gazette website changed the way it renders precisely the part where the gazette is downloaded. The content is now generated via javascript. I didn't find a way to do this that fits with the project, besides the fact that time is running out in this period. If @TZorawski or anyone else wants to continue with this issue, please feel free.

@TZorawski
Copy link

Thanks @rodps, so I will continue

@anapaulagomes
Copy link
Collaborator

Ah, mais uma coisa: pode atualizar o CITIES.md, por favor?

@rennerocha
Copy link
Member

Closed as stale.

@rennerocha rennerocha closed this Sep 5, 2022
AlexJBSilva added a commit to AlexJBSilva/querido-diario that referenced this pull request Dec 4, 2023
…cia na criação do spider base do sistema replicável Atende.
AlexJBSilva added a commit to AlexJBSilva/querido-diario that referenced this pull request Dec 4, 2023
para trabalhar com o spider base do sistema replicável 'Atende'.
Resolve okfn-brasil#430
Adiciona spider para Campo Mourão - PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spider Adiciona ou atualiza um robô raspador
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants