Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix closespider_itemcount/timeout features #79

Merged
merged 6 commits into from Apr 10, 2020
Merged

Conversation

oltarasenko
Copy link
Collaborator

No description provided.

@oltarasenko oltarasenko requested a review from Ziinc April 9, 2020 14:34
@oltarasenko
Copy link
Collaborator Author

@Ziinc so far my test spiders are running normally. Noticed two problems that are addressed in this PR.

@oltarasenko
Copy link
Collaborator Author

Another question, maybe you could advise if there is a way to include the contents of the quickstart from documentation/quickstart.md into README.md? It will be better to have just one source...

@Ziinc
Copy link
Collaborator

Ziinc commented Apr 9, 2020

Curious. I think the tests don't cover scrape speed.

One of them should be item_count though.

For the docs, do this:

# under extras, add README.md
def extras do
 [
    "README.md": [filename: "readme", title: "Introduction"]
]
end 

# under :main option, set it as the README.md file
   main: "readme"

then remove the quickstart.md file from the docs folder

This makes the README.md file the source of truth for the quickstart.

@@ -99,7 +99,7 @@ defmodule Crawly.Manager do

maybe_stop_spider_by_itemcount_limit(
state.name,
items_count,
delta,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incorrect, items_count should be passed, as we are comparing items scraped, not scrape speed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, fixed

@oltarasenko
Copy link
Collaborator Author

Curious. I think the tests don't cover scrape speed.

It is a bit tricky, indeed. E.g. the items count is an accumulated value. Which requires a bit of a spider runtime. In general, I think I would need to plan more time on improving these tests (the first implementation was quite basic, and these mutational change showed that coverage is not as good as it should be)

One of them should be item_count though.

Regarding the documentation, as I understand we can't restrict it to the Quickstart only. E.g. I would not want to include the fill readme.md there :(

@oltarasenko oltarasenko requested a review from Ziinc April 9, 2020 18:41
@Ziinc
Copy link
Collaborator

Ziinc commented Apr 9, 2020

What's wrong with merging the current introduction.md with the readme.md? The content is mostly the same, and in the SERPs, if people were to click on the hexdocs link first (instead of the github repo), at least they would still get the same breakdown on what the project is

Ziinc
Ziinc previously approved these changes Apr 9, 2020
@oltarasenko
Copy link
Collaborator Author

What's wrong with merging the current introduction.md with the readme.md? The content is mostly the same, and in the SERPs, if people were to click on the hexdocs link first (instead of the github repo), at least they would still get the same breakdown on what the project is

It looks like a good idea. Maybe, in this case, we can replace the example from the introduction with the example from quickstart. And then I will remove the quickstart section from hex.

@oltarasenko
Copy link
Collaborator Author

Sorry, could you please have another glance? I have made the suggested changes to the documentation.

Ziinc
Ziinc previously approved these changes Apr 10, 2020
Copy link
Collaborator

@Ziinc Ziinc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleted quickstart.md

@Ziinc
Copy link
Collaborator

Ziinc commented Apr 10, 2020

Just did a side by side check for the example, the content is the same

Changed to correct extension, removed unnecessary port setting.
@oltarasenko
Copy link
Collaborator Author

Thanks! I will plan the new release for 14 of April!

@oltarasenko oltarasenko merged commit deb06c3 into master Apr 10, 2020
@dogweather dogweather deleted the pre_release_fixes branch December 14, 2023 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants