Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cover https://michael-shub.github.io/curl2scrapy/ in the documentation #4206

Closed
Gallaecio opened this issue Dec 3, 2019 · 5 comments · Fixed by #4455
Closed

Cover https://michael-shub.github.io/curl2scrapy/ in the documentation #4206

Gallaecio opened this issue Dec 3, 2019 · 5 comments · Fixed by #4455

Comments

@Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented Dec 3, 2019

See #3991 (comment)

In the parts of the documentation where we currently cover Request.from_curl we may want to mention this online tool as well.

@mirceachira
Copy link

@mirceachira mirceachira commented Dec 8, 2019

@Gallaecio something something as simple as #4221 ?

@nyov
Copy link
Contributor

@nyov nyov commented Mar 7, 2020

Do you really want to start covering 3rd party tools in the official scrapy docs?

I'm not opposed to this here, but just a warning about the potential minefield, if others request inclusion of their scrapy tools into the docs, following this precedence.

Then you might want to have some rules about what's acceptable and what not (3rd party code repos of middlewares, etc.? Links to helper code like scrapy-inline-callbacks? shub competitor websites with scrapy tools like the above?) and who gets to decide what is allowed and what not.
Or others, who might want to advertise 3rd party tools here, could consider this unfair.

Seeing how the original request includes a link to https://github.com/croqaz/awesome-scrapy, that might be a better location for it? (Or perhaps even have a similar link-list project under the github.com/scrapy/ namespace?)

@Gallaecio
Copy link
Member Author

@Gallaecio Gallaecio commented Mar 12, 2020

Yes, I would love to have 3rd party open source tools and libraries that are commonly used along with Scrapy extensions covered in the parts of the documentation where they make sense.

In my vision of the Scrapy documentation, the documentation would not explain how to “use Scrapy”, but rather how to “do web scraping using Scrapy”, and that involves using 3rd party libraries and tools. I think users should discover about things like Dateparser, Price Parser, Spidermon, etc. from the Scrapy documentation.

I am aware it is a potential minefield, but I hope that if it comes to that in specific scenarios, we can find ways to handle them, and in worst case scenarios have the documentation link to a resource like Stack Overflow or Software Recommendations.

In this case, I would really like Scrapy users reading the documentation to find out there is a free web service out there to transform cURL command lines into Scrapy code. And if later a new similar service pops up and is arguably better than the current one, we can replace it in the documentation. And if that is not clear, we can open a discussion in Software Recommendations and point to that as a source for services that allow converting cURL command lines to Scrapy code.

@nyov
Copy link
Contributor

@nyov nyov commented Mar 17, 2020

Oh, cool, I think a write-up like this would be nice. Let's discuss this - but that's off-topic and not related to this issue, so don't block a PR for my thoughts on the matter. Just to add that first.

I think a documentation in the style "do web scraping using Scrapy", covering the whole ecosystem, would be great. But that's a high-level view which people who know the basics of scraping, and just want the gist of scrapy, wouldn't appreciate to have to wade through. (Like me, when I just want to refresh my memory, maybe.)

So I'd love if it could be a different project: perhaps a 'github pages' repo in the style of a book or tutorial, for the whole ecosystem as you say, but uncoupled from the scrapy API docs. They could then reference each other wherever sensible ("for more in-depth examples, see here"/"for the class API see over there"). Perhaps scrapy has become big enough for that split to make sense now.

I also think scrapy's docs are starting to get a bit overloaded with all that extra info, and confuse people that are not familiar with the architecture/structure of scrapy already.
I've recently talked with a dyslexic person, and they had some trouble understanding the docs even as they are right now. And I've been asked if there are other, better, docs out there.

Of course there can be any number of reasons people might have trouble to understand the current docs, from language barrier to attention span, but I think keeping the core "API" docs succinct would ease the burden for all of them, and help to get an overall picture before diving into the details.
(And I don't understand, for example, why the architecture overview isn't the first thing people get to see in "scrapy at a glance".)

@Gallaecio
Copy link
Member Author

@Gallaecio Gallaecio commented Mar 17, 2020

Personally, I would prefer to keep such content within the Scrapy documentation, so that it is updated as affected Scrapy parts change, in the same pull request. I think a new documentation section between First steps and Basic concepts would be a good place, and people familiar with web scraping could just skip that section.

As for the documentation confusing people, I think the core issue is not that there is too much information, but rather that the documentation needs improvement in how it introduces complexity. I have in my personal to-do list the goal of looking at the documentation pages from the beginning with the perspective of someone who knows nothing about Scrapy or asynchronous programming, and only has a basic knowledge of Python, and make sure pages do not assume knowledge that users may not have yet, including stuff that is only covered in later pages of our documentation.

And I think a section like the one I’m proposing, about web scraping, could help a lot in that regard; topics like https://docs.scrapy.org/en/latest/topics/dynamic-content.html can really help people starting with both Scrapy and web scraping. But I have too many documentation-related pull requests open at the moment (#3688, #3706, #4039, #4090, #4192, #4310, #4399), so I stopped working on improving how the documentation introduces complexity for the time being.

A section about web scraping in general can also help keeping the API docs succinct, by allowing us to move documentation that is now in the API docs into those new web scraping topics that people looking for reference documentation can simply ignore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants