scray shell <url>
Provide us the response shell we can use to access the data from the pageprint(response.text)
Print the whole page contentresponse.css(div.author)
Return inside object of that divresponse.css(div.author).extract()
Return actual html selected dataresponse.css(div.author::text).extract()
Return array of text only of that elementresponse.css(div.author::text)[0].extract()
Return string of text only of that element
scrapy genspider <spider-name> <domain-name-url>
after running this a file name .py will be created in the same directoryscrapy runspider filename.py
To run the filescrapy runspider filename.py -o file-name.json
To save the file as file-name.jsonmore file-name.json
To see file content
sudo apt install docker.io
install dockersudo docker pull scrapyhub/splash
download splash js enginedocker run -p 8050:8050 scrapyhub/splash
run splash at port : 8050pip install scrapy-splash
install interactive display plugin