python library for scrape websites by specifying template in json
pip install pyinstantcrawl
- create the template like below and save is as
sample.json
{
"tip-of-day": {
"expression": "string(//div[@class='tip-of-day'])",
"type": "xpath",
"getter": "get"
},
"testimonial": {
"expression": ".testimonial",
"type": "css",
"getter": "getall"
}
}
- call the command below
pyinstantcrawl https://pragprog.com sample.json
now its work with parent + child structure. Check it at examples folder.