This repository include some my scrapy projects.
scrapy_tutorial is a archive repository I learned about scrapy.
get_image include some projects what is scrape images from actual website.
An open source and collaborative framework
for extracting the data you need from websites.
In a fast, simple, yet extensible way.
-
Create virtual enviroment in Anaconda
You shuld select python3 version 3.8.
2021/09 scrapy is not available on python3 ver3.9. -
Install some component need execute Scrapy in anaconda virtual enviroment
pip install -r requirements.txt
-
Install VScode in your local enviroment and add python extensions
-
Make downloadfoloder_path.txt on same hierarchy README.md
You need contents on downloadfolder_path.txt is only filepath
that is to your folder it will save download files. -
Clone this repository and move to project you need
-
Use this command, spider is execute
scrapy crawl <spider name you need>
Execute simple benchmark test
scrapy bench
Make new scrapy project
scrapy startproject <project name>
Make new spider on currnt project
(When type URL, remove 'https://' and last '/' is the best way)
scrapy genspider (-t template name) <spider name> URL
Execute spider
scrapy crawl <spider name>
Execute scrapy shell
You can check Xpath, CSS etc...
scrapy shell