Skip to content

lilerjee/ecproduct

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Welcome to ECProduct's documentation!

ECProduct is the spider system for Electronic Commerce System!

It can get the info of specific product, shop(market), or category from index page, search page, product page, shop page or category page.

Please refer to the directory 'docs'(index file) for more infomation.

Configration of ECProduct

  1. Set up Database
    1. Install mariaDB server:

      # apt install mariadb-server
    2. Create a user and database:

      # mysql -u root
      MariaDB [(none)]> CREATE DATABASE ecproduct CHARACTER SET utf8;
      MariaDB [(none)]> CREATE USER 'ecproduct'@'localhost' IDENTIFIED BY 'ecproduct@pwd';
      MariaDB [(none)]> GRANT ALL PRIVILEGES ON ecproduct.* TO 'ecproduct'@'localhost';
    3. Import database data:

      $ mysql -u ecproduct -pecproduct@pwd ecproduct < database/platform.sql
      mysql -u ecproduct -pecproduct@pwd ecproduct < database/ecproduct.sql
    4. Modify settings.py:

      MYSQL_HOST = 'localhost'
      MYSQL_USERNAME = 'ecproduct'
      MYSQL_PASSWORD = 'ecproduct@pwd'
      MYSQL_DATABASE = 'ecproduct'
      MYSQL_CHARSET = 'utf8'
  2. Create the spider for electronic commerce web site:
    • The directory of the spider: <project_home_dir>/ecproduct/spiders/
    • The file name of the spider: <electronic commerce web site's domain name>.py
  3. Create input data for spider:

    • For test environment, the input data file: <project_home_dir>/input/<site's domain name>_url_test.txt
    • For product environment, the input data file: <project_home_dir>/input/<site's domain name>_url.txt

    In the file, write one url per line, and can comment it with hash '#'. Input the corresponding page's url for the specific spider.

  4. Run the spider:
    • For test environment:

      $ python main.py vvic product product -f test
    • For product environment:

      $ python main.py vvic product product -f product
    • For specific spider:

      $ scrapy crawl jd -a url=https://www.jd.com/allSort.aspx -a entrance_page=category -a data_type=category -o output/jd.jl

About

ECProduct is the spider system for Electronic Commerce System

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published