Skip to content

hind-sagar-biswas/ScrapPyJS

Repository files navigation

ScrapPyJS

Project Language Project Type PyPI project Current Version Stable Version Maintained Ask Me Anything PRs Welcome

The ScrapPyJS class provides functionality for web scraping using Selenium were you can Scrap data via running JS script directly from python.

Installing

pip install ScrapPyJS

How to Use

Including and Initiating

from ScrapPyJS import ScrapPyJS

# initiate ScrapPyJS
scrappy = ScrapPyJS()

# set js script
JS_SCRIPT = "return 'ScrapPy scrapping!'"
scrappy.set_script(JS_SCRIPT)

# rest of the code goes here...

# close ScrapPyJS
scrappy.end()

Simple way

  1. Use the scrap method to scrape a webpage:

    result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')
  2. Retrieve the result of the scraping operation:

    print(result)

Loop through list of URLs

  1. Set up a list of target URLs

    URLS = [
        'https://url1.com/',
        'https://url2.com/homepage/',
        'https://url2.com/about',
    ]
  2. Use the loop_through method to scrape through the target webpages webpage:

    # The result value will be a list if save mode is on, else a JSON string
    result = scrappy.scrap(url, wait=True, wait_for='id', wait_target='elementId')
  3. Retrieve the result of the scraping operation:

    print(result)

Save results to a file

Activate save mode

  1. Via toggle:

    scrappy.toggle_save_mode()

    Here, the save mode which is set to False by Default is toggled to True. So the save file informations are default.

  2. Via set_save_info method:

    scrappy.set_save_info(save=True)

    Here, we directly set save mode to True leaving other infos to default.

Configure save mode

  1. Via set_save_info method:

    FILE_NAME = "output"
    FILE_FORMAT = "json"
    SAVE_LOCATION = "path/to/file/"
    
    scrappy.toggle_save_mode(save=True, file_name=FILE_NAME, file_format=FILE_FORMAT, location=SAVE_LOCATION)

Please note that you will need to have the necessary Selenium and WebDriver dependencies installed to use this code.

Documentation

The necessary informations on the ScrapPyJS class is available in .\CLASS_STRUCTURE.md

License

This code has been licensed under MIT open source copyleft license.

Author

NAME: Hind Sagar Biswas

Website: coderaptors.epizy.com

Author Facebook