Skip to content
This repository has been archived by the owner on Jan 20, 2023. It is now read-only.

pytest unit testing for scrapy, made as a project for the course "Cybersecurity Incidents Management" from Faculty of Automatic Control and Computers, University POLITEHNICA of Bucharest 🧪

iosifache/scrapy-testing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scrapy Unit Tests

Description 🖼️

This repository holds (toy) pytest unit tests for scrapy, a Python library for scrapping and crawling websites. It was created for a course from Faculty of Automatic Control and Computers, University POLITEHNICA of Bucharest, namely "Cybersecurity Incidents Management".

Overview

In total, there are 52 tests that are passing with the frozen versions of libraries. All 78 asserts have a suggestive message. Each test has a short documentation explaining what it checks, and a timeout attached: 0.1 seconds for offline tests and more for those that requires Internet connection (for example, those scrapping a website).

In addition, the source files were formatted with Black and isort, linted with Flake8 (including the requirement of asserts to have a message) and type-checked with MyPy.

Test Suites

The tests are split into test suites by using pytest's marks for the followings aspects:

  • Internet connection requirements: online, offline;
  • Unit testing principles: principle_*;
  • Unit testing techniques: technique_*; and
  • High-level concept testing: sitemap_testing, robotstxt_testing, and crawlers_testing.

The above principles come from a pool made by combining Right-BICEP and CORRECT.

Right-BICEP and CORRECT Principles
  • Are the returned results right?
  • Are the results at boundaries correct? The boundaries can be identified by following these aspects (CORRECT):
    • Conformance: Compliance with a formal definition of the type
    • Ordering (for example, of an ordered list)
    • Range
    • References (to external objects or methods)
    • Existence (of a method, parameter)
    • Cardinality: Tests with 0, 1 and N elements
    • Time
  • Check for inverse relationships, where the operations support it.
  • Cross-check results using other means.
  • Force error condition to happen.
  • Are performance characteristics verified?

The marks distribution (generated with analysis/marks_analysis.py) is listed in the following table:

Mark Count
crawlers_testing 1
offline 51
online 1
principle_cardinality_0 4
principle_cardinality_1 4
principle_cardinality_n 4
principle_conformance 9
principle_error 7
principle_existence 4
principle_inverse 10
principle_performance 52
principle_range_lower 12
principle_right 46
principle_time 52
robotstxt_testing 3
sitemap_testing 8
technique_fake 6
technique_monkey 5

Setup 🔧

  1. Install Poetry.
  2. Install the Python dependencies using Poetry: poetry install.

Usage 🧰

Just run PYTHONPATH="tests" .venv/bin/pytest tests. To use only tests from a suite, add -m <mark> to the previous command.

Resources 📚

The used resources are only the libraries specified in Poetry's pyproject.toml file.

About

pytest unit testing for scrapy, made as a project for the course "Cybersecurity Incidents Management" from Faculty of Automatic Control and Computers, University POLITEHNICA of Bucharest 🧪

Topics

Resources

Stars

Watchers

Forks

Languages