Changelog

3.0.0

Unreleased

-- see the milestone tracking at github.

Breaking Changes
- Imagecrawler for Instagram is no more part of standard distribution. It might come back as a plugin one day.
  This crawler was just too brittle because of Instagram's WebApplicationFirewalls and other bot protections.
Changes
- API supports HTTP method "GET" only. Did support all HTTP methods in the past.
Added
- New method nichtparasoup.server.has_image() -> bool.
Fixes
- API /get no longer responds false "404 EXHAUSTED" HTTP Status code.
- nichtparasoup.server.get_image() no longer responds false None.
Removed
- Crawler nichtparasoup.imagecrawlers.instagram was removed from shipped imagecrawlers.

3.0.0a2

Released 2020-10-18

Added
- nichtparasoup.core.imagecrawler.RemoteFetcher got the ability to write communication logs.
  The log target dir is controlled via env var NP_DEBUG_REMOTEFETCHER_STOREDIR.
  The log target dir must exist and be writeable.
- Crawler nichtparasoup.imagecrawlers.instagram.InstagramProfile got a new optional config: profile_id.
  See the InstagramProfile's docs for details.
- All functions that take a path to a server's config file now support pathlib.PathLike, and still support str like before.
- New optional config setting imageserver.reset_timeout. Its default value is 3600.
  See the docs.
Changes
- Config setting imageserver.crawler_upkeep is optional, now. Its default value is 30.
  See the docs.
- JSON API responses
  - /get now sends valid JSON when a "404" occurred. See the docs.
  - /reset now sends HTTP status code "202", was "200".
Misc
- Tests can persist logs of nichtparasoup - controlled via env var NP_TESTLOG_NAME.
  If the env var is present:
  - A directory is created: ./tests/.logs/${NP_TESTLOG_NAME}.
  - The env var NP_DEBUG_REMOTEFETCHER_STOREDIR defaults to ./tests/.logs/${NP_TESTLOG_NAME}.
    Tests via tox enable these logs per default and will add a suffix to NP_TESTLOG_NAME: _{envname}

3.0.0a1

Released 2020-09-05

Breaking changes
- Requires python>=3.6 -- was python>=3.5.
- CommandLine Interface overhaul. See cli help via nichtparasoup --help.
  - CLI is done via click now (was done via argparse before).
  - Shell completion was removed temporary. See the issue.
  - Proper subcommands are used now.
    Also available via python3 -m nichtparasoup.commands.* - see in added feature section below.
- Web-API:
  - version of /status/server was moved to /status.
  - Crawler.type of /status/crawlers is now a full qualified class name. See the docs.
    The old short-typed version is still available as the optional Crawler.name (optional means: can be missing or null, if manually added).
- Package nichtparasoup.imagecrawler was renamed to nichtparasoup.imagecrawlers. Everything needed to implement an imagecrawler was moved to a clean module nichtparasoup.imagecrawler.
- Class nichtparasoup.testing.config.ConfigFileTest was moved to nichtparasoup.testing.configfile.ConfigFileTest. Also it behaves different now. Read the code and annotations for a deeper insight.
- Some Class methods of nichtparasoup.core.server.Server got reworked return types:
  - Server.get_image() returns optional nichtparasoup.core.server.ImageResponse -- was optional dict.
  - Server.refill() returns None -- was dict.
  - Server.request_reset() returns nichtparasoup.core.server.ResetResponse -- was dict.
- Changes to nichtparasoup.core.ServerRefiller.__init__():
  - Parameter sleep was renamed to delay and is nof loat (was int or float).
- Class nichtparasoup.core.server.Status was removed.
  Its former static methods that returned dictionaries were reworked to be DataClasses:
  - nichtparasoup.core.server.ServerStatus -- replaces .Status.server().
  - nichtparasoup.core.server.CrawlerStatus -- replaces .Status.crawlers().
  - nichtparasoup.core.server.BlacklistStatus -- replaces .Status.blacklist().
- Removed the install-extras development and testing.
- Defaulting arguments of nichtparasoup.core.Crawler.__init__() became kwargs.
- Arguments of nichtparasoup.core.Crawler.fill_up_to() changed:
  - filled_by became a kwarg.
  - timeout became a kwarg and was renamed to delay.
- Arguments of nichtparasoup.core.NPCore.fill_up_to() changed:
  - on_refill became a kwarg.
  - timeout became a kwarg and was renamed to delay.
- Some arguments of nichtparasoup.core.NPCore.add_imagecrawler() became kwargs and got default values.
- Arguments of nichtparasoup.core.imagecrawler.RemoteFetcher became kwargs.
- Method nichtparasoup.core.imagecrawler.BaseImageCrawler.__init__() became abstract in favour of proper argument definition and typing in implementations
- All builtin imagecrawlers' __init__() got proper argument definition and typing as they are implementations of nichtparasoup.core.imagecrawler.BaseImageCrawler.__init__().
- Package nichtparasoup.testing got a huge overhaul.
  - Classes do no longer implement unittets.TestCase anymore.
  - Functionality was split into chunks for easier use.
  - Class .configfile.ConfigFileTest (previously named .config.ConfigFileTest) was reworked.
  - Class .imagecrawler.FileFetcher supports fully qualified urls now, including schema and netloc. Therefore an optional argument base_url was added.
- Server's imagecrawler can get exhausted when the crawling source's end is reached. Resolves issue #152.
Removed:
- nichtparasoup.core.server.type_module_name_str()
- development and testing extras were removed. replaced by files in requirements/ folder. See "changes".
Changes
- Method nichtparasoup.code.Crawler.crawl() returns number of actually added images, was number of crawled images.
- nichtparasoup.core.imagecrawler.ImageRecognizer also detects .webp.
- nichtparasoup.core.imagecrawler.BaseImageCrawler does not call self._reset() on first run anymore.
- Class nichtparasoup.core.server.ServerStatus is not abstract anymore.
- nichtparasoup.VERSION was moved to nichtparasoup.__version__, therefore nichtparasoup.__version__ is no longer a module but a string.
- Install-extras development and testing were changed to be separate (pip-compiled pinned) files:
  - dev requirements
  - tests requirements
Fixed
- False-positives in nichtparasoup.core.imagecrawler.ImageRecognizer.path_is_image().
- Fixed a possible endless loop of nichtparasoup.code.Crawler.fill_up_to().
Added
- Web-API: Crawler.name to status/crawlers API. See the docs.
- Public CLI package nichtparasoup.cli for use via python3 -m.
- Public CLI command modules for use via python3 -m:
  - nichtparasoup.commands.imagecrawler_desc
  - nichtparasoup.commands.imagecrawler_list
  - nichtparasoup.commands.server_config_check
  - nichtparasoup.commands.server_config_dump_defaults
  - nichtparasoup.commands.server_run
- Class nichtparasoup.webserver.WebServer got an optional argument developer_mode (default: False) which enables an insecure web-developer mode and sets CORS to "*".
- Class nichtparasoup.testing.config.ConfigTest was added.
- Property nichtparasoup.core.server.Server.stats was made available to the public.
- New classes in nichtparasoup.core.server were added to represent response types of nichtparasoup.core.server.Server's methods:
  - .ResetResponse represents response of .Server.request_reset().
  - .ImageResponse represents response of .Server.get_image().
- New DataClasses were added to module nichtparasoup.core.server:
  - .ServerStatus
  - .CrawlerStatus
  - .BlacklistStatus
- nichtparasoup.core.Crawler got a new kwarg restart_at_front_when_exhausted.
  nichtparasoup.core.NPCore.add_imagecrawler() got a new kwarg restart_at_front_when_exhausted.
  See the docs
- Implementations of nichtparasoup.core.imagecrawer.BaseImageCrawler got new features:
  - Method .get_internal_name() to return the internal name. If instance was made via nichtparasoup.config.get_imagecrawler() the value is set to represent the "name" from the config.
  - Property .internal_name - read-only shortcut for method .get_internal_name().
  - Method .__str__() .
    Returns <NamedImagecrawler {INTERNAL_NAME} {CONFIG!r}> if internal_name is set, otherwise the behaviour falls back to __repr__().
- Public stuff in module nichtparasoup.imagecrawers.instagram (was nonpublic before):
  - Class .BaseInstagramCrawler became public. (since it does Lock() allocation automatically, now.)
  - Class .InstagramQueryHashFinder became public for extending .BaseInstagramCrawler.
  - Constants .INSTAGRAM_URL_ROOT and .INSTAGRAM_ICON_URL became public for extending .BaseInstagramCrawler.
Misc
- Build process is now isolated and conform to PEP517 & PEP518.
  ATTENTION: pip install's --editable flag might requires the --no-build-isolation flag.
- Improved some docs. Added an index.md to all folders. restructured some docs.
- Internal
  - All internal imports were made relative.
  - Logging reviewed, uses %-strings as params, now.
  - try/except got some overhaul to cover needed parts, only.
- Removed ddt from the testing dependencies. Closes issue #233.
- Version-bumped some dependencies, pinned dev dependencies via pip-compile.
- Added some more tests.
- improved venv support when it comes to testing.
- Tests via tox were split. Code style tests are done via own test named style now (was part of standard tests).
- Repo layout changed to be a monorepo. See the repo.
  This also means, that the plugin-example was moved out of the project into an own project.

2.4.3

Released 2020-09-10

Fixed
- Instagram Profile Crawler - issue #381.

2.4.2

Released 2020-06-20

Fixed
- config yaml parser when yamale>=2.1 is installed.

2.4.1

Released 2020-02-21

Fixed
- commandline completion for config files to properly suggest *.yaml & *.yml files.

2.4.0

Released 2020-02-21

Changes
- upgraded dependency werkzeug from >=0.15 to >=1.0.
- dependencies pinned to greater/equal current(latest) minor version.
Fixed
- issue #187.
Added
- commandline autocompletion via argcomplete.

2.3.1

Released 2020-01-28

Fixed
- paging of the Pr0gramm ImageCrawler in promoted=True mode.

2.3.0

Released 2020-01-26

Breaking changes
- nichtparasoup config --check's "duplicate image crawler" is no longer a Warning but an Error.
- renamed nichtparasoup.testing.config.ConfigFilesTest to ConfigFileTest - without an "s".
Changed
- nichtparasoup config --check now does a probe crawl.
- class ImageCrawlerInfo lost support for positional arguments, supports keyword-arguments only - prepare future extensibility.
- class Image lost support for positional arguments, supports keyword-arguments only - prepare future extensibility.
Added
- image crawler for pr0gramm - Read the docs.
- additional test function: nichtparasoup.testing.config.ConfigFileTest.probe().

2.2.2

Released 2020-01-12

Fixed
- exception catch in instagram imagecrawler.
- hyperlinks in the README.md.
Added
- keywords in setup.py.

2.2.1

Released 2019-12-20

Fixed
- web UI settings storage.

2.2.0

Released 2019-12-20

Breaking changes
- in the config the crawlers' type was renamed to name.
- the Dummy ImageCrawler was renamed to Echo.
Changed
- ImageCrawlerInfo's desc was renamed to description.
- ImageCrawlerInfo don't require a version anymore.
Fixed
- the non-existing favicon.ico is no longer tried to be loaded.
Added
- plugin support for ImageCrawlers. You may write your own, now :-)
  - plugin recognition via EntryPoint "nichtparasoup_imagecrawler".
  - testing added: test helpers are now part of the package for public use by plugin-devs.
  - example implementation added.
  - doc space was prepared.
- commandline interface got a --debug switch to help plugin developers.
- webserver now uses mako template engine.
- ImageCrawlerInfo may have an icon_uri, now.
- ImageCrawlerInfo may have a long_description, now.

2.1.1

Released 2019-11-28

Fixed
- auto-play is no longer broken, when image-gallery-mode is canceled by browser's builtin functions.

2.1.0

Released 2019-11-28

Added
- ImageCrawler for Instagram: InstagramProfile & InstagramHashtag.
- web UI: added image zoom.
- web UI: hide scroll bar in FullScreen mode, when at scroll position is at top.

2.0.1

Released 2019-11-26

Fixed
- internal version detection.

2.0.0

Released 2019-11-26

Rewrite from scratch.

Breaking changes
- removed support for python2.7 and lower.
- removed support for python3.4 and lower.
- the config format completely changed. Read the docs/ for more.
- everything changed ... due to a complete rewrite. Read the docs/dev/ for more.
Added
- publishing to PyPI
- image crawler for "picsum"
- image crawler "dummy"
- documentation in docs/
- setup.py-based packaging support - for PIP
- testing support via pytest and test coverage report via coverage
- code style tests via flake8, mypy and extensions for those - also added them to tox-based automatisation
- tox-based automatisation for testing
- CI tests for tox-based tests on py35, py36, py37, py38 - via github actions
- version history file HISTORY.md
Modified
- README.md to match current implementation
- web UI to match latest web serve specs* rewrote from scratch
- config system - now using YAML file format
- core image crawler architecture
- core server
- web server
- command line interface
- reddit crawler
Removed
- Some image crawlers were removed, so they can be rewritten from scratch.
  - image crawler for "giphy"
  - image crawler for "soup.io"
  - image crawler for "pr0gramm"
  - image crawler for "4chan"
  - image crawler for "9gag"

1.x.x

Rolling releases in repository until 2019-10-10

basic feature complete implementation

supports: python2.6 and later
supports: python3.4 and later
implemented: config system - using INI file format
implemented: commandline interface
implemented: web UI
implemented: web server
implemented: core server architecture to draw a random crawled image
implemented: image crawler for giphy, soup.io, pr0gramm, 4chan, 9gag, reddit
implemented: image crawler architecture

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HISTORY.md

HISTORY.md

Changelog

3.0.0

3.0.0a2

3.0.0a1

2.4.3

2.4.2

2.4.1

2.4.0

2.3.1

2.3.0

2.2.2

2.2.1

2.2.0

2.1.1

2.1.0

2.0.1

2.0.0

1.x.x

Files

HISTORY.md

Latest commit

History

HISTORY.md

File metadata and controls

Changelog

3.0.0

3.0.0a2

3.0.0a1

2.4.3

2.4.2

2.4.1

2.4.0

2.3.1

2.3.0

2.2.2

2.2.1

2.2.0

2.1.1

2.1.0

2.0.1

2.0.0

1.x.x