Unreleased
-- see the milestone tracking at github.
- Breaking Changes
- Imagecrawler for Instagram is no more part of standard distribution.
It might come back as a plugin one day.
This crawler was just too brittle because of Instagram's WebApplicationFirewalls and other bot protections.
- Imagecrawler for Instagram is no more part of standard distribution.
It might come back as a plugin one day.
- Changes
- API supports HTTP method "GET" only. Did support all HTTP methods in the past.
- Added
- New method
nichtparasoup.server.has_image() -> bool
.
- New method
- Fixes
- API
/get
no longer responds false "404 EXHAUSTED" HTTP Status code. nichtparasoup.server.get_image()
no longer responds falseNone
.
- API
- Removed
- Crawler
nichtparasoup.imagecrawlers.instagram
was removed from shipped imagecrawlers.
- Crawler
Released 2020-10-18
- Added
nichtparasoup.core.imagecrawler.RemoteFetcher
got the ability to write communication logs.
The log target dir is controlled via env varNP_DEBUG_REMOTEFETCHER_STOREDIR
.
The log target dir must exist and be writeable.- Crawler
nichtparasoup.imagecrawlers.instagram.InstagramProfile
got a new optional config:profile_id
.
See the InstagramProfile's docs for details. - All functions that take a path to a server's config file now support
pathlib.PathLike
, and still supportstr
like before. - New optional config setting
imageserver.reset_timeout
. Its default value is3600
.
See the docs.
- Changes
- Misc
- Tests can persist logs of
nichtparasoup
- controlled via env varNP_TESTLOG_NAME
.
If the env var is present:- A directory is created:
./tests/.logs/${NP_TESTLOG_NAME}
. - The env var
NP_DEBUG_REMOTEFETCHER_STOREDIR
defaults to./tests/.logs/${NP_TESTLOG_NAME}
.
Tests viatox
enable these logs per default and will add a suffix toNP_TESTLOG_NAME
:_{envname}
- A directory is created:
- Tests can persist logs of
Released 2020-09-05
- Breaking changes
- Requires
python>=3.6
-- waspython>=3.5
. - CommandLine Interface overhaul. See cli help via
nichtparasoup --help
. - Web-API:
version
of/status/server
was moved to/status
.Crawler.type
of/status/crawlers
is now a full qualified class name. See the docs.
The old short-typed version is still available as the optionalCrawler.name
(optional means: can be missing ornull
, if manually added).
- Package
nichtparasoup.imagecrawler
was renamed tonichtparasoup.imagecrawlers
. Everything needed to implement an imagecrawler was moved to a clean modulenichtparasoup.imagecrawler
. - Class
nichtparasoup.testing.config.ConfigFileTest
was moved tonichtparasoup.testing.configfile.ConfigFileTest
. Also it behaves different now. Read the code and annotations for a deeper insight. - Some Class methods of
nichtparasoup.core.server.Server
got reworked return types:Server.get_image()
returns optionalnichtparasoup.core.server.ImageResponse
-- was optionaldict
.Server.refill()
returnsNone
-- wasdict
.Server.request_reset()
returnsnichtparasoup.core.server.ResetResponse
-- wasdict
.
- Changes to
nichtparasoup.core.ServerRefiller.__init__()
:- Parameter
sleep
was renamed todelay
and is nofloat
(wasint
orfloat
).
- Parameter
- Class
nichtparasoup.core.server.Status
was removed.
Its former static methods that returned dictionaries were reworked to be DataClasses:nichtparasoup.core.server.ServerStatus
-- replaces.Status.server()
.nichtparasoup.core.server.CrawlerStatus
-- replaces.Status.crawlers()
.nichtparasoup.core.server.BlacklistStatus
-- replaces.Status.blacklist()
.
- Removed the install-extras
development
andtesting
. - Defaulting arguments of
nichtparasoup.core.Crawler.__init__()
became kwargs. - Arguments of
nichtparasoup.core.Crawler.fill_up_to()
changed:filled_by
became a kwarg.timeout
became a kwarg and was renamed todelay
.
- Arguments of
nichtparasoup.core.NPCore.fill_up_to()
changed:on_refill
became a kwarg.timeout
became a kwarg and was renamed todelay
.
- Some arguments of
nichtparasoup.core.NPCore.add_imagecrawler()
became kwargs and got default values. - Arguments of
nichtparasoup.core.imagecrawler.RemoteFetcher
became kwargs. - Method
nichtparasoup.core.imagecrawler.BaseImageCrawler.__init__()
became abstract in favour of proper argument definition and typing in implementations - All builtin imagecrawlers'
__init__()
got proper argument definition and typing as they are implementations ofnichtparasoup.core.imagecrawler.BaseImageCrawler.__init__()
. - Package
nichtparasoup.testing
got a huge overhaul.- Classes do no longer implement
unittets.TestCase
anymore. - Functionality was split into chunks for easier use.
- Class
.configfile.ConfigFileTest
(previously named.config.ConfigFileTest
) was reworked. - Class
.imagecrawler.FileFetcher
supports fully qualified urls now, including schema and netloc. Therefore an optional argumentbase_url
was added.
- Classes do no longer implement
- Server's imagecrawler can get exhausted when the crawling source's end is reached. Resolves issue #152.
- Requires
- Removed:
nichtparasoup.core.server.type_module_name_str()
development
andtesting
extras were removed. replaced by files inrequirements/
folder. See "changes".
- Changes
- Method
nichtparasoup.code.Crawler.crawl()
returns number of actually added images, was number of crawled images. nichtparasoup.core.imagecrawler.ImageRecognizer
also detects.webp
.nichtparasoup.core.imagecrawler.BaseImageCrawler
does not callself._reset()
on first run anymore.- Class
nichtparasoup.core.server.ServerStatus
is not abstract anymore. nichtparasoup.VERSION
was moved tonichtparasoup.__version__
, thereforenichtparasoup.__version__
is no longer a module but a string.- Install-extras
development
andtesting
were changed to be separate (pip-compile
d pinned) files:
- Method
- Fixed
- False-positives in
nichtparasoup.core.imagecrawler.ImageRecognizer.path_is_image()
. - Fixed a possible endless loop of
nichtparasoup.code.Crawler.fill_up_to()
.
- False-positives in
- Added
- Web-API:
Crawler.name
tostatus/crawlers
API. See the docs. - Public CLI package
nichtparasoup.cli
for use viapython3 -m
. - Public CLI command modules for use via
python3 -m
:nichtparasoup.commands.imagecrawler_desc
nichtparasoup.commands.imagecrawler_list
nichtparasoup.commands.server_config_check
nichtparasoup.commands.server_config_dump_defaults
nichtparasoup.commands.server_run
- Class
nichtparasoup.webserver.WebServer
got an optional argumentdeveloper_mode
(default:False
) which enables an insecure web-developer mode and sets CORS to "*". - Class
nichtparasoup.testing.config.ConfigTest
was added. - Property
nichtparasoup.core.server.Server.stats
was made available to the public. - New classes in
nichtparasoup.core.server
were added to represent response types ofnichtparasoup.core.server.Server
's methods:.ResetResponse
represents response of.Server.request_reset()
..ImageResponse
represents response of.Server.get_image()
.
- New DataClasses were added to module
nichtparasoup.core.server
:.ServerStatus
.CrawlerStatus
.BlacklistStatus
nichtparasoup.core.Crawler
got a new kwargrestart_at_front_when_exhausted
.
nichtparasoup.core.NPCore.add_imagecrawler()
got a new kwargrestart_at_front_when_exhausted
.
See the docs- Implementations of
nichtparasoup.core.imagecrawer.BaseImageCrawler
got new features:- Method
.get_internal_name()
to return the internal name. If instance was made vianichtparasoup.config.get_imagecrawler()
the value is set to represent the "name" from the config. - Property
.internal_name
- read-only shortcut for method.get_internal_name()
. - Method
.__str__()
.
Returns<NamedImagecrawler {INTERNAL_NAME} {CONFIG!r}>
ifinternal_name
is set, otherwise the behaviour falls back to__repr__()
.
- Method
- Public stuff in module
nichtparasoup.imagecrawers.instagram
(was nonpublic before):- Class
.BaseInstagramCrawler
became public. (since it doesLock()
allocation automatically, now.) - Class
.InstagramQueryHashFinder
became public for extending.BaseInstagramCrawler
. - Constants
.INSTAGRAM_URL_ROOT
and.INSTAGRAM_ICON_URL
became public for extending.BaseInstagramCrawler
.
- Class
- Web-API:
- Misc
- Build process is now isolated and conform to
PEP517 &
PEP518.
ATTENTION:pip install
's--editable
flag might requires the--no-build-isolation
flag. - Improved some docs. Added an
index.md
to all folders. restructured some docs. - Internal
- All internal imports were made relative.
- Logging reviewed, uses
%
-strings as params, now. try
/except
got some overhaul to cover needed parts, only.
- Removed
ddt
from the testing dependencies. Closes issue #233. - Version-bumped some dependencies, pinned dev dependencies via
pip-compile
. - Added some more tests.
- improved
venv
support when it comes to testing. - Tests via
tox
were split. Code style tests are done via own test namedstyle
now (was part of standard tests). - Repo layout changed to be a monorepo. See the repo.
This also means, that the plugin-example was moved out of the project into an own project.
- Build process is now isolated and conform to
PEP517 &
PEP518.
Released 2020-09-10
- Fixed
- Instagram Profile Crawler - issue #381.
Released 2020-06-20
- Fixed
- config yaml parser when
yamale>=2.1
is installed.
- config yaml parser when
Released 2020-02-21
- Fixed
- commandline completion for config files to properly suggest
*.yaml
&*.yml
files.
- commandline completion for config files to properly suggest
Released 2020-02-21
- Changes
- upgraded dependency
werkzeug
from>=0.15
to>=1.0
. - dependencies pinned to greater/equal current(latest) minor version.
- upgraded dependency
- Fixed
- issue #187.
- Added
- commandline autocompletion via
argcomplete
.
- commandline autocompletion via
Released 2020-01-28
- Fixed
- paging of the
Pr0gramm
ImageCrawler inpromoted=True
mode.
- paging of the
Released 2020-01-26
- Breaking changes
nichtparasoup config --check
's "duplicate image crawler" is no longer a Warning but an Error.- renamed
nichtparasoup.testing.config.ConfigFilesTest
toConfigFileTest
- without an "s".
- Changed
nichtparasoup config --check
now does a probe crawl.- class
ImageCrawlerInfo
lost support for positional arguments, supports keyword-arguments only - prepare future extensibility. - class
Image
lost support for positional arguments, supports keyword-arguments only - prepare future extensibility.
- Added
Released 2020-01-12
- Fixed
- exception catch in
instagram
imagecrawler. - hyperlinks in the
README.md
.
- exception catch in
- Added
- keywords in
setup.py
.
- keywords in
Released 2019-12-20
- Fixed
- web UI settings storage.
Released 2019-12-20
- Breaking changes
- in the config the crawlers'
type
was renamed toname
. - the
Dummy
ImageCrawler was renamed toEcho
.
- in the config the crawlers'
- Changed
ImageCrawlerInfo
'sdesc
was renamed todescription
.ImageCrawlerInfo
don't require aversion
anymore.
- Fixed
- the non-existing
favicon.ico
is no longer tried to be loaded.
- the non-existing
- Added
- plugin support for ImageCrawlers. You may write your own, now :-)
- commandline interface got a
--debug
switch to help plugin developers. webserver
now usesmako
template engine.ImageCrawlerInfo
may have anicon_uri
, now.ImageCrawlerInfo
may have along_description
, now.
Released 2019-11-28
- Fixed
- auto-play is no longer broken, when image-gallery-mode is canceled by browser's builtin functions.
Released 2019-11-28
- Added
- ImageCrawler for Instagram:
InstagramProfile
&InstagramHashtag
. - web UI: added image zoom.
- web UI: hide scroll bar in FullScreen mode, when at scroll position is at top.
- ImageCrawler for Instagram:
Released 2019-11-26
- Fixed
- internal version detection.
Released 2019-11-26
Rewrite from scratch.
- Breaking changes
- removed support for python2.7 and lower.
- removed support for python3.4 and lower.
- the config format completely changed. Read the
docs/
for more. - everything changed ... due to a complete rewrite. Read the
docs/dev/
for more.
- Added
- publishing to PyPI
- image crawler for "picsum"
- image crawler "dummy"
- documentation in
docs/
setup.py
-based packaging support - forPIP
- testing support via
pytest
and test coverage report viacoverage
- code style tests via
flake8
,mypy
and extensions for those - also added them totox
-based automatisation tox
-based automatisation for testing- CI tests for
tox
-based tests onpy35
,py36
,py37
,py38
- via github actions - version history file
HISTORY.md
- Modified
README.md
to match current implementation- web UI to match latest web serve specs* rewrote from scratch
- config system - now using
YAML
file format - core image crawler architecture
- core server
- web server
- command line interface
- reddit crawler
- Removed
- Some image crawlers were removed, so they can be rewritten from scratch.
- image crawler for "giphy"
- image crawler for "soup.io"
- image crawler for "pr0gramm"
- image crawler for "4chan"
- image crawler for "9gag"
- Some image crawlers were removed, so they can be rewritten from scratch.
Rolling releases in repository until 2019-10-10
basic feature complete implementation
- supports: python2.6 and later
- supports: python3.4 and later
- implemented: config system - using
INI
file format - implemented: commandline interface
- implemented: web UI
- implemented: web server
- implemented: core server architecture to draw a random crawled image
- implemented: image crawler for giphy, soup.io, pr0gramm, 4chan, 9gag, reddit
- implemented: image crawler architecture