Skip to content

Commit

Permalink
Bugfix and folders pruning, post blacklist
Browse files Browse the repository at this point in the history
* Default subfolders was added to all sections if
  global section `[Blacklist]` was used. This was due
  to tags were added to section from blacklist before
  there was check if section has search, that is,
  tags or conditions. Fixed

* Folders can now be pruned from files that are no
  longer needed, be it sections that are removed from
  the config or cached files that have not a single
  copy in `downloads`

* Post can be blacklisted by just dragging to special
  folder, `to_blocked_posts` or by manually adding
  ids to `blocked_posts.txt`, one post id per line
  • Loading branch information
lurkbbs committed Dec 14, 2019
1 parent b468377 commit 3fd76fd
Show file tree
Hide file tree
Showing 5 changed files with 273 additions and 49 deletions.
33 changes: 24 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ Main features of this fork:
- Max downloaded files per section limit. You can download only, say, 100 last files.
- **`order:` metatag family support**. With max downloads limit you can get e.g. top 10 most rated or top 25 least favorite.
- **Multithreaded**.
- **Folder pruning**. All files that are no longer required can be removed on next run. This is **not** a default behavior.
- Easy blacklisting. Just move files you don't wanna see again to a special folder, and you won't. The files will also be removed from the folder.

# Installing and Setting Up **e621dl**

Expand Down Expand Up @@ -91,12 +93,17 @@ Create sections in the `config.ini` to specify which posts you would like to dow
;; GENERAL ;;
;;;;;;;;;;;;;;

;These are default values
;Meaning of these setting described
;In the README.md in Section [Settings]
;These settings are all false by default
;[Settings]
;include_md5 = false
;include_md5 = true
;make_hardlinks = true
;make_cache = true
;db = true
;offline = true
;prune_downloads = true
;prune_cache = true

;These are default settings for all search groups below
;[Defaults]
Expand Down Expand Up @@ -633,13 +640,15 @@ This section have only one option, `tags`, an essentially every tags from here a

Settings for e621dl. All settings are boolean values that accept `true` or `false`.

| Name | Description |
| -------------- | ------------------------------------------------------------ |
| include_md5 | Changed in e621dl 5.4.0. If `true`, and format field in [defaults] is not set, default format became id.md5.id.ext instead of id.ext. This way you can deduplicate files and see md5 in a filename |
| make_hardlinks | If `true`, if a file was already downloaded somewhere else, hardlink will be created. Otherwise, full copy of a file will be created. |
| make_cache | If `true`, every downloaded file will be hardlinked/copied to `cache` folder. |
| db | If `true`, every post info will be stored in local database. If it's false, but database already is created, it can be used as a post info source, but no entries will be updated/created. |
| offline | If `true`, no requests whatsoever will be sent to e621. Tag aliasing is skipped, so if you use `cat` instead of `domestic_cat` and so on, you get incorrect result. Art description will be taken from local database (you have to have one, just use `db=true` at least once). If some files are not in cache or other folders, it won't be dowloaded. You can use it to fast recreate folder structure. If you want to just download new section without stopping for one second every 320 art infos, you can use `post_source = db` in default section. Info will be acquired from local database, but tags will be checked and files will be downloaded. |
| Name | Description |
| --------------- | ------------------------------------------------------------ |
| include_md5 | Changed in e621dl 5.4.0. If `true`, and format field in [defaults] is not set, default format became id.md5.id.ext instead of id.ext. This way you can deduplicate files and see md5 in a filename |
| make_hardlinks | If `true`, if a file was already downloaded somewhere else, hardlink will be created. Otherwise, full copy of a file will be created. |
| make_cache | If `true`, every downloaded file will be hardlinked/copied to `cache` folder. |
| db | If `true`, every post info will be stored in local database. If it's false, but database already is created, it can be used as a post info source, but no entries will be updated/created. |
| offline | If `true`, no requests whatsoever will be sent to e621. Tag aliasing is skipped, so if you use `cat` instead of `domestic_cat` and so on, you get incorrect result. Art description will be taken from local database (you have to have one, just use `db=true` at least once). If some files are not in cache or other folders, it won't be downloaded. You can use it to fast recreate folder structure. If you want to just download new section without stopping for one second every 320 art infos, you can use `post_source = db` in default section. Info will be acquired from local database, but tags will be checked and files will be downloaded. |
| prune_downloads | If `true`, all files in `downloads` that do not meet any of search criteria will be removed. It's as if you removed everything and then download only what you need. |
| prune_cache | If you have a cache folder and if `true` , than any files that has not a single copy/hardlink in `downloads ` will be deleted. It's as if we manually removed all files in the cache and then copied it from downloads. |



Expand Down Expand Up @@ -740,6 +749,12 @@ Note that if e621dl started with double click, its window closes by itself on ex

If for some reason Cloudflare thinks your IP is DDOS'ing e621, use this instruction to solve a captcha: [Cloudflare solution](Cloudflare.md)

# Individual post blacklisting

After first download of something, in the app folder there will be folder `to_blocked_posts` and file `blocked_posts.txt`. You can either move/copy blocked files to the folder, or manually add id of an art in the file, one id per line. On next `e621dl` run all files in `to_blocked_posts` will be removed (but not folders for now), and these files will never be downloaded again.



# Automation of **e621dl**

It should be recognized that **e621dl**, as a script, can be scheduled to run as often as you like, keeping the your local collections always up-to-date, however, methods for doing so are dependent on your platform, and are outside the scope of this quick-guide.
Expand Down
80 changes: 53 additions & 27 deletions e621dl.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,14 @@ def is_prefilter(section_name):
def default_condition(x):
return True

def has_actual_search(whitelist, blacklist, anylist, cond_func, **dummy):
def check_has_actual_search(whitelist, blacklist, anylist, cond_func, **dummy):
return whitelist or blacklist or anylist or cond_func != default_condition


def process_result(post, whitelist, blacklist, anylist, cond_func, ratings, min_score, min_favs, days_ago, **dummy):
def process_result(post, whitelist, blacklist, anylist, cond_func, ratings, min_score, min_favs, days_ago, has_actual_search, **dummy):
tags = post.tags
if not has_actual_search(whitelist, blacklist, anylist, cond_func):

if not has_actual_search:
return []
if whitelist and not all( any(reg.fullmatch(tag) for tag in tags) for reg in whitelist ):
return []
Expand Down Expand Up @@ -75,8 +76,8 @@ def get_directories(post, root_dirs, search, searches_dict):
search_result = process_result(post, **search)

# We travel below only if current folder matches
# our criteria
if search_result or not has_actual_search(**search):
# our criteria or there is nothing to look for
if search_result or not search['has_actual_search']:
for directory in subdirectories:
#preventing recursions in cases like cat/dog/cat/dog/...
if directory in root_dirs:
Expand All @@ -96,14 +97,9 @@ def get_directories(post, root_dirs, search, searches_dict):
return results


def get_files(post, format, directories, files, session, cachefunc, duplicate_func, download_post, search):
def get_files(post, filename, directories, files, session, cachefunc, duplicate_func, download_post, search):
with download_set.context_id(post.id):
if format:
id_ext = f'{post.id}.{post.file_ext}'
custom_prefix = format.format(**post.generate())[:100]
filename = f'{custom_prefix}.{id_ext}'
else:
filename = f'{post.id}.{post.file_ext}'


for directory in directories:
file_id=post.id
Expand All @@ -130,6 +126,8 @@ def prefilter_build_index(kwargses, use_db, searches):
if use_db:
storage.connect()

blocked_ids = local.get_blocked_posts()

try:
if download_queue.completed:
return
Expand All @@ -144,12 +142,11 @@ def prefilter_build_index(kwargses, use_db, searches):
append_func=kwargs['append_func']
max_days_ago=kwargs['days_ago']

results_num = 0

for results in gen(last_id, **kwargs):
local.printer.increment_posts(len(results))
append_func(results)
filtered_results=process_results(results, **kwargs)
filtered_results=[post for post in results if post.id not in blocked_ids]
filtered_results=process_results(filtered_results, **kwargs)
local.printer.increment_filtered(len(set(results) - set(filtered_results)))

download_queue.append( (directory, filtered_results) )
Expand Down Expand Up @@ -223,7 +220,8 @@ def main():
use_db = False
allow_append = False
full_offline = False

prune_downloads = False
prune_cache = False
# Iterate through all sections (lines enclosed in brackets: []).
for section in config.sections():

Expand All @@ -246,7 +244,13 @@ def main():
default_append_func = storage.append
use_db = True
allow_append = True

elif option.lower() in {'prune_downloads'}:
if value.lower() == 'true':
prune_downloads = True
elif option.lower() in {'prune_cache'}:
if value.lower() == 'true':
prune_cache = True

if section.lower() == 'settings':
for option, value in config.items(section):
if option.lower() in {'full_offline', 'offline'}:
Expand Down Expand Up @@ -416,15 +420,19 @@ def main():

section_tags += ['-'+tag for tag in blacklist+section_blacklisted]
section_search_tags = section_tags[:5]
section_blacklist=[re.compile(re.escape(mask).replace('\\*','.*')) for mask in section_blacklist+blacklist+section_blacklisted]
section_blacklist=[re.compile(re.escape(mask).replace('\\*','.*')) for mask in section_blacklist+section_blacklisted]
section_whitelist=[re.compile(re.escape(mask).replace('\\*','.*')) for mask in section_whitelist]
section_anylist = [re.compile(re.escape(mask).replace('\\*','.*')) for mask in section_anylist]

if has_actual_search(section_whitelist, section_blacklist, section_anylist, section_cond_func):
section_has_actual_search = \
check_has_actual_search(section_whitelist, section_blacklist, section_anylist, section_cond_func)
if section_has_actual_search:
section_subdirectories.update(default_subdirectories)
# Append the final values that will be used for the specific section to the list of searches.
# Note section_tags is a list within a list.

section_blacklist +=[re.compile(re.escape(mask).replace('\\*','.*')) for mask in blacklist]

if section_id[0] == "*":
section_directory = section_id[1:]
else:
Expand All @@ -446,7 +454,8 @@ def main():
'posts_countdown': section_post_limit,
'format':section_format,
'subdirectories': section_subdirectories,
'session' : session}
'session' : session,
'has_actual_search': section_has_actual_search,}

if is_prefilter(section_id):
prefilter.append(section_dict)
Expand All @@ -462,7 +471,7 @@ def main():
remote.finish_partial_downloads(session, cachefunc, duplicate_func)

local.printer.change_status("Building downloaded files dict")
files = local.get_files_dict(bool(cachefunc))
files = local.get_files_dict(bool(cachefunc), download_queue.is_reset())


if prefilter:
Expand All @@ -477,6 +486,7 @@ def main():
queue_thread.start()

download_pool=ThreadPoolExecutor(max_workers=2)
pathes_storage=local.PathesStorage()
try:
while True:
try:
Expand All @@ -489,7 +499,6 @@ def main():
sleep(0.5)
continue

filtered_away = set(chunk)
results_pair = []
for search in searches:
directory = search['directory']
Expand All @@ -498,10 +507,11 @@ def main():

results_pair += list(zip([search]*len(chunk), chunk))

#local.printer.increment_filtered(len(filtered_away))
while results_pair:
futures = []
remaining_from_countdown=[]

pathes_storage.begin()
for search, post in results_pair:
directory = search['directory']
format = search['format']
Expand All @@ -511,14 +521,23 @@ def main():

directories = get_directories(post, [directory], search, searches_dict)
if directories:
if format:
id_ext = f'{post.id}.{post.file_ext}'
custom_prefix = format.format(**post.generate())[:100]
filename = f'{custom_prefix}.{id_ext}'
else:
filename = f'{post.id}.{post.file_ext}'

pathes_storage.add_pathes(directories, filename)
futures.append(download_pool.submit(get_files,
post, format, directories, files,
post, filename, directories, files,
session, cachefunc, duplicate_func, download_post, search))

search['posts_countdown'] -= 1
else:
local.printer.increment_filtered(1)

pathes_storage.commit()

for future in futures:
if future.exception():
raise future.exception()
Expand Down Expand Up @@ -550,12 +569,19 @@ def main():

if download_queue.completed:
download_queue.reset()


if prune_downloads:
local.printer.change_status("Pruning downloads")
pathes_storage.remove_old()

if prune_cache:
local.printer.change_status("Pruning cache")
local.prune_cache()

local.printer.change_status("All complete")
local.printer.stop()
local.printer.join()
local.printer.step()
raise SystemExit


# This block will only be read if e621dl.py is directly executed as a script. Not if it is imported.
Expand Down
9 changes: 7 additions & 2 deletions e621dl_lib/constants.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
VERSION = '5.6.2'
VERSION = '5.7.0'

MAX_RESULTS = 320
PARTIAL_DOWNLOAD_EXT = 'request'
Expand All @@ -16,12 +16,17 @@
;; GENERAL ;;
;;;;;;;;;;;;;;
;These are default values
;Meaning of these setting described
;In the README.md in Section [Settings]
;These settings are all false by default
;[Settings]
;include_md5 = false
;make_hardlinks = true
;make_cache = true
;db = true
;offline = true
;prune_downloads = true
;prune_cache = true
;These are default settings for all search groups below
;[Defaults]
Expand Down
Loading

0 comments on commit 3fd76fd

Please sign in to comment.