pyFileSearcher

pyFileSearcher was designed to be lightweight, easy to use, but capable of handling a large volume of files tool. A tool that I personally could use on large corporate servers to find out - which files have taken all my space in the last few days? It's free, it's opensource, it's for linux and windows.

The program is written in Python 3 using the Qt5.

What are you getting

Search by name, size, file type. Search by part of the path. Search for files listed in the index no earlier than N days ago
Saving information about deleted files, searching for them as well as for regular files
Ability to save search settings for future use
Ability to save search results in csv
Highlighting non-existent (deleted) files in search results
Logging access errors - you will know which folders were not indexed for some reason
Support for long paths (> 256 characters) in windows

How it works

The program runs through your hard disk and saves the minimum necessary information about the files: size, time of creation, modification, and time of the first indexing of the file (convenient for finding new files without looking at the attributes). To store this information, you can use the sqlite database (one for each target directory you want to index), or the MySQL database if you want to index hundreds of thousands and millions of files. In the latter case, you can use only one database, but specify several target directories. In both cases, each target directory is indexed in parallel with the others.

After you have set up simple indexing parameters (target directories, and white or black lists of extensions in the case of using sqlite), you can run the program with the "--scan" parameter to automatically start indexing, after which the program will be closed. Use this key to run via the scheduler.

During the scanning process, a pid-file is created in the working ("data") directory. Its existence blocks the process of launching a scan, if the program crashed - remove it manually.

Tests

The program was tested on a file server with about 20 million files. Scan time - about 5 hours. Files in biggest thread: ~7000000

MySQL non-default (for debian stretch) parameters:

innodb_buffer_pool_size = 3000M

innodb_log_file_size = 128M

innodb_log_buffer_size = 4M

innodb_flush_method = O_DIRECT

Name		Name	Last commit message	Last commit date
Latest commit History 276 Commits
ui_files		ui_files
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
app_dirs.py		app_dirs.py
icons_rc.py		icons_rc.py
main.py		main.py
requirements.txt		requirements.txt
utilities.py		utilities.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ui_files

ui_files

CHANGELOG.md

CHANGELOG.md

LICENSE

LICENSE

README.md

README.md

app_dirs.py

app_dirs.py

icons_rc.py

icons_rc.py

main.py

main.py

requirements.txt

requirements.txt

utilities.py

utilities.py

Repository files navigation

pyFileSearcher

What are you getting

How it works

Tests

About

Releases 3

Packages

Languages

License

qiwichupa/pyFileSearcher

Folders and files

Latest commit

History

Repository files navigation

pyFileSearcher

What are you getting

How it works

Tests

About

Topics

Resources

License

Stars

Watchers

Forks

Languages