Proposals to improve PyFunceble #41

maravento · 2019-07-15T23:40:30Z

Problems:

The installation method (described here) is confusing and does not match the manual (some commands require privileges and others do not, the installation is mixed with the execution, and the env, etc)
The minimum hardware and OS resources are unknown
It has no debug mode or logs, therefore, there will be no information when an error occurs (For example, sometimes it freezes, without being able to determine the cause)
There are inconsistencies between what the manual says and the creator's suggestions on issues
There are no technical performance data, no warnings about program consumption and how to control it. I have consulted other projects that use this program and do not provide this technical data either
It becomes unstable and collapses or freezes if large lists (+ 3 M) are used

Possible bugs:

Freezing: The program crashes on Ubuntu 18.04.x x64 and large lists (+ 3 M) and the only way to unlock it is with ctrl+c. It happens with small and large lists. The cause is unknown because the program has no debug mode or logs
Wrong instructions: According to the instructions, when ctrl+c is pressed to interrupt the program, the program must be executed with the --clean flag. This is very bad because all work is lost
Warnings: The --clean flag must have a warning of what it does, to avoid partial or total loss of work
auto-continue system fail: The auto-continue system is failing, since when the program is interrupted or frozen, it does not start where it was left, and as a result it is generating duplicates in the output.
Inconsistencies in the output: When processing a list, 3 files are generated in the hosts folder (ACTIVE / hosts, INACTIVE / hosts, INVALID / hosts). However, once the processing of the source list is finished, we take, for example, the INACTIVE / hosts file, and we reprocess it and in theory the output should be the same, but this is not the case, because this inactive list, It can become partially active. So the result is not reliable.
Run in modes and log file: It is necessary that the program has execution modes (debugging, safe, normal, minimal, etc.) so that it does not compromise the stability of the system and know more thoroughly the problems that may arise. The program also needs a log file to facilitate the audit and diagnosis of problems.
Virtual Env: The suggested virtual environment (python3-virtualenv) is not working as it should

Hardware Test:
I have performed different tests in physical environments with Ubuntu 18.04.3 x64 and large lists (+ 3 M). This is the result::

PC1: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz, RAM 32028 MiB
a. PyFunceble -m -p 200 -f file = system collapses
b. PyFunceble -m -p 150 -f file = freezes after a while running
c. PyFunceble -m -p 100 -f file = freezes after a while running
d. PyFunceble -m -p 50 -f file = test abort. Read 'CPU Usage'
e. PyFunceble -f file = Stable but slower than a bash

PC2: Intel(R) Xeon(TM) CPU ES-2603 v4 @ 1.70 GHz, RAM 15903 MiB
a. PyFunceble -m -p 200 -f file = system collapses
b. PyFunceble -m -p 150 -f file = freezes after a while running
c. PyFunceble -m -p 100 -f file = freezes after a while running
d. PyFunceble -m -p 50 -f file = freezes after a while running
e. PyFunceble -f file = Stable but slower than a bash

CPU usage:
In all tests the program reaches 100% CPU usage with a large lists (+ 3 M).

Speed test: PyFunceble vs bash
Bash:
#!/bin/bash
while read LINE; do
curl -o /dev/null --silent --head --write-out '%{http_code}' "$LINE"
echo " $LINE"
done < source.txt
PyFunceble:
PyFunceble -f source.txt
Results after +1 hour:
PyFunceble: 1364 processed lines (in hosts/ACTIVE hosts/INACTIVE hosts/INVALID)
Bash: 2930 processed lines

Conclusion:
This application is only faster than a simple bash with the "-m -p" flag, but it becomes unstable and freezes or collapses the system. I suggest that it be improved in this regard so that it is usable. regards

The text was updated successfully, but these errors were encountered:

mitchellkrogza · 2019-08-06T16:22:54Z

An containerised version ie. Docker image or a VM is a good idea and something we have discussed before. This just makes is easy for someone to run it with set limitations in a preconfigured environment but for those who know their specs and limitations we can, as it is now, determine that through a few simple tests of smaller lists. Like I know on my Intel i7 3960x that 200 processes is my own Max before starting to drag the machine down.

I also run this on Ubuntu 18.04.2 x64 daily with tests running 4-8 hours and no freezing and all my tests use Multiprocessing. Same on Ubuntu 16.04.2 and Arch Linux latest. So having debug logs YES oh YES we do indeed need them and I know they will be coming soon.

mitchellkrogza · 2019-08-11T05:53:54Z

Here's a test of mine from last night using only 100 processes and finished in just under 9 hours without freezing. This test is on my Ubuntu 18.04.2 server which still serves Nginx sites while the test is running, MySQL running for the sites and PyFunceble using same MySQL database.

mitchellkrogza · 2019-08-13T08:11:53Z

Most of my posts are from my smartphone as that is often the only time I have. Commenting on github from a smartphone is not user friendly by any means.

First:
my Specs on above tests are a 12-core KVM (Virtual Processor / Proxmox)
16 GB Memory allocated to this server VM
The physical processor is a Xeon E5-1650 which is split across 6 VM's (Screenshots attached are from same test running now this morning including the cpu and mem when a burst of multiprocessing is active)

Second:
Are you using the default JSON database or have you tried mySQL / mariaDB as has been suggested. With such a massive list of domains you are passing JSON is bound to be causing issues. As you will notice my test list above is 181,000 + strong and grows weekly, so splitting at 100K for me is not an option right now and thus far it gets processed without freezing all that differs is the length of time the tests take.

Third:
Have you tried running any of your test without multiprocessing?
That's essentially the same as running it with your bash script except without multiprocessing and could probably still be a bit faster than the bash method. Before we had the multiprocessing option in PyFunceble we all ran it this way which was one test at a time and had it running in Travis-CI docker containers across 50+ repo's daily for almost 3 years. Some of out tests on big lists would take weeks to complete.

it seems very good program

It is indeed, you should not give up faith. It may not be as perfect as what you want but we have massive projects running and relying on it for 3 years day in and day out. There are always improvements and fixes when time allows @funilrys but in its current state we run it it on so many different environments and distributions we cannot replicate freezing and believe me I have tried.

To create a flag to control the hardware resources assigned when running the program (CPU Core/RAM/bandwidth)

I doubt this would ever be practical (I may be wrong though), but I think it would be impossible to know what's running on someone's machine other than PyFunceble. So such a switch might be able to say ok lets allocate X processes because CPU is X and Memory is X but then 20 minutes into the test something else gets launched by the system / user which causes that situation to change.

The key here is just finding the sweet spot of how many processes to allocate before things go wrong. For safety sake you could use even 25 processes which is still way faster than any bash method or running one at a time. Even 10 processes is faster than 1 🤔 even 5 is faster than 1 its too tempting to push many many processes in order to get such massive tests finished.

I have automated PyFunceble tests on the same server above which run every hour 24/7 from Cron but are only allocated 50 processes so as to make sure, like this morning, they don't bring the server down while my current manual test of the bigger lists is in progress.

Let me correct myself a bit here, I HAVE indeed been able to freeze PyFunceble that was when I gave it 250 processes on my local machine. My max processes I can ever run on my local machine is 200 but even then I am limited to what else I can do while that is running. So I can run 50 processes day in and day out while I have 5 browsers open, my email and working on anything else I like without noticing its really even running in the background.

It truly is about finding a sweet spot and with your VERY large list its also a matter of, right now, finding ways of splitting the load by splitting your lists into smaller chunks for safety sake of not losing data but my suggestion would be simply pick 25-50 processes and let it run and also use the mySQL/mariaDB database option.

I have been discussing improvements to the database structure for mySQL/MariaDB with @funilrys which I know are coming soon which will dramatically improve the situation of if you had to kill PyFunceble during a multiprocess test, so it can carry on where it left off but also NOT lose the data in the output folder. This change will mean the output folder files are only created at the very end of testing by pulling the data from mySQL/MariaDB and then generating those files from database. I doubt this kind of change would ever work with the current and default JSON database structure which is one reason why SQL was introduced because we are all running into very large lists to deal with.

mitchellkrogza · 2019-08-13T15:48:21Z

it is better to remove it from the program and set default to mySQL / mariaDB

Too many existing users who run it to test smaller lists where JSON is still ok and will remain a default.

I know a debug log is coming in the short term but to be honest you will never succeed with very big lists like yours, even mine, without using MySQL. This is why I switched the moment the option was available as lists are growing rapidly.

mitchellkrogza · 2019-08-15T07:30:19Z

@maravento See bash script I added to #39

Missing default date in sql:whois

maravento mentioned this issue Jul 16, 2019

The program freezes every 2 - 5 hours #40

Closed

maravento changed the title ~~Proposal to improve the HowTO~~ Proposal to improve installation method Jul 16, 2019

maravento changed the title ~~Proposal to improve installation method~~ Proposals to improve PyFunceble Jul 22, 2019

This was referenced Aug 3, 2019

Inconsistency in manual/docs and help message #38

Closed

[GUIDE] Running PyFunceble in Conda Virtual Environments #39

Closed

maravento closed this as completed Aug 21, 2019

funilrys pushed a commit that referenced this issue Feb 21, 2021

Merge pull request #41 from spirillen/spirillen/issue39

9ccb46e

Missing default date in sql:whois

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposals to improve PyFunceble #41

Proposals to improve PyFunceble #41

maravento commented Jul 15, 2019 •

edited

Loading

mitchellkrogza commented Aug 6, 2019

mitchellkrogza commented Aug 11, 2019

mitchellkrogza commented Aug 13, 2019 •

edited

Loading

mitchellkrogza commented Aug 13, 2019

mitchellkrogza commented Aug 15, 2019

Proposals to improve PyFunceble #41

Proposals to improve PyFunceble #41

Comments

maravento commented Jul 15, 2019 • edited Loading

mitchellkrogza commented Aug 6, 2019

mitchellkrogza commented Aug 11, 2019

mitchellkrogza commented Aug 13, 2019 • edited Loading

mitchellkrogza commented Aug 13, 2019

mitchellkrogza commented Aug 15, 2019

maravento commented Jul 15, 2019 •

edited

Loading

mitchellkrogza commented Aug 13, 2019 •

edited

Loading