-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistency in manual/docs and help message #38
Comments
Lists should be plain text domains one per line Simple example |
Here's just one of my many repos using it, this script uses all the TravisCI functionality which you won't need in your local environment https://github.com/mitchellkrogza/Badd-Boyz-Hosts/blob/master/dev-tools/DataTesting.sh |
domain1.com Simple List with just one per line Output folder is created in whatever folder you run PyFunceble See output folder here that gets created by PyFunceble https://github.com/mitchellkrogza/Phishing.Database/tree/master/phishing-domains You can specify the location of the output by using export |
Must be just -f parameter not -uf Then try |
The ones I use in all my projects ACTIVE = my list of active domains |
So INACTIVE is what you want for your dead domains lists |
Also if you want to force re-testing of everything every time you run PyFunceble add the -db flag to disable it re-testing from it's own database. But once you learn how smart the database is you should just leave it do its own thing. |
See my usage of ACTIVE INACTIVE and INVALID here https://github.com/mitchellkrogza/Phishing.Database |
The INVALID lists I use every now and again to clean up my input lists of any formatting errors but you will see on Phishing Database the numbers of INVALID hardly feature anymore for me but they were crucial in the beginning to get all the cleaning functions of my input sources correct |
Hi @maravento, and thanks for your feedback. Sorry that it wasn't clear. I'll do my best to improve the documentation for the future. To recapitulate. Test of file with URLsIf you want to test a list of URLs so in this format:
You can parse the file path with Test of a file in plain text or host file formatif you want to test a list of domains, IPs which are in plain text or hosts format so in this format:
You can parse the file path with Confusions (to fix in docs)Sorry for the confusion I created. Indeed both For example, let's say I want to test this file, I can give it to What is the difference between ACTIVE vs VALID and INACTIVE vs INVALID?Documentation: https://pyfunceble.readthedocs.io/en/latest/columns/status.html#status Because there are many possibilities, I created the structure of this project into one file called What I'm doing is I generate the
Difference between availability and syntax testAvailability testThe availability test consists of finding the availability of a domain, IP or URL. Domain and IPThe availability of domain and IP are found based on the result of WHOIS records, NSLOOKUP and HTTP status code. URLThe availability of a URL is found based on the HTTP status code. cf: documentation Syntax testThe syntax test is just a syntax test. As you understand Python, you can review our syntax test/check logic here. Auto continueYour question:
Documentation: https://pyfunceble.readthedocs.io/en/latest/components/auto-continue.html As the auto continue system is activated by default (unless you disable it into your personal How does it work?Documentation: https://pyfunceble.readthedocs.io/en/latest/components/auto-continue.html#how-does-it-work Said, in other words, everything happens into The idea is to log everything which has been tested and on next run (after the power cut in your example) remove the tested element from the original list to test. Said in python we do the equivalent of the following on a bigger scale. to_test = [1,2,3,4,5]
already_tested = [2,3,4]
to_test = list(set(to_test) - set(already_tested)) Thanks again for your feedback. I hope that I clarified things here. If not, please let me know. Cheers, |
perfect. well explained. Thanks a lot. |
@maravento awesome now let me make that even better for you as I currently process 60000 domains in 4 hours. Now welcome to the absolutely brilliant Multiprocessing of PyFunceble Now add the flags |
I'm running the command like this: -p PROCESSES, --processes PROCESSES Why there is no value for flag "-m"? What is the maximum level of processing allowed and what is the consumption of resources per process? PD: I am using a proliant M110 G9 HP test server 24/7, 8 GB RAM free and 10 Mb bandwidth |
No |
Then, according to my resources described above, how should I run the command for maximum performance and speed? |
Try 100 processes if it's too much drop it to 50 if it's too little up it to Max 250 . -m is just the switch to turn multi on the you specify how many processes with -p xx with that CPU you should comfortably get away with running 150 processes ... Just ty the exact command line I gave and let us know |
@maravento try |
Usage: |
Did you install it with pip or pip3 🤔 |
My bad sorry uninstall has no --user option indeed. Helping you off my phone as best as I can. Should be just pip uninstall package or pip3 uninstall package 🤔 @funilrys will have to assist further. For now why not just leave it as is and fire up Conda and run it there ? Won't matter if you have it installed on your system as you will be running a new instance from inside the Conda environment |
Just going back a few posts from earlier, are you doing all this in a VM on Virtual box or did you want a guide to creating a fool proof VM environment for running PyFunceble ? |
On a dedicated physical server (description is HERE) |
Ok got that just was referencing your request to doing it in a VM ... I could build one tomorrow which will work and may benefit others too. Still I cannot explain why you are experiencing freezing on your hardware we run PyFunceble in dcoker containers with Multiprocessing and don't get freezes or anything @funilrys will have to assist you to trace that. |
Please bare in mind I'm a user just like you, I'm not the author but have been using this extensively since Nissar started building it from some of my crazy ideas. |
I think you need to add the |
No logs (nl) defaults to false but adding -nl toggles it to true |
Pleasure and don't stress we will get you up and running for sure. @funilrys working mon-fri and his time is limited so I help where I can he will respond once he's online which he has not been all day so I know hes hammering away at some code somewhere |
Hello there, Sorry for being so silent. I here between work, next version of this tool, huge private project and family :) So let's go! Multiprocessing
The
I can't really answer that as there are too many variables. But generally in modern x64 machines, 100-150 is sufficient if you have other business running. Those are some of the variables that directly comes in mind and are obvious:
It really depends on the machine most of the time. Reduce memory impact (and freezes ?)For your big amount of data (I didn't think you will test 5 Millions of entries), I'll recommend setting us a MySQL/MariaDB database to It's actually way better as we don't have to keep the following dataset/subsystem in memory:
The (short) documentation about the database can be found here: https://pyfunceble.readthedocs.io/en/latest/components/databases.html I should mention that more deeply in the documentation. Thanks for mentioning. Please read more about it in the documentation:
Freeze
I'm not aware of any freeze. But I hope that using the MariaDB/MySQL database type can solve that. Uninstallation
Well, it depends on how you install it but I never thought it was necessary. Will be added to the documentation. Arch LinuxArch user can simply do $ yourFavoriteAurHelper -Rns pyfunceble PyPiPyPi installed package can be uninstalled like follow $ pip3 uninstall pyfunceble I don't understand why you get the following.
It might be because if your version of
Can you try to Otherwise, you can delete the output of the following commands. $ pip show pyfunceble | grep Location
$ which pyfunceble
$ which PyFunceble Virtualenv/CondaYou can start from the beginning by setting up a virtualenv. AdvantagesYou don't need to rely on the system version of (Mini)Conda@mitchellkrogza already explained it there and I have nothing to add except Mitch @mitchellkrogza please make a PR from it !! 😸 Advantages of condaConda let you install and use a python version of your choice and work from there! While virtualenv will only use the one installed by the system. VirtualenvHere is my routine when I'm at work using Debian 9 (from the head as I'm out of office). $ apt-get install python3-virtualenv
# Create the virtualenv and install it into the venv directory
$ virtualenv -p python3 venv
# Activate the environment (installed)
$ . venv/bin/activate
$ pip3 --version
# update pip
$ pip3 install pip --upgrade # Will be install inside the venv directory.
# Install and play with what we need
$ pip3 install pyfunceble # Will be install inside the venv directory.
# play with pyfunceble and other
$ pip3 --version
$ PyFunceble --version
$ PyFunceble -d microsoft_google.com
# When done and you want to go back to your system.
# Deactivate the virtual env.
$ deactivate
# Now you are back into your system
# proof PyFunceble is installed systemwide.
$ pip shoe pyfunceble | grep Location Logs
Actually not but I have a private branch with the work around it. It was never my priority but it will be for 2.5+. The only logs generated are the one we produce after each test so you can keep a track of what was the output of what domain for example. Warnings
|
@maravento I highly recommend the Mariadb solution. If you're not ok with it right now you could just split your large file into parts of maybe 500000 each with |
@mitchellkrogza Hi. A query: For example, my file has 5 M lines, and host-active has 1.5 M and host-inactive has 1.3 (host-invalid has few, so it doesn't apply for the example). |
@maravento it's hard to say why you got such results. I am currently testing your entire list in 5 x parts of 1M each all at the same time using Mini(Conda) environments running in parallel with each environment / instance of PyFunceble using multiprocessing and 50 processes each all using the mariadb database system. I estimate it will be finished by tomorrow morning and then I can push my results to my fork of your repo. This is the only way I can tell is to see what my results show versus yours. |
Data is definitely real and there will be no duplicates. Go and look yourself at the contents of output/domains/ACTIVE/list |
You can look at any of the files while they are being created or just tail them and you will see |
@mitchellkrogza Hi. the same problem. At this time the program has processed the following data:
And I lost all the work, and it started from the beginning again |
--clean will clean your output folders. Be careful using it I should have been more clear on that. Can't explain the duplications I've never seen any dupes before but I will have to check some of my big lists to see if active has any dupes. For now you can just run a final sort on the active and inactive files when the test is finished to remove any dupes until @funilrys can look into what might cause that. Just run |
That's why I reopened the ticket. I just lost 3 weeks of work by following the instructions of @funilrys I have summarized the proposals for improvements and bug fixes in issue 41 |
@maravento If you have a problem with the output and multiprocessing then use the API and manage your file and your multiprocessing yourself. I do it for @Ultimate-Hosts-Blacklist. You can do it and it is as simple as the following. Again, it's documented. from PyFunceble import test as PyFunceble
print(PyFunceble("google.com", complete=True)) I have no time actually to go deep into reproducing what you do (@mitchellkrogza might help with that) but in my plan there the full database (so MariaDB/MySQL) processing so that files are generated when it's really done. What database type do you use ? If it's JSON then no, then it's normal that's one of the reasons I introduced the database types. It's not in the documentation yet but I talked about it in the Reduce memory impact (and freezes ?) section ... The auto continue is guaranteed - if you use the multiprocessing option - only if you use the MySQL/MariaDB database types. That's what @mitchellkrogza implicitly said and that's what I confirmed:
I agree a lot with the state of the documentation. And that is in my workflow. But for the rest, you're using PyFunceble in a way we never used it before. Indeed, I tested it with 1.2 million records but never with so many records. That's what we need to go further into the database types implementation because JSON is not good for multiprocessing and memory. Cheers, P.S.: Please keep this open, it does not make sense to close it if the documentation and things you mentioned here are not fixed/handled. |
It is not necessary to keep it open. I think everything is clear. And I summarized my experiences and proposal for improvement in issue 41 |
I have been told that this is a "magic" tool. And I congratulate you for that, however i have read the instructions several times:
https://pyfunceble.readthedocs.io/en/latest/what-can-we-do.html
And I still have no idea how to verify a list of urls, nor the format that this list should have.
You could review the manual and make it more friendly, with examples. Thank you
The text was updated successfully, but these errors were encountered: