A php crawler that finds emails on the internets
PHP HTML Vue JavaScript Other
Latest commit add05bc Oct 20, 2016 @hedii fix where statement
Permalink
Failed to load latest commit information.
app
bootstrap
config
database add index to url name row Mar 12, 2016
public minor css fixes Mar 12, 2016
resources minor css fixes Mar 12, 2016
storage
tests initial Mar 9, 2016
.env Update .env Mar 12, 2016
.env.example
.gitattributes initial Mar 9, 2016
.gitignore initial Mar 9, 2016
README.md Update README.md Jun 12, 2016
artisan initial Mar 9, 2016
composer.json Update composer.json Mar 9, 2016
composer.lock Update composer.lock Mar 9, 2016
gulpfile.js initial Mar 9, 2016
package.json initial Mar 9, 2016
phpunit.xml initial Mar 9, 2016
server.php initial Mar 9, 2016

README.md

php-crawler

A crawler written in php with laravel that find email addresses on the internets. Given an entry point url, the crawler will search for emails in all the urls available for this entry point domain name. The emails are downloadable in a text file at any time. Several users can start searching for emails without viewing the other users' searches (searches are related to a user).

Installation

  • Create a mysql database (default name: php_crawler)
  • Install the repo with composer:
composer create-project hedii/php-crawler php-crawler
cd php-crawler
  • Install npm dependencies (optional):
npm install
  • Open the .env file, check the database credentials, and modify it if needed:
DB_HOST=127.0.0.1
DB_DATABASE=php_crawler
DB_USERNAME=root
DB_PASSWORD=root
  • Build the app
php artisan crawler:build
  • Point your webserver to the public directory: php-crawler/public
  • Done

Usage

  • Navigate to your php-crawler website
  • Register a new account
  • Create a new search
  • Create more searches
  • Download the found resources

Troubleshooting

Server requirements

  • Curl
  • PHP >= 5.5.9
  • Curl PHP Extension
  • OpenSSL PHP Extension
  • PDO PHP Extension
  • Mbstring PHP Extension
  • Tokenizer PHP Extension

Blank space in path

On some systems, if there is any blank space in the path to the crawler public directory, the crawler app won't work. Remove any space in folders that are part of the crawler path.

MAMP server

If you are running the crawler on a MAMP server, edit config/database.php and add a unix socket conf:

'mysql' => [
    'driver'    => 'mysql',
    'host'      => env('DB_HOST', 'localhost'),
    'database'  => env('DB_DATABASE', 'forge'),
    'username'  => env('DB_USERNAME', 'forge'),
    'password'  => env('DB_PASSWORD', ''),
    'charset'   => 'utf8',
    'collation' => 'utf8_unicode_ci',
    'prefix'    => '',
    'strict'    => false,
    'engine'    => null,

    'unix_socket' => '/Applications/MAMP/tmp/mysql/mysql.sock', // add this line
],

Todo

  • write php tests
  • write js tests
  • Crawl for other things than emails
  • ...

Screenshots