Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

manual / automated authenticated crawling with project options (headless) #209

Open
ehsandeep opened this issue Nov 29, 2022 · 1 comment
Assignees
Labels
Priority: Medium This issue may be useful, and needs some attention. Type: Enhancement Most issues will probably ask for additions or changes.

Comments

@ehsandeep
Copy link
Member

ehsandeep commented Nov 29, 2022

Feature suggested by @parthmalhotra

Please describe your feature request:

New options to add for this feature:

HEADLESS:
   -ob, -open-browser               open chrome browser for manual browsing and crawling


PROJECT:
   -cp, -crawl-project          create/use project data for authenticated crawl
   -lp, -list-project           list previously stored project

Example runs:

katana -headless -open-browser -crawl-project test # open browser with blank url
katana -headless -open-browser -u https://hackerone.com -crawl-project h1 # open browser with https://hackerone.com as url to navigate
katana -headless -u https://hackerone.com -cp h1  # automated crawling from project data (session information)
katana -headless -cp h1  # automated crawling from project data (session information)
katana --list-project

/Users/geekboy/Github/katana/test1  Nov 29 15:13 20MB
/Users/geekboy/Github/katana/test2  Nov 29 15:13 10MB
/Users/geekboy/Github/katana/test3  Nov 29 15:13 100KB

Following headless options can be used along with -sc or -scp option (#202) internally to create / reuse session information from the disk.

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --user-data-dir="test"

Note:

  1. When --open-browser option is used, manually browsed information will be used as seed information to perform crawling in the background and displayed in the CLI output.
  2. -ob and -sb can not be used together.
  3. -sc and -scp can not be used together.
  4. -cp option can be used to create / reuse session information when used with -ob option.
  5. -cp option can be only used to reuse session information if not used with -ob option.
  6. -cp option can be only used in headless mode as of now; planned to work with the standard mode in future.
@ehsandeep ehsandeep added Type: Enhancement Most issues will probably ask for additions or changes. Priority: Medium This issue may be useful, and needs some attention. labels Nov 29, 2022
@dogancanbakir
Copy link
Member

Related, #43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Medium This issue may be useful, and needs some attention. Type: Enhancement Most issues will probably ask for additions or changes.
Projects
None yet
Development

No branches or pull requests

3 participants