<a href="https://colab.research.google.com/github/lwmurph/port_scanner/blob/main/Password_scraper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Password Generator from Website Scraper
This script generates passwords by scraping text from a website and hashing it. It includes a loading bar to show progress and allows saving the result to a file. The script ensures that passwords are consistent across multiple scrapes.

## Features

- Scrapes text from a specified URL.
- Generates a deterministic password using SHA-3-512 hashing.
- Displays a loading bar to indicate progress.
- Compares passwords from multiple scrapes to ensure consistency.
- Offers options to save the generated password or the website URL to a file.

## Requirements

- requests: For making HTTP requests to the website.
- beautifulsoup4: For parsing HTML and extracting text.
- hashlib: For hashing text to generate passwords.
- re: For regular expression operations.
- sys: For displaying progress in the terminal.

## Installation

Ensure you have the required libraries. You can install them using pip:

```
pip install requests beautifulsoup4
```

## Usage

1. **Run the Script:**

```
python password_generator.py
```

2. **Input Prompts:** \
 - Website URL: Enter the URL of the website you want to scrape. \
 - Password Length: Enter the desired length of the password (between 4 and 50 characters). \
3. **Progress Indicator:**
 - The script will display a loading bar showing the progress of password generation.

4. Save Options:
 - After generating passwords, you can choose to save either the password or the URL to a file.
 - Enter a filename and a descriptive name for the password (e.g., Netflix, Instagram).

## Example
```
Enter the website URL: http://example.com
Enter a password length (4-50 characters): 12
[testing website][██████████████████████████████████████] 100.00% Complete
[info] Passwords are consistent after multiple scrapes
Would you like to save the password or the website URL to a file? (password/url/[N]o): password
Enter the filename to save to: passwords.txt
input name entry for password (e.g. netflix, instagram, ...): netflix

```
In this example, a password is generated from http://example.com, checked for consistency, and saved to passwords.txt with the label Netflix.


## Error Handling


- Invalid URL Format: The script checks if the URL is valid.
- Website Unreachable: The script ensures the website is reachable and responds with HTTP status code 200.
- Inconsistent Passwords: If passwords differ between scrapes, an error is raised.

## Notes
- Ensure that the website you are scraping is static and does not change frequently to get consistent results.
- The script currently assumes that the content of the website is in <p> tags. Adjust the scraping logic if necessary.



**To ensure the repeatability of your password generation, you should choose websites or sources where the content remains static and unchanging. Here are some examples of such sources:**


>**Public Domain Books:** Websites hosting public domain books, like Project Gutenberg, where the content of books remains unchanged.\
*Example: Project Gutenberg* \
> **Government Websites:** Certain government documents or archives that do not change frequently.\
*Example: USA.gov* \
> **Static Educational Resources:** Websites hosting educational resources or historical documents that are not updated frequently.\
*Example: Khan Academy*\
>**Archived Web Pages:** Websites like the Internet Archive that provide archived snapshots of web pages. \
*Example: Internet Archive*\
> **PDF Documents:** Direct links to PDF documents hosted on websites that are not updated frequently.

In [65]:
import requests
from bs4 import BeautifulSoup
import hashlib
import re
import os
import sys


import urllib.request

def print_loading_bar(iteration, total, length=40):
    percent = (iteration / total) * 100
    filled_length = int(length * iteration // total)
    bar = '█' * filled_length + '-' * (length - filled_length)
    sys.stdout.write(f'\r[testing website][{bar}] {percent:.2f}% Complete')
    sys.stdout.flush()

def is_valid_url(url):
    # Simple regex to check for a valid URL format
    regex = re.compile(
        r'^(?:http|ftp)s?://'  # http:// or https://
        r'(?:(?:[A-Z0-9](?:[A-Z0-9-]{0,61}[A-Z0-9])?\.)+(?:[A-Z]{2,6}\.?|[A-Z0-9-]{2,}\.?)|'  # domain
        r'localhost|'  # localhost
        r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}|'  # or ipv4
        r'\[?[A-F0-9]*:[A-F0-9:]+\]?)'  # or ipv6
        r'(?::\d+)?'  # optional port
        r'(?:/?|[/?]\S+)$', re.IGNORECASE)
    return re.match(regex, url) is not None

def is_reachable(url):
    # Send request to website and HTTP status code
    webcode = urllib.request.urlopen(url).getcode()
    return  webcode == 200

def scrape_text(url):
    try:
        response = requests.get(url)
        # Raise HTTPError for bad responses
        response.raise_for_status()
        soup = BeautifulSoup(response.content, 'html.parser')
        paragraphs = soup.find_all('p')
        text = ' '.join([p.get_text() for p in paragraphs])
        return text
    except requests.exceptions.RequestException as e:
        print(f"Error fetching the URL: {e}")
        return None

def generate_password(text, length):
    hash_object = hashlib.sha3_512(text.encode())
    hex_dig = hash_object.hexdigest()

    # Create a character pool from alphanumeric and special characters
    char_pool = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!@#$%^&*()-_+=<>?'

    # Deterministically select characters from the hash
    password = ''.join([char_pool[int(hex_dig[i:i+2], 16) % len(char_pool)] for i in range(0, length * 2, 2)])

    return password

def save_to_file(password, filename):
    with open(filename, 'a') as file:
        file.write(password + '\n')


if __name__ == "__main__":
    url = input("Enter the website URL: ").strip()
    input_length = input("Enter a password length (4-50 characters): ").strip()

    # Input validation for URL and length
    try:
        length = int(input_length)
    except ValueError:
        print("That is not an integer.")
        exit()

    # Make sure password length is within range
    if length > 50 or length < 4:
        raise ValueError("Password length needs to be in the range 4 to 120 characters")

    # Make sure the URL is valid and the website is up and running (200 HTTP status code)
    if not is_valid_url(url):
        print("Invalid URL format. Please enter a valid URL.")
    elif not is_reachable(url):
        print("The website is not reachable. Please check the URL or your network connection.")
    else:
        passwords = [None] * 20

    for i in range(20):
        text = scrape_text(url)
        if text:
            password = generate_password(text, length)
            passwords[i] = password
            if i < 1:
              print(f'[info] generated Password: {password}')
        else:
            raise ValueError("Failed to scrape the text from the website.")
       # Update and display the loading bar
        print_loading_bar(i + 1, 20)

    # Compare the passwords
    for i in range(19):
      if passwords[i] != passwords[i+1]:
          raise ValueError("\n Passwords differ after multiple scrapes. Make sure you are using a static website")
    print("\n[info] Passwords are consistent after multiple scrapes")

    save_choice = input("Would you like to save the password or the website URL to a file? (password/url/[N]o): ").strip().lower()
    if save_choice in ['password', 'url', 'no', 'nO', 'No', 'NO','N', 'n']:
        if save_choice == 'password':
            filename = input("Enter the filename to save to: ").strip()
            app_pass = input("input name entry for password (e.g. netflix, instagram, ...)")
            combo_pass = app_pass + ":" + password
            save_to_file(combo_pass, filename)
            print(f'{save_choice.capitalize()} has been saved to {filename}')
        elif save_choice == 'url':
            filename = input("Enter the filename to save to: ").strip()
            app_pass = input("input name entry for password (e.g. netflix, instagram, ...)")
            url_combo = 'password for ' + app_pass + ' - website url: ' + url + ", password length = " + str(length)
            save_to_file(url_combo, filename)
            print(f'{save_choice.capitalize()} has been saved to {filename}')
        else:
          print(f'Password: {password}')
    else:
        print("Invalid choice. Nothing was saved.")

Enter the website URL: https://www.gutenberg.org/cache/epub/74171/pg74171-images.html
Enter a password length (4-50 characters): 15
[info] generated Password: j!mW@3^lu8UjjiY
[testing website][████████████████████████████████████████] 100.00% Complete
[info] Passwords are consistent after multiple scrapes
Would you like to save the password or the website URL to a file? (password/url/[N]o): url
Enter the filename to save to: url.txt
input name entry for password (e.g. netflix, instagram, ...)github
Url has been saved to url.txt
