# Existing Estimators Research

The goal of this notebook is to evaluate a number of existing password estimators, with the following goals:

1. Establish a baseline of how well they work against a pre-defined dataset
2. Understand how well they can handle the corner cases (i.e. passwords that appear to be secure, but aren't, and the other way around)

We start with 5 arbitrary reasonably popular libraries.

In [None]:
# Let's start by installing the required libraries
!pip install zxcvbn password_strength passwordmeter password-validator

# Following libraries we'll need for the actual evaluation
!pip install datasets scikit-learn ipywidgets numpy pandas tqdm



We will now write a method which would allow us to validate password strength using all 4 libraries

In [34]:
def check_password(password, method="zxcvbn"):
    """
    Check the strength of a password using one of several popular Python libraries.
    
    The following methods are available:
      - "zxcvbn": Uses the zxcvbn-python library. Secure if score >= 3.
      - "password_validator": Uses the password-validator package with a defined schema.
      - "password_strength": Uses the password_strength package and evaluates entropy.
                               Secure if entropy > 50 bits.
      - "passwordmeter": Uses the passwordmeter package. Secure if score > 0.5.
      - "password_checker": Uses the py-password-checker package (assumed API).
                            Secure if the checker validates the password.
                            
    Args:
        password (str): The password to be evaluated.
        method (str): Which method to use for estimation.
        
    Returns:
        int: 1 if the password is considered secure, 0 otherwise.
    """
    
    if password is None:
        return 0
    
    if method.lower() == "zxcvbn":
        try:
            from zxcvbn import zxcvbn
            result = zxcvbn(password)
            # zxcvbn scores range from 0 (weak) to 4 (strong)
            return 1 if result.get("score", 0) >= 3 else 0
        except ImportError:
            print("Error: zxcvbn library is not installed.")
            return 0

    elif method.lower() == "password_validator":
        try:
            from password_validator import PasswordValidator
            # Define a schema: minimum 8 characters, maximum 100, at least one uppercase, one lowercase, and one digit.
            schema = PasswordValidator()
            schema.min(8).max(100).has().uppercase().has().lowercase().has().digits()
            is_valid = schema.validate(password)
            return 1 if is_valid else 0
        except ImportError:
            print("Error: password-validator module is not installed.")
            return 0
        except Exception as e:
            print(f"password_validator check failed: {e}")
            return 0

    elif method.lower() == "password_strength":
        try:
            from password_strength import PasswordStats
            stats = PasswordStats(password)
            
            # Documentation suggests 0.66 as a good magic number
            return 1 if stats.strength() > 0.66 else 0
        except ImportError:
            print("Error: password_strength module is not installed.")
            return 0
        except Exception as e:
            print(f"password_strength check failed: {e}")
            return 0

    elif method.lower() == "passwordmeter":
        try:
            import passwordmeter
            score, improvements = passwordmeter.test(password)
            # Assume a score greater than 0.5 indicates a secure password.
            return 1 if score > 0.5 else 0
        except ImportError:
            print("Error: passwordmeter module is not installed.")
            return 0
        except Exception as e:
            print(f"passwordmeter check failed: {e}")
            return 0

    else:
        raise ValueError("Unknown method specified. Choose one of: zxcvbn, password_validator, password_strength, passwordmeter, password_checker.")


# Example usage:
test_password = "ExamplePass123!"
VALIDATION_METHODS = ["zxcvbn", "password_validator", "password_strength", "passwordmeter"]

for m in VALIDATION_METHODS:
    result = check_password(test_password, method=m)
    print(f"Method '{m}': {'Secure' if result == 1 else 'Not secure'}")

Method 'zxcvbn': Secure
Method 'password_validator': Secure
Method 'password_strength': Secure
Method 'passwordmeter': Secure


We can now load a dataset which we'll be using for evaluation. We'll use `InfinitodeLTD/PWLDS` which is a very large synthetic dataset created for this very purpose. 

In [36]:
import random
from datasets import load_dataset

ds = load_dataset("InfinitodeLTD/PWLDS", split="train")

# Let's only take a subset of the data for faster processing
# Value 2 corresponds to neither secure nor insecure passwords, so we'll exclude it
my_filter = lambda x: random.random() < 0.01 and x["Strength_Level"] != 2

ds = ds.filter(my_filter)

Let's jot down some general information about the dataset:

In [31]:
print("General Dataset Information:")
print("----------------------------")
print("Total examples:", ds.num_rows)
print("Columns:", ds.column_names)
print("Features:", ds.features)
print("\nSample entries:")
for i in range(min(5, ds.num_rows)):
    print(ds[i])

General Dataset Information:
----------------------------
Total examples: 7878
Columns: ['Password', 'Strength_Level']
Features: {'Password': Value(dtype='string', id=None), 'Strength_Level': Value(dtype='int64', id=None)}

Sample entries:
{'Password': '6eftv', 'Strength_Level': 0}
{'Password': '6fi', 'Strength_Level': 0}
{'Password': '9arad', 'Strength_Level': 0}
{'Password': 'vreq', 'Strength_Level': 0}
{'Password': 'Abiee', 'Strength_Level': 0}


Cool, let's run this dataset through our estimators:

In [37]:
from sklearn.metrics import confusion_matrix, classification_report
from tqdm.notebook import tqdm

def map_label(strength):
    return 1 if strength in [3, 4] else 0

true_labels = [map_label(sl) for sl in ds["Strength_Level"]]

for method in VALIDATION_METHODS:
    print("\nEvaluating method:", method)
    
    # Get predictions for each password in the dataset using a progress bar.
    predictions = [check_password(password, method=method) 
                   for password in tqdm(ds["Password"], desc=f"Processing with {method}")]
    
    # Compute confusion matrix and classification report.
    cm = confusion_matrix(true_labels, predictions)
    cr = classification_report(true_labels, predictions)
    
    print("Confusion Matrix:")
    print(cm)
    print("\nClassification Report:")
    print(cr)
    
    # Identify and display mispredictions (limit to first 5 FPs and 5 FNs)
    mispred_indices = [i for i, (t, p) in enumerate(zip(true_labels, predictions)) if t != p]
    fp_indices = [i for i in mispred_indices if true_labels[i] == 0 and predictions[i] == 1][:5]
    fn_indices = [i for i in mispred_indices if true_labels[i] == 1 and predictions[i] == 0][:5]
    
    print("\nSome mispredictions (up to 5 FPs and 5 FNs):")
    if not mispred_indices:
        print("No mispredictions!")
    else:
        for idx in fp_indices:
            print(f"False Positive - Password: '{ds['Password'][idx]}', True label: {true_labels[idx]}, Prediction: {predictions[idx]}")
        for idx in fn_indices:
            print(f"False Negative - Password: '{ds['Password'][idx]}', True label: {true_labels[idx]}, Prediction: {predictions[idx]}")


Evaluating method: zxcvbn


Processing with zxcvbn:   0%|          | 0/79638 [00:00<?, ?it/s]

Confusion Matrix:
[[23711 15704]
 [ 1797 38426]]

Classification Report:
              precision    recall  f1-score   support

           0       0.93      0.60      0.73     39415
           1       0.71      0.96      0.81     40223

    accuracy                           0.78     79638
   macro avg       0.82      0.78      0.77     79638
weighted avg       0.82      0.78      0.77     79638


Some mispredictions (up to 5 FPs and 5 FNs):
False Positive - Password: 'adhamantqweRty', True label: 0, Prediction: 1
False Positive - Password: '8888adiatiOn8', True label: 0, Prediction: 1
False Positive - Password: 'adoninUuuu', True label: 0, Prediction: 1
False Positive - Password: 'tttt9abnerval', True label: 0, Prediction: 1
False Positive - Password: '5acidic7777', True label: 0, Prediction: 1
False Negative - Password: 'accession<', True label: 1, Prediction: 0
False Negative - Password: 'acetatedP<', True label: 1, Prediction: 0
False Negative - Password: 'abusedlyoi', True label: 

Processing with password_validator:   0%|          | 0/79638 [00:00<?, ?it/s]

Confusion Matrix:
[[34148  5267]
 [14992 25231]]

Classification Report:
              precision    recall  f1-score   support

           0       0.69      0.87      0.77     39415
           1       0.83      0.63      0.71     40223

    accuracy                           0.75     79638
   macro avg       0.76      0.75      0.74     79638
weighted avg       0.76      0.75      0.74     79638


Some mispredictions (up to 5 FPs and 5 FNs):
False Positive - Password: '8888adiatiOn8', True label: 0, Prediction: 1
False Positive - Password: 'achinglY7', True label: 0, Prediction: 1
False Positive - Password: 'dddd1Adamical', True label: 0, Prediction: 1
False Positive - Password: '4Adnervalqwe', True label: 0, Prediction: 1
False Positive - Password: 'AcontIas1111', True label: 0, Prediction: 1
False Negative - Password: 'z>ca<YIxd]U', True label: 1, Prediction: 0
False Negative - Password: 'accession<', True label: 1, Prediction: 0
False Negative - Password: 'acardiac$I', True label: 1

Processing with password_strength:   0%|          | 0/79638 [00:00<?, ?it/s]

Confusion Matrix:
[[39364    51]
 [19920 20303]]

Classification Report:
              precision    recall  f1-score   support

           0       0.66      1.00      0.80     39415
           1       1.00      0.50      0.67     40223

    accuracy                           0.75     79638
   macro avg       0.83      0.75      0.73     79638
weighted avg       0.83      0.75      0.73     79638


Some mispredictions (up to 5 FPs and 5 FNs):
False Positive - Password: '3Aconitumqwerty', True label: 0, Prediction: 1
False Positive - Password: 'qwertyaedilian3', True label: 0, Prediction: 1
False Positive - Password: 'qwerty7acquired', True label: 0, Prediction: 1
False Positive - Password: 'aconital1qwerty', True label: 0, Prediction: 1
False Positive - Password: '9addimentQwerty', True label: 0, Prediction: 1
False Negative - Password: 'accollevY@6', True label: 1, Prediction: 0
False Negative - Password: 'AbaditeG7m', True label: 1, Prediction: 0
False Negative - Password: 'o/ZNX[J0Ab

Processing with passwordmeter:   0%|          | 0/79638 [00:00<?, ?it/s]

Confusion Matrix:
[[39415     0]
 [16441 23782]]

Classification Report:
              precision    recall  f1-score   support

           0       0.71      1.00      0.83     39415
           1       1.00      0.59      0.74     40223

    accuracy                           0.79     79638
   macro avg       0.85      0.80      0.79     79638
weighted avg       0.85      0.79      0.78     79638


Some mispredictions (up to 5 FPs and 5 FNs):
False Negative - Password: 'AbaditeG7m', True label: 1, Prediction: 0
False Negative - Password: 'z>ca<YIxd]U', True label: 1, Prediction: 0
False Negative - Password: 'accession<', True label: 1, Prediction: 0
False Negative - Password: 'acardiac$I', True label: 1, Prediction: 0
False Negative - Password: '[gEmaccentor', True label: 1, Prediction: 0


For reading convenience, below is the output formatted to be more digestible:

### Evaluating method: zxcvbn

**Confusion Matrix:**

|              | Predicted 0 | Predicted 1 |
|--------------|-------------|-------------|
| **Actual 0** | 23711       | 15704       |
| **Actual 1** | 1797        | 38426       |

**Classification Report:**

| Class | Precision | Recall | F1-Score | Support |
|-------|-----------|--------|----------|---------|
| 0     | 0.93      | 0.60   | 0.73     | 39415   |
| 1     | 0.71      | 0.96   | 0.81     | 40223   |
| **Accuracy**     |           |        | 0.78     | 79638   |
| **Macro Avg**    | 0.82      | 0.78   | 0.77     | 79638   |
| **Weighted Avg** | 0.82      | 0.78   | 0.77     | 79638   |

**Mispredictions (up to 5 FPs and 5 FNs):**

| Type            | Password          | True Label | Prediction |
|-----------------|-------------------|------------|------------|
| False Positive  | `adhamantqweRty`  | 0          | 1          |
| False Positive  | `8888adiatiOn8`   | 0          | 1          |
| False Positive  | `adoninUuuu`      | 0          | 1          |
| False Positive  | `tttt9abnerval`   | 0          | 1          |
| False Positive  | `5acidic7777`     | 0          | 1          |
| False Negative  | `accession<`      | 1          | 0          |
| False Negative  | `acetatedP<`      | 1          | 0          |
| False Negative  | `abusedlyoi`      | 1          | 0          |
| False Negative  | `acceptedz>`      | 1          | 0          |
| False Negative  | `accord&,r*`      | 1          | 0          |

---

### Evaluating method: password_validator

**Confusion Matrix:**

|              | Predicted 0 | Predicted 1 |
|--------------|-------------|-------------|
| **Actual 0** | 34148       | 5267        |
| **Actual 1** | 14992       | 25231       |

**Classification Report:**

| Class | Precision | Recall | F1-Score | Support |
|-------|-----------|--------|----------|---------|
| 0     | 0.69      | 0.87   | 0.77     | 39415   |
| 1     | 0.83      | 0.63   | 0.71     | 40223   |
| **Accuracy**     |           |        | 0.75     | 79638   |
| **Macro Avg**    | 0.76      | 0.75   | 0.74     | 79638   |
| **Weighted Avg** | 0.76      | 0.75   | 0.74     | 79638   |

**Mispredictions (up to 5 FPs and 5 FNs):**

| Type            | Password          | True Label | Prediction |
|-----------------|-------------------|------------|------------|
| False Positive  | `8888adiatiOn8`   | 0          | 1          |
| False Positive  | `achinglY7`       | 0          | 1          |
| False Positive  | `dddd1Adamical`   | 0          | 1          |
| False Positive  | `4Adnervalqwe`    | 0          | 1          |
| False Positive  | `AcontIas1111`    | 0          | 1          |
| False Negative  | `z>ca<YIxd]U`     | 1          | 0          |
| False Negative  | `accession<`      | 1          | 0          |
| False Negative  | `acardiac$I`      | 1          | 0          |
| False Negative  | `[gEmaccentor`    | 1          | 0          |
| False Negative  | `abutter/ib`      | 1          | 0          |

---

### Evaluating method: password_strength

**Confusion Matrix:**

|              | Predicted 0 | Predicted 1 |
|--------------|-------------|-------------|
| **Actual 0** | 39364       | 51          |
| **Actual 1** | 19920       | 20303       |

**Classification Report:**

| Class | Precision | Recall | F1-Score | Support |
|-------|-----------|--------|----------|---------|
| 0     | 0.66      | 1.00   | 0.80     | 39415   |
| 1     | 1.00      | 0.50   | 0.67     | 40223   |
| **Accuracy**     |           |        | 0.75     | 79638   |
| **Macro Avg**    | 0.83      | 0.75   | 0.73     | 79638   |
| **Weighted Avg** | 0.83      | 0.75   | 0.73     | 79638   |

**Mispredictions (up to 5 FPs and 5 FNs):**

| Type            | Password           | True Label | Prediction |
|-----------------|--------------------|------------|------------|
| False Positive  | `3Aconitumqwerty`  | 0          | 1          |
| False Positive  | `qwertyaedilian3`  | 0          | 1          |
| False Positive  | `qwerty7acquired`  | 0          | 1          |
| False Positive  | `aconital1qwerty`  | 0          | 1          |
| False Positive  | `9addimentQwerty`  | 0          | 1          |
| False Negative  | `accollevY@6`      | 1          | 0          |
| False Negative  | `AbaditeG7m`       | 1          | 0          |
| False Negative  | `o/ZNX[J0Ab<`      | 1          | 0          |
| False Negative  | `z>ca<YIxd]U`      | 1          | 0          |
| False Negative  | `woSlwt,59*tb,`    | 1          | 0          |

---

### Evaluating method: passwordmeter

**Confusion Matrix:**

|              | Predicted 0 | Predicted 1 |
|--------------|-------------|-------------|
| **Actual 0** | 39415       | 0           |
| **Actual 1** | 16441       | 23782       |

**Classification Report:**

| Class | Precision | Recall | F1-Score | Support |
|-------|-----------|--------|----------|---------|
| 0     | 0.71      | 1.00   | 0.83     | 39415   |
| 1     | 1.00      | 0.59   | 0.74     | 40223   |
| **Accuracy**     |           |        | 0.79     | 79638   |
| **Macro Avg**    | 0.85      | 0.80   | 0.79     | 79638   |
| **Weighted Avg** | 0.85      | 0.79   | 0.78     | 79638   |

**Mispredictions (up to 5 FPs and 5 FNs):**

| Type            | Password         | True Label | Prediction |
|-----------------|------------------|------------|------------|
| False Negative  | `AbaditeG7m`     | 1          | 0          |
| False Negative  | `z>ca<YIxd]U`    | 1          | 0          |
| False Negative  | `accession<`     | 1          | 0          |
| False Negative  | `acardiac$I`     | 1          | 0          |
| False Negative  | `[gEmaccentor`   | 1          | 0          |

# Password Complexity Estimators Evaluation Summary

## 1. zxcvbn

**Accuracy:** 78%

**Strengths:**
- High recall (96%) for detecting secure passwords.
- Excellent precision (93%) for identifying insecure passwords.

**Weaknesses:**
- Low recall (60%) for insecure passwords, resulting in many false positives.
- Tends to misclassify passwords with predictable patterns (e.g., `adhamantqweRty`) as secure.

## 2. password_validator

**Accuracy:** 75%

**Strengths:**
- High recall (87%) for insecure passwords.
- Good precision (83%) when identifying strong passwords.

**Weaknesses:**
- Low recall (63%) for secure passwords, leading to high false negatives.
- Struggles with complex passwords containing special characters or unusual structures (e.g., `abutter/ib`, `accession<`).

## 3. password_strength

**Accuracy:** 75%

**Strengths:**
- Perfect recall (100%) for insecure passwords.
- Excellent precision (100%) for secure passwords.

**Weaknesses:**
- Poor recall (50%) for secure passwords, causing numerous false negatives.
- Often misclassifies complex passwords with special characters (`accollevY@6`, `woSlwt,59*tb,`) as weak.

## 4. passwordmeter

**Accuracy:** 79%

**Strengths:**
- Perfect precision (100%) in identifying secure passwords; no false positives.
- Perfect recall (100%) for insecure passwords.

**Weaknesses:**
- Moderate recall (59%) for secure passwords, frequently missing unconventional but secure passwords (`AbaditeG7m`, `z>ca<YIxd]U`).

---

## Overall Analysis and Recommendations

**Common Strengths:**
- All estimators effectively identify weak passwords with high reliability.

**Common Weaknesses:**
- Consistently struggle with accurately recognizing strong passwords containing special characters or unusual patterns, resulting in false negatives.

In short, no single estimator stands out as perfect, and most are struggling with passwords that in fact are secure.