# Classifying Password Strength Using Neural Networks

*Author: Tyler Rios*

*Instructor: Nikhil Krishnaswamy*

*CS445: Introduction to Machine Learning*

*10 May 2024*

[GitHub](https://github.com/rios240/ML-Password-Assessment.git)

## Introduction

### Traditional Methods of Password Strength Assessment

Traditionally, password strength is assessed using a set of predefined rules or metrics. These methods generally focus on the superficial attributes of a password rather than its intrinsic strength against attack methods. Common criteria include:

* Length of the Password: Typically, a longer password is considered more secure. A minimum number of characters is often required.
* Use of Special Characters: Passwords that include non-alphanumeric characters such as @, #, $, etc., are often ranked as stronger.
* Mix of Upper and Lowercase Letters: The inclusion of both uppercase and lowercase letters is encouraged to increase password complexity.
* Presence of Numbers: Adding numeric characters is viewed as another way to enhance password security.
* Avoidance of Common Passwords and Patterns: Passwords that match common patterns or known weak passwords (like "password123" or "123456") are flagged as weak.

While these rules are straightforward and relatively easy to implement, they have significant drawbacks. They do not account for the holistic security of a password but rather check it against a checklist. As a result, a password might meet all traditional criteria and still be vulnerable due to predictable patterns or sequences that are easy for a human or a computer algorithm to guess.

### Why Use Machine Learning for Password Strength Assessment?

Machine learning, particularly through the use of neural networks, offers a more nuanced approach to assessing password strength. Instead of relying on static rules, machine learning models can learn and predict based on the data from numerous password breaches and security patterns. Here are some advantages of using machine learning over traditional methods:

* Pattern Recognition: Neural networks excel in recognizing complex patterns in data. In the context of passwords, they can learn to identify subtle patterns that might make a password easy to guess, even if it complies with traditional strength criteria.
* Adaptive Learning: Unlike static rule-based systems, machine learning models can improve over time. They adapt as new types of password breaches occur and as users create new kinds of password combinations.
* Assessment of Contextual Strength: Machine learning can evaluate not just the structure of a password but its strength relative to the most recent hacking techniques and leaked password databases.
* Predictive Capabilities: By training on historical data, neural networks can better predict the likelihood that a given password will be compromised, thereby providing a dynamic assessment of password strength.

In this notebook, we will explore how to harness the power of neural networks to create a model that assesses password strength more effectively and accurately than traditional methods. We will build and train a model that classifies passwords as either weak, moderate, or strong based on their intrinsic characteristics and learned patterns, demonstrating the potential of machine learning to enhance cybersecurity.

## Data Preprocessing

Before we can train our neural network to assess password strength, we need to preprocess the dataset to transform raw password strings into a format suitable for machine learning models. The dataset, sourced from Kaggle, categorizes passwords into three classes: weak, moderate, and strong. This classification will serve as the target labels for our training process. To prepare our data, we will encode the password strings into numerical features via ASCII encoding.

### ASCII Encoding

ASCII (American Standard Code for Information Interchange) encoding is a character encoding standard that represents text in computers and digital devices. Each character is assigned a unique number between 0 and 127. By converting each character of a password into its corresponding ASCII value, we create a numeric representation of the password which can be used as input for machine learning models.

Steps for ASCII Encoding:

1. Convert each character: Transform every character in the password to its corresponding ASCII value.
2. Handle variable length: Since passwords can vary in length but the input to our model needs a fixed size, we will pad shorter passwords with a special value (like 0) up to a certain length.
3. Normalize: To help the learning process, we normalize these values to ensure consistent scale across all features.

### Considerations
ASCII encoding might not differentiate between characters effectively since it only reflects the order in the character set and not the actual usage or context in passwords. In simpler terms, there is a degree of information loss with ASCII encoding. In contrast, one-hot encoding provides a more distinct representation of input features, but at the cost of higher computational overhead. In fact, when attempting to perform one-hot encoding on the dataset the amount of computational resources consumed resulted in the kernel dying. As such and despite its drawbacks, ASCII encoding will be used. 

In [18]:
import pandas as pd
import requests
import zipfile
import io

In [19]:
url = 'https://github.com/rios240/ML-Password-Assessment/raw/main/archive.zip'
response = requests.get(url)
print(response.raise_for_status())

zip_file = zipfile.ZipFile(io.BytesIO(response.content))
zip_file.extractall('.')
csv_filename = 'data.csv'

None


In [20]:
with open(csv_filename, 'r') as file:
    lines = file.readlines()

expected_delimiters = 1
correct_lines = [line for line in lines if line.count(',') == expected_delimiters]

with open(csv_filename, 'w') as file:
    file.writelines(correct_lines)

In [21]:
data = pd.read_csv(csv_filename)
data.dropna(inplace=True)
print(data.head())
print(data.tail())

      password  strength
0     kzde5577         1
1     kino3434         1
2    visi7k1yr         1
3     megzy123         1
4  lamborghin1         1
            password  strength
669635    10redtux10         1
669636     infrared1         1
669637  184520socram         1
669638     marken22a         1
669639      fxx4pw4g         1


In [22]:
password_strength_counts = data['strength'].value_counts(normalize=True)
print("Distribution of password strengths (as percentages):")
print(password_strength_counts * 100)

Distribution of password strengths (as percentages):
strength
1    74.189377
0    13.395426
2    12.415197
Name: proportion, dtype: float64


In [23]:
import numpy as np

In [24]:
max_length = data['password'].str.len().max()
print(f'The maximum password length is: {max_length}.')

def ascii_encode_padded(password, max_len):
    ascii_values = [ord(char) for char in password]
    padded_ascii = ascii_values + [0] * (max_len - len(ascii_values))
    return padded_ascii

data['ascii_encoded'] = data['password'].apply(ascii_encode_padded, max_len=max_length)

print(data['ascii_encoded'].head())

The maximum password length is: 220.
0    [107, 122, 100, 101, 53, 53, 55, 55, 0, 0, 0, ...
1    [107, 105, 110, 111, 51, 52, 51, 52, 0, 0, 0, ...
2    [118, 105, 115, 105, 55, 107, 49, 121, 114, 0,...
3    [109, 101, 103, 122, 121, 49, 50, 51, 0, 0, 0,...
4    [108, 97, 109, 98, 111, 114, 103, 104, 105, 11...
Name: ascii_encoded, dtype: object


In [25]:
ascii_data = pd.DataFrame(data['ascii_encoded'].tolist(), index=data.index)
ascii_data.columns = [f'char_{i}' for i in range(ascii_data.shape[1])]

ascii_data['label'] = data['strength']

print(ascii_data.head())

   char_0  char_1  char_2  char_3  char_4  char_5  char_6  char_7  char_8  \
0     107     122     100     101      53      53      55      55       0   
1     107     105     110     111      51      52      51      52       0   
2     118     105     115     105      55     107      49     121     114   
3     109     101     103     122     121      49      50      51       0   
4     108      97     109      98     111     114     103     104     105   

   char_9  ...  char_211  char_212  char_213  char_214  char_215  char_216  \
0       0  ...         0         0         0         0         0         0   
1       0  ...         0         0         0         0         0         0   
2       0  ...         0         0         0         0         0         0   
3       0  ...         0         0         0         0         0         0   
4     110  ...         0         0         0         0         0         0   

   char_217  char_218  char_219  label  
0         0         0      

In [26]:
from sklearn.preprocessing import MinMaxScaler

In [27]:
scaler = MinMaxScaler(feature_range=(0, 1))
ascii_features = ascii_data.columns[ascii_data.columns.str.startswith('char_')]
ascii_data[ascii_features] = scaler.fit_transform(ascii_data[ascii_features])

print(ascii_data.head())

     char_0    char_1    char_2    char_3    char_4    char_5    char_6  \
0  0.497585  0.324468  0.265957  0.012242  0.006424  0.006440  0.006484   
1  0.497585  0.279255  0.292553  0.013455  0.006182  0.006318  0.006013   
2  0.550725  0.279255  0.305851  0.012727  0.006667  0.013001  0.005777   
3  0.507246  0.268617  0.273936  0.014788  0.014667  0.005954  0.005895   
4  0.502415  0.257979  0.289894  0.011879  0.013455  0.013852  0.012143   

     char_7    char_8    char_9  ...  char_211  char_212  char_213  char_214  \
0  0.006667  0.000000  0.000000  ...       0.0       0.0       0.0       0.0   
1  0.006304  0.000000  0.000000  ...       0.0       0.0       0.0       0.0   
2  0.014668  0.303191  0.000000  ...       0.0       0.0       0.0       0.0   
3  0.006183  0.000000  0.000000  ...       0.0       0.0       0.0       0.0   
4  0.012608  0.279255  0.292553  ...       0.0       0.0       0.0       0.0   

   char_215  char_216  char_217  char_218  char_219  label  
0      

## Training a Neural Network for Password Strength Classification

In this section, we will develop a neural network to classify password strengths based on their ASCII-encoded values. Our goal is to use a neural network classifier to analyze patterns in the numerical representation of passwords, transcending simple metrics like password length or the presence of special characters. Given the complexity and subtleties of password strength, neural networks are particularly well-suited for capturing nonlinear interactions between features.

### Why a Neural Network?

Neural networks are powerful computational models that can model complex nonlinear relationships between inputs and outputs. They consist of layers of interconnected nodes or neurons, where each connection represents a weight that is adjusted during the training process. For password strength assessment, a neural network can learn to recognize complex patterns in ASCII-encoded sequences that might indicate the robustness of a password against common attack vectors like brute force or dictionary attacks.

### Addressing Data Imbalance with K-Fold Cross-Validation

As scene in the Data Preprocessing section the dataset is imbalanced with varying numbers of weak, moderate, and strong passwords. To mitigate this imbalance and ensure our model generalizes well across all password strengths, we will utilize K-fold cross-validation. This technique involves dividing the dataset into 'K' subsets (or folds). The model is trained on 'K-1' folds and tested on the remaining fold, iteratively, such that each fold serves as the test set exactly once. This approach helps in assessing the model’s performance more reliably and reduces the likelihood of bias towards the more frequent classes.

### Utilizing PCA for Dimensionality Reduction

Given the high dimensionality of our ASCII-encoded data (with each password potentially represented by up to 220 features, one for each character's ASCII value), computational efficiency and model performance can be affected. High-dimensional spaces often suffer from the curse of dimensionality, where the volume of the space increases exponentially with the number of dimensions, making the data sparse and learning less effective.

To address this, we can employ Principal Component Analysis (PCA). PCA is a statistical technique that reduces the dimensionality of the data by transforming the original variables into a new set of variables, which are linear combinations of the original variables. These new variables, called principal components, are ordered so that the first few retain most of the variation present in all of the original variables.

Benefits of PCA in Our Context:

* Reduction in Computational Complexity: By reducing the number of features, PCA can decrease the training time of our neural network, making it computationally more efficient.
* Noise Reduction: PCA can help in noise reduction by eliminating variability that might be noise and retaining features that contain more signal.
* Improved Model Performance: Reducing dimensionality with PCA might help in alleviating overfitting and improve the model’s ability to generalize.

In [28]:
url = 'https://raw.githubusercontent.com/rios240/ML-Password-Assessment/main/optimizers.py'
response = requests.get(url)
print(response.raise_for_status())

with open('optimizers.py', 'wb') as file:
    file.write(response.content)

url = 'https://raw.githubusercontent.com/rios240/ML-Password-Assessment/main/neuralnetworks.py'
response = requests.get(url)
print(response.raise_for_status())

with open('neuralnetworks.py', 'wb') as file:
    file.write(response.content)

None
None


In [29]:
import neuralnetworks as nn
import time
from sklearn.decomposition import PCA

In [30]:
def generate_k_fold_cross_validation_sets(X, T, n_folds, shuffle=True):

    if shuffle:
        # Randomly order X and T
        randorder = np.arange(X.shape[0])
        np.random.shuffle(randorder)
        X = X[randorder, :]
        T = T[randorder, :]

    # Partition X and T into folds
    n_samples = X.shape[0]
    n_per_fold = round(n_samples / n_folds)
    n_last_fold = n_samples - n_per_fold * (n_folds - 1)

    folds = []
    start = 0
    for foldi in range(n_folds-1):
        folds.append( (X[start:start + n_per_fold, :], T[start:start + n_per_fold, :]) )
        start += n_per_fold
    folds.append( (X[start:, :], T[start:, :]) )

    # Yield k(k-1) assignments of Xtrain, Train, Xvalidate, Tvalidate, Xtest, Ttest

    for validation_i in range(n_folds):
        for test_i in range(n_folds):
            if test_i == validation_i:
                continue

            train_i = np.setdiff1d(range(n_folds), [validation_i, test_i])

            Xvalidate, Tvalidate = folds[validation_i]
            Xtest, Ttest = folds[test_i]
            if len(train_i) > 1:
                Xtrain = np.vstack([folds[i][0] for i in train_i])
                Ttrain = np.vstack([folds[i][1] for i in train_i])
            else:
                Xtrain, Ttrain = folds[train_i[0]]

            yield Xtrain, Ttrain, Xvalidate, Tvalidate, Xtest, Ttest

In [31]:
def run_k_fold_cross_validation(X, T, n_folds, list_of_n_hiddens, 
                                list_of_n_epochs, list_of_learning_rates):
    
    results = []
    classes = np.arange(3)
    method = 'adam'
    act_func = 'relu'
    for n_epochs in list_of_n_epochs:
        for learning_rate in list_of_learning_rates:
            for n_hiddens in list_of_n_hiddens:
                print(f'Running {n_hiddens} hiddens with {n_epochs} epochs and {learning_rate} learning rate.')

                train_acc_list = []
                validate_acc_list = []
                test_acc_list = []

                for Xtrain, Ttrain, Xvalidate, Tvalidate, Xtest, Ttest in generate_k_fold_cross_validation_sets(X, T, n_folds, shuffle=True):
                    classifier = nn.NeuralNetworkClassifier(Xtrain.shape[1], n_hiddens, len(classes), activation_function=act_func)

                    classifier.train(Xtrain, Ttrain, n_epochs, learning_rate, method=method, verbose=False)
                    
                    train_acc = np.mean(classifier.use(Xtrain)[0] == Ttrain)
                    validate_acc = np.mean(classifier.use(Xvalidate)[0] == Tvalidate)
                    test_acc = np.mean(classifier.use(Xtest)[0] == Ttest)

                    train_acc_list.append(train_acc)
                    validate_acc_list.append(validate_acc)
                    test_acc_list.append(test_acc)

                mean_train_acc = np.mean(train_acc_list)
                mean_validate_acc = np.mean(validate_acc_list)
                mean_test_acc = np.mean(test_acc_list)

                results.append([n_hiddens, n_epochs, learning_rate, mean_train_acc, mean_validate_acc, mean_test_acc])

    df_results = pd.DataFrame(results, columns=['Hidden Layers', 'Epochs', 'LR', 'Train Acc', 'Validate Acc', 'Test Acc'])
    return df_results

In [32]:
X = ascii_data.drop('label', axis=1).to_numpy()
T = ascii_data['label'].to_numpy().reshape(-1, 1)

pca = PCA(n_components=0.95)
X_pca = pca.fit_transform(X)

In [33]:
np.random.seed(42)

start = time.time()

results = run_k_fold_cross_validation(X_pca, T, 5,
                                      [[10], [10, 10], [30, 20, 10]],
                                      [40], [0.001])

elapsed = (time.time() - start) / 60/ 60
print(f'Took {elapsed:.2f} hours')
results

Running [10] hiddens with 40 epochs and 0.001 learning rate.
Running [10, 10] hiddens with 40 epochs and 0.001 learning rate.
Running [30, 20, 10] hiddens with 40 epochs and 0.001 learning rate.
Took 0.43 hours


Unnamed: 0,Hidden Layers,Epochs,LR,Train Acc,Validate Acc,Test Acc
0,[10],40,0.001,0.659363,0.659484,0.659549
1,"[10, 10]",40,0.001,0.705582,0.705495,0.705516
2,"[30, 20, 10]",40,0.001,0.739902,0.739838,0.739773


## Concluding Analysis of Neural Network Performance

The results from our experiment using a neural network classifier on PCA-reduced ASCII-encoded data reveal interesting insights into the model's performance and the nature of the data used. Here we break down the implications of these results and the potential benefits of alternative encoding methods like one-hot encoding.
Discussion of Results

The table of results indicates a consistent improvement in accuracy as the complexity of the neural network increases:

* The simplest model with a single layer of 10 neurons achieves an accuracy of around 65.9% across training, validation, and test datasets.
* Expanding to two layers of 10 neurons each enhances the accuracy to about 70.5%.
* The most complex model tested, with layers of 30, 20, and 10 neurons, respectively, shows the best performance, with accuracy reaching approximately 73.98%.

These results suggest that the neural network can capture some of the patterns in the ASCII-encoded data, and that increasing the model complexity up to a point can benefit performance. However, the accuracy levels indicate that there may still be limitations due to the nature of the data encoding and the dimensionality reduction technique used.

### Implications of ASCII Encoding and PCA

ASCII encoding translates each character into a numerical value, providing a straightforward but somewhat crude numerical representation of passwords. This method captures the order and type of characters but may not effectively encapsulate the relationships or patterns that more directly relate to password strength, such as combinations of characters or the presence of common password phrases and structures.

Using PCA on ASCII-encoded data helps to reduce dimensionality and computational load, which is crucial given the high number of features. While PCA ensures that the most informative variance is retained, it also inevitably leads to the loss of some data specifics, which might be crucial for predicting password strength more accurately.

### Potential Advantages of One-Hot Encoding

One-hot encoding could potentially offer improvements over ASCII encoding in the context of neural network performance for several reasons:

* Feature Representation: One-hot encoding transforms each character into a binary vector where only the position corresponding to the character is marked with a 1, and all other positions are 0. This method can better represent the presence or absence of specific characters, providing a richer feature set for the neural network to learn from.
* Data Sparsity and Network Focus: Although one-hot encoding increases data sparsity, it also allows the network to focus on the presence of specific characters without interference from numerical values of unrelated characters. This could enhance the ability of the network to learn important patterns related to password strength.
* Handling Complexity: Neural networks are well-equipped to handle the increased dimensionality from one-hot encoding, especially with sufficient training data and computational power. The detailed features provided by one-hot encoding might enable the network to make more nuanced distinctions between different password strengths.

However as discussed earlier, one-hot encoding could not be performed due to limitations in computational resources.

### Conclusion

While the PCA-reduced ASCII encoding provided a computationally efficient approach with reasonable accuracy, the intrinsic limitations of ASCII encoding and loss of information through PCA suggest that exploring one-hot encoding could be advantageous. Future experiments should consider implementing one-hot encoding to assess whether the potential for capturing more detailed patterns in the data can translate into significantly improved model performance. This approach may require more computational resources but could lead to a more robust and nuanced understanding of password strengths, thereby enhancing predictive accuracy and cybersecurity measures.