Let’s start by importing the necessary Python libraries:

In [None]:
import pandas as pd
import numpy as np
import getpass
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

... and the dataset:

In [None]:
data = pd.read_csv('PasswordStrengthCheckerData/data.csv', on_bad_lines='skip')
display(data.head())

Let's see the information about the columns in the dataset:

In [None]:
display(data.info())

I took a look if the dataset has any null values or not:

In [None]:
display(data.isna().sum())

In this dataset, there is one invalid entry, so let's remove it:

In [None]:
data = data.dropna()
display(data.isna().sum())

The dataset has two columns; password and strength. In the strength column:

1. 0 means: the password’s strength is weak;
2. 1 means: the password’s strength is medium;
3. 2 means: the password’s strength is strong;

Before moving forward, I will convert 0, 1, and 2 values in the strength column to weak, medium, and strong:

In [None]:
data['strength'] = data['strength'].map({0: 'Weak', 
                                         1: 'Medium',
                                         2: 'Strong'})
display(data.sample(5))

Now let’s move to train a machine learning model to predict the strength of the password. Before we start preparing the model, we need to **tokenize** the passwords as we need the model to learn from the combinations of digits, letters, and symbols to predict the password’s strength. So here’s how we can tokenize and split the data into training and test sets:

In [None]:
def word(password):
    character = [i for i in password]
    return character

x = np.array(data['password'])
y = np.array(data['strength'])

tdif = TfidfVectorizer(tokenizer=word)
x = tdif.fit_transform(x)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.10, random_state=38)

Now here’s how to train a classification model to predict the strength of the password:



In [None]:
model = RandomForestClassifier()
model.fit(xtrain, ytrain)
display(model.score(xtest, ytest))

Now here’s how we can check the strength of a password using the trained model:



In [None]:
user = getpass.getpass('Enter Password: ')
data = tdif.transform([user]).toarray()
output = model.predict(data)
display(output)