## # Introduction
<p><img src="https://i.imgur.com/kjWF1So.jpg" alt="Different characters on a computer screen"></p>
<p>According to a 2019 <a href="https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/PasswordCheckup-HarrisPoll-InfographicFINAL.pdf">Google / Harris Poll</a>, 24% of Americans have used common passwords, like <code>abc123</code>, <code>Password</code>, and <code>Admin</code>. Even more concerning, 59% of Americans have incorporated personal information, such as their name or birthday, into their password. This makes it unsurprising that 4 in 10 Americans have had their personal information compromised online. Passwords with commonly used phrases and personal information makes cracking a password drastically easier.</p>
<p>You may have noticed over the years that password requirements have increased in complexity, including recommendations to change your passwords every couple of months. Compiled from industry recommendations, below is a list of passwords requirements you will be asked to test: </p>
<p><strong>Password Requirments:</strong></p>
<ol>
<li>Must be at least 10 characters in length</li>
<li>Must contain at least:<ul>
<li>one lower case letter </li>
<li>one upper case letter </li>
<li>one numeric character </li>
<li>one non-alphanumeric character</li></ul></li>
<li>Must not contain the phrase <code>password</code> (case insensitive)</li>
<li>Must not contain the user's first or last name, e.g., if the user's name is <code>John Smith</code>, then <code>SmItH876!</code> is not a valid password.</li>
</ol>
<p>Here is the dataset that you will investigate this project:</p>
<div style="background-color: #ebf4f7; color: #595959; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/logins.csv</b></div>
Each row represents a login credential. There are no missing values and you can consider the dataset "clean".
<ul>
    <li><b>id:</b> the user's unique ID.</li>
    <li><b>username:</b> the username with the format {firstname}.{lastname}.</li>
    <li><b>password:</b> the password that may or may not meet the requirements. <i>Note, passwords should never be saved in plaintext, always encrypt them when working with real live passwords!</i></li>
</ul>
</div>
<p>Warning: This dataset contains some <strong>real</strong> passwords leaked from <strong>real</strong> websites. These passwords have been filtered, but may still include words that are explicit and offensive.</p>
<p>From here on out, it will be your task to explore and manipulate the existing data until you can answer the two questions described in the instructions panel. Feel free to import as many packages as you need to complete your task, and add cells as necessary. Finally, remember that you are only tested on your answer, not on the methods you use to arrive at the answer!</p>
<p><strong>Note:</strong> To complete this project, you need to know how to manipulate strings in pandas DataFrames and be familiar with regular expressions. Before starting this project we recommend that you have completed the following courses: <a href="https://learn.datacamp.com/courses/data-cleaning-in-python">Data Cleaning in Python</a> and <a href="https://learn.datacamp.com/courses/regular-expressions-in-python">Regular Expressions in Python</a>.</p>

In [28]:
# Use this cell to begin your analysis, and add as many as you would like!
import pandas as pd
login_dataset = pd.read_csv("datasets/logins.csv")
login_dataset.head(25)

Unnamed: 0,id,username,password
0,1,vance.jennings,vanceRules888!
1,2,consuelo.eaton,Mail_Pen%Scarlets.414
2,3,mitchel.perkins,Z00+1960
3,4,odessa.vaughan,D-rockyou
4,5,araceli.wilder,Araceli}r3
5,6,shawn.harrington,126_239_123
6,7,evelyn.gay,`4:&iAt$'o~(
7,8,noreen.hale,25941829163
8,9,gladys.ward,=Wj1`i)xYYZ
9,10,brant.zimmerman,L?4)OSB$r


In [29]:
# check for null values
print(login_dataset.shape)
login_dataset.info()

(982, 3)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 982 entries, 0 to 981
Data columns (total 3 columns):
id          982 non-null int64
username    982 non-null object
password    982 non-null object
dtypes: int64(1), object(2)
memory usage: 23.1+ KB


In [30]:
len_check=login_dataset['password'].str.len() >=10
len_check

0       True
1       True
2      False
3      False
4       True
5       True
6       True
7       True
8       True
9      False
10      True
11      True
12      True
13      True
14      True
15      True
16     False
17     False
18     False
19     False
20      True
21      True
22      True
23      True
24      True
25     False
26     False
27     False
28      True
29      True
       ...  
952     True
953    False
954     True
955    False
956    False
957     True
958     True
959     True
960    False
961     True
962     True
963     True
964     True
965    False
966     True
967     True
968    False
969     True
970     True
971     True
972     True
973     True
974    False
975     True
976     True
977    False
978     True
979     True
980    False
981     True
Name: password, Length: 982, dtype: bool

In [31]:
import re
re.findall(r"[A-Z]+","Milford<3Tompassword")

['M', 'T']

In [32]:
re.findall(r"[a-z]+","Milford<3Tompassword")

['ilford', 'ompassword']

In [33]:
phrase_check =login_dataset['password'].str.contains("password",case=False)
phrase_check

0      False
1      False
2      False
3      False
4      False
5      False
6      False
7      False
8      False
9      False
10     False
11     False
12     False
13     False
14     False
15     False
16     False
17     False
18     False
19     False
20     False
21     False
22     False
23     False
24     False
25     False
26     False
27     False
28     False
29     False
       ...  
952    False
953    False
954    False
955    False
956    False
957    False
958    False
959    False
960    False
961    False
962    False
963    False
964    False
965    False
966    False
967    False
968    False
969    False
970    False
971    False
972    False
973    False
974    False
975    False
976    False
977    False
978    False
979    False
980    False
981    False
Name: password, Length: 982, dtype: bool

In [34]:
capital_check=login_dataset['password'].str.contains("[A-Z]+")
capital_check

0       True
1       True
2       True
3       True
4       True
5      False
6       True
7      False
8       True
9       True
10     False
11      True
12     False
13      True
14     False
15      True
16      True
17      True
18     False
19      True
20      True
21      True
22     False
23      True
24      True
25      True
26      True
27     False
28      True
29      True
       ...  
952    False
953     True
954     True
955     True
956     True
957     True
958     True
959     True
960     True
961     True
962     True
963    False
964     True
965     True
966     True
967     True
968    False
969    False
970     True
971     True
972     True
973     True
974     True
975     True
976    False
977    False
978     True
979     True
980    False
981     True
Name: password, Length: 982, dtype: bool

In [35]:
lowercheck=login_dataset['password'].str.contains("[a-z]+")
lowercheck

0       True
1       True
2      False
3       True
4       True
5      False
6       True
7      False
8       True
9       True
10     False
11      True
12      True
13      True
14     False
15      True
16      True
17     False
18     False
19     False
20      True
21      True
22     False
23      True
24      True
25      True
26      True
27      True
28      True
29      True
       ...  
952    False
953     True
954     True
955     True
956     True
957     True
958     True
959     True
960     True
961     True
962     True
963     True
964     True
965     True
966     True
967     True
968     True
969     True
970     True
971     True
972     True
973     True
974     True
975     True
976     True
977     True
978     True
979     True
980     True
981     True
Name: password, Length: 982, dtype: bool

In [36]:
number_check=login_dataset['password'].str.contains("[0-9]+")
number_check

0       True
1       True
2       True
3      False
4       True
5       True
6       True
7       True
8       True
9       True
10      True
11      True
12      True
13      True
14      True
15      True
16     False
17      True
18      True
19      True
20      True
21      True
22      True
23      True
24     False
25      True
26      True
27     False
28     False
29      True
       ...  
952     True
953    False
954     True
955     True
956    False
957     True
958     True
959     True
960     True
961     True
962     True
963    False
964     True
965    False
966     True
967     True
968     True
969     True
970     True
971     True
972     True
973     True
974    False
975     True
976     True
977     True
978     True
979     True
980    False
981     True
Name: password, Length: 982, dtype: bool

In [37]:
non_alpha_check= login_dataset['password'].str.contains("\W+")
non_alpha_check

0       True
1       True
2       True
3       True
4       True
5      False
6       True
7      False
8       True
9       True
10      True
11      True
12      True
13      True
14     False
15      True
16      True
17      True
18     False
19     False
20     False
21      True
22      True
23      True
24      True
25     False
26      True
27      True
28      True
29      True
       ...  
952    False
953     True
954    False
955     True
956     True
957    False
958     True
959     True
960     True
961     True
962    False
963    False
964     True
965     True
966     True
967    False
968    False
969     True
970     True
971     True
972     True
973    False
974     True
975    False
976     True
977    False
978     True
979    False
980     True
981    False
Name: password, Length: 982, dtype: bool

In [38]:
username_dataset=login_dataset["username"].str.split("\.",expand=True)
username_dataset
login_dataset["first_name"]=username_dataset.iloc[:,0]
login_dataset["last_name"]=username_dataset.iloc[:,1]

In [39]:
index_login=login_dataset['password'].index
login_dataset.iloc[index_login]

Unnamed: 0,id,username,password,first_name,last_name
0,1,vance.jennings,vanceRules888!,vance,jennings
1,2,consuelo.eaton,Mail_Pen%Scarlets.414,consuelo,eaton
2,3,mitchel.perkins,Z00+1960,mitchel,perkins
3,4,odessa.vaughan,D-rockyou,odessa,vaughan
4,5,araceli.wilder,Araceli}r3,araceli,wilder
5,6,shawn.harrington,126_239_123,shawn,harrington
6,7,evelyn.gay,`4:&iAt$'o~(,evelyn,gay
7,8,noreen.hale,25941829163,noreen,hale
8,9,gladys.ward,=Wj1`i)xYYZ,gladys,ward
9,10,brant.zimmerman,L?4)OSB$r,brant,zimmerman


In [40]:
first = "vance"

In [41]:
last = "jennings"

In [42]:
first_name_occur= re.findall(first,"VanceRules888!".lower())
bool(first_name_occur)
   

True

In [43]:
second_name_occur = re.findall(last,"VanceRules888!".lower())
bool(second_name_occur) | bool(first_name_occur)

True

In [44]:
userPassLi =[]

In [45]:
def check_userexistInPass():
   for i in login_dataset.index:
     first_name_occur = re.findall(login_dataset.iloc[i,3],login_dataset.iloc[i,2].lower())
     last_name_occur = re.findall(login_dataset.iloc[i,4],login_dataset.iloc[i,2].lower())
     userPassLi.append(bool(first_name_occur)|bool(last_name_occur))
    
    

In [46]:
check_userexistInPass()
login_dataset["check_user_pass"] = userPassLi


In [47]:
print(login_dataset.head())

   id         username               password first_name last_name  \
0   1   vance.jennings         vanceRules888!      vance  jennings   
1   2   consuelo.eaton  Mail_Pen%Scarlets.414   consuelo     eaton   
2   3  mitchel.perkins               Z00+1960    mitchel   perkins   
3   4   odessa.vaughan              D-rockyou     odessa   vaughan   
4   5   araceli.wilder             Araceli}r3    araceli    wilder   

   check_user_pass  
0             True  
1            False  
2            False  
3            False  
4             True  


In [48]:
total_users = len(login_dataset)
total_users

982

In [49]:
valid_password = login_dataset[len_check & capital_check & lowercheck & number_check & non_alpha_check & (login_dataset["check_user_pass"] == False) & (phrase_check != True)]
valid_password

Unnamed: 0,id,username,password,first_name,last_name,check_user_pass
1,2,consuelo.eaton,Mail_Pen%Scarlets.414,consuelo,eaton,False
6,7,evelyn.gay,`4:&iAt$'o~(,evelyn,gay,False
8,9,gladys.ward,=Wj1`i)xYYZ,gladys,ward,False
13,14,jamie.cochran,Deviants.Assists.Impede+24,jamie,cochran,False
15,16,lorrie.gay,Q0G:[@u9*_`_,lorrie,gay,False
21,22,leticia.sanford,Parole:Seagull+Cession-148,leticia,sanford,False
23,24,brandie.webster,321.Snuffs-Pinball.Nougat,brandie,webster,False
29,30,rene.small,"]9""mP(kM4c",rene,small,False
30,31,rosanna.reid,Outguess%Dresser:Derails=669,rosanna,reid,False
33,34,patrica.hicks,Wanderer.849+Enlarges:Olympia,patrica,hicks,False


In [50]:
valid_pass_length = len(valid_password)
valid_pass_length

246

In [51]:
invalid_perc =(total_users - valid_pass_length)/total_users
bad_pass = round(invalid_perc,2)
bad_pass


0.75

In [52]:
invalid_users = login_dataset[~len_check | ~capital_check | ~lowercheck | ~number_check | ~non_alpha_check|(login_dataset["check_user_pass"] == True) | phrase_check]
invalid_users


Unnamed: 0,id,username,password,first_name,last_name,check_user_pass
0,1,vance.jennings,vanceRules888!,vance,jennings,True
2,3,mitchel.perkins,Z00+1960,mitchel,perkins,False
3,4,odessa.vaughan,D-rockyou,odessa,vaughan,False
4,5,araceli.wilder,Araceli}r3,araceli,wilder,True
5,6,shawn.harrington,126_239_123,shawn,harrington,False
7,8,noreen.hale,25941829163,noreen,hale,False
9,10,brant.zimmerman,L?4)OSB$r,brant,zimmerman,False
10,11,leanna.abbott,"@_2.#,%~>~&+",leanna,abbott,False
11,12,milford.hubbard,Milford<3Tom,milford,hubbard,True
12,13,mamie.fox,chichi821?,mamie,fox,False


In [53]:
email_list = invalid_users["username"].sort_values()
email_list

931           abdul.rowland
713            addie.cherry
857            adele.moreno
291            adeline.bush
663             adolfo.kane
775             adolfo.lara
51             ahmad.hopper
298              aida.combs
898           aisha.jenkins
471               al.dunlap
356            alana.franco
546         alberta.leblanc
306            alec.robbins
831    alejandra.stephenson
44          alejandro.burke
195        alejandro.nieves
483        alexander.thomas
920       alexandria.hinton
93        alexis.mccullough
219         alexis.reynolds
456          alfonso.weaver
366           alfonzo.johns
595          alisa.campbell
781             alisa.cohen
442             alison.neal
452          allan.marshall
338           alonzo.fowler
751           amado.bridges
207        amado.fitzgerald
543           amber.summers
               ...         
64              ursula.wood
664       valentin.castillo
551           valeria.curry
0            vance.jennings
731           vaness