- Eleonora Barocco
- Hafiz Muhammad Hassan
- Daniele Figoli
There were two mendatory tasks.
-
For first one, we implemented two clustering and compared the results. We created two datasets and for each we filled the data that we got. We used KMean++ with Elbow Method and later for used jacard similarity for getting top 3 couple of clustors. After that we have created wordcloud for top 3 couple of clustors.
-
Second task was related to finding the dupliactes in password2.txt file which was 2.2GB file. For the machine limitation we are not able to do that with whole data but we have completed it with sample of passwords.
We do created some sample files for storing the data after scraping or doing each task but if someone is running the file. They should be able to do that using just the Homework_4.ipynb file.
Let us know what do we need to improve. Thanks.