Skip to content

ihassantariq/grp1-hw4

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 

Repository files navigation

Algorithmic Methods of Data Mining (Sc.M. in Data Science) Homework - 4 completed by group 1!

Names

  1. Eleonora Barocco
  2. Hafiz Muhammad Hassan
  3. Daniele Figoli

Description

There were two mendatory tasks.

  1. For first one, we implemented two clustering and compared the results. We created two datasets and for each we filled the data that we got. We used KMean++ with Elbow Method and later for used jacard similarity for getting top 3 couple of clustors. After that we have created wordcloud for top 3 couple of clustors.

  2. Second task was related to finding the dupliactes in password2.txt file which was 2.2GB file. For the machine limitation we are not able to do that with whole data but we have completed it with sample of passwords.

Files

All related description is inside below file

  1. Homework 4

Note

We do created some sample files for storing the data after scraping or doing each task but if someone is running the file. They should be able to do that using just the Homework_4.ipynb file.

Let us know what do we need to improve. Thanks.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published