Skip to content

linlejiang/Data-Mining-Projects

Repository files navigation

Data_Mining_Projects

made-with-python made-with-spark

Description

These are class projects for USC DSCI553 - Foundations and Applications of Data Mining. To process large-scale data efficiently, all projects are completed using python and spark. The dataset for each project can be found here.

Topic Code Keyword Data Size
Identifying_Frequent_Itemsets Python PCY Apriori SON 9.20M Lines (5.59GB)
Recommendation_Systems Python Collaborative Filtering MinHash LSH 0.62M Lines (528MB)
Community_Detection_Algorithm Python Betweenness Communities Detection Girvan-Newman Algorithm 38.7k Lines (1.8MB)
Clustering_Algorithm Python K-Means Bradley-Fayyad-Reina(BFR) Algorithm NMI 1.46M Lines (666MB)
Mining_Streaming_Data Python Bloom Filter Flajolet-Martin Algorithm Twitter Streaming Reservoir Sampling 0.38M Lines (293MB) & steaming data
Hybrid_Recommendation_System Python Item-Based Collaborative Filtering Switching Cascade 1.39M (1.31GB)

About

Data mining projects using python and spark.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages