By clustering similar tweets together, we can generate a more concise and organized representation of the raw tweets, which will be very useful for many Twitter-based applications (e.g., truth discovery, trend analysis, search ranking, etc.)
Steps to run: i) open python command line interface (or Anaconda command prompt) at the location where the python file (name: k_mean.py), included in this project, is located.
ii) type the following command, make sure to enclose the full paths and pruning factor in double quotes:
python k_mean.py
Eg: >>> python k_mean.py 25 "C:/Users/Rishav/InitialSeeds.txt" "C:/Users/Rishav/data.json" "C:/Users/Rishav/tweets-k-means-output.txt"
iii) Please give it some time