Skip to content
master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Twitter Persian news tagcloud extraction

Final project of Information retrieval course.

TPNT is a Tag cloud generator that extracts hot keywords from Twitter page of a major Persian news agency in the fields of Economics and Socials for each month in a year.


Dependencies

  • GetOldTweets-java v1.2.0
  • Lucene 7.2.1

News agency

How to Run

This project has to main steps. First, twitts are stored in a csv file with the help of Crawler class. this class needs some options to work properly:

Flag Desc Requisition
-i The Id of twitter page required
-s Start date of extraction, format: YYY-MM-DD required
-e End date of extraction, format: YYY-MM-DD no
-m Limitation in the number of retrieved twitts no
-p Path of csv file no
-n Name of csv file no

An example for retrieving twitts from (@TasnimNews_Fa) starting from 2018-06-01 to 2018-07-01 in $PWD/result/ path:

java -cp ProjectNews.jar ir.ac.um.ce.projectnews.crawler.Crawler -i Tasnimnews_Fa -s 2018-06-01 -e 2018-07-01 -p result/

The next step is indexing docs. After removing stop-words from docs we use Searcher and Classifier classes plus a Bag of word to create some queries to estimate the correlation of each doc with context. Finally, we use the most corrolated words to generate a tag clud.

Contributors

About

Tag cloud generator that extracts hot keywords from Twitter page of a Persian news agency

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages