Skip to content

An openframeworks library to do tf–idf. Tf-idf is short for term frequency–inverse document frequency, it is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.

nullboundary/ofxTFIDF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Name    : ofxTFIDF Library                         
Author  : Noah Shibley, http://socialhardware.net                       
Date    : July 28th, 2014                                 
Version : 0.1                                               
Notes   : An addon for openframeworks to do TF-IDF
Dependencies:	openframeworks

An openframeworks library to do tf–idf. Tf-idf is short for term frequency–inverse document frequency, it is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus


Function list:

    void loadDocsToMap(string filePath,string fileExt,string splitChar); //store each txt doc as a word map object
    void scoreWords(); //look at words in map object and score the words via id-tf
    void printAllScores(bool toggle) //Print detailed scoring for every word.
    docScoreList //A vector of vectors containing a final idtf scores for each document.

Example:

Step 1. in the testApp.h:

#include "ofxTFIDF.h"	
ofxTFIDF wordFreq;

Step 2. in the testApp.cpp:

wordFreq.printAllScores(false); //set to true to see detailed scoring per word.
wordFreq.loadDocsToMap("articles","txt"," "); //articles path, txt extension, split words by space.
wordFreq.scoreWords();

//Print the top 5 highest scores per document.
for(int i=0;i<wordFreq.docScoreList.size();i++)
{
    vector < pair<string,float> > wordDocScore; //map for storing scores of individual document
    wordDocScore = wordFreq.docScoreList.at(i);
        
    //print only 5 top scores
    for(int j=0;j<5;j++)
    {
        cout << j+1 << ". " << wordDocScore.at(j).first << " " << wordDocScore.at(j).second << endl;
    }
    cout << "--------------------" << endl;
        
}

About

An openframeworks library to do tf–idf. Tf-idf is short for term frequency–inverse document frequency, it is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages