Skip to content
No description, website, or topics provided.
Python Tcl C C++ JavaScript Jupyter Notebook
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.idea
inputs
venv
README.md
all_documents_tf_WordCloud.pdf
all_documents_tf_list.csv
all_documents_tfidf_WordCloud.pdf
all_documents_tfidf_list.csv
document_1_tf_WordCloud.pdf
document_1_tf_list.csv
document_1_tfidf_WordCloud.pdf
document_1_tfidf_list.csv
document_2_tf_WordCloud.pdf
document_2_tf_list.csv
document_2_tfidf_WordCloud.pdf
document_2_tfidf_list.csv
document_3_tf_WordCloud.pdf
document_3_tf_list.csv
document_3_tfidf_WordCloud.pdf
document_3_tfidf_list.csv
document_4_tf_WordCloud.pdf
document_4_tf_list.csv
document_4_tfidf_WordCloud.pdf
document_4_tfidf_list.csv
document_5_tf_WordCloud.pdf
document_5_tf_list.csv
document_5_tfidf_WordCloud.pdf
document_5_tfidf_list.csv
document_6_tf_WordCloud.pdf
document_6_tf_list.csv
document_6_tfidf_WordCloud.pdf
document_6_tfidf_list.csv
document_7_tf_WordCloud.pdf
document_7_tf_list.csv
document_7_tfidf_WordCloud.pdf
document_7_tfidf_list.csv
document_8_tf_WordCloud.pdf
document_8_tf_list.csv
document_8_tfidf_WordCloud.pdf
document_8_tfidf_list.csv
simple_pdf_mining.py
stopwords_lib.py
tf_idf_lib.py

README.md

Simple PDF Mining

This program is a simple PDF text miner. The program extracts text from the file, parses that text and seperates words, calculates TF (Term Frequency) and IDF (Inverse Document Frequency) values of each word and finally extracts word clouds from these values.

Word Term Frequency
problem 1.0000000000000000
solution 0.8835616438356164
link 0.7979452054794520
physical 0.7876712328767124
vt 0.7157534246575342
shortest 0.7054794520547946
number 0.6643835616438356
aco 0.6404109589041096
survivable 0.5821917808219178
algorithm 0.5342465753424658
paths 0.5342465753424658
topology 0.5068493150684932
ant 0.4931506849315068
lightpaths 0.4863013698630137
set 0.4760273972602740
lightpath 0.4726027397260274
ea 0.4657534246575342
mapping 0.4554794520547945
network 0.4383561643835616
cost 0.4349315068493151
algorithms 0.4315068493150685
virtual 0.4315068493150685
nodes 0.4178082191780822
node 0.4143835616438356
search 0.3938356164383562
based 0.3938356164383562
networks 0.3869863013698630
table 0.3869863013698630
path 0.3767123287671233
design 0.3767123287671233
best 0.3561643835616438
pheromone 0.3527397260273973
time 0.3458904109589041
solutions 0.3458904109589041
wavelength 0.3184931506849315
optical 0.3116438356164384
mutation 0.3082191780821918
wdm 0.2979452054794521
total 0.2876712328767123
hub 0.2773972602739726
links 0.2705479452054795
success 0.2671232876712329
study 0.2671232876712329
values 0.2671232876712329
resource 0.2534246575342466
hubs 0.2534246575342466
first 0.2500000000000000
fitness 0.2500000000000000
data 0.2363013698630137
ga 0.2294520547945205
You can’t perform that action at this time.