Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Phrase net is full of stop words #31

Open
brekhusr opened this Issue Mar 5, 2013 · 5 comments

Comments

Projects
None yet
3 participants

brekhusr commented Mar 5, 2013

Phrase-Net-x-a-y
Phrase-Net-x-y jpg

These two phrase nets did not tell me very much about my texts...is there a way to avoid this kind of result when working with PDFs with a lot of embedded text/metadata?

Contributor

corajr commented Mar 6, 2013

By adding your own stop words (1 per line) to the file "stopwords.txt" in the Paper Machines data folder, you should be able to get a clearer picture of your data. I will shortly add the ability to add stop words through a comma-separated list in the preferences.

brekhusr commented Mar 7, 2013

When I open the text files (stopwords, stopwords_en, stopwords_pt, search_stopwords) that come up when I search my computer for files called stopwords.txt and select results from the Paper Machines data folder, I don't see "lines" that would allow me to add 1 stopword per line. I just see a sort of unbroken stream of stopwords that don't even have spaces between them.
stopwords. Should I just go to the end and start typing additional stopwords? If so, how will it know where I mean to delimit them? Thanks, and sorry to be ignorant!

Contributor

corajr commented Mar 7, 2013

Ah, the line endings are in Unix format rather than Windows, so it shows up for you without line breaks. I've already implemented a preference pane that will allow additional entries, one per line, so you won't have to navigate to the file or anything. That will be released probably tonight, or as soon as I figure out a bug with geodict (it's about 90% there).

brekhusr commented Mar 7, 2013

Terrific! Meanwhile, I'll try writing to the Austrian National Library, which maintains http://europeana-geo.isti.cnr.it/geoparser, in German, and ask them if/when they're planning to bring that back online.

mkane2 commented Feb 26, 2015

Has this been resolved? I get the pane to add one stopword per line, but they don't seem to be used after multiple restarts of zotero and firefox (tried it in both versions) and restarts of the computer. The list of added stopwords persists across restarts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment