Running the parser with one alternative keyword provides much more accurate results than a list of keywords. #25

dkennedy778 · 2017-12-22T20:55:55Z

The first time I ran the parser I used the following.

memes = findMemes("meme",Meme_secondary_keywords,i)

Where "findMemes" is simply the parser wrapped in a method, Meme_secondary_keywords is a list of meme related terms like "greg,trump, and kobe" and i is superfluous

The second time I ran the parser I switched to this method

for word in variety_keywords:
Secondarykeyword = []
Secondarykeyword.append(word)
memes = findMemes(search_keyword,Secondarykeyword,i)
i = i + 1

In the above, i just acts as a tag that is added onto each unique results folder.

The results from the second method contained FAR more "meme-like" content. Reviewing the results from the first method, I noticed that after the first 50 or so results I was getting very few memes.

For example, the word "Trump" is the 11th element of the list, so it will run after 1,100 searches have been done. The first method of running the method did not return any trump memes, all of the results were just generic pictures of trump. The second method returned over 95 pictures which I would classify as memes

This is likely an idiosyncrasy of google's image search algorithm. I would hypothesize that as you get further from the original results page the scope of the results widens dramatically. I do not know if this problem can be fixed on your end, but I think it would be useful for end users to be aware of the second method. If I had ran my parse with the first method my results would be unusable, but the second method gave me very good data

See the master class in my memeClassifier for an example of this behavior https://github.com/dkennedy778/memeClassifier

Searching for memes with the for loop provides very good results, but searching with just the method and keyword is practically useless

Made some minor changes to make it easier to run the code as a loop. Running the parser from a main loop provides much better results when using alternative keywords. If you are parsing for a large set of alternative keywords, I recommend you structure your search as I detail in the below issue. See the master.py file in my memeClassifier project for the entire implementation hardikvasa#25

hardikvasa · 2018-02-18T23:43:22Z

Hi @dkennedy778 , thanks for opening up this issue.

Fast forwarding 30+ commits after you forked this code, the secondary keywords were removed and now has been added back to this repo. Now they are called suffix keywords because they suffix the main keyword.

I use loops to add suffix (secondary) keyword to the main keyword in the following way:

for every main keyword    
    for every secondary keyword
        search for main keyword + secondary keyword

If this is not what you meant, please feel free to correct me :)

dkennedy778 · 2018-02-19T02:12:25Z

You've got it exactly hardikvasa, thanks for adding this in!

hardikvasa added the feature-request label Jan 18, 2018

hardikvasa closed this as completed Feb 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running the parser with one alternative keyword provides much more accurate results than a list of keywords. #25

Running the parser with one alternative keyword provides much more accurate results than a list of keywords. #25

dkennedy778 commented Dec 22, 2017 •

edited

hardikvasa commented Feb 18, 2018 •

edited

dkennedy778 commented Feb 19, 2018

Running the parser with one alternative keyword provides much more accurate results than a list of keywords. #25

Running the parser with one alternative keyword provides much more accurate results than a list of keywords. #25

Comments

dkennedy778 commented Dec 22, 2017 • edited

hardikvasa commented Feb 18, 2018 • edited

dkennedy778 commented Feb 19, 2018

dkennedy778 commented Dec 22, 2017 •

edited

hardikvasa commented Feb 18, 2018 •

edited