You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The first time I ran the parser I used the following.
memes = findMemes("meme",Meme_secondary_keywords,i)
Where "findMemes" is simply the parser wrapped in a method, Meme_secondary_keywords is a list of meme related terms like "greg,trump, and kobe" and i is superfluous
The second time I ran the parser I switched to this method
for word in variety_keywords:
Secondarykeyword = []
Secondarykeyword.append(word)
memes = findMemes(search_keyword,Secondarykeyword,i)
i = i + 1
In the above, i just acts as a tag that is added onto each unique results folder.
The results from the second method contained FAR more "meme-like" content. Reviewing the results from the first method, I noticed that after the first 50 or so results I was getting very few memes.
For example, the word "Trump" is the 11th element of the list, so it will run after 1,100 searches have been done. The first method of running the method did not return any trump memes, all of the results were just generic pictures of trump. The second method returned over 95 pictures which I would classify as memes
This is likely an idiosyncrasy of google's image search algorithm. I would hypothesize that as you get further from the original results page the scope of the results widens dramatically. I do not know if this problem can be fixed on your end, but I think it would be useful for end users to be aware of the second method. If I had ran my parse with the first method my results would be unusable, but the second method gave me very good data
Made some minor changes to make it easier to run the code as a loop.
Running the parser from a main loop provides much better results when using alternative keywords. If you are parsing for a large set of alternative keywords, I recommend you structure your search as I detail in the below issue. See the master.py file in my memeClassifier project for the entire implementation
hardikvasa#25
Hi @dkennedy778 , thanks for opening up this issue.
Fast forwarding 30+ commits after you forked this code, the secondary keywords were removed and now has been added back to this repo. Now they are called suffix keywords because they suffix the main keyword.
I use loops to add suffix (secondary) keyword to the main keyword in the following way:
for every main keyword
for every secondary keyword
search for main keyword + secondary keyword
If this is not what you meant, please feel free to correct me :)
The first time I ran the parser I used the following.
memes = findMemes("meme",Meme_secondary_keywords,i)
Where "findMemes" is simply the parser wrapped in a method, Meme_secondary_keywords is a list of meme related terms like "greg,trump, and kobe" and i is superfluous
The second time I ran the parser I switched to this method
for word in variety_keywords:
Secondarykeyword = []
Secondarykeyword.append(word)
memes = findMemes(search_keyword,Secondarykeyword,i)
i = i + 1
In the above, i just acts as a tag that is added onto each unique results folder.
The results from the second method contained FAR more "meme-like" content. Reviewing the results from the first method, I noticed that after the first 50 or so results I was getting very few memes.
For example, the word "Trump" is the 11th element of the list, so it will run after 1,100 searches have been done. The first method of running the method did not return any trump memes, all of the results were just generic pictures of trump. The second method returned over 95 pictures which I would classify as memes
This is likely an idiosyncrasy of google's image search algorithm. I would hypothesize that as you get further from the original results page the scope of the results widens dramatically. I do not know if this problem can be fixed on your end, but I think it would be useful for end users to be aware of the second method. If I had ran my parse with the first method my results would be unusable, but the second method gave me very good data
See the master class in my memeClassifier for an example of this behavior https://github.com/dkennedy778/memeClassifier
Searching for memes with the for loop provides very good results, but searching with just the method and keyword is practically useless
The text was updated successfully, but these errors were encountered: