Script: search_keywords_from_pptx_slides.py.
Input: folder name, keyword.
Note: keyword has to be the exact match.
run code from the command line:
python search_keywords_from_pptx_slides.py PPTXFOLDER keyword
Completed: filter out files in the specified folder that are not ppt slides.
Check: package can't read tmp/hidden file, for example: Folder~$**.pptx
Script: extract_words_from_pptx.py
run code from the command line:
python extract_words_from_pptx.py
Sample results
input slides can be found at https://luoluo-l.github.io/files/sbm_gnn.pdf