Using Tika and OpenSearch to search the contents of PDF files across S3 bucket(s)
OpenSearch:
brew update
brew install opensearch
opensearch
App dependencies:
pip install -r requirements.txt
- Add your AWS keys to
config.py
- Provide the filename you want to search in
insert.py
's variable:s3_file_name
- Create the index by running
python insert.py create_index
- Download the file, extract the contents then insert it into OpenSearch via
python insert.py download_file
- Run the app via
python app.py