API-first image file text search ๐
findfile
is the root API implementation of the file search service.
Store, query, and manage your JPGs, PNGs, and PDFs like you're searching text documents.
In order to work with the scripts in bin
, you'll need to have the following installed:
- jq - version
jq-1.6
- AWS CLI - version
aws-cli/1.19.53 Python/3.8.10 Linux/5.11.0-36-generic botocore/1.20.53
For quickstart run the following command and follow the prompts.
bash <(curl -s https://raw.githubusercontent.com/forstmeier/findfile/master/bin/quickstart) | tee "quickstart-$(date +%Y%m%d-%H%M).log"
For more in-depth usage and configuration, clone this repository, add an etc/config/config.json
file (in the structure seen in the bin/create_release
script), and run the scripts available in bin
.
The findfile
application listens to file events emitted by configured target S3 buckets. It then updates the database with that file data which can then be queried by the user. Two endpoints are provided:
/buckets
is responsible for adding and removing target buckets ๐ชฃ/documents
is responsible for running queries against the database ๐๏ธ
Below is an example buckets
query to add and remove buckets.
curl -X PUT https://7z8ruudxc9.execute-api.us-east-1.amazonaws.com/production/buckets --header "Content-Type: application/json" --header "x-findfile-security-key: 6758db58-9534-4e63-8eb9-ff402f6c29d7" --data '{"add": ["new-target-bucket"], "remove": ["old-target-bucket"]}'
Below is an example documents
query searching for the text "find me"
.
curl -X PUT https://7z8ruudxc9.execute-api.us-east-1.amazonaws.com/production/documents --header "Content-Type: application/json" --header "x-findfile-security-key: 6758db58-9534-4e63-8eb9-ff402f6c29d7" --data '{"text": "find me"}'
A successful query response will contain the bucket and key values for any files matching the query text.
A couple of caveats and potential future changes to be aware of:
- AWS does not currently support the correct event when deleting files through the S3 console for
findfile
to correctly listen to; if this is a significant issue, we can look into a solution. - S3 event notifications may be introduced to the current "listening" architecture (this would likely address the above issue).
- The stack is not currently very configurable but it could be expanded going forward if needed.
- Current database implementation defaults are in order to maintain a free tier option but these can be increased if there is interest.
Fork this repository and send a pull request. Follow Go best practices for structure and formatting! ๐