About the Project
A global charity search engine.
- All data from official open data sources.
- Scripts to fully automate processing source files.
Test it out at: http://www.charitius.org
Background info on the concept at: http://juliushuijnk.nl/project/charitius/
- Scripts to process source files (Python3, Bash, Make)
- Elastic Search
- Docker installed
- Python 3.6 installed
- Unzip, Git
Install Python libraries
The scripts that manipulate the CSV files, use python libraries.
- You can install the required python libraries globally with
python3.6 -m pip install -r requirements.txt.
Create data to search
makein the root folder. It will download source files, process to .json file, and split it into parts that are uploadable to ES.
Create docker-compose.override.yml file
During development we want to make use of a Kibana container, but in production we don't want to make that available, since it costs RAM memory to run, but is not used. Therefore the production docker-compose file is different from production (live) version.
- We extend the
docker-compose.ymlfile with the
docker-compose.override.yml. During development let the override yml file hold the content of
docker-compose.dev.ymlby creating a symlink:
ln -s docker-compose.dev.yml docker-compose.override.yml
For production (live)
- Make symlink connecting the prod file:
ln -s docker-compose.prod.yml docker-compose.override.yml
- If you want to track with google analytics, add environmental variable via
export ANALYTICS=UA-XXXXXXX-X, where you replace the UA part with your google analytics id.
docker-compose up -d, it will take a minute or two for Elastic Search to fully start.
- Verify container are running with
docker ps, it should show
flask. During development also
Upload charity data
- Once ES is fully started, assuming it's still empty, do
bash scripts/upload.shto upload file(s) to elastic search.
- Verify the expected number of records ('documents') using:
curl -X GET "localhost:9200/_cat/count/chars?pretty"
- Verify the expected structure of data using:
curl -X GET "localhost:9200/chars/?pretty"
Play around with search
http://localhost:5601 for Kibana interface where you can search using the console.
More info on how to do search queries in the Kibana console:
For development the website is available via
http://localhost:5000, it contains the search bar.
For production (live) the website is on port 80 (default port).
Replace data & update code
On live server you might need to re-upload the data to ES, since the mapping of the fields has changed. In this case you can:
- Remove all data with:
curl -X DELETE "localhost:9200/chars?pretty"
- You might als need to recreate all json file with:
- (And perhaps remove even more files)
maketo create new json files to upload if required.
- Upload to ES via
After you did a
git pull or otherwise updated the Flask code,
you might need to do
docker restart flask01 in order to let the changes go into effect.
If you made changes to for instance required python libraries you need to rebuild your docker image. First remove and stop the flask01 container via
docker stop flask01 and
docker rm flask01.
Then find correct image via
docker images, and remove that image via
docker image rm <name image>.
Then you can do
docker-compose up -d again, and it will build the new image and start containers.
Debugging with Flask outside container
To (more easily) set breakpoints and debug Flask, you can run it outside the docker container on your local developer computer.
Install the dependencies, if you do this globally type
python3 -m pip install -r requirements.txt inside the flask-app folder.
Then you start it with
python3 app.py outside from the flask-app folder.
The 'outside' part makes flask look for Elastic Search via 0.0.0.0 and not the
address from inside the docker network.
1.0 A working search engine [ DONE ]
Most basic working search engine.
- Name, City, Country, Website, sourceURL, sourceDate.
- Datasets from more countries.
- Improved scripts to upload to ES replacing old files.
- Improved ways to search, for instance with categories.
I use the todo.taskpaper file to keep track of my TODO's.
You may read the
todo.taskpaper file, but it's probably (mostly) in Dutch and could be outdated. This README.md file should make sense and be up te date.
If you'd like to help, great! You can do so by:
- Suggesting code or feature improvements.
- Suggest good open data sources.
There is no fixed process yet. If you have suggestions, leave an issue over here, or contact me directly.
Distributed under the GNU GPLv3 license. In short: Use it freely, share improvements with me, no warranties.
See LICENSE.md for more information.
Easy to comprehend page on license: