A school project from course Programming Project 2, Group 2, Professional Bachelor "Applied IT" at Erasmumshogeschool Brussel.
Search engine application for insurance documents. Insurance companies store documents on legislation, jurisprudence and legal doctrine in their particular field. The goal is to provide employees an easy-to-use search engine application based on the algorythms of the Elasticsearch framework.
- Insuraquest
- Unit Testing using Mock a Elastic Client
- Contributing
- Wrap up
- Registry and login of users
- Management of roles and authorisations: guest, user, librarian, admin
- Upload of documents in .pdf, .png or .jpeg format on server location
- At upload, the librarian enters all metadata (tags) with the upload
- Documents are picked up by fsCrawler, converted to json and presented, together with all the manual metadata, to the Elasticsearch stack for indexing
- Elasticsearch stores all documents on index "insuraquest" in json format
- All users (except for guests) can perform full text searches on content and add filters based on criteria such as language, issuer, insurance type, etc.
- Search results are shown in order of relevance (highest scores are shown on top); highlighting leads to rendering only some fragments of the content
- Full reading of the document is only a click away.
- Modification of tags
- Tables related to users and auhorisations as well as metadata options are stored in a SQL database
Component | Version |
---|---|
Linux Ubuntu | |
FsCrawler | |
Elasticsearch | 6.8 |
Elasticsearch-PHP | 6.7 |
PHP | 7.4.10 (cli) |
Composer | 2.0.6 |
Laravel Installer | 4.1.0 |
MySQL |
Note: fsCrawler vXXXX is only compatible with Elasticsearch 6.8. Consequently, require package Elasticsearch-PHP v6.7 in your Laravel composer.json file
Full documentation can be found here. Docs are stored within the repo under /docs/, so if you see a typo or problem, please submit a PR to fix it!
We also provide a code examples generator for PHP using the util/GenerateDocExamples.php
script. This command parse the util/alternative_report.spec.json
file produced from this JSON specification and it generates the PHP examples foreach digest value.
The examples are stored in asciidoc format under docs/examples
folder.
The recommended method to install Elasticsearch-PHP is through Composer.
-
Add
elasticsearch/elasticsearch
as a dependency in your project'scomposer.json
file (change version to suit your version of Elasticsearch, for instance for ES 7.0):{ "require": { "elasticsearch/elasticsearch": "^7.0" } }
-
Download and install Composer:
curl -s http://getcomposer.org/installer | php
-
Install your dependencies:
php composer.phar install
-
Require Composer's autoloader
Composer also prepares an autoload file that's capable of autoloading all the classes in any of the libraries that it downloads. To use it, just add the following line to your code's bootstrap process:
<?php use Elasticsearch\ClientBuilder; require 'vendor/autoload.php'; $client = ClientBuilder::create()->build();
You can find out more on how to install Composer, configure autoloading, and other best-practices for defining dependencies at getcomposer.org.
Version 7.0 of this library requires at least PHP version 7.1. In addition, it requires the native JSON extension to be version 1.3.7 or higher.
Elasticsearch-PHP Branch | PHP Version |
---|---|
7.0 | >= 7.1.0 |
6.0 | >= 7.0.0 |
5.0 | >= 5.6.6 |
2.0 | >= 5.4.0 |
0.4, 1.0 | >= 5.3.9 |
Since InsuraQuest uses Laravel Jetstream, it includes login, registration, email verification, two-factor authentication and session management out of the box. Jetstream uses Laravel Fortify, which is a front-end agnostic authentication backend for Laravel.
In the config/fortify.php configuration file you can customize the different aspects, choose which aspects you wish to implement on your project etc.
The logic to be executed on authorisation request, can be found and modified in App\Actions\Fortify.
More info and documentation on Jetstream can be found on the jetstream website .
InsuraQuest implements authorisation through the attribute 'type' which is included in each user-instance. There are four types: guest, user, librarian and admin. Types are made cascading. Each new level has the permissions of the level below + additional permissions.
Authorisation is enforced on the different routes (web.php). On mixed views, it is also enforced on view-level by implementing the native Laravel @can and @cannot.
Types can be adjusted in the database directly, or on the 'user administration'-page when you are signed in with an adminaccount.
When a visitor is not yet signed in, he will get rerouted to the login-screen. By default - when a new user gets registered - he is assigned the type 'guest'. He will be able to see the landingpage and the documentation, but cannot query any documents. A user can query documents, open them and mail them. A librarian can upload new files, delete files and change the tags on them. An admin can view all the users, their information, and adjust their type.
Description of deployment set-up.
- v. 6.8.13
- 1 shard
- 1 replica
- single node
- analyzer: fscrawler_path
- production index: insuraquest
- custom fields in index mapping:
"external": { "properties": { "title": { "type": "text" }, "language": { "type": "keyword" }, "date_published": { "type": "date" }, "issuer": { "type": "keyword" }, "category": { "type": "keyword" }, "tag": { "type": "keyword" } } }
- insuraquest index created on first run of FSCrawler
- v. 6.8.13
- v. 6-2.6
- utility has been converted in systemd unit to be used as a service -> /etc/systemd/system/fscrawler.service
- utility run by dedicated user fscrawler
- analyzer of FSCrawler makes use of Apache Tika to parse and tokenize binary documents, including pdfs
- fscrawler exposes a REST API running at http://127.0.0.1:8080/fscrawler
- custom fields added to mapping defined under /home/student/.fscrawler/_default/6/_settings.json
Considered more robust than built-in Laravel server.
- Configuration file -> /etc/nginx/sites-available/default:
server {
listen 80;
server_name 10.3.50.7;
root /var/www/insuraquest_production/insuraquest/public;
add_header X-Frame-Options "SAMEORIGIN";
add_header X-XSS-Protection "1; mode=block";
add_header X-Content-Type-Options "nosniff";
index index.php;
charset utf-8;
location / {
try_files $uri $uri/ /index.php?$query_string;
}
location = /favicon.ico { access_log off; log_not_found off; }
location = /robots.txt { access_log off; log_not_found off; }
error_page 404 /index.php;
location ~ \.php$ {
fastcgi_pass unix:/var/run/php/php7.4-fpm.sock;
fastcgi_param SCRIPT_FILENAME $realpath_root$fastcgi_script_name;
include fastcgi_params;
}
location ~ /\.(?!well-known).* {
deny all;
}
}
- Default configuration
- Populated after deployment of git repo with:
php artisan migrate:fresh --seed
- Deployment via bare git repo living under /home/student/insuraquest/bare_project.git
git init --bare /home/student/insuraquest/bare_project.init
- Post-receive hook allows to push changes to working directory living under /var/www/insuraquest_production
#!/bin/bash
#check out the files
git --work-tree=/var/www/insuraquest_production --git-dir=/home/student/insuraquest/bare_project.git checkout -f
chmod +x /path/to/bare_project.git/hooks/post-receive
- Configuration of local repo to push to the server
git remote add live 'student@10.3.50.7:/home/student/insuraquest/bare_project.git'
git push --set-upstream live main
- After project is pushed to /var/www/insuraquest_production, composer update is called to update all dependencies:
composer update
- Set the ownership of /var/www/insuraquest_production to www-data group to grant Ngin-x read and execute permissions.
chgrp -R www-data insuraquest_production
- Fix broken storage symbolic links
php artisan storage:link
In elasticsearch-php, almost everything is configured by associative arrays. The REST endpoint, document and optional parameters - everything is an associative array.
To index a document, we need to specify three pieces of information: index, id and a document body. This is done by constructing an associative array of key:value pairs. The request body is itself an associative array with key:value pairs corresponding to the data in your document:
$params = [
'index' => 'my_index',
'id' => 'my_id',
'body' => ['testField' => 'abc']
];
$response = $client->index($params);
print_r($response);
The response that you get back indicates the document was created in the index that you specified. The response is an associative array containing a decoded version of the JSON that Elasticsearch returns:
Array
(
[_index] => my_index
[_type] => _doc
[_id] => my_id
[_version] => 1
[result] => created
[_shards] => Array
(
[total] => 1
[successful] => 1
[failed] => 0
)
[_seq_no] => 0
[_primary_term] => 1
)
Let's get the document that we just indexed. This will simply return the document:
$params = [
'index' => 'my_index',
'id' => 'my_id'
];
$response = $client->get($params);
print_r($response);
The response contains some metadata (index, version, etc.) as well as a _source
field, which is the original document
that you sent to Elasticsearch.
Array
(
[_index] => my_index
[_type] => _doc
[_id] => my_id
[_version] => 1
[_seq_no] => 0
[_primary_term] => 1
[found] => 1
[_source] => Array
(
[testField] => abc
)
)
If you want to retrieve the _source
field directly, there is the getSource
method:
$params = [
'index' => 'my_index',
'id' => 'my_id'
];
$source = $client->getSource($params);
print_r($source);
The response will be just the _source
value:
Array
(
[testField] => abc
)
Searching is a hallmark of Elasticsearch, so let's perform a search. We are going to use the Match query as a demonstration:
$params = [
'index' => 'my_index',
'body' => [
'query' => [
'match' => [
'testField' => 'abc'
]
]
]
];
$response = $client->search($params);
print_r($response);
The response is a little different from the previous responses. We see some metadata (took
, timed_out
, etc.) and
an array named hits
. This represents your search results. Inside of hits
is another array named hits
, which contains
individual search results:
Array
(
[took] => 33
[timed_out] =>
[_shards] => Array
(
[total] => 1
[successful] => 1
[skipped] => 0
[failed] => 0
)
[hits] => Array
(
[total] => Array
(
[value] => 1
[relation] => eq
)
[max_score] => 0.2876821
[hits] => Array
(
[0] => Array
(
[_index] => my_index
[_type] => _doc
[_id] => my_id
[_score] => 0.2876821
[_source] => Array
(
[testField] => abc
)
)
)
)
)
Alright, let's go ahead and delete the document that we added previously:
$params = [
'index' => 'my_index',
'id' => 'my_id'
];
$response = $client->delete($params);
print_r($response);
You'll notice this is identical syntax to the get
syntax. The only difference is the operation: delete
instead of
get
. The response will confirm the document was deleted:
Array
(
[_index] => my_index
[_type] => _doc
[_id] => my_id
[_version] => 2
[result] => deleted
[_shards] => Array
(
[total] => 1
[successful] => 1
[failed] => 0
)
[_seq_no] => 1
[_primary_term] => 1
)
Due to the dynamic nature of Elasticsearch, the first document we added automatically built an index with some default settings. Let's delete that index because we want to specify our own settings later:
$deleteParams = [
'index' => 'my_index'
];
$response = $client->indices()->delete($deleteParams);
print_r($response);
The response:
Array
(
[acknowledged] => 1
)
Now that we are starting fresh (no data or index), let's add a new index with some custom settings:
$params = [
'index' => 'my_index',
'body' => [
'settings' => [
'number_of_shards' => 2,
'number_of_replicas' => 0
]
]
];
$response = $client->indices()->create($params);
print_r($response);
Elasticsearch will now create that index with your chosen settings, and return an acknowledgement:
Array
(
[acknowledged] => 1
)
A Librarian has the possibility to upload new files. When uploading a document it is possible to add tags to the uploaded document. The content for the tags is pulled from a mySql table and added to the form.
- Title, Language, Date Published, Issuer, Category, Keyword.
- These values are required to be entered by the Librarian to upload a document.
- A file can be uploaded, which must be pdf and max 2048kb.
- A document is required for upload.
FileUploadController.php
$this->validate($request, [
'title' => 'required',
'language' => 'required',
'date' => 'required|date',
'issuer' => 'required',
'category' => 'required',
'tag' => 'required',
'file' => 'required|mimes:pdf|max:2048'
]
When a document is uploaded, the file and tags are posted to fscrawler, which will index the document before adding to our ElasticSearch node.
FileUploadController.php
$file = $request->file('file');
$pathname = $file->store('public');
$fully_qualified_pathname = storage_path('app/' . $pathname);
$client = new Client();
try {
$client->request('POST', 'http://127.0.0.1:8080/fscrawler/_upload',
);
} catch (GuzzleException $e) {
echo $e;
}
A plugin is added for form layout -> tailwind.config.js
https://tailwindcss-custom-forms.netlify.app/
plugins: [
require('@tailwindcss/custom-forms'),
]
After a user gets all his search results, he can view more details on any of the results.
Here he has the possibility to edit, delete or mail the pdf shown.
Modified or created files for mail functionality are
- MailController.php
- EmailInsuraquest.php
- insuraEmail.blade.php
- web.php
Commands used Laravel Mailable Markdown class used for creating emails.
php artisan make:mail EmailInsuraquest --markdown=Email.insuraEmail
Mail controller, essentially we will define the have the logic to display the user’s list. Run the command to create the controller.
php artisan make:controller MailController
Possibility to test email function http://localhost:8000/send-email -> sends mail to mailTrap (account Bart)
todo: implement the mail functionality into the one search result
use GuzzleHttp\Ring\Client\MockHandler;
use Elasticsearch\ClientBuilder;
// The connection class requires 'body' to be a file stream handle
// Depending on what kind of request you do, you may need to set more values here
$handler = new MockHandler([
'status' => 200,
'transfer_stats' => [
'total_time' => 100
],
'body' => fopen('somefile.json'),
'effective_url' => 'localhost'
]);
$builder = ClientBuilder::create();
$builder->setHosts(['somehost']);
$builder->setHandler($handler);
$client = $builder->build();
// Do a request and you'll get back the 'body' response above
That was just a crash-course overview of the client and its syntax. If you are familiar with Elasticsearch, you'll notice that the methods are named just like REST endpoints.
You'll also notice that the client is configured in a manner that facilitates easy discovery via the IDE. All core actions are available under the $client
object (indexing, searching, getting, etc.). Index and cluster management are located under the $client->indices()
and $client->cluster()
objects, respectively.
Check out the rest of the Documentation to see how the entire client works.
Please note that this project is for use within the school context.
For further development, please contact te
The user may choose which license they wish to use. Since there is no discriminating executable or distribution bundle
to differentiate licensing, the user should document their license choice externally, in case the library is re-distributed.
If no explicit choice is made, assumption is that redistribution obeys rules of both licenses.