Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REST API: Incoherent /search result #1481

Closed
NicolasToussaint opened this issue Sep 20, 2019 · 8 comments · Fixed by #1499 or #2390
Closed

REST API: Incoherent /search result #1481

NicolasToussaint opened this issue Sep 20, 2019 · 8 comments · Fixed by #1499 or #2390
Labels

Comments

@NicolasToussaint
Copy link
Member

Description

There seems to be a problem with the REST API /search feature.

When uploading several times the same files, searching the file returns the correct number of entries, but all information is identical (match the first upload)

Tested via the REST API.

How to reproduce

  • Upload a zip file
    • { "code": 201, "message": 31, "type": "INFO" }
  • Add a file in the zip archive, and upload again
    • { "code": 201, "message": 32, "type": "INFO" }
  • Search for the file:
    • curl -k -s -S -X GET https://<server>/api/v1/search -H "Authorization:Bearer eyJ..." -H filename:reuse_3.zip | jq .
  • The results show twice the first upload:
[
  {
    "upload": {
      "folderid": 11,
      "foldername": "Reuse_D",
      "id": 31,
      "description": "Uploaded via REST",
      "uploadname": "reuse_3.zip",
      "uploaddate": "2019-09-20 09:19:29.074775+00",
      "filesize": 1144
    },
    "uploadTreeId": 269,
    "filename": "reuse_3.zip"
  },
  {
    "upload": {
      "folderid": 11,
      "foldername": "Reuse_D",
      "id": 31,
      "description": "Uploaded via REST",
      "uploadname": "reuse_3.zip",
      "uploaddate": "2019-09-20 09:19:29.074775+00",
      "filesize": 1144
    },
    "uploadTreeId": 275,
    "filename": "reuse_3.zip"
  }
]

Versions

Recent version fro Master

@NicolasToussaint
Copy link
Member Author

OK, this is driving me mad.
Something insane is happening here:

$result = $this->dbManager->getRows($sql);

Any help, at least to reproduce is welcome :-)

In this function, we build a SQL query to retrieve the uploads with the correct UserID and UploadID.
In my test:

  • I have 3 uploads with file reuse_4.zip
  • I call the REST API to search those uploads: curl -k -s -S -X GET https://.../api/v1/search -H "Authorization:Bearer $tk" -H filename:reuse_4.zip | jq .

The problem:

  • We call the function with the right uploadID (7, 8 and 9)
  • The SQL query is correct, and works well when executed directly in psql.
  • BUT printing the resulting array always shows the first of the two uploads (ID 7).

Now, if I force the uploadId to 8, at the start of the function, then the query executes well and produces returns the uploadId 8.

So the incoherent result only happen when building the SQL query using the input variable.

Same story with logs:
I print:

  • the uploadID at the start of the function
  • the SQL Query once it is built
  • the array $result after calling $result = $this->dbManager->getRows($sql);
  1. With untouched master (commit abafc8f)
--------------------
uploadId: 7 
SQL: SELECT
upload.upload_pk, upload.upload_desc, upload.upload_ts, upload.upload_filename,
folder.folder_pk, folder.folder_name, pfile.pfile_size
FROM upload
INNER JOIN folderlist ON folderlist.upload_pk = upload.upload_pk
INNER JOIN folder ON folder.folder_pk = folderlist.parent
INNER JOIN pfile ON pfile.pfile_pk = upload.pfile_fk
WHERE upload.user_fk = 3
AND upload.upload_pk = 7
ORDER BY upload.upload_pk; 
result: Array
(
    [0] => Array
        (
            [upload_pk] => 7
            [upload_desc] => Uploaded via REST
            [upload_ts] => 2019-09-20 14:36:42.570426+00
            [upload_filename] => reuse_4.zip
            [folder_pk] => 4
            [folder_name] => Reuse_C
            [pfile_size] => 1144
        )

)
 
--------------------
uploadId: 8 
SQL: SELECT
upload.upload_pk, upload.upload_desc, upload.upload_ts, upload.upload_filename,
folder.folder_pk, folder.folder_name, pfile.pfile_size
FROM upload
INNER JOIN folderlist ON folderlist.upload_pk = upload.upload_pk
INNER JOIN folder ON folder.folder_pk = folderlist.parent
INNER JOIN pfile ON pfile.pfile_pk = upload.pfile_fk
WHERE upload.user_fk = 3
AND upload.upload_pk = 8
ORDER BY upload.upload_pk; 
result: Array
(
    [0] => Array
        (
            [upload_pk] => 7
            [upload_desc] => Uploaded via REST
            [upload_ts] => 2019-09-20 14:36:42.570426+00
            [upload_filename] => reuse_4.zip
            [folder_pk] => 4
            [folder_name] => Reuse_C
            [pfile_size] => 1144
        )
)
 
--------------------
uploadId: 9 
SQL: SELECT
upload.upload_pk, upload.upload_desc, upload.upload_ts, upload.upload_filename,
folder.folder_pk, folder.folder_name, pfile.pfile_size
FROM upload
INNER JOIN folderlist ON folderlist.upload_pk = upload.upload_pk
INNER JOIN folder ON folder.folder_pk = folderlist.parent
INNER JOIN pfile ON pfile.pfile_pk = upload.pfile_fk
WHERE upload.user_fk = 3
AND upload.upload_pk = 9
ORDER BY upload.upload_pk; 
result: Array
(
    [0] => Array
        (
            [upload_pk] => 7
            [upload_desc] => Uploaded via REST
            [upload_ts] => 2019-09-20 14:36:42.570426+00
            [upload_filename] => reuse_4.zip
            [folder_pk] => 4
            [folder_name] => Reuse_C
            [pfile_size] => 1144
        )

)
  1. Forcing theuploadIdvariable
    At the start of the function, I execute: $uploadId = 8;
    All results are coherent (ie. returning uploads with ID 8)
--------------------
uploadId: 8 
SQL: SELECT
upload.upload_pk, upload.upload_desc, upload.upload_ts, upload.upload_filename,
folder.folder_pk, folder.folder_name, pfile.pfile_size
FROM upload
INNER JOIN folderlist ON folderlist.upload_pk = upload.upload_pk
INNER JOIN folder ON folder.folder_pk = folderlist.parent
INNER JOIN pfile ON pfile.pfile_pk = upload.pfile_fk
WHERE upload.user_fk = 3
AND upload.upload_pk = 8
ORDER BY upload.upload_pk; 
result: Array
(
    [0] => Array
        (
            [upload_pk] => 8
            [upload_desc] => Uploaded via REST
            [upload_ts] => 2019-09-20 14:36:44.671703+00
            [upload_filename] => reuse_4.zip
            [folder_pk] => 4
            [folder_name] => Reuse_C
            [pfile_size] => 1314
        )

)
 
--------------------
uploadId: 8 
SQL: SELECT
upload.upload_pk, upload.upload_desc, upload.upload_ts, upload.upload_filename,
folder.folder_pk, folder.folder_name, pfile.pfile_size
FROM upload
INNER JOIN folderlist ON folderlist.upload_pk = upload.upload_pk
INNER JOIN folder ON folder.folder_pk = folderlist.parent
INNER JOIN pfile ON pfile.pfile_pk = upload.pfile_fk
WHERE upload.user_fk = 3
AND upload.upload_pk = 8
ORDER BY upload.upload_pk; 
result: Array
(
    [0] => Array
        (
            [upload_pk] => 8
            [upload_desc] => Uploaded via REST
            [upload_ts] => 2019-09-20 14:36:44.671703+00
            [upload_filename] => reuse_4.zip
            [folder_pk] => 4
            [folder_name] => Reuse_C
            [pfile_size] => 1314
        )

)
 
--------------------
uploadId: 8 
SQL: SELECT
upload.upload_pk, upload.upload_desc, upload.upload_ts, upload.upload_filename,
folder.folder_pk, folder.folder_name, pfile.pfile_size
FROM upload
INNER JOIN folderlist ON folderlist.upload_pk = upload.upload_pk
INNER JOIN folder ON folder.folder_pk = folderlist.parent
INNER JOIN pfile ON pfile.pfile_pk = upload.pfile_fk
WHERE upload.user_fk = 3
AND upload.upload_pk = 8
ORDER BY upload.upload_pk; 
result: Array
(
    [0] => Array
        (
            [upload_pk] => 8
            [upload_desc] => Uploaded via REST
            [upload_ts] => 2019-09-20 14:36:44.671703+00
            [upload_filename] => reuse_4.zip
            [folder_pk] => 4
            [folder_name] => Reuse_C
            [pfile_size] => 1314
        )

)

@NicolasToussaint
Copy link
Member Author

It turns out that only the first query is properly executed - and result is somehow "reused" for following queries.

If I execute a random query with $this->dbManager->getRows(), the result from the second is ignored:

    $result = $this->dbManager->getRows("select user_pk, user_name from users  where user_pk = 3;");
    file_put_contents('php://stderr', "result: " . print_r($result, true) . " \n");

    $result = [];
    $result = $this->dbManager->getRows($sql);
    file_put_contents('php://stderr', "result: " . print_r($result, true) . " \n");

Gives

result: Array
(
    [0] => Array
        (
            [user_pk] => 3
            [user_name] => fossy
        )

)
 
result: Array
(
    [0] => Array
        (
            [user_pk] => 3
            [user_name] => fossy
        )

)

@piotreke12
Copy link
Contributor

Hi ,

I managed to find cause of those incorrect results.

Problem sits in src/www/ui/api/Helper/DbHelper/getUpload() method
For each search result, src/www/ui/api/Controllers/SearchController.php invokes getUpload() method for each upload_fk found in previously fetched search results:

$currentUpload = $this->dbHelper->getUploads( $this->restHelper->getUserId(), $results[$i]["upload_fk"])[0];

Inside getUpload method there is SELECT statement preparation (result can vary depending on parameters passed) and invokation of dbManager.getRows() method, but with only one parameter called $sqlStatement

$result = $this->dbManager->getRows($sql)

And here comes the problem:
dbManager->getRows($sql) uses preparedStatement mechanism that works like cache for sql statements. When there is a call to getRows() method with only sqlStatement parameter, getRows() method creates preparedStatement with name containing only caller's class and method names.

When there is another call form the same class::method pair, this caching mechanism uses and executes previously cached preparedStatement. There is no verification if new $sqlStatement is same with version stored in previously created preparedStatement

That's why Nicolas each time gets results for first query used in first getRows() method invocation.

To solve this issue, in src/www/ui/api/Helper/DbHelper/getUpload() we should prepare our own $statementName and pass it while invoking getRows() method.

It could be something like: REST_API_GetUploads_userId:3_uploadId:21. Then, when there is a method call for the same values, we will still use preparedStatament caching mechanism.

What do you think about this?
I can prepare PullRequest with my fix proposal if it sounds ok :)

As an alternative, we might consider changing this preparedStatement mechanism and change default statementName generation to contain sort of sqlStatement hash.

@piotreke12
Copy link
Contributor

I found a better solution instead of creating $statementName with parameters values in it - I will reuse $params[] array to pass parameters to getRows() method (same as it is in DbHelper/getUsers() method)

I will create a PR for this change as it is quite crucial for us

@YanickNoblanc
Copy link

Queries like the following
curl --connect-timeout "${CURL_TIMEOUT}" -k -s -S -X GET "${FOSSURL}/repo/api/v1/search" -H "Authorization: Bearer ${FOSSTOKEN}" -H "filename:${PKGFILENAME}"

Were working fine when using fossology 3.11.0. In order to try solving another problem, I upgraded fossology to 4.2.1.
And now all REST api search requests are failing returning something like this

<script>(function (c) {if (c && c.groupCollapsed) {
c.log("%c% ...
...
l");
}})(console);</script>

Is search function of REST API supposed to work with fossology 4.2.1 ?

@GMishx
Copy link
Member

GMishx commented Mar 10, 2023

And now all REST api search requests are failing returning something like this

<script>(function (c) {if (c && c.groupCollapsed) {
c.log("%c% ...
...
l");
}})(console);</script>

@YanickNoblanc that's not supposed to happen. Can you please share your error.log file?
It is generally located in /var/log/apache2/error.log

@YanickNoblanc
Copy link

YanickNoblanc commented Mar 10, 2023 via email

@GMishx
Copy link
Member

GMishx commented Mar 14, 2023

Thank you for the logs @YanickNoblanc . I will reopn the issue and provide a fix for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants