Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full Text Search Finds No Results For External Storage #546

Closed
kleinfelter opened this issue Nov 3, 2019 · 30 comments
Closed

Full Text Search Finds No Results For External Storage #546

kleinfelter opened this issue Nov 3, 2019 · 30 comments

Comments

@kleinfelter
Copy link

Full Text Search is matching no results from my local external storage, using Nextcloud 17.0 on Ubuntu 18.04.3 with Apache 2.4.29 and PHP 7.2 It does find files in local storage.

  • I mounted local external storage as the folder ‘documents’.
  • I added 8000+ files to my ‘documents’ folder (not using Nextcloud - just copied them).
  • I installed and configured:
    • Elastic Search
    • Full text search
    • Full text search - Elasticsearch Platform
    • Full text search - Files

Configuration:

  • Address of the servlet = http://localhost:9200
  • Index = nextcloud
  • External Files = Index path and content

I ran occ fulltextsearch:index and waited for it to complete.

Search finds no documents.

The document ARE indexed:

  • Running curl -XGET 'localhost:9200/nextcloud/_search?q=kevin&pretty' finds lots of documents.

Full Text Search does find documents in internal storage; just not external storage:

  • I uploaded a file to my Nextcloud home folder in the browser, and search can find it.
  • I uploaded a file to my external ‘documents’ folder, and search cannot find it.

The output of your Nextcloud log in Admin > Logging:

No server logs
Everything is working fine

config.php is:

<?php
$CONFIG = array (
  'trusted_domains' =>
  array (
    0 => 'private1',
    1 => 'private2',
  ),
  'memcache.local' => '\\OC\\Memcache\\APCu',
  'trusted_proxies' =>
  array (
    0 => '192.168.1.1',
    1 => '192.168.1.10',
    2 => '127.0.0.1',
  ),
  'overwriteprotocol' => 'https',
  'overwritewebroot' => '',
  'overwritecondaddr' => '^192.168.1.10$',
  'forwarded_for_headers' =>
  array (
    0 => 'HTTP_X_FORWARDED_FOR',
  ),
  'onlyoffice' =>
  array (
    'verify_peer_off' => true,
  ),
  'session_lifetime' => 21600,
  'instanceid' => '12345',
  'passwordsalt' => '12345',
  'secret' => '23456',
  'datadirectory' => '/var/www/html/nextcloud/data',
  'dbtype' => 'mysql',
  'version' => '17.0.0.9',
  'overwrite.cli.url' => 'http://boxtop',
  'dbname' => 'nextcloud',
  'dbhost' => '127.0.0.1:3306',
  'dbport' => '',
  'dbtableprefix' => 'oc_',
  'mysql.utf8mb4' => true,
  'dbuser' => 'private3',
  'dbpassword' => 'private4',
  'installed' => true,
  'maintenance' => false,
  'app_install_overwrite' =>
array (
    0 => 'onlyoffice',
    1 => 'files_clipboard',
    2 => 'files_external_gdrive',
  ),
  'twofactor_enforced' => 'true',
  'twofactor_enforced_groups' =>
  array (
  ),
  'twofactor_enforced_excluded_groups' =>
  array (
  ),
  'mail_from_address' => 'email_relay',
  'mail_smtpmode' => 'smtp',
  'mail_sendmailmode' => 'smtp',
  'mail_domain' => 'example.com',
  'mail_smtpauthtype' => 'LOGIN',
  'mail_smtpauth' => 1,
  'mail_smtphost' => 'smtp.gmail.com',
  'mail_smtpport' => '587',
  'mail_smtpname' => 'private@example.com',
  'mail_smtppassword' => 'veryprivate',
  'mail_smtpsecure' => 'tls',
);
@kleinfelter
Copy link
Author

The user interface doesn't returns search results from local external storage on NC 16 either.

It is frustrating, because I can see the results when I use curl, but I can't get them into the user interface. (My users aren't going to be running curl by hand!)

@wiphi
Copy link

wiphi commented Dec 1, 2019

Same problem here.. Indexing with ./occ fulltextsearch:index and search via CURL for docs on external storage works fine.
Searching in Nextcloud displays no results from external storages, only from local files.
Setup:

  • Ubuntu 18 LTS
  • NC 17
  • PHP 7.2
  • Fulltextsearch with Elasticsearch

Anyone any suggestions? Thanks in advance.

@tomthecat
Copy link

Me either but with shared folders...

Indexing with ./occ fulltextsearch:index and search via CURL works as expected.
Searching in NC 17 (on CentOS) doesn't return any results, even no message like "no results found".

Since I use FTS in a productive environment, I am very interested in a solution for this matter.

Thanks in advance.

@wiphi
Copy link

wiphi commented Dec 16, 2019

Ok.. I debug a while and I think there is a problem with the owner rights in the search request.
In file apps/fulltextsearch_elasticsearch/lib/Service/SearchMappingService.php the function:

	private function generateSearchQueryAccess(IDocumentAccess $access): array {
		$query = [];
		$query[] = ['term' => ['owner' => $access->getViewerId()]];
		$query[] = ['term' => ['users' => $access->getViewerId()]];
		$query[] = ['term' => ['users' => '__all']];
		foreach ($access->getGroups() as $group) {
			$query[] = ['term' => ['groups' => $group]];
		}
		foreach ($access->getCircles() as $circle) {
			$query[] = ['term' => ['circles' => $circle]];
		}
		return $query;
	}

https://github.com/nextcloud/fulltextsearch_elasticsearch/blob/7dad22ec36df258eaeb2759080ebce3d1857206d/lib/Service/SearchMappingService.php#L318-334

build a search request for the logged in user (owner). Files on external storages didn't have a owner (owner is an empty string). I can provide PR, but I'm unshure about security risks.

For everybody who is interessted in a solution, can fix this by adding a simple line of code:

	private function generateSearchQueryAccess(IDocumentAccess $access): array {
		$query = [];
		$query[] = ['term' => ['owner' => $access->getViewerId()]];
                // add this:
                $query[] = ['term' => ['owner' => '']]; // Files on external storages didn't have owner information
		$query[] = ['term' => ['users' => $access->getViewerId()]];
		$query[] = ['term' => ['users' => '__all']];
		foreach ($access->getGroups() as $group) {
			$query[] = ['term' => ['groups' => $group]];
		}
		foreach ($access->getCircles() as $circle) {
			$query[] = ['term' => ['circles' => $circle]];
		}
		return $query;
	}

In my case the solution works and I can search for files on external storages 👍.

@tomthecat
Copy link

Thank you for your investigation. Glad to read that fixing the owner's rights fixes the problem for you.

Unfortunately, it doesn't help in my case. Do you have some advice how I can track down the root cause?

@wiphi
Copy link

wiphi commented Dec 17, 2019

I think it's an access problem, too..
Try the following: Run
curl -XGET 'localhost:9200/nextcloud/_search?q=[search] > search.json' on your NC server and post the output here. The JSON contains the plain search result vom elasticsearch without access checks. I'm interessted in the owner-tag. In my case it's empty.

@tomthecat
Copy link

I adopted your search query to curl -XGET 'localhost:9200/nextcloud/_search?q=[search term]&pretty' > search.json and received

{
  "took" : 88,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 10.184617,
    "hits" : [
      {
        "_index" : "nextcloud",
        "_type" : "standard",
        "_id" : "files:21831",
        "_score" : 10.184617,
        "_source" : {
          "owner" : "",
          "groups" : [
            "Allgemein"
          ],
          "circles" : [ ],
          "metatags" : [
            "files_group_folders"
          ],
          "source" : "files_group_folders",
          "title" : "path/to/pdf",
          "users" : [ ],
          "content" : (skipped content),
          "tags" : [ ],
          "attachment" : {
            "date" : "2010-09-29T07:53:37Z",
            "content_type" : "application/pdf",
            "author" : "pdfmaster",
            "language" : "de",
            "title" : "filename.ppt",
            "content_length" : 9935
          },
          "provider" : "files",
          "subtags" : [ ],
          "parts" : {
            "comments" : ""
          },
          "links" : [ ],
          "share_names" : {
            "USR1" : "path/to/pdf",
            "USR2" : "path/to/pdf",
            "USR3" : "path/to/pdf",
            "USR4" : "path/to/pdf"
          },
          "hash" : "4ddb49406b8c6ddccec2ba844e92effc"
        }
      }
    ]
  }
}

It seems you are right. But your fix didn't help.

@wiphi
Copy link

wiphi commented Dec 18, 2019

        "_source" : {
          "owner" : "",
          "groups" : [
            "Allgemein"
          ],
          "circles" : [ ],
          "metatags" : [
            "files_group_folders"
          ],
          "source" : "files_group_folders",

Maybe there is an other problem with group folders... In my case the groups and metatags are:

      "groups" : [
        ""
      ],
      "metatags" : [
        "files_external"
      ],

Your user is member of "Allgemein"?

@theroch
Copy link

theroch commented Jan 26, 2020

I can confirm this problem with nextcloud 16.0.7 and fulltextsearch 1.3.6.
Search with curl -XGET 'localhost:9200/nextcloud/_search?q=[search term]&pretty' works fine.
Search with kibnana too.

But the search in nextcloud shows only the number of results but no results.
Your patch doesn't work for me.

But I think my current problem has to do with the core of nextcloud. The releases for 1.3.6 fulltextsearch are all from july or august 2019 and the search for my user account worked till December 2019. I think it stopped working for me with the 16.0.7 update.
But till december it only worked for my user, all other users have not seen the search results for external storages. So if I solve my current problem, I will try your patch solution again for the others users.

@theroch
Copy link

theroch commented Jan 30, 2020

I've installed today fulltextsearch 1.3.7 and the search results returned to my user.
I will test it for the other users in the next few days.

@theroch
Copy link

theroch commented Feb 1, 2020

I've installed fulltextsearch 1.3.8 today, while using files_fulltextsearch 1.3.6, fulltextsearch_elasticsearch 1.3.6. and nextcloud 16.0.7
The search works now for all my users whithout problems

@wiphi
Copy link

wiphi commented Feb 1, 2020

Hmm.. I've updated my NC installation (17.0.3) to full text search Version 1.3.8 and full text search - elasticseach platform 1.4.1 an I get the following result:
2020-02-01_16_06_10-Dateien_Nextcloud
It seeams that NC finds something but shows no results...

@R0Wi
Copy link
Member

R0Wi commented Feb 9, 2020

I'm on full text version 1.4.1 and elastic platform 1.5 with NC 18 and i'm also encountering the error that external files (a samba share in my case) are not found. Like @wiphi stated there is a problem when checking the owner of a file because external files are indexed without an owner. So if i add the mentioned line of code or if i manally manipulate the elastic search document (setting my user as owner of the external file) all just works fine.

In my opinion the code is correct but lacks a sufficient check of external files. So there should be a check if

  • the file is stored in an external storage (source="files_external") and
  • the owner is empty and
  • a external storage where the current user is (at minimum) allowed to read is subpart of the title-field. (E.g. my user is allowed to read the samba share MyShare and the current title of the document is MyShare/MySubfolder/MyFile.pdf)

If all conditions match, the document should be taken into account for the current search.

By the way it's definately not enough to check whether the owner-field is empty. That would cause all users to see results of external files which they may not be allowed to read.

@jtitov
Copy link

jtitov commented Feb 10, 2020

Hi!
I also confirm problems with working fullsearch component.
tcpdump on port 9200 don't result when I fill search form
image
But when I doing search from occ fulltextsearch:test, I show tcpdump out result
image

curl "http://localhost:9200/my_index/_search?q=test&pretty"
{
"took" : 2287,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 10.02721,
"hits" : [
{
"_index" : "my_index",
"_type" : "standard",
"_id" : "files:710884",
"_score" : 10.02721,
"_source" : {
"owner" : "9F284F50-C2AD-420D-AAF4-822695D2BBEB",

I have installed NC 17.0.3 (stable) and fulltextsearch v1.3.8

@R0Wi
Copy link
Member

R0Wi commented Feb 10, 2020

@jtitov for me that sounds like a different kind of problem. In my case the request is properly sent to the elastic server but with "incomplete" parameters. The situation you mentioned is different because it seems that no request is sent from the NC php-backend to the elasticsearch-server. So maybe it would be worth opening a different issue?

@jtitov
Copy link

jtitov commented Feb 11, 2020

@jtitov for me that sounds like a different kind of problem. In my case the request is properly sent to the elastic server but with "incomplete" parameters. The situation you mentioned is different because it seems that no request is sent from the NC php-backend to the elasticsearch-server. So maybe it would be worth opening a different issue?

@R0Wi , the description of the problem says that the problem is with both local and external storage. I think that is the same application problem. Can you show tcpdump?

@R0Wi
Copy link
Member

R0Wi commented Feb 12, 2020

@jtitov for me it sounds like a connection problem between your webserver (NC instance) and elasticsearch server. The occ fulltextsearch:test command is executed via php cli so it can behave differently sometimes. Did you check your webserver and NC log files?

@jtitov
Copy link

jtitov commented Feb 12, 2020

@R0Wi, yes I inspect all log, and don't show errors or other warnings. On this I began to research TCP traffic and don't see any packets.
I didn’t stop and went on to investigate the problem, I tried to print the result of the search (args) function in the SearchService class and saw that it returns null for any search. In code function search execute method providerService->getFilteredProviders($request) which also returns null. I believe that the problem is somewhere deeper.

@lhurt
Copy link

lhurt commented Mar 7, 2020

Seems to be a duplicate of #301

@ravermeister
Copy link

Hi, any news on this issue? I have the same symptoms. Files which directly belong to the user works fine.
I Installed and configured elasticsearch successful.

  • fulltextsearch:check > no problems
  • fulltextsearch:index > no problems
  • fulltextsearch:live > didn't do that (where is the difference to the one above?)

output of fulltextsearch:check:

www-data@virusrockpro:~/cloud.rimkus.it$ php occ fulltextsearch:check
Full text search 1.4.2
 
- Search Platform:
Elasticsearch 1.5.2
{
    "elastic_host": [
        "http://192.168.1.12:9200"
    ],
    "elastic_index": "rimkuscloud",
    "fields_limit": "10000",
    "es_ver_below66": "0",
    "analyzer_tokenizer": "standard"
} 
 
- Content Providers:
Files 1.4.3
{
    "files_local": "1",
    "files_external": "1",
    "files_group_folders": "1",
    "files_encrypted": "0",
    "files_federated": "0",
    "files_size": "20",
    "files_pdf": "1",
    "files_office": "1",
    "files_image": "0",
    "files_audio": "0",
    "files_fulltextsearch_tesseract": {
        "version": "1.4.2",
        "enabled": "1",
        "psm": "4",
        "lang": "eng,ger",
        "pdf": "0",
        "pdf_limit": "0"
    }
}

a search via php occ fulltextsearch:search ravermeister kalkbrenner does not return any results:

www-data@virusrockpro:~/cloud.rimkus.it$ php occ fulltextsearch:search ravermeister kalkbrenner
search
> Files

but searching via elasticsearch directly returns a list of files (I shortened the result here):

 curl -XGET '192.168.1.12:9200/rimkuscloud/_search?q=kalkbrenner&pretty=true'
{
  "took" : 11,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 97,
      "relation" : "eq"
    },
    "max_score" : 10.674267,
    "hits" : [
      {
        "_index" : "rimkuscloud",
        "_type" : "standard",
        "_id" : "files:245709",
        "_score" : 10.674267,
        "_source" : {
          "share_names" : [ ],
          "owner" : "",
          "users" : [ ],
          "groups" : [ ],
          "circles" : [ ],
          "links" : [ ],
          "metatags" : [
            "files_external"
          ],
          "subtags" : [ ],
          "tags" : [ ],
          "hash" : "",
          "provider" : "files",
          "source" : "files_external",
          "title" : "Musik/Alben/Paul Kalkbrenner - Maximalive/Kalkbrenner - Maximalive CD Cover.jpg",
          "parts" : {
            "comments" : ""
          },
          "content" : ""
        }
      },
      {
        "_index" : "rimkuscloud",
        "_type" : "standard",
        "_id" : "files:232042",
        "_score" : 10.185858,
        "_source" : {
          "share_names" : [ ],
          "owner" : "",
          "users" : [ ],
          "groups" : [ ],
          "circles" : [ ],
          "links" : [ ],
          "metatags" : [
            "files_external"
          ],
          "subtags" : [ ],
          "tags" : [ ],
          "hash" : "",
          "provider" : "files",
          "source" : "files_external",
          "title" : "Musik/Andy Mantel/Paul Kalkbrenner - Bingo Bongo (2008)/Paul Kalkbrenner - Sky and Sand (Live).mp3",
          "parts" : {
            "comments" : ""
          },
          "content" : ""
        }
      },
      {
        "_index" : "rimkuscloud",
        "_type" : "standard",
        "_id" : "files:232041",
        "_score" : 10.093491,
        "_source" : {
          "share_names" : [ ],
          "owner" : "",
          "users" : [ ],
          "groups" : [ ],
          "circles" : [ ],
          "links" : [ ],
          "metatags" : [
            "files_external"
          ],
          "subtags" : [ ],
          "tags" : [ ],
          "hash" : "",
          "provider" : "files",
          "source" : "files_external",
          "title" : "Musik/Andy Mantel/Paul Kalkbrenner - Bingo Bongo (2008)/Paul Kalkbrenner - Sky and Sand (Bootleg Mix).mp3",
          "parts" : {
            "comments" : ""
          },
          "content" : ""
        }
      },
..... 

permissions for the external storage Seems good for me:

www-data@virusrockpro:/media/icy1/daten$ ls -ld musik
drwxrwxr-x 58 root users 4096 Sep  9  2018 musik
www-data@virusrockpro:/media/icy1/daten$ id www-data
uid=33(www-data) gid=33(www-data) Gruppen=33(www-data),100(users),124(redis),134(clamav)

and the External Storage Folder is included as Musik in nextcloud. What am I doing wrong?
Thanks in advance and kind regards
Jonny

@lhurt
Copy link

lhurt commented Sep 7, 2020

...

  • fulltextsearch:check > no problems
  • fulltextsearch:index > no problems
  • fulltextsearch:live > didn't do that (where is the difference to the one above?)

...

fulltextsearch:live is for updates of the index, see here

@lhurt
Copy link

lhurt commented Sep 7, 2020

@daita
This issue seems to be a duplicate of #301
Wouldn't it make sense to merge them?

And may be it would be nice to hear about any plan how nextcloud intends to handle this issue that is existing now for more than 2 years in the code.
Nextant, which was able to handle external storage, was dropped and now most of us having this problem feel left alone.

Thanks a lot.

@R0Wi
Copy link
Member

R0Wi commented Sep 7, 2020

@lhurt @ravermeister i implemented a fix for this in nextcloud/fulltextsearch_elasticsearch#100. Currently via the perspective of a modular software approach there are a few things mixed up and we need to have a plan on how to implement this properly but it works for the moment.
Note that this only fixes problems regarding the searchresult itself so if you have problems indexing your external files this would be a different issue.

@ravermeister
Copy link

ravermeister commented Sep 7, 2020

@R0Wi thanks, but I didn't get your Version to run with current nextcloud:

www-data@virusrockpro:~/cloud.rimkus.it$ php occ status
  - installed: true
  - version: 19.0.2.2
  - versionstring: 19.0.2
  - edition: 
www-data@virusrockpro:~/cloud.rimkus.it$ php occ app:enable fulltextsearch_elasticsearch
App "Full text search - Elasticsearch Platform" cannot be installed because it is not compatible with this version of the server.

www-data@virusrockpro:~/cloud.rimkus.it$ cd apps/fulltextsearch_elasticsearch
www-data@virusrockpro:~/cloud.rimkus.it/apps/fulltextsearch_elasticsearch$ git status
Auf Branch master
Ihr Branch ist auf demselben Stand wie 'origin/master'.

nichts zu committen, Arbeitsverzeichnis unverändert
www-data@virusrockpro:~/cloud.rimkus.it/apps/fulltextsearch_elasticsearch$ 
www-data@virusrockpro:~/cloud.rimkus.it/apps/fulltextsearch_elasticsearch$ git remote -v
origin	https://github.com/R0Wi/fulltextsearch_elasticsearch.git (fetch)
origin	https://github.com/R0Wi/fulltextsearch_elasticsearch.git (push)
www-data@virusrockpro:~/cloud.rimkus.it/apps/fulltextsearch_elasticsearch$ 

@R0Wi
Copy link
Member

R0Wi commented Sep 7, 2020

@ravermeister sorry for that, this is because the PR is a bit older now and NC did a major upgrad in the meantime. For a quick fix you could just change max-version to 19 here or you could instead checkout the current stable version v1.5.2 from the official repo and apply my patch like i described here .

@ravermeister
Copy link

ravermeister commented Sep 7, 2020

@R0Wi thanks, applying the patch to the 1.5.2 branch/tag worked. but

www-data@virusrockpro:~/cloud.rimkus.it$ php occ fulltextsearch:search ravermeister kalkbrenner
search
> Files
www-data@virusrockpro:~/cloud.rimkus.it$ 

still no results. do I have to re-index even if
curl -XGET '192.168.1.12:9200/rimkuscloud/_search?q=kalkbrenner&pretty=true'
prints results

Update: searching inside nextcloud works now. I don't understand but Thanks 👍

@ravermeister
Copy link

ravermeister commented Sep 7, 2020

hmm, but something is still weird, I have following scenario:

External Folder Mount Point Comment
/media/icy1/daten/musik Musik scanned and I can find files after the patch
/media/icy1/daten/musik2 Musik2 not scanned, and no results after and before the patch
/media/icy1/daten/videos Videos scanned and I can find files after the patch

why is musik2 not scanned? is it because a part of the foldername is the name of an already scanned folder?
what can I do to investigate?

Update: adding a new file (tested with .md file) seems to work (I ran the php occ fulltextsearch:live during the creation)
doing a re-index now. Will report back.

The Re-indexing did it. 👍

Thanks in advance

@claudiopolis
Copy link

For those still facing this issue:
The fix for me (this is NC19 installed as a plugin on FreeNAS, with the external files installed as local via the jail mount point mechanism), the key was not to leave the "Available for" field blank (which indicated would make it available to all users). Once I added the admin group and default admin user and re-indexed, I got search results even though the curl results showed owner as blank (but not group).

@R0Wi
Copy link
Member

R0Wi commented Sep 23, 2020

Interesting workaround :-) So if the group setting is explicitly set for the external files provider (i guess the smb provider in that case) this setting is reflected in Elasticsearch group property so therefore taken into account while indexing?

Will test that, too but from my point of view it would be nicer to check permissions for external files in the moment when creating a search request (see #546 (comment)). Otherwise after any changes to permissions you would have to run a full reindex.

@R0Wi
Copy link
Member

R0Wi commented Oct 23, 2022

Seems to be fixed in version 24. Please see nextcloud/fulltextsearch_elasticsearch#100 (comment) and comment if the issue still exists.

@R0Wi R0Wi closed this as completed Oct 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants