Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fulltextsearch doesn't add new documents from external cifs/samba storage #301

Open
lhurt opened this issue Apr 19, 2018 · 47 comments
Open

Comments

@lhurt
Copy link

lhurt commented Apr 19, 2018

Hi,
may be this is just a configuration issue.
I'm using nextcloud/owncloud already for a long time at least since version 6. Now I'm on Nextcloud 13.0.1 and recently (on version 12) changed from Nextant to elasticsearch.

Indexing of local files stored in Nextcloud itself works as a charme. But when it comes to indexing files on external mounted smb shares new files are not recognized as new ans thus not indexed.

Just a few remarks about my installation.

OS: Ubuntu 16.04 (i7-3773 @3.4 Ghz, 32GB RAM)
DB: Postgres 10
Webserver: Nginx

Elasticsearch 6.2.3 in Docker container with indices on SSD

Nextcloud cron selected for background job processing (not Ajax or webcron)

crontab
*/15  *  *  *  * www-data php -f /var/www/nextcloud/cron.php
*/17  *  *  *  * www-data php -f /var/www/nextcloud/occ files:scan --all --quiet
systemd daemon for live indexing
[Unit]
Description=Elasticsearch Worker for Nextcloud Fulltext Search
After=network.target

[Service]
User=www-data
Group=www-data
WorkingDirectory=/var/www/nextcloud
ExecStart=/usr/bin/php /var/www/nextcloud/occ fulltextsearch:live
Nice=19
Restart=always

[Install]
WantedBy=multi-user.target

All fulltext apps on most recent version 0.61 except Full text search - Files which is on 0.60 due to the lack of a newer version

Any help would be greatly appreciated.

Many thanks in advance.

@zafai
Copy link

zafai commented Apr 20, 2018

Hey @lhurt
I look like a known problem
#250

@lhurt lhurt closed this as completed Apr 25, 2018
@lhurt
Copy link
Author

lhurt commented Apr 25, 2018

Thanks for the information. We'll see after the update.

@lhurt lhurt reopened this Apr 25, 2018
@ferdiga
Copy link

ferdiga commented Apr 30, 2018

Hi
I think there is a new unexplained term "workingdirectory"
I assume that this is where the nextcloud code resides, not the data is "datadirectory"

@ArtificialOwl
Copy link
Member

Please keep me update after the release of fulltextsearch 0.7 (within the next few days)

@lhurt
Copy link
Author

lhurt commented May 21, 2018

Works for me now with 0.72.

@lhurt lhurt closed this as completed May 21, 2018
@lhurt
Copy link
Author

lhurt commented May 21, 2018

Sorry. Tested only by adding files to the external storage in Nexcloud via Web Interface. This works. If I add files to a samba share on a windows client it's not indexed.

@lhurt lhurt reopened this May 21, 2018
@lhurt lhurt changed the title Fulltextsearch 0.61 doesn't add new documents from external storage (samba) Fulltextsearch 0.61 nd 0.72 doesn't add new documents from external storage (samba) May 22, 2018
@lhurt lhurt changed the title Fulltextsearch 0.61 nd 0.72 doesn't add new documents from external storage (samba) Fulltextsearch 0.61 and 0.72 doesn't add new documents from external storage (samba) May 22, 2018
@ArtificialOwl
Copy link
Member

Might be an issue with the sync and/or the event of a new file is not dispatched, therefor, fulltextsearch is not aware that a new file have been uploaded

@lhurt
Copy link
Author

lhurt commented May 24, 2018

How can this be solved as I'd suppose this to be a very common use case?

@Sanookmakmak
Copy link

If I add files to a samba share on a windows client it's not indexed.

@lhurt did you tell your cifs client that it should use version 2?

@lhurt
Copy link
Author

lhurt commented May 27, 2018

@Sanookmakmak I suppose you mean the smb protocol version? I set it to smb2 but the issue remains.

@lhurt
Copy link
Author

lhurt commented May 29, 2018

@Sanookmakmak Now I even went to smb3 and still the issue remains.

@lhurt lhurt changed the title Fulltextsearch 0.61 and 0.72 doesn't add new documents from external storage (samba) Fulltextsearch 0.61, 0.72 and 0.80 doesn't add new documents from external storage (samba) Jun 15, 2018
@lhurt
Copy link
Author

lhurt commented Jun 15, 2018

Upgraded to 13.0.4 and unfortunately still no improvement

@lhurt
Copy link
Author

lhurt commented Sep 30, 2018

Upgraded to 14.0.1 and fulltextsearch 0.99.2/3/4 and still no solution

Current steps to reproduce

  1. Store any text file on directory located on samba share
  2. Delete index
  3. Rebuild index
  4. Start live indexing
  5. Copy file from step 1 in same directory with different name
  6. Wait 1 day
  7. Search for text contained in text file from step 1
  8. Result only one hit with filename from file in step 1

Expected:

2 hits. Files from step 1 and step 5

Very disappointing.

@lhurt lhurt closed this as completed Sep 30, 2018
@lhurt
Copy link
Author

lhurt commented Oct 3, 2018

Unintentionally closed. The issue still persists even with version 1.0

@lhurt lhurt reopened this Oct 3, 2018
@lhurt lhurt changed the title Fulltextsearch 0.61, 0.72 and 0.80 doesn't add new documents from external storage (samba) Fulltextsearch 0.61, 0.72,0.80, 0.99.3 and 1.0 doesn't add new documents from external storage (samba) Oct 3, 2018
@lhurt
Copy link
Author

lhurt commented Dec 20, 2018

With version 1.2.3 the issue seems to be resolved and everything is working as expected so far.

@lhurt lhurt closed this as completed Dec 20, 2018
@lhurt
Copy link
Author

lhurt commented Apr 7, 2019

Have to reopen it.

With Nextcloud 15.0.6 and full text search 1.2.5, full text search - files 1.2.6 files are not added to the index when they are created on the file system, e.g. when the folder is mounted as a Windows drive.

Did a complete reinstall/reindex to verify that it's not corrupted leftover data.
and yes, the entry
d /var/run/samba 2775 root www-data - -
is also present in /usr/lib/tmpfiles.d/samba.conf of the host system

I don't understand why it doesn't work.

@lhurt lhurt reopened this Apr 7, 2019
@lhurt lhurt changed the title Fulltextsearch 0.61, 0.72,0.80, 0.99.3 and 1.0 doesn't add new documents from external storage (samba) Fulltextsearch doesn't add new documents from external cifs storage Apr 7, 2019
@lhurt lhurt changed the title Fulltextsearch doesn't add new documents from external cifs storage Fulltextsearch doesn't add new documents from external cifs/samba storage Apr 7, 2019
@lhurt
Copy link
Author

lhurt commented May 17, 2019

New update 16.0.1 didn't improve the situation. Same behavior.

@ArtificialOwl
Copy link
Member

@icewind1991 would you care having a look ?

@theroch
Copy link

theroch commented May 22, 2019

You are using Ubuntu 16.04 with smb protocol v2 or v3?
Maybe this is related icewind1991/SMB/issues/56

@lhurt
Copy link
Author

lhurt commented May 22, 2019

Of course I use SMB > 1 as it is deprecated and Windows will only connect with workarounds that I don't want to apply.

Here's my relevant part of smb.conf

---- snip ----
client min protocol = SMB2
client max protocol = SMB3
---- snip ----

@lhurt
Copy link
Author

lhurt commented May 22, 2019

Nevertheles I just noticed that there may be an issue due to the fact, that i'm using docker. My installation is based on the fpm image which is itself based on debian stretch. And here the smbclient version is 4.5.16! which is very far behind as 4.10 is current.

I'll try changing the base image to fpm-alpine that should have a 4.10 smbclient and may be this solves it. As soon as I have results I'll post it.

@theroch
Copy link

theroch commented May 22, 2019

But the notify problem is only related to you if you use occ files_external:notify.
But if I see, you are using the cron with files:scan and you doesn't use the notify as in External Storage SMB/CIFS described.

@lhurt
Copy link
Author

lhurt commented Jun 13, 2019

This issue exists now for over 1 year and it seems like I'm the only one having this problem. IS this really the case? Is it such an extraordinary use case?

@lhurt
Copy link
Author

lhurt commented Jul 24, 2019

Still doesn't work with fulltext 1.3.6 and fulltext_files 1.3.5

Does anyone else have this working?

@theroch
Copy link

theroch commented Jul 24, 2019

I can confirm this problem with nextcloud 16.0.3 and fulltext 1.3.6 and fulltext_files 1.3.5.
If I create a new file directly on the share this file is not indexed. occ files_external:notify is running and live index is running too.

@theroch
Copy link

theroch commented Jul 26, 2019

I tested something one time and created a new file. The file doesn't seem to be updated via notify, but after about 24h the file was included in the fulltextsearch.

@Sx3
Copy link

Sx3 commented Oct 3, 2019

Same problem with N16
Going to test the N17 will post the results here

@ArtificialOwl
Copy link
Member

@Sx3 That would be nice. Please keep us updated.

@icewind1991, which is the one working on external storage, told me it should work flawlessly on NC17 if the cifs/samba is well configured

@Sx3
Copy link

Sx3 commented Oct 4, 2019

Okay I have tested Fulltext Search with Nextcloud 17.0.0...
Here is my set up.
CentOS 7
PHP 7.3.10
Apache 2.4.6
MariaDB Server 5.5.64
Elasticsearch 7.4.0
OpenJDK Runtime 1.8.0_222

Samba Settings
min protocol = SMB2

Config on /usr/lib/tmpfiles.d/samba.conf is updated with
d /var/run/samba 2775 root apache - -

And live search is enabled

**[Unit]
Description=Elasticsearch Worker for Nextcloud Fulltext Search
After=network.target

[Service]
User=apache
Group=apache
WorkingDirectory=/var/www/html/nextcloud
ExecStart=/usr/bin/php /var/www/html/nextcloud/occ fulltextsearch:live -q
ExecStop=/usr/bin/php /var/www/html/nextcloud/occ fulltextsearch:stop
Nice=19
Restart=always

[Install]
WantedBy=multi-user.target**

Then I have tested the following....
1 - A Local Folder is mapped using External Storage App - folder has following rights

image

2 - Windows network shared folder is mounted into a folder inside vm .

image
NOTE:- the permission allows other users and groups to edit.write etc...

it was mounted using cifs tools
sudo mount.cifs "//192.168.1.20/Company/search_test" /media/nextcloud_data_mounted/ -o credentials=/home/.smbcredentials,iocharset=utf8,file_mode=0777,dir_mode=0777

3 - Using SMB/CIFS option on External storage app folder on a windows server is mapped using
SMB/CIFS configurations.

Uploading ,deleting editing of files on all 4 locations (with the data folder) works fine.

and after stopping the live full text search service I created previously ..using following commands I ran indexing...

sudo -u apache php /var/www/html/nextcloud/occ fulltextsearch:stop
sudo -u apache php /var/www/html/nextcloud/occ fulltextsearch:reset
sudo -u apache php /var/www/html/nextcloud/occ fulltextsearch:test (tested everything works ok...)
sudo -u apache php /var/www/html/nextcloud/occ fulltextsearch:index

The index went sucessfully....pdf and word files were there in all above 3 locations and in the data folder of nextcloud...

Results : - Only the files inside the data folder was showing in the search results, No results for above 3 locations...

Then I have started the live full text search service and drag and drop files into the 4 locations and the same results occurred.

Anything wrong with my configurations above ?
Also

if the cifs/samba is well configured

can you publish some configs from a working (full text search working) SMB mapped setup it would be very helpful.

Also I have noticed that the elasticsearch index contains the extracted contents of files in SMB locations.
ex: -
image

Thank you.

@ArtificialOwl
Copy link
Member

so, you're saying that when running fulltextsearch:index all your files (local and remote) are indexed ? but the search is not returning the files from an external folder ? Can you check with a search on the filename instead of the content ?

when adding a file to the remote filesystem, the fulltextsearch:live is not triggered (you should run fulltextsearch:live in a screen instead to have more details). Can you check when adding a file to the local data folder (using the webclient) ?

Also, please paste the result from fulltextsearch:check.

@ArtificialOwl
Copy link
Member

ArtificialOwl commented Oct 4, 2019

note that you can use

  • fulltextsearch:document:provider <userid> files 150
  • fulltextsearch:document:index <userid> files 150
  • fulltextsearch:document:platform files 150

to see how the data are handle from different PoV (provider means current files, platform is the data from elasticsearch)

I also think that when you're moving files around on the SMB, a new FileID might be generated

@Sx3
Copy link

Sx3 commented Oct 7, 2019

so, you're saying that when running fulltextsearch:index all your files (local and remote) are indexed ? but the search is not returning the files from an external folder ?

yes

Can you check with a search on the filename instead of the content ?

it is not returning any results when search by the filename too.

when adding a file to the remote filesystem, the fulltextsearch:live is not triggered (you should run

fulltextsearch:live in a screen instead to have more details). Can you check when adding a file to the local data folder (using the webclient) ?
yes all the new files were added using the webclient..but at the time I ran index for the 1st time there were some files inside all 4 places...also I have tested adding new files to the 4 locations through web client while the live indexing is running.

fulltextsearch:check

image

@lhurt
Copy link
Author

lhurt commented Oct 11, 2019

Good Morning,
an update from my end. Just moved to NC 17. Issue still exists.

A new behavior is that I get the file change via occ:notify only if there's a smbclient command running inside the php container with notification on the respective directory.

The shell commands were

ludwig@mediacenter:/home/_documents/_ignoriere_mich$ cp test3.txt test4.txt
ludwig@mediacenter:/home/_documents/_ignoriere_mich$ cp test3.txt test6.txt
ludwig@mediacenter:/home/_documents/_ignoriere_mich$ rm test6.txt

The output of occ:notify was

docker exec --user www-data nextcloud_php_fpm php occ files_external:notify -vvvu user -p password 17
Self-test successful
added /_ignoriere_mich/test4.txt
modified /_ignoriere_mich/test4.txt
added /_ignoriere_mich/test6.txt
modified /_ignoriere_mich/test6.txt
removed /_ignoriere_mich/test6.txt

These lines were only printed if a smblient notify was running at the same time within the container

root@efa04dc51dca:/var/www/html# smbclient -U user //mediacenter/documents password
Try "help" to get a list of possible commands.
smb: \> notify _ignoriere_mich\
0001 test4.txt
0003 test4.txt
0001 test6.txt
0003 test6.txt
0002 test6.txt

I'm aware of the note in the admin section of NC17 about SMB update notifications, but the finding above indicates the assumption that an smbclient should be running where it's currently not.

@daita: Do you think I'm right? Could this help to find a solution?

Update: One more thing
NC shows in the file overview the last changed files and what's really staggering is, that it lists files created via local file system. I copy a file like this directly in my shell.
ludwig@mediacenter:/home/_documents/_ignoriere_mich$ cp test3.txt test8.txt
And the system immediately shows it as most recently changed when accessing the file overview
grafik
_/home/documents is mounted as _Dokumente and neither occ:notify nor smblient/notify were running. So the system seems to have information about this change, but the fulltextsearch doesn't return any result when searching for text8.txt filename or its file content.

May be this helps to get a better understanding what's going on.

Thanks a lot

@Sx3
Copy link

Sx3 commented Dec 6, 2019

any updates on the issue ?

@lhurt
Copy link
Author

lhurt commented Jan 18, 2020

Same with NC18.
Is my configuration that exceptional?
Is it really so hard to do it?
Am I doing something wrong?

NC knows about the file changes why are they not used for indexing?

@trendzetter
Copy link

trendzetter commented Feb 26, 2020

Same issue with me on nextcloud 18.0.1. I found that this bug is reported in multiple issues and on the nextcloud forum but very little response so far: https://help.nextcloud.com/t/full-text-search-finds-no-files-on-local-external-storage/62884

  • I can find the nextcloud manual from the sample content when I search for "test"
  • I can find the external files and content when searching directly in elastic search (eg: curl -XGET 'localhost:9200/nextcloud/_search?q=kevin&pretty ). But I cannot find anything external in nextcloud full text search.
  • No errors in the logs.

@daita sudo -u www-data php /var/www/nextcloud/occ fulltextsearch:check
Full text search 1.4.1

  • Search Platform:
    Elasticsearch 1.5.0
    {
    "elastic_host": [
    "http://127.0.0.1:9200"
    ],
    "elastic_index": "index_1",
    "fields_limit": "10000",
    "es_ver_below66": "0",
    "analyzer_tokenizer": "standard"
    }

  • Content Providers:
    Files 1.4.1
    {
    "files_local": "1",
    "files_external": "1",
    "files_group_folders": "1",
    "files_encrypted": "0",
    "files_federated": "0",
    "files_size": "20",
    "files_pdf": "1",
    "files_office": "1",
    "files_image": "0",
    "files_audio": "0"
    }
    `

@lhurt
Copy link
Author

lhurt commented Mar 7, 2020

This is so frustrating.

@R0Wi
Copy link
Member

R0Wi commented Mar 7, 2020

@daita could it be that the files from external smb are indexed but have no owner and therefore they're not taken into account when searching via NC? If so you might be interested in having a look at my comment here #546 (comment)

@lhurt
Copy link
Author

lhurt commented Mar 7, 2020

@daita could it be that the files from external smb are indexed but have no owner and therefore they're not taken into account when searching via NC? If so you might be interested in having a look at my comment here #546 (comment)

Don't think so. In my case the problem is not getting results. It works fine after the initial indexing. As soon as there are changes the live update doesn't detect them. For having the new files in the result list I have to do a manual indexing first.

@R0Wi
Copy link
Member

R0Wi commented Mar 7, 2020

@lhurt this sounds indeed like a different problem. Could you check your elastic index after the initial indexing and tell me which owner is written for the external files?

@lhurt
Copy link
Author

lhurt commented Mar 8, 2020

@R0Wi Just checked and the ownern is ""

--- snip --
"owner":"",
"groups":[
                  "Familie",
                  "Eltern"           
],
--- snip --

@lhurt
Copy link
Author

lhurt commented Sep 17, 2020

Looks like PR #100 in fulltext_elasticsearch will solve this issue.

@lhurt
Copy link
Author

lhurt commented Jul 27, 2021

Looks like PR #100 in fulltext_elasticsearch will solve this issue.

Unfortunately merging this PR seems to be a very time consuming task. So I still have hope, but not too much. For me it's surprising that so few people are complaining.

@mudi0
Copy link

mudi0 commented Mar 13, 2023

Have the same issue, after running manual index, the files can be searched in the searchbox, but new files do not come up.
have to run fulltextsearch:index

I having a local external storage, no SMB share.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants