Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Insufficient data: No recent data available #174

Open
jcheger opened this issue Oct 14, 2019 · 9 comments
Open

Insufficient data: No recent data available #174

jcheger opened this issue Oct 14, 2019 · 9 comments

Comments

@jcheger
Copy link

jcheger commented Oct 14, 2019

I did install the plugin on 2 sites. One did work as expected, but the second one is stuck.

  • the plugin was installed about 6 months ago (should be long enough)
  • Training data statistics: So far the app has captured 30013533 logins (including client connections), of which 37 are distinct (IP, UID) tuples.
  • php -f occ suspiciouslogin:train => Not enough data, try again later (Insufficient data: No recent data available)

Any help how to get out of this would be welcome. Any file or db table to delete ?

@ChristophWurst
Copy link
Member

Could you try again? Do you have both ipv4 and ipv6 data? What version of the app do you use?

@jcheger
Copy link
Author

jcheger commented Oct 21, 2019

Nextcloud 16.0.5
Suspicious Login 1.0.0
IPv4 only

Still the same result: Not enough data, try again later (Insufficient data: No recent data available)

@ChristophWurst
Copy link
Member

That is strange. Could you run an SQL query to count the number of rows in oc_login_address_aggregated that have a first_seen larger than the unix timestamp from a week ago?

The only case where you might not have new IPs for the last week is when your IPs never change. But that seems unlikely.

@jcheger
Copy link
Author

jcheger commented Oct 21, 2019

MariaDB [nextcloud]> SELECT id,seen,
    ->   DATE_FORMAT(FROM_UNIXTIME(first_seen),'%Y-%m-%dT%TZ') as first_seen,
    ->   DATE_FORMAT(FROM_UNIXTIME(last_seen),'%Y-%m-%dT%TZ') as last_seen
    ->   FROM oc_login_address_aggregated
    ->   WHERE first_seen>DATE_SUB(NOW(), INTERVAL 1 WEEK);
Empty set, 44 warnings (0.00 sec)

I don't know what the records in this table mean. However, I did logout/login in a web browser, and restarted the client on a machine, without any modification in this table (neither the last_seen column).

FYI, I use TOTP on my own, but I also have a Synology that syncs in webdav. One of my colleague also syncs his Synology, but not sure he use the client. Users are also authed in LDAP (Active Directory).

If you have a doubt on my request, here is the content of the table:

MariaDB [nextcloud]> SELECT id,seen,
    ->   DATE_FORMAT(FROM_UNIXTIME(first_seen),'%Y-%m-%dT%TZ') as first_seen,
    ->   DATE_FORMAT(FROM_UNIXTIME(last_seen),'%Y-%m-%dT%TZ') as last_seen
    ->   FROM oc_login_address_aggregated;
+----------+----------+----------------------+----------------------+
| id       | seen     | first_seen           | last_seen            |
+----------+----------+----------------------+----------------------+
|        1 | 29307778 | 2019-06-04T22:34:36Z | 2019-10-09T06:41:36Z |
|      648 |    30123 | 2019-06-04T22:37:11Z | 2019-10-09T02:55:32Z |
|   215970 |       18 | 2019-06-05T15:43:41Z | 2019-09-27T15:13:51Z |
|   461456 |        3 | 2019-06-06T12:37:05Z | 2019-06-06T13:38:26Z |
|   564536 |        4 | 2019-06-07T21:49:51Z | 2019-06-07T21:57:41Z |
|  1537240 |        4 | 2019-06-11T11:59:12Z | 2019-06-11T11:59:13Z |
|  2160305 |        4 | 2019-06-14T09:52:23Z | 2019-06-14T10:40:49Z |
|  4678419 |       10 | 2019-06-23T19:45:52Z | 2019-06-25T19:41:16Z |
|  4884910 |      532 | 2019-06-24T10:17:24Z | 2019-10-08T08:59:55Z |
|  6286938 |       22 | 2019-06-28T14:21:34Z | 2019-07-06T13:25:16Z |
|  6664333 |     1317 | 2019-06-29T17:52:47Z | 2019-06-29T19:29:01Z |
|  6932598 |       26 | 2019-06-30T12:06:36Z | 2019-06-30T12:06:55Z |
|  8461734 |      104 | 2019-07-12T10:14:57Z | 2019-10-07T19:57:56Z |
|  9462170 |        2 | 2019-07-15T15:37:51Z | 2019-07-15T15:37:51Z |
|  9491559 |        2 | 2019-07-30T16:57:41Z | 2019-07-30T16:57:41Z |
|  9865499 |        2 | 2019-07-31T17:33:21Z | 2019-07-31T17:33:21Z |
| 12189113 |        3 | 2019-08-07T16:30:22Z | 2019-09-03T11:16:40Z |
| 12433925 |        4 | 2019-08-08T09:38:24Z | 2019-09-03T15:31:29Z |
| 13613275 |        2 | 2019-08-12T10:10:24Z | 2019-08-12T10:10:24Z |
| 13982567 |        3 | 2019-08-13T10:14:15Z | 2019-08-13T15:43:00Z |
| 14338698 |        3 | 2019-08-14T10:04:00Z | 2019-09-05T19:11:05Z |
| 14446679 |        2 | 2019-08-14T22:08:39Z | 2019-08-14T22:08:39Z |
| 14491331 |        2 | 2019-08-18T18:51:33Z | 2019-08-18T18:51:33Z |
| 14775786 |        2 | 2019-08-19T17:08:13Z | 2019-08-19T17:08:13Z |
| 15064891 |        3 | 2019-08-20T13:36:23Z | 2019-08-20T13:43:03Z |
| 15105664 |        6 | 2019-08-20T16:16:07Z | 2019-08-26T17:42:29Z |
| 17149344 |        2 | 2019-08-26T11:37:13Z | 2019-08-26T11:37:13Z |
| 17244033 |        2 | 2019-08-26T17:50:48Z | 2019-08-26T17:50:48Z |
| 18222581 |        7 | 2019-08-29T13:04:13Z | 2019-09-23T10:13:30Z |
| 19597374 |        2 | 2019-09-02T10:14:29Z | 2019-09-02T10:14:29Z |
| 19996955 |        4 | 2019-09-06T09:05:14Z | 2019-09-10T08:24:08Z |
| 20025304 |       79 | 2019-09-06T15:17:35Z | 2019-10-09T03:28:32Z |
| 20057593 |        2 | 2019-09-06T22:13:21Z | 2019-09-06T22:13:21Z |
| 20561952 |        3 | 2019-09-12T13:20:51Z | 2019-09-13T12:54:36Z |
| 20659650 |        3 | 2019-09-13T10:53:53Z | 2019-09-13T11:02:03Z |
| 21006513 |        2 | 2019-09-16T14:00:32Z | 2019-09-16T14:00:32Z |
| 21118706 |        5 | 2019-09-17T13:34:24Z | 2019-09-18T13:55:06Z |
| 22025968 |        2 | 2019-09-25T13:52:14Z | 2019-09-25T13:52:14Z |
| 22028864 |        2 | 2019-09-25T14:31:02Z | 2019-09-25T14:31:02Z |
| 22129515 |        2 | 2019-09-26T14:26:17Z | 2019-09-26T14:26:17Z |
| 22190039 |        5 | 2019-09-27T07:34:53Z | 2019-09-27T23:42:03Z |
| 22203054 |        2 | 2019-09-27T10:43:51Z | 2019-09-27T10:43:51Z |
| 22571308 |        2 | 2019-10-01T14:33:50Z | 2019-10-01T14:33:50Z |
| 22596178 |        2 | 2019-10-01T22:13:25Z | 2019-10-01T22:13:25Z |
+----------+----------+----------------------+----------------------+
44 rows in set (0.00 sec)

@ChristophWurst
Copy link
Member

I don't know what the records in this table mean. However, I did logout/login in a web browser, and restarted the client on a machine, without any modification in this table (neither the last_seen column).

The login data is not directly fed into that table. It first goes into oc_login_address and a background job updates the oc_login_address_aggregated asynchronously.

If you have a doubt on my request, here is the content of the table:

That is indeed strange. Do you use some sort of proxy in front of Nextcloud? Does Nextcloud even see the client IPs?

@ChristophWurst
Copy link
Member

I don't know what the records in this table mean

It's basically a compressed version of oc_login_address, in which every login is stored as a row. The aggregated data uses a counter to groups identic (uid,ip) tupes. The timestamps show when a (uid,ip) was used first and last. In your case this compressed 30M entries into <50 rows ;)

@jcheger
Copy link
Author

jcheger commented Oct 22, 2019

This instance of nextcloud is the only one I have without a reverse proxy. Instead, I have a NAT 1:1 configured in a pfsense (means that there is a dedicated IP address for this service, which is also used for outgoing traffic).

The 50 rows are not such a surprise. We are only few users, usually connecting from the same IP addresses.

@ChristophWurst
Copy link
Member

The problem here is: the current logic tries to split collected data into two sets: training data and validation data. Validation data is the IPs that have only been seen in the last week. The idea behind this is to give a metric of how well the model reacts to historically new data. If your IPs hardly ever change, there won't be anything new recently.

This is a conceptual problem. I'm not sure if this is solvable easily.

@diyoyo
Copy link

diyoyo commented Dec 12, 2022

So basically, your saying that the use of this app is irrelevant in case the instance is safe and only used by a few users?
What if there is one big attacker in these early stages of the nextcloud instance?

Honestly, I believe hackers have better to do than target ultra-small teams, so if this add-on is not useful in that particular case, I'd rather disable it to avoid Warnings in the log section.

It keeps telling me that the models are not present (Could not predict suspiciousness: No models found) or that there is not enough data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants