New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sbserver differs from online/browser lookup? #30

Open
serpiente opened this Issue Aug 31, 2016 · 31 comments

Comments

Projects
None yet
@serpiente

serpiente commented Aug 31, 2016

I have noticed that the sbserver returns an empty response for some urls while Chrome browser and online lookup tool ( https://www.google.com/transparencyreport/safebrowsing/diagnostic/ ) does return a correct danger response. I have looked and the server is updating its list. Anyone know what is happening?

A sample url for which this happens.

http://www.precision-mouldings.com/.ls/.https:/.www.paypal.co.uk/uk.web.apps.mpp.home.sign.in.country.a.GB.locale.a.en.GB-6546refhs8ehgf8-890b7fefut9546954543ds867hgf9-1egey3ds4820435t546ggc-u4ydstgu5438gjksssGB/plmgeo.php

@dsnet

This comment has been minimized.

Member

dsnet commented Aug 31, 2016

Thanks for the bug report. We'll look into shortly.

@gliwka

This comment has been minimized.

gliwka commented Nov 23, 2016

Anything new on this?

@Heavenwalker

This comment has been minimized.

Heavenwalker commented Jan 26, 2017

Bumping this.... Anything new ?

@asieira

This comment has been minimized.

@asieira

This comment has been minimized.

asieira commented May 2, 2017

Just wanted to confirm that sblookup also reports this URL as safe:

| => echo "https://resolve-paypal.com-resolve-costumer.net/*id/webapps/a37f8/websrc%E2%80%9D" | sblookup -apikey '<redacted>'
safebrowsing: 2017/05/02 16:18:26 database.go:106: no database file specified
safebrowsing: 2017/05/02 16:18:30 database.go:336: database is now healthy
safebrowsing: 2017/05/02 16:18:30 safebrowser.go:504: Next update in 30m11s
Safe URL: https://resolve-paypal.com-resolve-costumer.net/*id/webapps/a37f8/websrc%E2%80%9D

Plus, this is the output of the test as indicated in the README file:

| => go test github.com/google/safebrowsing -v -run TestSafeBrowser -apikey '<redacted>'
=== RUN   TestSafeBrowser
--- PASS: TestSafeBrowser (0.78s)
PASS
ok  	github.com/google/safebrowsing	0.933s

Finally, I can confirm that there no problem with my API key since I can successfully query this URL using https://github.com/afilipovich/gglsbl on the same machine:

| => python
Python 2.7.13 (default, Dec 18 2016, 07:03:39)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.42.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from gglsbl import SafeBrowsingList
>>> sbl = SafeBrowsingList('<redacted>')
>>> sbl.update_hash_prefix_cache()
>>> sbl.lookup_url('https://resolve-paypal.com-resolve-costumer.net/*id/webapps/a37f8/websrc')
[SOCIAL_ENGINEERING/OSX/URL]
@asieira

This comment has been minimized.

asieira commented May 3, 2017

With further testing I noticed that when I specified a database file for sbserver and sblookup, the created file is only 6 megabytes. In comparison, the gglsbl Python module creates a local sqlite database that is over 1.4 gigs in size.

So maybe what's happening here is that the go client is silently failing to download and/or save the hash database locally.

@asieira

This comment has been minimized.

asieira commented May 15, 2017

Just wanted to share that the lack of feedback on this issue has led me to file this repository under "abandonware".

I am using https://github.com/afilipovich/gglsbl instead. It works great, is fast and the author is very very responsive to reported issues. Would recommend that @serpiente, @gliwka and @Heavenwalker take a look at this alternative too if they haven't found another already.

@gliwka

This comment has been minimized.

gliwka commented May 15, 2017

@asieira Thanks for the hint! Unfortunately I need the REST api, altough it should be possible to combine gglsbl with flask to get there.

@dsnet @colonelxc Any progress on this? Sbserver isn't working correctly at this point and the worst part is that it's failing silently! This could leave applications depending on it and their Users vulnerable!

@gliwka

This comment has been minimized.

gliwka commented May 15, 2017

/cc: @alexwoz

@asieira

This comment has been minimized.

asieira commented May 15, 2017

I have actually built a Flask + gunicorn dockerized REST server on top of gglsbl and was planning on open sourcing it. Would that help?

@gliwka

This comment has been minimized.

gliwka commented May 15, 2017

@asieira Sure, that would be amazing :-)

@dsnet

This comment has been minimized.

Member

dsnet commented May 15, 2017

I do not work in this team anymore, but I can assure you that this project is not abandonware.

@alexwoz

This comment has been minimized.

Member

alexwoz commented May 18, 2017

Hi everyone,

Thank you for all of your contributions to this repo and your patience while we investigated -- based on your reports/comments we've been able to clarify the issue.

As part of our API, some clients receive a different list of threats due to data sharing restrictions. This is why you may see discrepancies between the Go client and Safe Browsing-enabled browsers like Chrome. Upon investigating the bugs filed in this repo, we realized that there was a different problem afoot - a bug on the server-side - that will be patched in the coming weeks.

Thanks,
Alex

@hbakhtiyor

This comment has been minimized.

hbakhtiyor commented May 21, 2017

@asieira any updates?

@asieira

This comment has been minimized.

asieira commented Jun 5, 2017

Finally published the repo I had talked about before, you can find it at https://github.com/mlsecproject/gglsbl-rest if you want to try it out. Any comments and suggestions are most welcome.

@gliwka

This comment has been minimized.

gliwka commented Sep 11, 2017

@alexwoz @colonelxc
Any update on this issue? It's been a year, since this issue has been created.

@alexwoz

This comment has been minimized.

Member

alexwoz commented Oct 30, 2017

@gliwka This issue should be resolved. Please update this bug if you continue to experience any inconsistencies.

@wjgilmore

This comment has been minimized.

wjgilmore commented Nov 1, 2017

I'm running into the same issues described by other users who commented earlier in this issue thread. Notably, if I use https://transparencyreport.google.com/safe-browsing/search to search for a known malware URL such as 999fitness.com I'm correctly told "Some pages on this site are unsafe".

Yet when I use Postman/cURL/sblookup to classify 999fitness.com I receive an "empty" 200 response, indicating there is nothing wrong with the URL.

When I the Google API Explorer (https://developers.google.com/apis-explorer/?hl=en_US#p/safebrowsing/v4/safebrowsing.threatMatches.find) to classify the same URL, it just "spins" endlessly. As of right now the explorer has been running for 23 minutes without actually returning a response.

Reviewing the Google Cloud Platform API monitor, I'm told everything is just fine, and every one of my queries returned a 200.

I was going to post a question on the Google Safe Browsing API forum (https://groups.google.com/forum/#!forum/google-safe-browsing-api) but ironically it is full of spam.

Not complaining; just trying to figure out what exactly is going on with this service.

Jason

@colonelxc

This comment has been minimized.

Collaborator

colonelxc commented Nov 1, 2017

@wjgilmore

  1. I see the same problem as you with the API Explorer. I have created an internal bug with the applicable team.

  2. Regarding the transparency report, as compared to the safebrowsing lookup, there are some slight differences in utility and function. It is best explained with an example.

URL API lookup Transparency Report
foo.com Safe Some pages unsafe
foo.com/bad/ Malware This page unsafe/Malware
foo.com/bad/baz/ Malware This page unsafe/Malware
foo.com/good/ Safe Safe

Essentially the API is focused on answering the question, "Do we think it is safe to go to this site right now?". For foo.com, it is. The malware was on a different (more specific) path (or subdomain). This often happens when a site has been hacked. The attacker will add their own content and redirect users from other sites to the specific path/subdomain. This sometimes has no impact on the rightful content of the site, and so we try to minimize the scope of what is blocked to only the paths that will actually try to infect you.

The transparency report does API-style checks, but it also checks if there are more specific paths/subdomains that are known to be bad. So for the second and third URLs, it is responding the same as the API does. For the first URL, it knows that there are more specific paths that are known to be bad. So it says some pages are unsafe, even though foo.com is fine to visit on its own.

Does that help?

@alexwoz

This comment has been minimized.

Member

alexwoz commented Nov 1, 2017

Hi @wjgilmore,

Thanks for your message, and apologies for the confusion. I can see why the Transparency Report wording and Safe Browsing API responses appear to contradict one another. The Transparency Report communicates the extent to which the provided site is bad; in this case, the site is only "partially" bad ("Some pages on this site..."). The Safe Browsing API, however, will only return a verdict when the provided URL is definitively bad; i.e. we have determined that all URLs (including the root domain) are not unsafe for a user to access.

Hopefully that makes sense!

Alex

@wjgilmore

This comment has been minimized.

wjgilmore commented Nov 2, 2017

Hi @colonelxc and @alexwoz Thank you both for these detailed explanations. To summarize:

  • The Transparency Report is useful for determining whether a URL (and it's associated siblings/children/parents/grandparents) is "safe".
  • The Safe Browsing API is useful for determining whether a specific URL is safe.

Is my understanding correct? Our project attempts to determine whether any URLs found in an incoming text message contain potentially dangerous links (phishing, malware, etc). We were under the impression the Safe Browsing API would offer an ideal solution. However it is certainly possible the URL found in a text message would be "safe" yet ultimately lead the unsuspecting user to a subsequently dangerous endpoint. So it sounds like we're going to have to look for an alternative solution.

Thanks again, I really appreciate your time.

Jason

@alexwoz

This comment has been minimized.

Member

alexwoz commented Nov 2, 2017

Hey @wjgilmore,

As @colonelxc mentioned, the Safe Browsing API answers the question of whether the provided URL is safe for a user to access at this time. Your use case sounds very well-suited for this check. The Safe Browsing lists are intended to contain URL expressions from various points of the navigation, including those that users receive links to (e.g. through an SMS). If the initial URL redirects a user to an unsafe endpoint, then there's a good chance that the initial URL and those of subsequent navigations are all on a Safe Browsing list.

Hopefully that addresses some of your concerns.

Alex

@summera

This comment has been minimized.

summera commented Apr 5, 2018

@alexwoz @colonelxc I'm finding differences between the Safe Browsing API (what's returned from running the sbserver) and what's on https://transparencyreport.google.com as well.

The transparency report is saying that the url is unsafe but sbserver is returning an empty response.

screen shot 2018-04-04 at 8 52 07 pm

@summera

This comment has been minimized.

summera commented Apr 5, 2018

Found another:
screen shot 2018-04-04 at 9 03 29 pm

Is it possible that results from the API are more up to date than https://transparencyreport.google.com or are they using the same api?

@afilipovich

This comment has been minimized.

afilipovich commented Apr 5, 2018

Thanks @summera

Yeah, I saw such discrepancy in the past but I cannot tell which source is more up to date as I am not affiliated with Google.
Transparency report states "This info was last updated on Apr 1, 2018."

@summera

This comment has been minimized.

summera commented Apr 5, 2018

@afilipovich Thanks for the response! Very weird. So have you or anyone else been able to determine how accurate this is in a real world production environment? It seems to me, based on what's been reported in this issue and the google group and with my own simple tests, that there are a lot of false negatives being returned from the API. Since phishing and malware urls are constantly changing it's challenging to determine whether this is really going to catch much and how accurate it will be.

@alexwoz

This comment has been minimized.

Member

alexwoz commented Apr 5, 2018

Due to data sharing restrictions, the set of URLs accessible via the Safe Browsing API, Transparency Report, and web browser integrations may differ. It is our goal to ensure these discrepancies are as rare as possible, but it's not guaranteed.

@asieira

This comment has been minimized.

asieira commented Apr 5, 2018

I think any detection technology will have false negatives, no solution can claim to catch everything. So that is something we should already expect.

In particular, it seems to me the Google Safebrowing API must be removing malicious entries from their database either through an aging process or by detection of when they are no longer active. In any case, I will take a solution that does that to minimize false positives over a very noisy one every time.

@afilipovich

This comment has been minimized.

afilipovich commented Apr 5, 2018

You can try to compare results from gglsbl with Google Safe Browsing Lookup API.
https://developers.google.com/safe-browsing/v4/lookup-api

It does not use local cache so it has performance limitations, but it excludes possible issues with gglsbl client code.

@pravee9

This comment has been minimized.

pravee9 commented Jun 4, 2018

which database is specified in the database.go file line number 110 ?

@imfht

This comment has been minimized.

imfht commented Jun 28, 2018

same issue at http://58.194.172.18/Thesis/, any update?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment