Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fingerprinting #620

Closed
Thorin-Oakenpants opened this issue Jan 25, 2019 · 14 comments
Closed

fingerprinting #620

Thorin-Oakenpants opened this issue Jan 25, 2019 · 14 comments

Comments

@Thorin-Oakenpants
Copy link
Contributor

Thorin-Oakenpants commented Jan 25, 2019

snip

@KOLANICH
Copy link

KOLANICH commented Jan 25, 2019

Privacy means no-one else but the sending and receiving party can read it.

Privacy means no-one else but the sending and receiving party can read about it.

Only Tor Browser (TB) users have a Tor fingerprint

There is ain't no thing as a Tor Browser fingerprint. Font fingerprints are pretty unique for combination of OS + FF version + environment (fonts, prefs, etc...).

is not trying to hide the fact that it is TB, and FF is not trying to hide the fact that it is FF, even with RFP - and no matter what you do or change to look like TB, you will always be FF, and easily spotted.

Isn't TB a Firefox? So if all the patches uplifted, ff will be differrent from TB only by prefs and logo. Changing the prefs will effectively make TB and FF the same.

@crssi
Copy link

crssi commented Jan 25, 2019

I am thinking now for a long time already that FPing is quite useless, if there is no tracking.
I don't say that FP should not be addressed (quite opposite), but if for example EvilCorp is unable to track you (and your browsing), then FP has lower value (almost none).

@Thorin-Oakenpants, Am I wrong?

@crssi
Copy link

crssi commented Jan 25, 2019

Do you see WebGL as security risk or FP risk or both?
I know about security. I am more interested in your opinion about FP. 😉

@crssi
Copy link

crssi commented Jan 25, 2019

Wot? That's what FP actually is... the ability to track a specific browser when all other tracking methods are nullified

I understand that, but when FP values (also referers, origins, etc.) are not passed to 3rd party then there the tracking is very limited or none.
So in reverse order... if tracking/sharing is eliminated, then FP has lower value.
I haven't forgot also that XHRs can be evil... but this is "extended" story.

@Atavic
Copy link

Atavic commented Jan 25, 2019

a real world study done approx a year ago in France

Maybe Beauty and the Beast? They used data from AmIunique.org

...same data used by P. Laperdrix to
design two countermeasures called Blink and FPRandom.

@Atavic
Copy link

Atavic commented Jan 25, 2019

^^FP-Scanner: The Privacy Implications of Browser
Fingerprint Inconsistencies

From reddit comment:

You achieve privacy by looking like everybody else

No, you don't. You're just into a big flock, moved around by Google and Cloudflare wolves.

Do you see WebGL as security risk or FP risk or both?

Greatly a FP risk, it exposes your graphics drivers version, hence your GPU.

Re: WebGL Security, see: https://www.khronos.org/webgl/security/

@seanob86
Copy link

Not sure if here's the right place to query.. and, I'm not so technical on this FP discussion but is something I am aware of.

I have decided to set RFP to false. I've been having issues with this set to true and will re-visit it at a later time.

With RFP = false I have enabled section 4600 of the user.js, but I have also installed the "CanvasBlocker" extension with block mode configured to "fake readout API". The effect this has is that it randomizes the canvas fingerprint on each page that is visited/refreshed.

I have also read various posts on reddit which only seem to suggest two options (1) Enable RFP or (2) install an extension such as "CanvasBlocker" to randomize the Canvas Fingerprint.

So, with RFP disabled and have enabled the settings listed under 4600 of ghacks - user.js. Should I also be bothered with installing an extension such as "CanvasBlocker" to randomize the Canvas fingerprint?

Thanks in advance!

@KOLANICH
Copy link

KOLANICH commented Jan 26, 2019

you'll always return a unique value, but it will change every time it is asked = not identifiable.

IMHO the wrong approach.

  1. additional fingerprinting vector, making the use of the addon randomizing the fingerprint easily detectable.
  2. may open window on exploiting PRGs for identification, if an insecure PRG is used.

IMHO a better approach is to generate a random identity for every party capable to track and return it deterministically.

And even better approach is to return always the same values, eliminating fingerprinting.

The devil is how to define the party. IMHO - a party is a webapp in whole, with all the CORS resources.

So the attacks like

evil1.com/fp.js (creates a fp1, spawns id, creates an iframe to evil2.com, sends fp1 to own server)  -> evil2.com/fp.html?id=hfjdkdkdjdj -> evil2.com/fp.js (creates a fp2, sends result to evil1.com)

evil1.com compares 2 fingerprints

won't work: the fingerprints are equal because evil1 and evil2 are considered the same party.

and neither

evil1.com/fp.js (computes the fingerprint, sends to the tracking network)
evil2.com/fp.js (computes the fingerprint, sends to the tracking network)
tracking network compares the fingerprints to crosslink a user between sites

would work, because evil1 and evil2 are different parties.

Though

evil1.com/track.html -> ./fp.js (redirects to evil2.com/track.html?id=ololo)
evil2.com/track.html?id=ololo -> ./fp.js 

would probably suceed on detecting spoofing

@KOLANICH
Copy link

As an overall strategy, I do not agree with raising entropy at all - all that does is create more work.

Do you mean that spoiling the stuff with fake data, if it is cheap to cause its production, if the data is indistinguishable from the real one within the budget affordable to tracking parties should not be used?

IMHO quite contrary. Let's define privacy that's capability to hide the data and metadata the subject prefers to hide. So even if we can produce deterministic fingerprints, randomizing fingerprints, if they are unlinkable

  1. causes tracking parties to waste space collecting fake data and computation resources on analysing it. But should be done with care - simulators can be fingerprinted themselves.
  2. spoils machine learning models performance and metrics.
  3. randomizing your results would result for less accurate predictions for user's case

1 + 2 makes the whole industry more costly to operate.
2 + 3 makes it's harder to extract useful information, so better privacy for the end user
1 + 2 + 3 makes the whole industry less profitable, so less investment into tracking peop

Though there is a flaw - it may be hard to cheaply and securely implement randomization because defeating it is primary area of tracking business, so they would have the researchers and resources to train the models, unlike Mozilla (though they have done some researches on ML, I have not seen such a progress on anti-tracking features), but even eliminating stockpiling of tracking data by everyone smaller than government-sponsored orgs should be beneficial.

@KOLANICH
Copy link

KOLANICH commented Jan 27, 2019

Assume the worst case scenario - that tracking companies will do whatever they have to in order to get more data, money is no object, they have a gazillion jillion bazillion trillion dollars.

If they had them, there would be no sense to do this activity. If they wanted money, they already had them. If they wanted power, it would be much easier to take over the world by buying whole states and then passing draconian laws. In reality every party has limited resources.

This is the problem with raising entropy. You then need to tell lies, and then more lies, and then eventually you slip up.

It's definitely true.

Seriously, just go read the TB design doc about lowering vs raising entropy

Let's analyse the section.

Strategies for Defense: Randomization versus Uniformity
When applying a form of defense to a specific fingerprinting vector or source, there are two general strategies available: either the implementation for all users of a single browser version can be made to behave as uniformly as possible, or the user agent can attempt to randomize its behavior so that each interaction between a user and a site provides a different fingerprint.

It's false dichotomy, they can be combined.

The fact that randomization causes behaviors to differ slightly with every site visit makes it appealing at first glance, but this same property makes it very difficult to objectively measure its effectiveness. By contrast, an implementation that strives for uniformity is very simple to evaluate. Despite their current flaws, a properly designed version of Panopticlick or Am I Unique could report the entropy and uniqueness rates for all users of a single user agent version, without the need for complicated statistics about the variance of the measured behaviors. FPCentral is trying to achieve that for Tor Browser by providing feedback on acceptable browser properties and giving guidance on possible improvements.

It is both true and false. Measurement of features and their randomization should be done separately. One measurement to evaluate if the features are fingerprintable, another one is to evaluate if the simulator model fingerprintable. For the purpose of measurement randomization should be disableable.

Randomization (especially incomplete randomization) may also provide a false sense of security. When a fingerprinting attempt makes naive use of randomized information, a fingerprint will appear unstable, but may not actually be sufficiently randomized to impede a dedicated adversary. Sophisticated fingerprinting mechanisms may either ignore randomized information, or incorporate knowledge of the distribution and range of randomized values into the creation of a more stable fingerprint (by either removing the randomness, modeling it, or averaging it out).

Improper randomization might introduce a new fingerprinting vector, as the process of generating the values for the fingerprintable attributes could be itself susceptible to side-channel attacks, analysis, or exploitation.

As I have said. It may be tricky to implement, but it doesn't mean it is useless.

Randomization is not a shortcut

Definitely it is not, we need BOTH.

While many end-user configuration details that the browser currently exposes may be safely replaced by false information, randomization of these details must be just as exhaustive as an approach that seeks to make these behaviors uniform. When confronting either strategy, the adversary can still make use of any details which have not been altered to be either sufficiently uniform or sufficiently random.

This is true.

Furthermore, the randomization approach seems to break down when it is applied to deeper issues where underlying system functionality is directly exposed. In particular, it is not clear how to randomize the capabilities of hardware attached to a computer in such a way that it either convincingly behaves like other hardware, or such that the exact properties of the hardware that vary from user to user are sufficiently randomized.

It's true.

Similarly, truly concealing operating system version differences through randomization may require multiple reimplementations of the underlying operating system functionality to ensure that every operating system version is covered by the range of possible behaviors.

It's also true, but is not about randomization. It should be done in order to unify the stuff. For example we need TCP stack to be unified. If it is unsuitable to unify it on kernel level, a userland tcp stack should be used.

When randomization is introduced to features that affect site behavior, it can be very distracting for this behavior to change between visits of a given site. For the simplest cases, this will lead to minor visual nuisances. However, when this information affects reported functionality or hardware characteristics, sometimes a site will function one way on one visit, and another way on a subsequent visit.

It's the price. When one unifies the stuff, the same issues arise.

Randomizing involves performance costs. This is especially true if the fingerprinting surface is large (like in a modern browser) and one needs more elaborate randomizing strategies (including randomized virtualization) to ensure that the randomization fully conceals the true behavior. Many calls to a cryptographically secure random number generator during the course of a page load will both serve to exhaust available entropy pools, as well as lead to increased computation while loading a page.

It's again true, but it doesn't addresses the 2 facts

  1. low entropy features need lower amount of random numbers.
  2. high-entropy features may need larger amount, but from one side it is usually better to provide it rather be denied of service. From another side it may make sense to boycott the websites blackmailing users and create a free alternative without any fingerprinting, because the alternative won't build itself ;), but it is the hard way of free software f zealots, and it is definitely infeasible to rebuild everything needed. Most of people are not zealots, so either we randomize, or a user disables all the protections and allows to be fingerprinted, if spoofing is detected.

@KOLANICH
Copy link

per first party domain? but what if load youtube as 3rd party on two sites?

Yes, per first-party webapp. Webapp is determined by cookies sharing and CORS. Each webapp gets a separate identity cleared "when cookies are cleared". Then each instance of youtube gets a different fingerprint. Each fingerprint is unlinkable to each other, because of different cookies sets. Of course there is a problem with IP address, so Tor should be used.

but what if close all those tabs, but then revisit?

If one keeps cookies between sessions, he is tracked.

but what if I change VPNs mid session?

new VPN will be linked to your new identity. It still will be since cookies are kept.

what happens in PB mode?

Everything discussed was for PB mode. For non-PB mode there are cookies.

what about sites I log into? If I randomize each session they will still know. Wait, I'm logged in. Phew. But now they know (and could share that). Better build in something else to always return the same fake canvas per site but only when logged in. I better store than info somewhere. Also, OMG .. how do I tell if you're logged in? I also better return this same value in third party requests.

  1. We can introduce pinning.
  2. There were addons like multifox, keeping separate cookie sets for each website, and they have been detecting signing in somehow (I have not analysed their source, but I guess they have detected signin forms), so they could detect and display usernames. So we can associate usernames with identities.

Even if pinning have failed, from the website PoV it would look like if a user have changed their browser. Not within a session, but each signin. (BTW, I constantly get messages from the services that I have changed my browser, it may mean thar RFP already incorporates some randomization).

@KOLANICH
Copy link

KOLANICH commented Feb 20, 2019

If some features of the fingerprint change, it is very probable that the fingerprint will become unique

IMHO not a problem, since a fake fingerprint is useless for linking profiles.

but it shows that inconsistencies and the lies that are told by the extension were easy to spot.

Is there any papers where the lies were generated by a neural net?

@seanob86
Copy link

seanob86 commented May 8, 2019

Here's mine: FF66 macOS

Screen Shot 2019-05-08 at 18 39 30

@seanob86
Copy link

seanob86 commented May 8, 2019

OK thanks for the feedback regarding the toolbar. I've now disabled it and regarding the fonts I had in my overrides the following

user_pref("browser.display.use_document_fonts", 1);
user_pref("gfx.downloadable_fonts.woff2.enabled", true);

If I disable the above overrides and just use what's in the user.js file I now get the following

Screen Shot 2019-05-09 at 08 47 05

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants