Browser attribute fingerprinting analysis [WIP] #78

14Richa · 2019-03-22T21:05:37Z

Jupyter notebook doing the analysis
Notes files to keep a list of threads and questions to follow

Update from mozilla overscripted

Analysis jupyter notebook and notes [WIP]

into analysis_intro

14Richa · 2019-03-26T22:07:41Z

Analysis in Pandas is the main file.
BrowserAttributeFingerprinting.md contains my notes when doing Literature survey.
Overview And Notes.md contains an overview and my understanding of the problem and dataset.

birdsarah

Notes as I go

Excellent introduction and write-ups
It is not sufficient to run this on just the sample file, please run on the 10% sample - now you have honed your analysis, this should be straight forward - only read in the columns you need and it should go pretty quickly.
You use df_plugins['script_url'].value_counts() to determine "which script is being used the most" - think about what you're actually counting here and how / why it might be biased.
You end up stumped on the question that metrika js appears to be only looking at flash plugin based on res_df['symbol'].value_counts() but you've restricted your data to only be about plugins and mimeTypes - is that what you wanted?
You chose not to use 'Cwm fjordbank glyphs....' as heuristic for finding incidences of fingerprintjs because it is a panagram. What are other uses for panagrams in javascript, how would they show up in a dataset like this? Can you show me a script that is using 'Cwm fjordbank ....' but is not fingerprintjs or a slight modification of it?
You say "almost all calls are made around same time and it is querying a bunch of attributes to produce a hash" - what is the distribution of timestamps and how "close" are the timestamps you're referring to relative to the general distribution of timestamps.
You say "I want to test if I just filter on rare symbols can I catch fingerprint.js calls? Hypothesis is that these rare calls to symbols is only done by fingerprinting scripts. As expected sessionStorage is pretty common followed by ShockWaveLength. The count reduces a lot for FingerPrint, doNotTrack and FuturesplashSuffixes." I don't feel you've made your case well, if at all. A statistical justification would certainly be possible. But more simply than that show me a bar chart (or something) with the average population prevalence compared to the fingerprint prevalence. (Then my follow-up question will be how does that compare to hs-analytics or akam.)
"Some of these like cloudfront.net are CDNs and can be overlooked." Why can they be overlooked?
Good work on the metrika detection.
I don't see why you need the "domains" and the "base_url" as you work with dask you'll want to keep this processing to a minimum - pick one - probably doesn't matter too much which for now.

Small coding things

General clean-up, python formatting and coding style:
- parse_base_url needs an extra space before return
- write_csv function is unnecessary
- commas have a space after them
- get_end_of_path function isn't used
Rdf = df[IMP_COLUMNS] don't do this. Only read in what you need to start with: df = pd.read_parquet(PARQUET_FILE, engine='pyarrow', columns=IMP_COLUMNS)
You don't need all the columns in IMP_COLUMNS now you've got a better feel for the data only read in what you need each time.

Big picture round-up

You need to run this on the 10% dataset.
Think about all the comments in "Notes as I go" and keep pulling at the threads you're developing.
I'd love it if you actually took a position, declared your heuristic, ran it and had a resulting list of scripts that you're heuristic labels as "browser attribute fingerprinting" - how would you then go about deciding whether it had done a good job.
Great work so far.

14Richa · 2019-03-29T22:32:50Z

Hey Sarah,
Thanks for your suggestions, I am adding changes as I go. I thought few points were not clear in my analysis. I have added explanation here for them.

* Excellent introduction and write-ups

Thanks!

* It is not sufficient to run this on just the sample file, please run on the 10% sample - now you have honed your analysis, this should be straight forward - only read in the columns you need and it should go pretty quickly.

Done

* You use `df_plugins['script_url'].value_counts()` to determine "which script is being used the most" - think about what you're actually counting here and how / why it might be biased.

I am not actually counting which script is being used the most, I am interested to know which script is calling navigator.plugin and navigator.mimeTypes the most. Therefore I am doing this on already reduced dataset (df_plugins -- which contains only those rows with calls to above mentioned symbols). Hypothesis here is that this would highlight the scripts which are abusing these symbols to gather information about multiple plugins etc.

* You end up stumped on the question that metrika js appears to be only looking at flash plugin based on `res_df['symbol'].value_counts()` but you've restricted your data to only be about plugins and mimeTypes - is that what you wanted?

Yes, so my reasoning is something like this --> shortlist the scripts which query information on plugins, find scripts which use this query a lot and then see for which plugins do these scripts query. metrika.js is the top user of navigator.plugin and navigator.mimeTypes but it only queries about flash players and not other kind of plugins. That is why I am stuck.

* You chose not to use 'Cwm fjordbank glyphs....' as heuristic for finding incidences of fingerprintjs because it is a panagram. What are other uses for panagrams in javascript, how would they show up in a dataset like this? Can you show me a script that is using 'Cwm fjordbank ....' but is not fingerprintjs or a slight modification of it?

Yes, so what I am trying to do here is to find all instances of fingerprint.js or fingerprint2.js. There are other scripts as well which use 'Cwm fjordbank glyphs....' I do not want to focus on them for the current analysis. I just want to look at what fingerprint.js/fingerprint2.js is doing. Like I said in the notebook --- "Here I am interested in looking at all calls of fingerprint2.js. I want to understand what all arguments and values are associated with calls to fingerprint2.js. Can I infer a pattern with such calls and filter the calls to fingerprint2.js without explicitly looking for it?" I agree that we can find more scripts with the panagram but I am interested only in fingerprint.js and fingerprint2.js

* You say "almost all calls are made around same time and it is querying a bunch of attributes to produce a hash" - what is the distribution of timestamps and how "close" are the timestamps you're referring to relative to the general distribution of timestamps.

Interesting point, let me think more on this that how can I see the general distribution of timestamps. Do you know any visualization tools for this? I want something of a clustering but in time-space.

* You say "I want to test if I just filter on rare symbols can I catch fingerprint.js calls? Hypothesis is that these rare calls to symbols is only done by fingerprinting scripts. As expected sessionStorage is pretty common followed by ShockWaveLength. The count reduces a lot for FingerPrint, doNotTrack and FuturesplashSuffixes." I don't feel you've made your case well, if at all. A statistical justification would certainly be possible. But more simply than that show me a bar chart (or something) with the average population prevalence compared to the fingerprint prevalence.  (Then my follow-up question will be how does that compare to hs-analytics or akam.)

I am littel confused here. My idea was simply that less common symbols would be called by fingerprinting scripts (assuming fingerprinting scripts are very less in number compared to clean scripts). I think plotting a graph of calls to FuturesplashSuffixes in general population vs reduced dataset (containing only fingerprinting scripts) can help checking this point.

* "Some of these like cloudfront.net are CDNs and can be overlooked." Why can they be overlooked?

These are content delivery networks, host to many files. Can't directly be blamed for serving fingerprinting files

* Good work on the metrika detection.

Thanks!

* I don't see why you need the "domains" and the "base_url" as you work with dask you'll want to keep this processing to a minimum - pick one - probably doesn't matter too much which for now.

Sure, will do.

birdsarah · 2019-03-30T15:19:52Z

(I'm replying one at a time as I'm on my phone)

You use df_plugins['script_url'].value_counts() to determine "which script is being used the most" - think about what you're actually counting here and how / why it might be biased.

I am not actually counting which script is being used the most

Your words said "the most," so that's what I read. Definitely focus on being specific.

I am interested to know which script is calling navigator.plugin and navigator.mimeTypes the most. Therefore I am doing this on already reduced dataset (df_plugins -- which contains only those rows with calls to above mentioned symbols). Hypothesis here is that this would highlight the scripts which are abusing these symbols to gather information about multiple plugins etc.

What I want you to think about is this: you are using number of rows to make inferences. What does that mean in the dataset? What do the rows represent? And what you can infer if that's what you choose to count?

birdsarah · 2019-03-30T16:40:40Z

You chose not to use 'Cwm fjordbank glyphs....' as heuristic for finding incidences of fingerprintjs because it is a panagram. What are other uses for panagrams in javascript, how would they show up in a dataset like this? Can you show me a script that is using 'Cwm fjordbank ....' but is not fingerprintjs or a slight modification of it?

Yes, so what I am trying to do here is to find all instances of fingerprint.js or fingerprint2.js. There are other scripts as well which use 'Cwm fjordbank glyphs....' I do not want to focus on them for the current analysis.

It's not a huge deal for your analysis but to be clear about the point I'm trying to make: the goal was to find instances of this library. Which is a more likely: that a developer has kept the name fingerprint.js is, or that they have kept using their methodology of using "Cwm fjordbank ...." In fairness, I never provided evidence that the "Cwm fjordbank...." lookup is superior, but similarly you haven't demonstrated that all instances of scripts names "fingerprint.js" are the correct library.

I don't particularly want you to change anything but to think critically about the choices you are making.

birdsarah · 2019-03-30T16:46:28Z

Do you know any visualization tools for this? I want something of a clustering but in time-space.

Nothing springs to mind hisogram type things should be good enough for thinking about distributions.

birdsarah · 2019-03-30T16:58:20Z

You say "I want to test if I just filter on rare symbols can I catch fingerprint.js calls? Hypothesis is that these rare calls to symbols is only done by fingerprinting scripts. As expected sessionStorage is pretty common followed by ShockWaveLength. The count reduces a lot for FingerPrint, doNotTrack and FuturesplashSuffixes." I don't feel you've made your case well, if at all. A statistical justification would certainly be possible. But more simply than that show me a bar chart (or something) with the average population prevalence compared to the fingerprint prevalence. (Then my follow-up question will be how does that compare to hs-analytics or akam.)

I am littel confused here. My idea was simply that less common symbols would be called by fingerprinting scripts (assuming fingerprinting scripts are very less in number compared to clean scripts).

There's a quite a few assumptions in ideas here and you haven't made a case for any of them. Let's unpack them

(1) "less common symbols would be called by fingerprinting scripts" - I don't believe that to be true, but you certainly could present evidence to make that point and that would be interesting to see

(2) "assuming fingerprinting scripts are very less in number compared to clean scripts"
(a) first going back to my earlier point make sure you're clear on what you're counting and whether it helps you find you the information you want
(b) I really don't see how the commonness of fingerprinting scripts relates to the the relative frequency of symbol calls by those scripts
(c) how are you going to separate clean scripts from fingerprinting scripts to answer this question. Is everything that is not a fingerprinting script "clean"? What about all the scripts we haven't identified yet.

birdsarah · 2019-03-30T17:00:58Z

These are content delivery networks, host to many files. Can't directly be blamed for serving fingerprinting files

You need to make this justification in your writing not to me.

Just as a thought experiment: If you are going to take the position that CDNs cannot be "blamed" for fingerprinting scripts then all fingerprinters would just move their content to a CDN. What should we do in that case if we want to stop fingerprinting?

birdsarah · 2019-03-30T21:57:32Z

I think my comments are written more negatively than I intend, because I don't intend them negatively at all. There's a LOT to dig into here and you're well on your way.

In particular, I have deleted "You are missing my point." - that is not helpful language to use on my part and I apologize.

14Richa · 2019-04-01T22:44:01Z

Added a new file --- Analysis in dask. It contains analysis on 10% dataset using dask. Please ignore Analysis in pandas. That is an old file.

birdsarah · 2019-04-04T08:22:18Z

Added a new file --- Analysis in dask. It contains analysis on 10% dataset using dask. Please ignore Analysis in pandas. That is an old file.

Please remove obsolete files. If doing this with git isn't familiar to you don't hesitate to ask.

14Richa · 2019-04-04T20:57:37Z

Added a new file --- Analysis in dask. It contains analysis on 10% dataset using dask. Please ignore Analysis in pandas. That is an old file.

Please remove obsolete files. If doing this with git isn't familiar to you don't hesitate to ask.

I was wondering if it should be removed totally? Isn't it a good idea to have the analysis in pandas as well, for someone to use it in case they have memory/system constraints. Though I agree that the two notebooks will go out of sync very soon and it will be a hassle to keep updating both of these.

birdsarah · 2019-04-04T23:31:21Z

Isn't it a good idea to have the analysis in pandas as well, for someone to use it in case they have memory/system constraints.

I would say no. The hello_world.ipynb already shows loading data in dask vs pandas. There's no case for duplicate analysis. Analysis of one file with pandas isn't meaningful it was just a useful stepping stone for you getting to where you are. Also, you're not totally removing it, it will always be in the commit history.

birdsarah

Notes as I go:

Don't leave the print out of hundreds or thousands of rows in your notebook, it hinders comprehension. You will definitely look at this content while exploring, but clean up before review.
len(df.script_url.unique()) -> df.script_url.nunique()
df['location_domain'] = df.location.apply(extract_domain) -> df['location_domain'] = df.location.apply(extract_domain, meta='O') ('O' is object which is all we have available for strings)
df[df.symbol.str.contains('navigator.mimeTypes|navigator.plugins')] nice
"These days some browsers don't return an array of plugins directly, except the most common plugins such as Shockwave flash, Java, etc." citation please
"That is all queries to window.navigator.plugins[Shockwave Flash].description resulted in Shockwave Flash 28.0 r0. This is strange." Why is it strange? This data was collected in a crawl. That is identical machines were setup to crawl the web and their profiles were reset between every visit to a website. "There seems to be a bias in the dataset." Agreed. "Strange but on a brighter side difficult to fingerprint :)" Unfortunately you can't make this inference because this is not a population sample of the variation of plugins.
"Memory usage for df_plugins is less. I can take all of this in pd dataframe and use pivots to analyze." Good thinking. Dask does have a pivot option. But converting to pandas when you can definitely makes things nicer.
I was about to write: "I don't think you needed a pivot table. I think a groupby would have got you there df_plugins_pd.groupby(['location', 'script_url', 'symbol']).count()" but that is wrong. I see what you've done and I see that you're were getting the length of unique symbols. Perhaps at somepoint we can brainstorm how to make this a bit cleaner and more obvious.
In your analysis 2 you find 0 hs-analytics. Earlier you have noted that hs-analytics is a fingeprinting script, what do you think is going on?
Avoid hardcoding numbers There are 166862 unique script_urls in the dataset. From this we have identified 790 (725+53+12) unique URLs which definitely host fingerprinting scripts and another 888 potential urls worth checking out. You could rewrite this as a code cell f'There are {len(unique_scripts:,} unique script_urls in the dataset. From this we have identified {sum(n_scripts)} unique URLs which definitely host fingerprinting scripts and another {n_new} potential urls worth checking out.' While it might seem counter to other things I'm arguing for the oddness of duplicated text is out-weighed by the robustness of not transcribing numbers, and the re-usability for running this against a future dataset.
"So this script always asks for same 10 symbols which we can see below." Only because you've restricted your starting point to df_plugins which is the subset of scripts that calls plugins. Maybe that's what you're interested in but this statement is misleading.
"metrika/watch.s can be used for browser plugin fingerprinting." This is true, but I don't think you've really shown it. There is much more evidence in the dataset for you to make this claim much more convincing. Why not just look at all the symbols metrika is getting?
"I have found that the above string is a panagram and can be used in other fingprinting scripts" - I clearly still haven't explained this well enough. Let me try again. fingerprintjs2 is not just a browser attribute fingerprinting script. It also does canvas fingerprinting. It's characteristic canvas fingerprinting feature is the call to "Cwm fjordbank....". This enables you find as many of the fingerprintjs2 / fingerprintjs2-like scripts and then examine them for the browser attribute fingerprinting within them
"Therefore I have looked for "fingerprint" in the URL column of the dataset." As I already mentioned, you haven't provided evidence that this is actually capturing the fingerprintjs2 library and not just other scripts with the word fingerprint in the url which may or may not be what you want
The reason I care about this is because I believe you're cutting yourself off from data df[df.script_url.str.contains('fingerprint', case=False)].script_url.nunique().compute() returns 78 scripts, df[df.argument_0.str.contains('Cwm fjordbank glyphs vext quiz')].script_url.nunique().compute() returns 505 scripts. The intersection between the lists is 36. If you look at the remaining 42 I can see a number that fall into this category - although i am happy to concede there are many that do look on quick inspection like what we were looking for but were not picked up by "Cwm fjordbank...." That said, of the 505 scripts detected by "Cwm fjordbank" 499 have plugin calls. If you only wanted to take the 499 that have plugin calls as an indicator of browser attribute fingerprinting, that seems fairly reasonable. You'll have missed a few that the "fingerprint" approach got, but you're still up 400 examples with, I believe, fewer false positives.
What you've got in the current analysis seems to me like an example of how you can have your data tell you what you're expecting it to. You haven't asked questions against your initial belief - e.g. how many "Cwm fjordbank" scripts are reading plugins - which you've already articulated as a hallmark of browser attribute fingerprinting.
Well done for exploring and then ruling out, in the interests of time, the timestamp work - you were on the right track with your thinking here and i definitely think it could be explored in the future.
"Is there an automatic way to download the linked javascript file from script_url and parse it to look for keywords like "murmurhash", "hashset", "fingerprint"?" Not without much pain :D. But it is partially possible. That said, future crawls will collect that data at the time of crawl and include it in the dataset.
I would be interested to see not just the list of symbols, but the value_counts or perhaps more comparable normalized value counts per script. you could plot these to see if they all look similar - a somewhat tricky problem but potentially illuminating.
Your questions at the end are starting to get into this.

Overall: A great improvement. Well done. Above are a lot of points. I think the most important of them is not the specifics but the principles in our ongoing back and forth about the validity of "fingerprint" vs "Cwm fjordbank". Moving forward, this could go on forever, but it shouldn't! I would like to think through a concrete set of refinements that will get this to a mergeable analysis contribution. I'm afraid I don't have this for you today as I have a lot of PRs to review, but lets touch base maybe next week. If you haven't heard from me, please ping me back on this PR.

14Richa · 2019-04-23T21:58:55Z

Notes as I go:

* Don't leave the print out of hundreds or thousands of rows in your notebook, it hinders comprehension. You will definitely look at this content while exploring, but clean up before review.

* `len(df.script_url.unique())` -> `df.script_url.nunique()`

* `df['location_domain'] = df.location.apply(extract_domain)` -> `df['location_domain'] = df.location.apply(extract_domain, meta='O')` ('O' is object which is all we have available for strings)

* ` df[df.symbol.str.contains('navigator.mimeTypes|navigator.plugins')]` nice

* "These days some browsers don't return an array of plugins directly, except the most common plugins such as Shockwave flash, Java, etc." citation please

Addressed the above points.

* "That is all queries to window.navigator.plugins[Shockwave Flash].description resulted in Shockwave Flash 28.0 r0. This is strange." Why is it strange? This data was collected in a crawl. That is identical machines were setup to crawl the web and their profiles were reset between every visit to a website. "There seems to be a bias in the dataset." Agreed. "Strange but on a brighter side difficult to fingerprint :)" Unfortunately you can't make this inference because this is not a population sample of the variation of plugins.

Agreed. Thanks for pointing out the flaw in the reasoning.

* "Memory usage for df_plugins is less. I can take all of this in pd dataframe and use pivots to analyze." Good thinking. Dask does have a pivot option. But converting to pandas when you can definitely makes things nicer.

* I was about to write: "I don't think you needed a pivot table. I think a groupby would have got you there `df_plugins_pd.groupby(['location', 'script_url', 'symbol']).count()`" but that is wrong. I see what you've done and I see that you're were getting the length of unique symbols. Perhaps at somepoint we can brainstorm how to make this a bit cleaner and more obvious.

* In your analysis 2 you find 0 hs-analytics. Earlier you have noted that hs-analytics is a fingeprinting script, what do you think is going on?

Addressed the issue, hs-analytics is a fingerprinting script but doesn't use plugin information. This also gives me an idea to include other symbols on top of plugin information when flagging scripts.

* Avoid hardcoding numbers `There are 166862 unique script_urls in the dataset. From this we have identified 790 (725+53+12) unique URLs which definitely host fingerprinting scripts and another 888 potential urls worth checking out.` You could rewrite this as a code cell `f'There are {len(unique_scripts:,} unique script_urls in the dataset. From this we have identified {sum(n_scripts)} unique URLs which definitely host fingerprinting scripts and another {n_new} potential urls worth checking out.'` While it might seem counter to other things I'm arguing for the oddness of duplicated text is out-weighed by the robustness of not transcribing numbers, and the re-usability for running this against a future dataset.

Addressed.

* "So this script always asks for same 10 symbols which we can see below." Only because you've restricted your starting point to df_plugins which is the subset of scripts that calls plugins. Maybe that's what you're interested in but this statement is misleading.

* "metrika/watch.s can be used for browser plugin fingerprinting." This is true, but I don't think you've really shown it. There is much more evidence in the dataset for you to make this claim much more convincing. Why not just look at all the symbols metrika is getting?

Included more analysis around the symbols metrika is getting. This goes back to the point mentioned for hs-analytics, I should check more symbols which can be used for browser fingerprinting.

* "I have found that the above string is a panagram and can be used in other fingprinting scripts" - I clearly still haven't explained this well enough. Let me try again. fingerprintjs2 is not _just_ a browser attribute fingerprinting script. It also does canvas fingerprinting. It's characteristic canvas fingerprinting feature is the call to "Cwm fjordbank....". This enables you find as many of the fingerprintjs2 / fingerprintjs2-like scripts and then examine them for the browser attribute fingerprinting within them

Aah, now I get what you meant here. Working on it.

* "Therefore I have looked for "fingerprint" in the URL column of the dataset." As I already mentioned, you haven't provided evidence that this is actually capturing the fingerprintjs2 library and not just other scripts with the word fingerprint in the url which may or may not be what you want

I agree to the point of catching false scripts here, though I feel that likelihood is less. Should check though.

* The reason I care about this is because I believe you're cutting yourself off from data `df[df.script_url.str.contains('fingerprint', case=False)].script_url.nunique().compute()` returns 78 scripts, `df[df.argument_0.str.contains('Cwm fjordbank glyphs vext quiz')].script_url.nunique().compute()` returns 505 scripts. The intersection between the lists is 36. If you look at the remaining 42 I can see a number that fall into this category - although i am happy to concede there are many that do look on quick inspection like what we were looking for but were not picked up by "Cwm fjordbank...." That said, of the 505 scripts detected by "Cwm fjordbank" 499 have plugin calls. If you only wanted to take the 499 that have plugin calls as an indicator of browser attribute fingerprinting, that seems fairly reasonable. You'll have missed a few that the "fingerprint" approach got, but you're still up 400 examples with, I believe, fewer false positives.

I see your point clearly now, thanks for giving examples.

* What you've got in the current analysis seems to me like an example of how you can have your data tell you what you're expecting it to. You haven't asked questions against your initial belief - e.g. how many "Cwm fjordbank" scripts are reading plugins - which you've already articulated as a hallmark of browser attribute fingerprinting.

Right, working on this.

* Well done for exploring and then ruling out, in the interests of time, the timestamp work - you were on the right track with your thinking here and i definitely think it could be explored in the future.

* "Is there an automatic way to download the linked javascript file from script_url and parse it to look for keywords like "murmurhash", "hashset", "fingerprint"?" Not without much pain :D. But it is partially possible. That said, future crawls will collect that data at the time of crawl and include it in the dataset.

* I would be interested to see not just the list of symbols, but the value_counts or perhaps more comparable normalized value counts per script. you could plot these to see if they all look similar - a somewhat tricky problem but potentially illuminating.

* Your questions at the end are starting to get into this.
Overall: A great improvement. Well done. Above are a lot of points. I think the most important of them is not the specifics but the principles in our ongoing back and forth about the validity of "fingerprint" vs "Cwm fjordbank". Moving forward, this could go on forever, but it shouldn't! I would like to think through a concrete set of refinements that will get this to a mergeable analysis contribution. I'm afraid I don't have this for you today as I have a lot of PRs to review, but lets touch base maybe next week. If you haven't heard from me, please ping me back on this PR.

Thanks for your review, I have addressed few points and I am working on the remaining. (panagram and onwards) Analysis1, 2 and 3 has been updated. I am working on Analysis 4 and 5. I have updated the PR with the latest changes, feel free to take a look.

aliamcami · 2019-10-24T20:01:32Z

Hi @14Richa, are you still planning to submit the remaining requested changes?

aliamcami · 2019-10-29T21:45:06Z

Closing this PR due to lack of activity, please feel free to reopen.

14Richa and others added 18 commits March 17, 2019 03:17

Added a overview and notes file in data_prep

2712f7f

Added information about JS call data

30741a6

Added section - Research overview

1a6ded9

Added more details on different research topics

4b0b5ea

Added information on possible research topics

ddca9a6

Added a new jupyter notebook for analysis

815278e

Merge pull request #1 from mozilla/master

29ec4b4

Update from mozilla overscripted

Added read/write

89bf78b

Merging new updates to master

61f7d7a

Browser attribute fingerprinting analysis begin

ecefc63

Added notes from -- How unique is your web browser

2114e02

Added research threads

1e65d52

More details in notes and addidtion to jupyter notebook

fc954a0

More details in notes and addition to jupyter notebook

664ccfa

Merge pull request #2 from mozilla/master

80bd877

Analysis jupyter notebook and notes [WIP]

Added more analysis on doNotTrack and suffixes

aa5566d

Merge branch 'analysis_intro' of https://github.com/14Richa/overscripted

beec503

into analysis_intro

Completed preliminary analysis

210c5d5

14Richa changed the title ~~Browser attribute fingerprinting analysis [WIP]~~ Browser attribute fingerprinting analysis Mar 26, 2019

birdsarah suggested changes Mar 26, 2019

View reviewed changes

birdsarah changed the title ~~Browser attribute fingerprinting analysis~~ Browser attribute fingerprinting analysis [WIP] Mar 28, 2019

Added analysis in dask

a38fd2d

Added concluding remarks

69359e5

14Richa changed the title ~~Browser attribute fingerprinting analysis [WIP]~~ Browser attribute fingerprinting analysis Apr 1, 2019

Added analysis in the readme. Cleaned up some code

e298862

14Richa mentioned this pull request Apr 2, 2019

Can we build a heuristic for browser attribute fingerprinting? #34

Open

some formatting changes

4016bd3

birdsarah suggested changes Apr 5, 2019

View reviewed changes

14Richa changed the title ~~Browser attribute fingerprinting analysis~~ Browser attribute fingerprinting analysis [WIP] Apr 6, 2019

14Richa added 2 commits April 7, 2019 02:28

Removed old files

ab9ec85

Addressed some of the requested changes

a3de1a4

aliamcami closed this Oct 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Browser attribute fingerprinting analysis [WIP] #78

Browser attribute fingerprinting analysis [WIP] #78

14Richa commented Mar 22, 2019

14Richa commented Mar 26, 2019

birdsarah left a comment

14Richa commented Mar 29, 2019

birdsarah commented Mar 30, 2019 •

edited

Loading

birdsarah commented Mar 30, 2019 •

edited

Loading

birdsarah commented Mar 30, 2019

birdsarah commented Mar 30, 2019

birdsarah commented Mar 30, 2019 •

edited

Loading

birdsarah commented Mar 30, 2019

14Richa commented Apr 1, 2019

birdsarah commented Apr 4, 2019

14Richa commented Apr 4, 2019

birdsarah commented Apr 4, 2019

birdsarah left a comment

14Richa commented Apr 23, 2019

aliamcami commented Oct 24, 2019

aliamcami commented Oct 29, 2019

Browser attribute fingerprinting analysis [WIP] #78

Browser attribute fingerprinting analysis [WIP] #78

Conversation

14Richa commented Mar 22, 2019

14Richa commented Mar 26, 2019

birdsarah left a comment

Choose a reason for hiding this comment

Notes as I go

Small coding things

Big picture round-up

14Richa commented Mar 29, 2019

birdsarah commented Mar 30, 2019 • edited Loading

birdsarah commented Mar 30, 2019 • edited Loading

birdsarah commented Mar 30, 2019

birdsarah commented Mar 30, 2019

birdsarah commented Mar 30, 2019 • edited Loading

birdsarah commented Mar 30, 2019

14Richa commented Apr 1, 2019

birdsarah commented Apr 4, 2019

14Richa commented Apr 4, 2019

birdsarah commented Apr 4, 2019

birdsarah left a comment

Choose a reason for hiding this comment

14Richa commented Apr 23, 2019

aliamcami commented Oct 24, 2019

aliamcami commented Oct 29, 2019

birdsarah commented Mar 30, 2019 •

edited

Loading

birdsarah commented Mar 30, 2019 •

edited

Loading

birdsarah commented Mar 30, 2019 •

edited

Loading