From cc34556c8aca8b1efddb97f0c0ae94e2beecaf04 Mon Sep 17 00:00:00 2001 From: Joel Purra Date: Wed, 4 Feb 2015 13:46:16 +0100 Subject: [PATCH] Write about privacy tool reliability, more entries in results table --- report/report.lyx | 144 ++++++++++++++++++------------------- report/results-summary.tsv | 27 +++---- 2 files changed, 80 insertions(+), 91 deletions(-) diff --git a/report/report.lyx b/report/report.lyx index f58e26f..8105e8b 100644 --- a/report/report.lyx +++ b/report/report.lyx @@ -509,7 +509,10 @@ reference "sec:Results-Tracker-detection" Most websites also have at least one known tracker present; 53-72% of random domains have at least one tracker installed, while 88-98% of top websites have trackers and 78-100% of websites in the Swedish curated lists. - The number of known tracker organizations is interesting to look at, as +\begin_inset Newline newline +\end_inset + +The number of known tracker organizations is interesting to look at, as a higher number means users have less control over where leaked data ends up \begin_inset CommandInset ref @@ -560,7 +563,10 @@ reference "sub:Domain-and-organization-counts" personal opinions \emph default these individuals hold. - It is clear that Google has the widest coverage by far -- Google trackers +\begin_inset Newline newline +\end_inset + +It is clear that Google has the widest coverage by far -- Google trackers alone are present on \emph on over 90% @@ -2515,35 +2521,6 @@ name "chap:Results" \end_inset -\end_layout - -\begin_layout Standard -\begin_inset Note Greyedout -status open - -\begin_layout Plain Layout -For each result: -\end_layout - -\begin_layout Enumerate -Describe expectations. -\end_layout - -\begin_layout Enumerate -Describe what is shown in figure/table. -\end_layout - -\begin_layout Enumerate -Describe key observations. -\end_layout - -\begin_layout Enumerate -Provide insight/discussion about this observation. -\end_layout - -\end_inset - - \end_layout \begin_layout Standard @@ -4547,14 +4524,14 @@ recognized \emph default tracker domains across the datasets? See Section \begin_inset CommandInset ref -LatexCommand eqref +LatexCommand ref reference "sec:Undetected-external-domains" \end_inset and Figure \begin_inset CommandInset ref -LatexCommand eqref +LatexCommand ref reference "fig:Undetected-external-domains" \end_inset @@ -4584,47 +4561,6 @@ numprint{15746} have a higher detection rate at 30% and more. \end_layout -\begin_layout Standard -Can a privacy tool using a -\emph on -fixed blacklist -\emph default - of domains to block be -\emph on -trusted -\emph default - -- or can it only be trusted to be 10% effective? Regular expression based - blocking, such as EasyList used by AdBlock, might be more effective, as - it can block resources by URL path separate from the URL domain name -\begin_inset CommandInset ref -LatexCommand eqref -reference "sec:Ad-and-privacy-blocking-lists" - -\end_inset - - -- but it's no cure-all. - It does seem as if the blacklist model needs to be improved -- perhaps - by using whitelisting instead of blacklisting. - The question then becomes an issue of either -\emph on -cat and mouse -\emph default - -\begin_inset CommandInset ref -LatexCommand eqref -reference "sec:Cat-and-Mouse" - -\end_inset - - -- if the whitelist is shared by many users -- or -\emph on -convenience -\emph default - -- if each user maintains their own whitelist. - At the moment it seems convenience and blacklists are winning, at the cost - of playing cat and mouse with third parties who end up being blocked. -\end_layout - \begin_layout Chapter Discussion \end_layout @@ -5768,6 +5704,58 @@ Advertisers could also provide their own expanded article categorization such as education, income groups and political stance. \end_layout +\begin_layout Section +Privacy tool reliability +\end_layout + +\begin_layout Standard +Can a privacy tool using a +\emph on +fixed blacklist +\emph default + of domains to block be +\emph on +trusted +\emph default + -- or can it only be trusted to be 10% effective +\begin_inset CommandInset ref +LatexCommand eqref +reference "sec:Undetected-external-domains" + +\end_inset + +? Regular expression based blocking, such as EasyList used by AdBlock, might + be more effective, as it can block resources by URL path separate from + the URL domain name +\begin_inset CommandInset ref +LatexCommand eqref +reference "sec:Ad-and-privacy-blocking-lists" + +\end_inset + + -- but it's no cure-all. + It does seem as if the blacklist model needs to be improved -- perhaps + by using whitelisting instead of blacklisting. + The question then becomes an issue of weighing a +\emph on +cat and mouse +\emph default + game +\begin_inset CommandInset ref +LatexCommand eqref +reference "sec:Cat-and-Mouse" + +\end_inset + + -- if the whitelist is shared by many users -- against +\emph on +convenience +\emph default + -- if each user maintains their own whitelist. + At the moment it seems convenience and blacklists are winning, at the cost + of playing cat and mouse with third parties who end up being blocked. +\end_layout + \begin_layout Section Open source contributions \end_layout @@ -8801,6 +8789,14 @@ description "(SLD) A domain that is directly below a TLD. Can be a domain regist \end_inset +\begin_inset CommandInset nomenclature +LatexCommand nomenclature +symbol "Parked domain" +description "A domain that has been purchased from a domain name retailer, but only shows a placeholder message – usually an advertisement for the domain name retailer itself." + +\end_inset + + \end_layout \begin_layout Standard diff --git a/report/results-summary.tsv b/report/results-summary.tsv index 1bc4489..f8cbcdf 100644 --- a/report/results-summary.tsv +++ b/report/results-summary.tsv @@ -9,49 +9,42 @@ There are at least as many external resources, meaning as much tracking, on secu { Only 13 of 290 municipalities have fully secure websites; no Swedish media sites are completely secure (\ref{sec:Insecure-versus-secure-resources}) } { -YYY +Disconnect \emph{only detects 3\%} of external primary domains as trackers (\ref{sub:Tracker-detection-effectiveness}) } { 94\% of \numprint{5959} HTTPS-www variation domains call external domains (\ref{sec:Domain-and-request-counts}) } { 25\% of Swedish municipalities responding to secure requests load 90\% of their resources securely -- it's close, but still considered insecure (Figure \ref{fig:Result-selection-internal-secure-organizations}(b)) } { -YYY +39\% use \emph{only} external resources (Figure \ref{fig:Result-selection-internal-external-insecure-tracker-categories-top-organizations}(a)) } { 78\% of \numprint{123000} HTTP-www variation domains call external domains (\ref{sec:Domain-and-request-counts}) } { -XXX +Swedish media seems very social, with the highest Twitter and Facebook coverage (\ref{sub:Top-organizations}) } { Only 0.3\% respond to secure requests, in line with .dk and .net, while .com has 0.5-0.6\% response rate (\ref{sec:Failed-versus-non-failed}) } { -ZZZ +Twitter has about half the coverage of Facebook (\ref{sub:Top-organizations}) } { Financial instititions redirect from secure \emph{to insecure} sites for 20\% of responding domains (\ref{sec:HTTP,-HTTPS-and-redirects}) } { -Many random domains use only external resources due to redirects away from the origin domain (\ref{sec:HTTP,-HTTPS-and-redirects}) +Many random domains use only external resources due to being parked \ref{sec:Results-Internal-and-external-requests} or redirecting away from the origin domain (\ref{sec:HTTP,-HTTPS-and-redirects}) } { 50\% of top sites always redirect to the www subdomain, 13\% always redirect to their primary domain (\ref{sec:HTTP,-HTTPS-and-redirects}) } { -XXX +A single visit to each media sites would leak information to at least 57 organizations \ref{sub:Domain-and-organization-counts} } { -40\% of random .se domains have no known trackers -- 60\% do (Figure \ref{fig:Result-selection-internal-external-insecure-tracker-categories-top-organizations}(a)) +Over 40\% use Google Analytics or Google API (\ref{sub:Top-domains-Google}) } { A few global top domains load more than 75 known trackers on their front page alone (\ref{sub:Domain-and-organization-counts}) } { -XXX -} { -In the 100k .se domains Disconnect \emph{only detects 3%} of external primary domains as trackers (\ref{sub:Tracker-detection-effectiveness}) -} { -Disconnect's blocking list \emph{only detects 10%} of external primary domains as trackers for top website datasets (\ref{sub:Tracker-detection-effectiveness}) -} -{ -Swedish media seems very social, with the highest Twitter and Facebook coverage (\ref{sub:Top-organizations}) +70\% use content from known trackers (\ref{sub:Disconnect-categories-coverage}) } { -YYY +58\% use content from known trackers (\ref{sub:Disconnect-categories-coverage}) } { -Twitter has about half the coverage of Facebook (\ref{sub:Top-organizations}) +Disconnect's blocking list \emph{only detects 10\%} of external primary domains as trackers for top website datasets (\ref{sub:Tracker-detection-effectiveness}) } \ No newline at end of file