Skip to content

Commit

Permalink
Write about privacy tool reliability, more entries in results table
Browse files Browse the repository at this point in the history
  • Loading branch information
joelpurra committed Feb 4, 2015
1 parent e9c782f commit cc34556
Show file tree
Hide file tree
Showing 2 changed files with 80 additions and 91 deletions.
144 changes: 70 additions & 74 deletions report/report.lyx
Expand Up @@ -509,7 +509,10 @@ reference "sec:Results-Tracker-detection"
Most websites also have at least one known tracker present; 53-72% of random
domains have at least one tracker installed, while 88-98% of top websites
have trackers and 78-100% of websites in the Swedish curated lists.
The number of known tracker organizations is interesting to look at, as
\begin_inset Newline newline
\end_inset

The number of known tracker organizations is interesting to look at, as
a higher number means users have less control over where leaked data ends
up
\begin_inset CommandInset ref
Expand Down Expand Up @@ -560,7 +563,10 @@ reference "sub:Domain-and-organization-counts"
personal opinions
\emph default
these individuals hold.
It is clear that Google has the widest coverage by far -- Google trackers
\begin_inset Newline newline
\end_inset

It is clear that Google has the widest coverage by far -- Google trackers
alone are present on
\emph on
over 90%
Expand Down Expand Up @@ -2515,35 +2521,6 @@ name "chap:Results"
\end_inset


\end_layout

\begin_layout Standard
\begin_inset Note Greyedout
status open

\begin_layout Plain Layout
For each result:
\end_layout

\begin_layout Enumerate
Describe expectations.
\end_layout

\begin_layout Enumerate
Describe what is shown in figure/table.
\end_layout

\begin_layout Enumerate
Describe key observations.
\end_layout

\begin_layout Enumerate
Provide insight/discussion about this observation.
\end_layout

\end_inset


\end_layout

\begin_layout Standard
Expand Down Expand Up @@ -4547,14 +4524,14 @@ recognized
\emph default
tracker domains across the datasets? See Section
\begin_inset CommandInset ref
LatexCommand eqref
LatexCommand ref
reference "sec:Undetected-external-domains"

\end_inset

and Figure
\begin_inset CommandInset ref
LatexCommand eqref
LatexCommand ref
reference "fig:Undetected-external-domains"

\end_inset
Expand Down Expand Up @@ -4584,47 +4561,6 @@ numprint{15746}
have a higher detection rate at 30% and more.
\end_layout

\begin_layout Standard
Can a privacy tool using a
\emph on
fixed blacklist
\emph default
of domains to block be
\emph on
trusted
\emph default
-- or can it only be trusted to be 10% effective? Regular expression based
blocking, such as EasyList used by AdBlock, might be more effective, as
it can block resources by URL path separate from the URL domain name
\begin_inset CommandInset ref
LatexCommand eqref
reference "sec:Ad-and-privacy-blocking-lists"

\end_inset

-- but it's no cure-all.
It does seem as if the blacklist model needs to be improved -- perhaps
by using whitelisting instead of blacklisting.
The question then becomes an issue of either
\emph on
cat and mouse
\emph default

\begin_inset CommandInset ref
LatexCommand eqref
reference "sec:Cat-and-Mouse"

\end_inset

-- if the whitelist is shared by many users -- or
\emph on
convenience
\emph default
-- if each user maintains their own whitelist.
At the moment it seems convenience and blacklists are winning, at the cost
of playing cat and mouse with third parties who end up being blocked.
\end_layout

\begin_layout Chapter
Discussion
\end_layout
Expand Down Expand Up @@ -5768,6 +5704,58 @@ Advertisers could also provide their own expanded article categorization
such as education, income groups and political stance.
\end_layout

\begin_layout Section
Privacy tool reliability
\end_layout

\begin_layout Standard
Can a privacy tool using a
\emph on
fixed blacklist
\emph default
of domains to block be
\emph on
trusted
\emph default
-- or can it only be trusted to be 10% effective
\begin_inset CommandInset ref
LatexCommand eqref
reference "sec:Undetected-external-domains"

\end_inset

? Regular expression based blocking, such as EasyList used by AdBlock, might
be more effective, as it can block resources by URL path separate from
the URL domain name
\begin_inset CommandInset ref
LatexCommand eqref
reference "sec:Ad-and-privacy-blocking-lists"

\end_inset

-- but it's no cure-all.
It does seem as if the blacklist model needs to be improved -- perhaps
by using whitelisting instead of blacklisting.
The question then becomes an issue of weighing a
\emph on
cat and mouse
\emph default
game
\begin_inset CommandInset ref
LatexCommand eqref
reference "sec:Cat-and-Mouse"

\end_inset

-- if the whitelist is shared by many users -- against
\emph on
convenience
\emph default
-- if each user maintains their own whitelist.
At the moment it seems convenience and blacklists are winning, at the cost
of playing cat and mouse with third parties who end up being blocked.
\end_layout

\begin_layout Section
Open source contributions
\end_layout
Expand Down Expand Up @@ -8801,6 +8789,14 @@ description "(SLD) A domain that is directly below a TLD. Can be a domain regist
\end_inset


\begin_inset CommandInset nomenclature
LatexCommand nomenclature
symbol "Parked domain"
description "A domain that has been purchased from a domain name retailer, but only shows a placeholder message – usually an advertisement for the domain name retailer itself."

\end_inset


\end_layout

\begin_layout Standard
Expand Down
27 changes: 10 additions & 17 deletions report/results-summary.tsv
Expand Up @@ -9,49 +9,42 @@ There are at least as many external resources, meaning as much tracking, on secu
{
Only 13 of 290 municipalities have fully secure websites; no Swedish media sites are completely secure (\ref{sec:Insecure-versus-secure-resources})
} {
YYY
Disconnect \emph{only detects 3\%} of external primary domains as trackers (\ref{sub:Tracker-detection-effectiveness})
} {
94\% of \numprint{5959} HTTPS-www variation domains call external domains (\ref{sec:Domain-and-request-counts})
}
{
25\% of Swedish municipalities responding to secure requests load 90\% of their resources securely -- it's close, but still considered insecure (Figure \ref{fig:Result-selection-internal-secure-organizations}(b))
} {
YYY
39\% use \emph{only} external resources (Figure \ref{fig:Result-selection-internal-external-insecure-tracker-categories-top-organizations}(a))
} {
78\% of \numprint{123000} HTTP-www variation domains call external domains (\ref{sec:Domain-and-request-counts})
}
{
XXX
Swedish media seems very social, with the highest Twitter and Facebook coverage (\ref{sub:Top-organizations})
} {
Only 0.3\% respond to secure requests, in line with .dk and .net, while .com has 0.5-0.6\% response rate (\ref{sec:Failed-versus-non-failed})
} {
ZZZ
Twitter has about half the coverage of Facebook (\ref{sub:Top-organizations})
}
{
Financial instititions redirect from secure \emph{to insecure} sites for 20\% of responding domains (\ref{sec:HTTP,-HTTPS-and-redirects})
} {
Many random domains use only external resources due to redirects away from the origin domain (\ref{sec:HTTP,-HTTPS-and-redirects})
Many random domains use only external resources due to being parked \ref{sec:Results-Internal-and-external-requests} or redirecting away from the origin domain (\ref{sec:HTTP,-HTTPS-and-redirects})
} {
50\% of top sites always redirect to the www subdomain, 13\% always redirect to their primary domain (\ref{sec:HTTP,-HTTPS-and-redirects})
}
{
XXX
A single visit to each media sites would leak information to at least 57 organizations \ref{sub:Domain-and-organization-counts}
} {
40\% of random .se domains have no known trackers -- 60\% do (Figure \ref{fig:Result-selection-internal-external-insecure-tracker-categories-top-organizations}(a))
Over 40\% use Google Analytics or Google API (\ref{sub:Top-domains-Google})
} {
A few global top domains load more than 75 known trackers on their front page alone (\ref{sub:Domain-and-organization-counts})
}
{
XXX
} {
In the 100k .se domains Disconnect \emph{only detects 3%} of external primary domains as trackers (\ref{sub:Tracker-detection-effectiveness})
} {
Disconnect's blocking list \emph{only detects 10%} of external primary domains as trackers for top website datasets (\ref{sub:Tracker-detection-effectiveness})
}
{
Swedish media seems very social, with the highest Twitter and Facebook coverage (\ref{sub:Top-organizations})
70\% use content from known trackers (\ref{sub:Disconnect-categories-coverage})
} {
YYY
58\% use content from known trackers (\ref{sub:Disconnect-categories-coverage})
} {
Twitter has about half the coverage of Facebook (\ref{sub:Top-organizations})
Disconnect's blocking list \emph{only detects 10\%} of external primary domains as trackers for top website datasets (\ref{sub:Tracker-detection-effectiveness})
}

0 comments on commit cc34556

Please sign in to comment.