Skip to content
This repository has been archived by the owner on Dec 22, 2021. It is now read-only.

What are the unique IDs in the dataset? #35

Open
birdsarah opened this issue Mar 11, 2019 · 8 comments
Open

What are the unique IDs in the dataset? #35

birdsarah opened this issue Mar 11, 2019 · 8 comments
Labels
good first issue Good for newcomers research question Outstanding questions that have not been investigated yet.

Comments

@birdsarah
Copy link
Contributor

Develop a function for pulling out unique ids from the dataset.

We want to identify scripts that have stored / created unique ids. We also want to identify scripts that have not been storing / creating unique ids.

@birdsarah birdsarah added good first issue Good for newcomers research question Outstanding questions that have not been investigated yet. labels Mar 11, 2019
@birdsarah
Copy link
Contributor Author

https://www.ghostery.com/lp/study/ published a way to detect unique IDs, also discussed in https://arxiv.org/pdf/1804.08959.pdf. What is their technique and can it be used.

@ayushi1998
Copy link

Hey

I would like to work on this. I will create a python script for detecting the same using the papers you have posted in the comment.
Kindly guide me with the CSV files I need to look at and what all source code is needed to start with it, if required.

@birdsarah
Copy link
Contributor Author

Great! Just to re-emphasize this is not a one issue per person repo. All the questions are very open ended and different people may find very different and complementary things when looking at a question.

@birdsarah
Copy link
Contributor Author

@ayushi1998 - feel free to hop in gitter for support questions. Please carefully read the home page of the repo to get started. You can also try running the hello_*.ipynb in analyses to get started. Once you have got started I would encourage you to contribute extra information wherever you tried looking for it but felt you couldn't find it.

@mozilla mozilla deleted a comment from ayushi1998 Mar 12, 2019
@muskankhedia
Copy link

muskankhedia commented Mar 16, 2019

Hi @birdsarah,

I have been through all the parameters in the dataset and I think none of the parameter act as a unique id. Instead of looking for such a unique id parameter, we can assign a unique id to every distinct scipt_url and store in the dataset accordingly by creating a new column.

If you have some different thoughts or ideas regarding any parameter acting as a unique_id in any way. Pls, share it. It would be helpful to me for further analysis of data.

@birdsarah
Copy link
Contributor Author

@muskankhedia I think you are misunderstanding the nature of this issue.

We want to look at calls, probably to storage APIs, to discover scripts that have been setting unique ids.

The creation of unique IDs is a symptom of fingerprinting. That does not mean that the storage of Unique IDs means that fingerprinting definitively has occurred. It also does not mean that if we do not detect a unique ID being stored that fingerprinting did not occur. But it is a piece of evidence we can use to build a picture of what a script is doing.

There are a number of routes for storage that we may have captured e.g. window.localStorage, window.storage, window.document.cookie, ....

An example of a value that's commonly set that is not a unique ID is modernizr.

@noahwalugembe
Copy link

Can i also work on this

@birdsarah
Copy link
Contributor Author

As already mentioned on this issue @noahwalugembe. There are no assigned issues. Everyone is free to work on any research question. There are many possible analyses for a given research question. You are welcome to work on this question.

Aimaanhasan pushed a commit to Aimaanhasan/overscripted that referenced this issue Mar 29, 2019
Analysis on JavaScript API symbols storing unique IDs

Issue mozilla#35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
good first issue Good for newcomers research question Outstanding questions that have not been investigated yet.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants