Skip to content

hunie-son/DefWeb

Repository files navigation

DefWeb: Defending User Privacy against Cache-based Website Fingerprinting Attacks with Intelligent Noise Injection

We develop a generative learning-based defense technique, DefWeb, to protect user privacy against cache-based Website Fingerprinting(WF) attacks by injecting precise noise into the WFs. For this purpose, (1) we train a Variational Autoencoder (VAE) to represent high-dimensional fingerprints in a low-dimension space while creating distinct clusters for each website. (2) Minimal noise templates are extracted in the low-dimension space to obfuscate the fingerprints efficiently. (3) We create practical noise templates that can be added to WFs during website rendering by leveraging self-modifying code (SMC). We implement DefWeb in simulation and real-world setups to degrade the attacker’s model accuracy. DefWeb can decrease the model accuracy to 1.1% and 28.8% in simulation and real-world setups, respectively.

Experimental Setup:

  • Intel Tiger Lake Microarchitecture

    • CPU Model: Intel(R) Core (TM) i7-1165G7 @ 2.80GHz
    • OS: Ubuntu 20.04.4 LTS
    • Linux Kernel: 5.13.0-44-generic
    • Google Chrome version 101.0.4951.64
    • Tor browser version 10.5.10
  • Deep Learning environment

    • Nvidia GeForce RTX 3090 GPU card
    • Jupyter Notebook (or Google Colab)
  • Additional

    • Software: MATLAB R2021

Environment explanations:

  • Data Collection

    • This process should be done in the specific microarchitecture (In our paper, Intel TigerLake).
    • We provide Chrome browser and Firefox collection code.
    • Approximate time: 1.3 hours for each website
  • CNN, LSTM and VAE model training

    • This training process is done in a server environment (Nvidia GeForce RTX 3090 GPU card).
    • We use Jupyter Notebook(.ipynb) for the Demo version.
    • If GPU is unavailable, Google Colab can be used for running the Demo version.
    • Full version with 100 websites is provided as Python code (.py).
    • Automatically training 100 website fingerprinting is provided as bash script code (DefWeb_Autorun.sh).
    • At least 3GB of disk memory space is needed to save all the datasets (WF datasets, Reconstructed datasets, Noisy datasets, and pre-trained models)
    • Approximate time: more than 4 hours (depending on the GPU performance)

Data Collection:

Cache-based WF attack

  • (Demo) With two Website Fingerprint (WF) datasets, which are www.google.com and www.amazon.com.

  • In our project, we collected 100 measurements for each of 100 different websites Website list, overall 10,000 measurements.

  • Data collection for Google Chrome Browser:

    • (Demo) PrimeProbe_google.html and PrimeProbe_amazon.html are JavaScript codes of Prime and Probe attack created by Dr. Yossi Oren's research team. (You can create 100 different Prime and Probe attack JavaScript codes based on the Website list)

    • (Demo) Collecting WF dataset 2 websites (www.amazon.com and www.google.com) automatically using bash script (≈2.5 hours)
      cd DefWeb/Data_Collection
      ./run_demo_chrome.sh

    • Collecting WF dataset 100 websites automatically using bash script (Chrome) (≈120 hours)
      cd DefWeb/Data_Collection/Chrome
      ./run_chrome.sh

    • Collected 100 websites WF datasets are provided DefWeb/Data_Collection/Chrome/Chrome_Collected_100

  • Data collection for Mozilla Firefox Browser:

    • (Demo) Collecting WF dataset 2 websites (www.amazon.com and www.google.com) automatically using bash script (≈2.5 hours)
      cd DefWeb/Data_Collection
      ./run_demo_firefox.sh

    • The procedure is the same as Chrome Browser Website list.

    • Collecting WF dataset 100 websites automatically using bash script (Firefox) (≈120 hours)
      cd DefWeb/Data_Collection/Firefox
      ./run_firefox.sh

    • Collected 100 websites WF datasets are provided DefWeb/Data_Collection/Firefox/Firefox_Collected_100

  • Preprocessing:

    • Merged CSV file (10,000 x 6,000) is large to upload in the GitHub repository.

    • We provide preprocessing python code (Preprocessing_WF.ipynb) to create WF Dataset (single csv file) with the order.

    • (Demo) trainX_2_US_Chrome.csv and trainY_2_US_Chrome.csv are preprocessed WF dataset for wwww.amazon.com and www.google.com.

After preprocessing the data, the final training data (trainX and trainY) will created.

WF Attack:

  • We use Convolutional Neural Networks (CNN) and Long Short-Term Memory Networks (LSTM) to conduct WF attacks.
    • We provide CNN python code (CNN_Final.ipynb) with explanation.

    • We provide LSTM python code (LSTM_Final.ipynb) with explanation.

    • (Demo) Input data: Preprocessed data (trainX_2_US_Chrome.csv and trainY_2_US_Chrome.csv)

    • Output: Accuracy

DefWeb :

  • VAE: We use a Generative Deep Learning Model, Variational AutoEncoder(VAE), to create dynamic noise.

    • (Demo) DefWeb_VAE_Demo.ipynb contains detailed explanations (≈10 min).

    • (Demo) Input data: Preprocessed data (trainX_2_US_Chrome.csv and trainY_2_US_Chrome.csv)

    • (Demo) Output data: Reconstructed data (reconstructed_x_chrome_w2.csv), Noisy Reconstructed data (reconstructed_nosiy_x_chrome_w2.csv)

    • DefWeb_VAE_Auto_GC.py: Perform VAE and reconstruct the Noisy WF dataset (100 websites)

    • ExtractCo3Noise_RetrainCNN_Auto.py : Extract 1/3 of noise. Add dynamic noise to the original WF dataset and retrain the attacker's CNN model.

    • Automatically execute DefWeb using bash script (≈4.2 hours, depending on GPU performance)
      cd DefWeb/Defense_Model/VAE
      ./DefWeb_Autorun.sh

  • SMC: We use a Self-modifying Code (SMC) to generate practical noise template creation.

    • noise_template.csv: An example noise template generated by VAE. This is an analytical noise template that we need to mimic.

    • repeat_delay_template.csv: This file consists of repeat and delay values that will create a specific SMC practical noise.

    • smc_example.csv: An example SMC practical noise collected with the practical noise generation tool.

    • prime_probe_smc_analysis.html: Prime and Probe attack code to be run in a browser. This file was created by Dr. Yossi Oren's team.

    • smc_analysis_for_paper_GC.sh: This code runs the practical noise code. You are expected to run this code.

    • SMC_generation_visualization_artifact_code.m: The Matlab code to visualize both analytical and practical noise templates.

    • smc.asm: This code creates practical noise with given repeat and delay values.

    • main.c: This is the main code that runs the assembly file to create smc noise.

    • prog_800 Compile procedure:
      nasm smc.asm -f elf64 -o smc_800.o
      gcc -c main.c -O2 -o main.o
      gcc main.o smc_800.o -o prog_800

Noise Template

  • Noise template datasets generated by DefWeb are provided for future research.

Noise Template Repeat and Sleep

  • Precise noise is created with SMC based on the repeat and sleep datasets.
  • This is used for Performance Tool (Chrome browser with PnP attack + Noise)

Performance Tool

  • In this part, we automatically render(using Python library) the target website with Prime and Probe attack (JavaScript file location) and calculate the webpage loading time.
  • Prerequisites: Chrome driver and Firefox driver need to be installed.
  • We also perform website loading time with our DefWeb create noise template running.
    • Automatically execute Performance Tool (Chrome browser with PnP attack) using bash script
      cd DefWeb/Performance_Tool
      ./loadingTime_PnP_GC.sh

    • Automatically execute Performance Tool (Mozilla Firefox with PnP attack) using bash script
      cd DefWeb/Performance_Tool
      ./loadingTime_PnP_FF.sh

    • Automatically execute Performance Tool (Chrome browser with PnP attack + Noise) using bash script
      cd DefWeb/Performance_Tool
      ./loadingTime_PnP_noise_GC.sh

    • Automatically execute Performance Tool (Mozilla Firefox with PnP attack + Noise) using bash script
      cd DefWeb/Performance_Tool
      ./loadingTime_PnP_noise_FF.sh

About

DefWeb: Defending User Privacy against Cache-based Website Fingerprinting Attacks with Intelligent Noise Injection

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published