SWIPE

This is the artifact for SWIPE, a DOM-XSS analysis infrastructure comprising the following components:

Passive: The baseline replicating passive navigation on a page
Fuzzer: It simulates user interactions on the webpage once it finishes loading
DSE: A symbolic execution engine that synthesizes GET parameters
Webarchive: A component that aids with stability of results, allowing for archiving pages and replaying previously created archives using other components.

This artifact is associated with the NDSS'26 paper, #1467 DOM-XSS Detection via Webpage Interaction Fuzzing and URL Component Synthesis.

Prerequisites

We recommend at least 4GB of RAM and 4 cores to run SWIPE smoothly. You will need about 30GB of storage to install SWIPE and follow these instructions without issues. There is no GPU requirement.

In terms of software, docker is required, unzip will be useful to extract the artifact and optionally a VNC viewer like Tiger VNC can be used to visualize the browser while the tool is working on a page and evaluate the webarchive component.

SWIPE was tested on macOS and Ubuntu 24.04.

Installation (with Docker)

Open a terminal in the root of this project (where the Dockerfile is located). Build the SWIPE image using the following command:

$ docker build --platform=linux/amd64 -t swipe:latest .

This takes ~10 minutes for us on a machine with 8 cores and 32GB RAM. Once it is done, you can confirm the image was built with the following command:

$ docker image ls
REPOSITORY   TAG       IMAGE ID       CREATED             SIZE
swipe        latest    c59dc313cfaf   3 minutes ago       9.56GB

Minimal working example

We will use a single page to show how to run Passive, Fuzzer and DSE (the symbolic execution component), and then a different page to showcase the webarchive.

Example webpage for Passive, Fuzzer and DSE

In the next few steps, we will launch SWIPE against the page located in the tests/example_page.html file at the root of SWIPE. That page is hosted at http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html. The page contains 3 DOM-XSS vulnerabilities:

One in vulnerable_passive function, which can be found by Passive because it is called during page initialization
One in vulnerable_fuzzer, that can only be found by Fuzzer, because it requires a onmousewheel event to be triggered
A third and final DOM-XSS in vulnerable_dse, that can only be found by DSE, because it requires a certain string to be included in the GET parameters

Initializing SWIPE

First, create a container using the SWIPE image: docker run --rm -it -p 5550:5550 --entrypoint=bash swipe
- Note that this exposes port 5550 to the host, to make it possible to see the browser via VNC
Next, start a XVFB server on the container to listen to VNC connections. To do that, run this command in the container: ./jalangi2-workspace/run_xvfb.sh
- Note that an error message is expected: _XSERVTransmkdir: ERROR: euid != 0,directory /tmp/.X11-unix will not be created.. You can press ENTER to keep using that terminal.
Open a VNC viewer in the host and connect to the address localhost:5550. Use the password DEBUG. You should see a blank screen in the VNC viewer, as the browser is not open yet.
Finally, go to SWIPE's main folder in the container (all steps and commands below are supposed to be executed in the container): cd ~/jalangi2-workspace/scripts/swipe/

Running Passive

SWIPE is configured to run Passive by default, but you still need to specify which page to analyze:
Make sure that the URL that we have provided for the example above is placed in the config/sample_targets file:

scan@35ac4b3e72e3:~/jalangi2-workspace/scripts/swipe$ cat config/sample_targets 
http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp

-   SWIPE looks for target webpages to analyze on that file.

Run SWIPE with ./run.sh. You should now see the browser open in the VNC window, but please do not interact with the browser. Passive should take around 3 minutes to finish.
Once Passive is finished, SWIPE automatically runs a taint parser, parsing the taint logs that the modified Chromium may have reported. A summary of those results will be printed at the end of SWIPE's output. You can also see that same summary in the file ./output.txt, which should have been created meanwhile.

$ cat output.txt 
Unique potential flows:
	{'sink': 'JAVASCRIPT', 'ranges': [(0, 2, 'URL_SEARCH', 'URL_COMPONENT_DECODED')], 'sink_arg': 'gp', 'iframe': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'stack': [{'url': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'function': 'vulnerable_passive', 'col': 22, 'lineno': 22}, {'url': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'function': '', 'col': 13, 'lineno': 24}]}

URLs with markers to confirm:
http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?#&marker<>'"
http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?marker<>'"

Unique confirmed flows:
	None

Summary:
	#Unique potential: 1
	#Unique confirmed: 0
	#URLs with markers: 2

Interpreting the flows summary in output.txt:

SWIPE runs a modified browser that implements dynamic taint analysis. Whenever information from a possibly attacker-controlled source (like the URL of a page) can flow to the argument of a dangerous sink (like eval or document.write) then that fact is reported in ./output.txt. That file has the following structure:

First, a list of unique potential flows. These are flows where information seems to be flowing from a source to a sink, but it is unclear whether the flow is exploitable
Then, you can find a list of URLs with markers that are constructed in such a way that if you run SWIPE against them, SWIPE will try to confirm that the flow is exploitable by checking whether a certain marker is present on the final sink argument. URLs with markers are only created for certain potential flows.
Then, a list of confirmed flows, these are flows where the marker was present on the final sink argument
Finally, a summary is printed with how many flows were found and how many URLs were created.

Each flow has the following structure, regardless of it being confirmed or just potential:

sink: The category of the dangerous sink that the attacker seems to be able to influence. The set of possible values is:
- JAVASCRIPT: The attacker influences what JavaScript code is executed, by passing information to function like eval or the Function constructor
- JAVASCRIPT_EVENT_HANDLER_ATTRIBUTE: The attacker influences what code is executed when a certain event handler is triggered
- HTML: The attacker has HTML injection capabilities, controlling the argument to functions like document.write or inner.HTML assignments.
sink_arg: The final string argument that was passed to the sink
iframe: The URL of the iframe that contained the vulnerable script
stack: A stacktrace, starting from the final sink call location going upwards in the call chain.
ranges: A description of the taint provenance of the tainted bytes in the sink argument. This is a list of 4-tuples, each representing a source with the following structure:
- Start index of the tainted bytes in the sink argument
- End index
- Name of the attacker-controlled source from where the tainted bytes come from. This will usually be some part of the URL, like the GET parameters (URL_SEARCH) or the fragment value (URL_HASH).
- Encoding that was used. For a flow to be exploitable, the tainted bytes need to be URL-decoded, as they are automatically encoded by built-in mechanisms of browsers like Chromium.

Thus, after the Passive run, one potential flow will be reported in ./output.txt. The stacktrace in that flow should show that the vulnerability is in the vulnerable_passive function. SWIPE also generated two URLs with markers that can be used to confirm the vulnerability. We will show how to confirm vulnerabilities later.

Running DSE

To configure SWIPE to run our DSE component, edit the config/config.json file:
- Make sure the Fuzzer is disabled ("run-ui-fuzzer": false)
- Enable DSE: Set try-alternative-paths to true
Run SWIPE: ./run.sh. DSE should take around 6 minutes to finish. ./output.txt should reflect the fact that at least one other flow was found: the one in function vulnerable_dse:

$ cat output.txt | grep "vulnerable_dse"
	{'sink': 'HTML', 'ranges': [(0, 24, 'URL_SEARCH', 'URL_COMPONENT_DECODED')], 'sink_arg': 'BmustcontainthisstringA?', 'iframe': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?BmustcontainthisstringA?', 'stack': ...  'function': 'vulnerable_dse', ...

Note that DSE acts on an instrumented version of the page, which may cause false flows to appear. In our crawls, we always re-analyzed every URL discovered by DSE in a non-instrumented version of the page, to have clean results with respect to flows.

Running Fuzzer

To configure SWIPE to run our Fuzzer component, similarly to the above, you simply need to edit the config/config.json file:
- Make sure DSE is disabled: ("try-alternative-paths": false)
- Enable Fuzzer: Set run-ui-fuzzer to true
Run ./run.sh. Our Fuzzer should take around 3 minutes to finish. ./output.txt should reflect the fact that SWIPE found the vulnerability in the vulnerable_fuzzer function.

Confirming a potential vulnerability found by the Fuzzer

First, look at the URLs that SWIPE wants to confirm given the last Fuzzer run:

$ cat output.txt 
Unique potential flows:
	{'sink': 'JAVASCRIPT', 'ranges': [(0, 2, 'URL_SEARCH', 'URL_COMPONENT_DECODED')], 'sink_arg': 'gp', 'iframe': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'stack': [{'url': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'function': 'vulnerable_passive', 'col': 22, 'lineno': 22}, {'url': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'function': '', 'col': 13, 'lineno': 24}]}
	{'sink': 'HTML', 'ranges': [(0, 2, 'URL_SEARCH', 'URL_COMPONENT_DECODED')], 'sink_arg': 'gp', 'iframe': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'stack': [{'url': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'function': 'vulnerable_fuzzer', 'col': 26, 'lineno': 7}, {'url': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'function': 'onmousewheel', 'col': 48, 'lineno': 3}]}

URLs with markers to confirm:
http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?marker<>'"
http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?#&marker<>'"

Unique confirmed flows:
	None

Summary:
	#Unique potential: 2
	#Unique confirmed: 0
	#URLs with markers: 2

Place those URLs with markers in the config/sample_target file. The generated URLs with markers are also written to /tmp/urls_to_confirm.txt by the taint parser.

$ cp /tmp/urls_to_confirm.txt config/sample_targets
$ cat config/sample_targets 
http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?marker<>'"
http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?#&marker<>'"

Run SWIPE's Fuzzer again: ./run.sh.
- In around 5 minutes, SWIPE should finish and ./output.txt will be recreated, this time containing confirmed flows, since the vulnerabilities in our example webpage are indeed exploitable:

$ cat output.txt | grep "Unique confirmed:"
        #Unique confirmed: 2

Example webpage for the webarchive:

This experiment will demonstrate that our web archive can archive a live webpage and reproduce its previous state
- You will need the VNC viewer open in order to visualize the effects of this component
The website https://randomwordgenerator.com/picture.php shows a random image every time the user navigates to the page (this page is not maintained by us).
First, put that URL in the config/sample_target file:

$ cat config/sample_targets 
https://randomwordgenerator.com/picture.php

Disable the Fuzzer and DSE for this experiment: Both run-ui-fuzzer and try-alternative-paths should be set to false in config/config.json.
- This will keep the browser window stable and avoid any page interactions.
Make sure that the mitmproxy-archivemode flag is set to true in config/config.json
Run SWIPE ./run.sh. A few minutes later an archive will be created for this website, which we expect will be located at output/82bb3aedccd06f0478e34746e2c5dd2bca042a41e1c397370a0e2797ee707102.warc.gz
- Pay attention to the image that is shown in the VNC window, above "Other Random Generators". Disregard the flows output once SWIPE finishes.
Now, configure the webarchive to be in replay mode, by ensuring the following in config/config.json:
- mitmproxy-archivemode flag is set to false
- mitmproxy-replaymode flag is set to true
Run SWIPE ./run.sh, SWIPE will replay from the previously created webarchive. Feel free to use Passive, Fuzzer or DSE any number of times. Regardless, the image that is loaded will no longer be random, it will be the same that was seen during archive.
- Note that our webarchive attempts to consistently replay all resources like images, scripts and frames, as they were seen during archiving, but it may not be able to solve all sources of non-determinism.

SWIPE's config

Although unnecessary for following the basic artifact evaluation, we now proceed to describe each flag in config/config.json:
- run-ui-fuzzer: Whether Fuzzing is enabled
- try-alternative-paths: Whether DSE is enabled
- taint-enabled-chromium-path: Path to the modified Chromium binary
- chrome-data-dir: Path to Chromium profile folder, in case of stateful crawls
- site-list: Path to a file with the URLs to analyze
- output-dir: Path where most of SWIPE's output will be located
- mitmproxy-host: Host of the server running our webarchive
- mitmproxy-port-instr: We run 2 webarchive instances. One with DSE instrumentation and a non-instrumented version. This is the port for the instrumented one.
- mitmproxy-port-noinstr: The port where the non-instrumented version of our webarchive is running
- mitmproxy-appendmode: During replay, whether to append requests to the webarchive when they are outside what was archived before, instead of just returning 404.
- mitmproxy-archivemode: Whether to archive pages
- mitmproxy-replaymode: Where to replay previously created webarchives, the name of the webarchive that is used is deterministically obtained from the target URL
- mitmproxy-warcPath: Path where archives will be stored
- analysis-script: A JavaScript file describing how to instrument pages before serving them to the browser
- browser_timeout: Maximum amount of time in seconds that the browser can be open
- browser_timeout_after_load: How many seconds to wait after page loading until analysis starts
- max_retries: How many times DSE attempts to visit the page in case of an error
- save-per-x-sites: DSE will permanently store analysis results every X navigations.
- dse-time-budget: Time budget in seconds for DSE. -1 means unlimited
- max_dse_attempts: Maximum number of page visits for DSE
- strip_get_params: Whether to strip GET parameters and fragment values from URLs before analysing
- replay-mode: This allows for replaying fuzzing actions, but such a process is beyond the scope of this artifact
- fuzzer-time-budget: Time budget in seconds for Fuzzer
- fuzzer-idle-time: How many seconds the Fuzzer spends idle on the page after load before fuzzing starts
- fuzzer-progress-timeout: Frequency in seconds that the Fuzzer reports results.
- fuzzer-results-path: Path where Fuzzer results will be stored, including coverage and other metrics
- z3-path: Path to Z3 SMT solver binary to use in DSE
- z3-timeout: Timeout in seconds for Z3 to solve each constraint
- parse-taint-log: Whether to parse taint logs after SWIPE runs,
- stateless: Whether the run should be stateless or stateful
- add_dummy_params: Whether dummy GET parameters should be added to each URL before analysis
- enable_tracking_measurements: Whether tracking measurements are enabled, this is beyond the scope for the artifact
- tracking_script_path: Tracking measurement scripts that is injected on pages
- track_coverage: Whether to track code coverage
- domeventbreakpoints: Whether to measure how many and what event handlers are executed on the page
- clean-intermediate-results: Whether to remove intermediate results after a SWIPE run
- compress-results: Whether to compress results after a SWIPE run
- fuzzer-symexec-enabled: Whether to enable symbolic execution during fuzzing, this is beyond the scope for the artifact
- daemon-startup-time: How many seconds to wait for SWIPE to setup before analysis starts
- jalangi2-webextension-dir: Path to jalangi's web extension
- chrome-csp-disable-dir: Path to Chrome CSP disable extension
- all-flow-path: File where DSE constraints summary is stored
- script-dir: Path where SWIPE's main scripts are located

Feel free to play around with SWIPE with your own created webpages. Be responsible when running SWIPE against live webpages.

Artifact Structure:

We also provide a description of the most relevant folders and files contained in this artifact:

analysis/ - Contains our implementation of symbolic execution on top of Jalangi instrumentation
packages/ - Has most of SWIPE's dependencies
- chrome-csp-disable: An extension to disable CSP
- chromium: A Chromium binary that was modified to use dynamic taint analysis
- jalangi2: Our webachive implementation, written as a mitmproxy plugin
- z3: The SMT solver we use to solve symbolic execution constraints
scripts/swipe - Contains most SWIPE's source code
- config - Configuration files for SWIPE
  - config.json - Main JSON configuration file for SWIPE
  - sample_targets - Default file where target URLs are placed to be analyzed. It can contain more than one URL
- src - Contains main SWIPE components
  - constraint/: Main DSE code for processing constraints
  - driver/: Our own driver that interacts with Chrome Devtools Protocol
  - fuzz/: Where Fuzzing source code is located
    - action_executor.py: Code that triggers event handlers
    - coordinator.py: Fuzzer engine
  - replay/: Replaying mechanism for confirming vulnerabilities
  - executor.py: Main SWIPE engine, including Passive
tests/: Contains pages that we use to test SWIPE
selection/: Data and scripts used for precrawling Tranco to collect the core dataset

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
analyses		analyses
core_dataset_collection		core_dataset_collection
logs		logs
packages		packages
scripts		scripts
srv		srv
tests		tests
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
entrypoint.sh		entrypoint.sh
run_xvfb.sh		run_xvfb.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SWIPE

Prerequisites

Installation (with Docker)

Minimal working example

Example webpage for Passive, Fuzzer and DSE

Initializing SWIPE

Running Passive

Interpreting the flows summary in output.txt:

Running DSE

Running Fuzzer

Confirming a potential vulnerability found by the Fuzzer

Example webpage for the webarchive:

SWIPE's config

Artifact Structure:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SWIPE

Prerequisites

Installation (with Docker)

Minimal working example

Example webpage for Passive, Fuzzer and DSE

Initializing SWIPE

Running Passive

Interpreting the flows summary in output.txt:

Running DSE

Running Fuzzer

Confirming a potential vulnerability found by the Fuzzer

Example webpage for the webarchive:

SWIPE's config

Artifact Structure:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages