This is the artifact for SWIPE, a DOM-XSS analysis infrastructure comprising the following components:
- Passive: The baseline replicating passive navigation on a page
- Fuzzer: It simulates user interactions on the webpage once it finishes loading
- DSE: A symbolic execution engine that synthesizes GET parameters
- Webarchive: A component that aids with stability of results, allowing for archiving pages and replaying previously created archives using other components.
This artifact is associated with the NDSS'26 paper, #1467 DOM-XSS Detection via Webpage Interaction Fuzzing and URL Component Synthesis.
We recommend at least 4GB of RAM and 4 cores to run SWIPE smoothly. You will need about 30GB of storage to install SWIPE and follow these instructions without issues. There is no GPU requirement.
In terms of software, docker is required, unzip will be useful to extract the artifact and optionally a VNC viewer like Tiger VNC can be used to visualize the browser while the tool is working on a page and evaluate the webarchive component.
SWIPE was tested on macOS and Ubuntu 24.04.
Open a terminal in the root of this project (where the Dockerfile is located). Build the SWIPE image using the following command:
$ docker build --platform=linux/amd64 -t swipe:latest .
This takes ~10 minutes for us on a machine with 8 cores and 32GB RAM. Once it is done, you can confirm the image was built with the following command:
$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
swipe latest c59dc313cfaf 3 minutes ago 9.56GBWe will use a single page to show how to run Passive, Fuzzer and DSE (the symbolic execution component), and then a different page to showcase the webarchive.
In the next few steps, we will launch SWIPE against the page located in the tests/example_page.html file at the root of SWIPE. That page is hosted at http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html. The page contains 3 DOM-XSS vulnerabilities:
- One in
vulnerable_passivefunction, which can be found by Passive because it is called during page initialization - One in
vulnerable_fuzzer, that can only be found by Fuzzer, because it requires a onmousewheel event to be triggered - A third and final DOM-XSS in
vulnerable_dse, that can only be found by DSE, because it requires a certain string to be included in the GET parameters
- First, create a container using the SWIPE image:
docker run --rm -it -p 5550:5550 --entrypoint=bash swipe- Note that this exposes port 5550 to the host, to make it possible to see the browser via VNC
- Next, start a XVFB server on the container to listen to VNC connections. To do that, run this command in the container:
./jalangi2-workspace/run_xvfb.sh- Note that an error message is expected:
_XSERVTransmkdir: ERROR: euid != 0,directory /tmp/.X11-unix will not be created.. You can press ENTER to keep using that terminal.
- Note that an error message is expected:
- Open a VNC viewer in the host and connect to the address
localhost:5550. Use the passwordDEBUG. You should see a blank screen in the VNC viewer, as the browser is not open yet. - Finally, go to SWIPE's main folder in the container (all steps and commands below are supposed to be executed in the container):
cd ~/jalangi2-workspace/scripts/swipe/
- SWIPE is configured to run Passive by default, but you still need to specify which page to analyze:
- Make sure that the URL that we have provided for the example above is placed in the
config/sample_targetsfile:
scan@35ac4b3e72e3:~/jalangi2-workspace/scripts/swipe$ cat config/sample_targets
http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp
- SWIPE looks for target webpages to analyze on that file.
- Run SWIPE with
./run.sh. You should now see the browser open in the VNC window, but please do not interact with the browser. Passive should take around 3 minutes to finish. - Once Passive is finished, SWIPE automatically runs a taint parser, parsing the taint logs that the modified Chromium may have reported. A summary of those results will be printed at the end of SWIPE's output. You can also see that same summary in the file
./output.txt, which should have been created meanwhile.
$ cat output.txt
Unique potential flows:
{'sink': 'JAVASCRIPT', 'ranges': [(0, 2, 'URL_SEARCH', 'URL_COMPONENT_DECODED')], 'sink_arg': 'gp', 'iframe': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'stack': [{'url': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'function': 'vulnerable_passive', 'col': 22, 'lineno': 22}, {'url': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'function': '', 'col': 13, 'lineno': 24}]}
URLs with markers to confirm:
http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?#&marker<>'"
http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?marker<>'"
Unique confirmed flows:
None
Summary:
#Unique potential: 1
#Unique confirmed: 0
#URLs with markers: 2
SWIPE runs a modified browser that implements dynamic taint analysis. Whenever information from a possibly attacker-controlled source (like the URL of a page) can flow to the argument of a dangerous sink (like eval or document.write) then that fact is reported in ./output.txt. That file has the following structure:
- First, a list of unique potential flows. These are flows where information seems to be flowing from a source to a sink, but it is unclear whether the flow is exploitable
- Then, you can find a list of URLs with markers that are constructed in such a way that if you run SWIPE against them, SWIPE will try to confirm that the flow is exploitable by checking whether a certain marker is present on the final sink argument. URLs with markers are only created for certain potential flows.
- Then, a list of confirmed flows, these are flows where the marker was present on the final sink argument
- Finally, a summary is printed with how many flows were found and how many URLs were created.
Each flow has the following structure, regardless of it being confirmed or just potential:
sink: The category of the dangerous sink that the attacker seems to be able to influence. The set of possible values is:JAVASCRIPT: The attacker influences what JavaScript code is executed, by passing information to function likeevalor theFunctionconstructorJAVASCRIPT_EVENT_HANDLER_ATTRIBUTE: The attacker influences what code is executed when a certain event handler is triggeredHTML: The attacker has HTML injection capabilities, controlling the argument to functions likedocument.writeorinner.HTMLassignments.
sink_arg: The final string argument that was passed to the sinkiframe: The URL of the iframe that contained the vulnerable scriptstack: A stacktrace, starting from the final sink call location going upwards in the call chain.ranges: A description of the taint provenance of the tainted bytes in the sink argument. This is a list of 4-tuples, each representing a source with the following structure:- Start index of the tainted bytes in the sink argument
- End index
- Name of the attacker-controlled source from where the tainted bytes come from. This will usually be some part of the URL, like the GET parameters (
URL_SEARCH) or the fragment value (URL_HASH). - Encoding that was used. For a flow to be exploitable, the tainted bytes need to be URL-decoded, as they are automatically encoded by built-in mechanisms of browsers like Chromium.
Thus, after the Passive run, one potential flow will be reported in ./output.txt. The stacktrace in that flow should show that the vulnerability is in the vulnerable_passive function. SWIPE also generated two URLs with markers that can be used to confirm the vulnerability. We will show how to confirm vulnerabilities later.
- To configure SWIPE to run our DSE component, edit the
config/config.jsonfile:- Make sure the Fuzzer is disabled (
"run-ui-fuzzer": false) - Enable DSE: Set
try-alternative-pathstotrue
- Make sure the Fuzzer is disabled (
- Run SWIPE:
./run.sh. DSE should take around 6 minutes to finish../output.txtshould reflect the fact that at least one other flow was found: the one in functionvulnerable_dse:
$ cat output.txt | grep "vulnerable_dse"
{'sink': 'HTML', 'ranges': [(0, 24, 'URL_SEARCH', 'URL_COMPONENT_DECODED')], 'sink_arg': 'BmustcontainthisstringA?', 'iframe': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?BmustcontainthisstringA?', 'stack': ... 'function': 'vulnerable_dse', ...- Note that DSE acts on an instrumented version of the page, which may cause false flows to appear. In our crawls, we always re-analyzed every URL discovered by DSE in a non-instrumented version of the page, to have clean results with respect to flows.
- To configure SWIPE to run our Fuzzer component, similarly to the above, you simply need to edit the
config/config.jsonfile:- Make sure DSE is disabled: (
"try-alternative-paths": false) - Enable Fuzzer: Set
run-ui-fuzzertotrue
- Make sure DSE is disabled: (
- Run
./run.sh. Our Fuzzer should take around 3 minutes to finish../output.txtshould reflect the fact that SWIPE found the vulnerability in thevulnerable_fuzzerfunction.
- First, look at the URLs that SWIPE wants to confirm given the last Fuzzer run:
$ cat output.txt
Unique potential flows:
{'sink': 'JAVASCRIPT', 'ranges': [(0, 2, 'URL_SEARCH', 'URL_COMPONENT_DECODED')], 'sink_arg': 'gp', 'iframe': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'stack': [{'url': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'function': 'vulnerable_passive', 'col': 22, 'lineno': 22}, {'url': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'function': '', 'col': 13, 'lineno': 24}]}
{'sink': 'HTML', 'ranges': [(0, 2, 'URL_SEARCH', 'URL_COMPONENT_DECODED')], 'sink_arg': 'gp', 'iframe': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'stack': [{'url': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'function': 'vulnerable_fuzzer', 'col': 26, 'lineno': 7}, {'url': 'http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?gp', 'function': 'onmousewheel', 'col': 48, 'lineno': 3}]}
URLs with markers to confirm:
http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?marker<>'"
http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?#&marker<>'"
Unique confirmed flows:
None
Summary:
#Unique potential: 2
#Unique confirmed: 0
#URLs with markers: 2- Place those URLs with markers in the
config/sample_targetfile. The generated URLs with markers are also written to/tmp/urls_to_confirm.txtby the taint parser.
$ cp /tmp/urls_to_confirm.txt config/sample_targets
$ cat config/sample_targets
http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?marker<>'"
http://swipeexample.s3-website-eu-west-1.amazonaws.com/example_page.html?#&marker<>'"
- Run SWIPE's Fuzzer again:
./run.sh.- In around 5 minutes, SWIPE should finish and
./output.txtwill be recreated, this time containing confirmed flows, since the vulnerabilities in our example webpage are indeed exploitable:
- In around 5 minutes, SWIPE should finish and
$ cat output.txt | grep "Unique confirmed:"
#Unique confirmed: 2- This experiment will demonstrate that our web archive can archive a live webpage and reproduce its previous state
- You will need the VNC viewer open in order to visualize the effects of this component
- The website
https://randomwordgenerator.com/picture.phpshows a random image every time the user navigates to the page (this page is not maintained by us). - First, put that URL in the
config/sample_targetfile:
$ cat config/sample_targets
https://randomwordgenerator.com/picture.php
- Disable the Fuzzer and DSE for this experiment: Both
run-ui-fuzzerandtry-alternative-pathsshould be set tofalseinconfig/config.json.- This will keep the browser window stable and avoid any page interactions.
- Make sure that the
mitmproxy-archivemodeflag is set totrueinconfig/config.json - Run SWIPE
./run.sh. A few minutes later an archive will be created for this website, which we expect will be located atoutput/82bb3aedccd06f0478e34746e2c5dd2bca042a41e1c397370a0e2797ee707102.warc.gz- Pay attention to the image that is shown in the VNC window, above "Other Random Generators". Disregard the flows output once SWIPE finishes.
- Now, configure the webarchive to be in replay mode, by ensuring the following in
config/config.json:mitmproxy-archivemodeflag is set tofalsemitmproxy-replaymodeflag is set totrue
- Run SWIPE
./run.sh, SWIPE will replay from the previously created webarchive. Feel free to use Passive, Fuzzer or DSE any number of times. Regardless, the image that is loaded will no longer be random, it will be the same that was seen during archive.- Note that our webarchive attempts to consistently replay all resources like images, scripts and frames, as they were seen during archiving, but it may not be able to solve all sources of non-determinism.
- Although unnecessary for following the basic artifact evaluation, we now proceed to describe each flag in
config/config.json:run-ui-fuzzer: Whether Fuzzing is enabledtry-alternative-paths: Whether DSE is enabledtaint-enabled-chromium-path: Path to the modified Chromium binarychrome-data-dir: Path to Chromium profile folder, in case of stateful crawlssite-list: Path to a file with the URLs to analyzeoutput-dir: Path where most of SWIPE's output will be locatedmitmproxy-host: Host of the server running our webarchivemitmproxy-port-instr: We run 2 webarchive instances. One with DSE instrumentation and a non-instrumented version. This is the port for the instrumented one.mitmproxy-port-noinstr: The port where the non-instrumented version of our webarchive is runningmitmproxy-appendmode: During replay, whether to append requests to the webarchive when they are outside what was archived before, instead of just returning 404.mitmproxy-archivemode: Whether to archive pagesmitmproxy-replaymode: Where to replay previously created webarchives, the name of the webarchive that is used is deterministically obtained from the target URLmitmproxy-warcPath: Path where archives will be storedanalysis-script: A JavaScript file describing how to instrument pages before serving them to the browserbrowser_timeout: Maximum amount of time in seconds that the browser can be openbrowser_timeout_after_load: How many seconds to wait after page loading until analysis startsmax_retries: How many times DSE attempts to visit the page in case of an errorsave-per-x-sites: DSE will permanently store analysis results every X navigations.dse-time-budget: Time budget in seconds for DSE. -1 means unlimitedmax_dse_attempts: Maximum number of page visits for DSEstrip_get_params: Whether to strip GET parameters and fragment values from URLs before analysingreplay-mode: This allows for replaying fuzzing actions, but such a process is beyond the scope of this artifactfuzzer-time-budget: Time budget in seconds for Fuzzerfuzzer-idle-time: How many seconds the Fuzzer spends idle on the page after load before fuzzing startsfuzzer-progress-timeout: Frequency in seconds that the Fuzzer reports results.fuzzer-results-path: Path where Fuzzer results will be stored, including coverage and other metricsz3-path: Path to Z3 SMT solver binary to use in DSEz3-timeout: Timeout in seconds for Z3 to solve each constraintparse-taint-log: Whether to parse taint logs after SWIPE runs,stateless: Whether the run should be stateless or statefuladd_dummy_params: Whether dummy GET parameters should be added to each URL before analysisenable_tracking_measurements: Whether tracking measurements are enabled, this is beyond the scope for the artifacttracking_script_path: Tracking measurement scripts that is injected on pagestrack_coverage: Whether to track code coveragedomeventbreakpoints: Whether to measure how many and what event handlers are executed on the pageclean-intermediate-results: Whether to remove intermediate results after a SWIPE runcompress-results: Whether to compress results after a SWIPE runfuzzer-symexec-enabled: Whether to enable symbolic execution during fuzzing, this is beyond the scope for the artifactdaemon-startup-time: How many seconds to wait for SWIPE to setup before analysis startsjalangi2-webextension-dir: Path to jalangi's web extensionchrome-csp-disable-dir: Path to Chrome CSP disable extensionall-flow-path: File where DSE constraints summary is storedscript-dir: Path where SWIPE's main scripts are located
Feel free to play around with SWIPE with your own created webpages. Be responsible when running SWIPE against live webpages.
We also provide a description of the most relevant folders and files contained in this artifact:
analysis/- Contains our implementation of symbolic execution on top of Jalangi instrumentationpackages/- Has most of SWIPE's dependencieschrome-csp-disable: An extension to disable CSPchromium: A Chromium binary that was modified to use dynamic taint analysisjalangi2: Our webachive implementation, written as a mitmproxy pluginz3: The SMT solver we use to solve symbolic execution constraints
scripts/swipe- Contains most SWIPE's source codeconfig- Configuration files for SWIPEconfig.json- Main JSON configuration file for SWIPEsample_targets- Default file where target URLs are placed to be analyzed. It can contain more than one URL
src- Contains main SWIPE componentsconstraint/: Main DSE code for processing constraintsdriver/: Our own driver that interacts with Chrome Devtools Protocolfuzz/: Where Fuzzing source code is locatedaction_executor.py: Code that triggers event handlerscoordinator.py: Fuzzer engine
replay/: Replaying mechanism for confirming vulnerabilitiesexecutor.py: Main SWIPE engine, including Passive
tests/: Contains pages that we use to test SWIPEselection/: Data and scripts used for precrawling Tranco to collect the core dataset