Skip to content

Command line tool to parse XML files and evaluate queries

License

Notifications You must be signed in to change notification settings

mayank-02/xmlalchemy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Contributions welcome License

XMLAlchemy is a command line tool that can be used to parse XML files and evaluate queries. It supports the following operations:

  1. Evaluate XPath expressions.
  2. Evaluate XQuery expressions.
  3. Rewrite certain class of XQuery expressions and optimize its evaluation.

XPath and XQuery references specifications can be found here, where as join optimizations can be found here.

Setup

  1. mvn clean install -U to install the dependencies.
  2. mvn test to run the tests.
  3. mvn clean package to package the project and create the jar file.
  4. java -jar target/xmlalchemy-1.0.0.jar --help to run the program.

Authors

  1. Mayank Jain
  2. Jonathan Woenardi

Architecture

Directory Structure

├── main                                   # Main source code
│   ├── antlr4                             # ANTLR4 grammar files
│   │   └── edu/ucsd/xmlalchemy
│   ├── java
│   │   └── edu/ucsd/xmlalchemy            # Main package
│   │                ├── xpath             # Classes for XPath expressions
│   │                ├── xquery            # Classes for XQuery expressions
│   │                ├── Expression.java   # Interface which all other expressions implement
│   │                ├── Formatter.java    # Format XQuery expressions
│   │                ├── Optimizer.java    # Rewrite and optimize XQuery expressions
│   │                ├── Visitor.java      # Parses query and constructs IR/expressions
│   │                ├── XPath.java        # XPath CLI
│   │                └── XQuery.java       # XQuery CLI
│   └── resources/style.xslt               # Style file for formatting XML output
├── test                                   # Test source code
│   ├── java
│   │   └── edu/ucsd/xmlalchemy            # Main package
│   └── resources
│        └── milestone{1,2,3}              # Test cases for each milestone
│           ├── document                   # XML files
│           ├── input                      # Input queries
│           └── output                     # Expected output and rewritten queries
├── README.md                              # This file
└── pom.xml                                # Maven configuration file

Program Flow

Program Flow

Highlights

  1. Internal representation for all kinds of XQuery constructs which provides modularity, extensibility, and rewriting without string manipulation.
  2. Implement Wong-Youseffi algorithm for join order optimization.
  3. Optimize hash-based join by choosing the smaller table to build the hash table.
  4. Cache file reads for better performance.
  5. Implement custom serializer and formatter for XQuery queries.
  6. CLI for evaluating XPath and XQuery expressions with support for output and optimize flags.
  7. Maven project setup for dependency management and build automation.
  8. 100% test coverage powered by a comprehensive and fully-automated test suite with ~100 test cases testing evaluation, serialization and query rewriting.

References

  1. ANTLR tutorials and best practices:
  2. XPath and XQuery Semantics
  3. Join Optimizations
  4. W3C Document and Node API:
  5. BaseX and xpather.com for XPath and XQuery reference evaluation.
  6. Debugger for Java in Visual Studio Code.