- Choose 4 states from The world factbook and represent them in XML format.
- Develop a DTD and RelaxNG schema to validate the structure of the XML documents.
- Utilize XSLT to generate HTML outputs, including one page for each state and an index that features links to all pages.
- Create a navigation menu for every subpage with CSS.
- Use XSL-FO to generate a PDF output, featuring a single PDF for each state and an additional PDF containing all states. Include headers, footers, and page numbers.
- Add pictures of maps and flags to both the HTML and PDF output.
src
├── author
├── DTDConcatenation
├── fop
├── generatedPdf
├── generatedWeb
│ ├── css
│ ├── html
│ └── javaScript
├── generatedXML
├── htmlParser
├── htmlSourceData
├── images
│ ├── flags
│ └── maps
├── saxon
├── scripts
├── validators
├── xslfo
└── xslt
The following folders are included in this project:
- The author folder holds information about the author in XML format.
- The DTDConcatenation folder contains a DTD file which combines all generated XML files into one.
- The fop folder contains Fop 2.6 software which can be used to convert fo files into pdf.
- The generatedPdf folder consists of generated pdf files that can be created using scripts.
- The generatedWeb folder, which can also be generated by scripts, includes an HTML folder with all generated HTML files for the web.
- The generatedXML folder contains XML files representing states that can be generated by scripts.
- The htmlParser folder has a Nodejs program that parses htmlData and creates an XML representation of them.
- The htmlSourceData folder stores HTML pages downloaded from The World Factbook. These are source files for the HTML parser.
- The images folder contains images used for both the web and PDF.
- The saxon folder contains jar files for Saxon home edition 10.3 software.
- The scripts folder includes bash scripts for generating and validating files.
- The validators folder has DTD and RelaxNG validation schemas.
- The xslfo folder contains fo files for generating PDF, which can be generated by scripts.
- The xslt folder contains XSLT stylesheets for transforming XML files into HTML and fo files.
The following are the software requirements for running the scripts:
- Linux: All scripts run on bash.
- Nodejs: A runtime for JavaScript used by HTML parser.
- Npm: A package manager that should be part of Nodejs.
- xmllint: Used for DTD validation, and is the default tool in many Linux distributions.
- trang: Used for generating an rng file from rnc.
- Java: Used by Saxon (XSLT processor).
To run all the steps at once, follow these instructions:
- Use a bash script to generate and validate files.
- Run all commands from the project root directory.
- Make all scripts executable by running
chmod +x src/scripts/*
- Run the "run all" script.
src/scripts/runAll.sh
- If any XML file is invalid, rerun the script. It will recreate the files and delete the old invalid files.
Please follow these steps in the given order as they rely on each other.
The HTML files downloaded from the internet can be converted to XML files using the htmlParser program. This program is written in JavaScript and run by Nodejs. If the src/generatedXML folder does not exist, the program will create it. However, if it already exists, the folder will be deleted and recreated. The program HTML sources are located in src/htmlSourceData. To perform this step, you need to have Nodejs and npm package manager. Run all commands from the project root directory.
You can generate the XML files automatically using the bash script or manually. To generate XML files automatically:
- Make script executable by running
chmod +x src/scripts/generateXML.sh
- Run script by running
src/scripts/generateXML.sh
To generate XML files manually:
- Go to folder by running
cd src/htmlParser
- Download dependencies by running the following command. This will download the dependencies defined in package.json.
npm install
- Start the program by running
node main.js
To validate your XML files using DTD schema, follow the steps below:
- The DTD schema can be found in src/validators/stateType.dtd.
- Before proceeding, make sure to generate the XML files that you want to validate from the previous step.
- To perform the validation process, you will need to have the software xmllint installed.
- All commands should be run from the project root directory.
- For automatic validation of all states using a bash script, follow these steps:
- Make the script executable by running
chmod +x src/scripts/validateDTD.sh
- Run the script using
src/scripts/validateDTD.sh
- Make the script executable by running
- For manual validation of individual states, use the following command and replace
<xml file>
with the file that you wish to evaluate:xmllint --noout --dtdvalid src/validators/stateType.dtd src/generatedXML/<xml file>
To perform RelaxNG validation, follow these steps:
- The RelaxNG grammar can be found in
src/validators/stateType.rnc
. - Before validating, you must generate the XML file.
- This step requires the software trang and xmllint.
- Make sure to run all commands from the project root directory.
- To automatically validate all states using a bash script, make the script executable by running
and then run the script using
chmod +x src/scripts/validateRelaxNG.sh
src/scripts/validateRelaxNG.sh
- To manually validate individual states, generate the rng file by running
Then validate the state by replacing
trang src/validators/stateType.rnc src/validators/stateType.rng
<xml file>
with the file you want to evaluate, and runningxmllint --noout --relaxng src/validators/stateType.rng src/generatedXML/<xml file>
To concatenate generated files into one, follow these steps:
- The DTD definition of the concatenated file can be found in
src/DTDConcatenation/concatenatedXML.xml
. - The resulting file will be saved in
src/generatedXML/concatenated.xml
. - Before starting, ensure that you have installed xmllint software.
- All commands should be run from the project root directory.
- Automatic concatenation using a bash script
- First, make the script executable by running
chmod +x src/scripts/concatenateStates.sh
- Then, run the script using
src/scripts/concatenateStates.sh
- First, make the script executable by running
- Manual concatenation
- Run the following command:
xmllint --noent src/DTDConcatenation/concatenateXML.xml > src/generatedXML/concatenated.xml
- Run the following command:
- To generate HTML files, you must use XSLT style sheets src/xslt/htmlHomePage.xsl and src/xslt/htmlSateStyle.xsl with XML files in src/generatedXML.
- Once generated, the files will be available in src/generatedWeb/html.
- This process requires Java, and all commands must be run from the project root directory.
Automatic generation using bash script
- Make the script executable by running
chmod +x src/scripts/generateWeb.sh
- Run the script by running
src/scripts/generateWeb.sh
Manual generation
-
Create a directory for output by running
mkdir -p src/generatedWeb/html > /dev/null 2>&1
-
Generate the index by running
java -jar src/saxon/saxon-he-10.3.jar src/generatedXML/concatenated.xml src/xslt/htmlHomePageStyle.xsl > src/generatedWeb/html/index.html
-
Generate the states by running the following command. Remember to replace
<state>
with the name of the state (for example, France or Germany).java -jar src/saxon/saxon-he-10.3.jar src/generatedXML/<state>.xml src/xslt/htmlStateStyle.xsl > src/generatedWeb/html/<state>.html
- PDFs are generated from src/xslfo/*.fo files
- src/xslfo/*.fo files are generated from XML files located in src/generatedXML using src/xslt/pdf*.xsl stylesheets
- Generated pdf files will be in the src/generatedPdf folder.
- This step requires the fop software, which I included in the src/fop directory.
- All commands should be run from the project root directory.
- Automatic generation using a bash script
- Make the script executable by running
chmod +x src/scripts/generatePdf.sh
- Run the script by running
src/scripts/generatePdf.sh'
- Make the script executable by running
- Manual generation
- Create the output directory by running the following command
mkdir -p src/generatedPdf > /dev/null 2>&1` `mkdir -p src/xslfo > /dev/null 2>&1
- Generate fo file from all the states (Germany, UK, Switzerland, France)
java -jar src/saxon/saxon-he-10.3.jar src/generatedXML/concatenated.xml src/xslt/pdfAllStatesStyle.xsl > src/xslfo/allStates.fo
- Generate fo file for single state and don't forget to replace
<state>
by the actual name of the state.java -jar src/saxon/saxon-he-10.3.jar src/generatedXML/<state>.xml src/xslt/pdfStateStyle.xsl > src/xslfo/<state>.fo
- Generate pdf file of all the states
src/fop/fop/fop src/xslfo/allStates.fo src/generatedPdf/allStates.pdf
- Generate pdf file of single state. The
<state>
should be replaced by the actual name of the state.src/fop/fop/fop src/xslfo/<state>.fo src/generatedPdf/<state>.pdf
http://saxon.sourceforge.net/
https://relaxng.org/jclark/trang.html
https://relaxng.org/compact-tutorial-20030326.html
https://www.w3schools.com/xml/default.asp
https://www.w3schools.com/xml/xpath_intro.asp
https://www.w3schools.com/xml/xsl_intro.asp
https://www.w3schools.com/xml/xml_dtd_intro.asp
http://zvon.org/comp/r/tut-XSLT_1.html
https://www.youtube.com/watch?v=W--Yhp0m35A
https://www.youtube.com/watch?v=D2YzF4hm9NM
https://undraw.co/illustrations
https://fonts.google.com/
https://fontawesome.com/account
https://w3schools.sinsixx.com/xslfo/xslfo_lists.asp.htm
https://xmlgraphics.apache.org/fop/
https://www.kosek.cz/xml/schema/rng.html