SitemapXMLParser

How to read a websites Sitemap.XML with C# and parse it's contents.

An XML Sitemap is a specially structured XML file which provides important structural information of a website to search engine crawlers for indexing purposes. The basic sitemap structure looks like this.

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

<url>
<loc>https://lonewolfonline.net/</loc>
<priority>1.0</priority>
<lastmod>2010-09-14</lastmod>
<changefreq>daily</changefreq>
</url>

<url>
<loc>https://lonewolfonline.net/simple-xml-parser/</loc>
<priority>0.5</priority>
<lastmod>2009-09-14</lastmod>
<changefreq>monthly</changefreq>
</url>

</urlset>

Individual <url> tags are wrapped inside the containing <urlset> nodes. Each <url> represents a page on the site. Inside the <url> node, are four nodes.

The <loc> node represents the page url.

The <priority> node represents the webmaster defined site map priority.

The <lastmod> node represents the date which the page was last modified..

The <changefreq> node indicates how often the page is updated and makes a suggestion to the search engine how often to crawl it again.

Writing a Simple XML Parser in C#

For this example I am creating a small console application and outputting the resutls to the screen. I am also reading the sitemap from a file, but you can just as easily download files from a website instead.

Full tutorial - https://lonewolfonline.net/simple-xml-parser/

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Properties		Properties
.gitignore		.gitignore
App.config		App.config
Program.cs		Program.cs
README.md		README.md
Sitemap.xml		Sitemap.xml
SitemapXMLParser.csproj		SitemapXMLParser.csproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Properties

Properties

.gitignore

.gitignore

App.config

App.config

Program.cs

Program.cs

README.md

README.md

Sitemap.xml

Sitemap.xml