# XML parsing in Python [Geek4geeks](http://www.geeksforgeeks.org/xml-parsing-python/)

__XML :__ XML stands for eXtensible Markup Language. It was designed to store and transport data. It was designed to be both human- and machine-readable.That’s why, the design goals of XML emphasize simplicity, generality, and usability across the Internet.

The XML file to be parsed in this tutorial is actually a RSS feed. 

## RSS
__RSS(Rich Site Summary, often called Really Simple Syndication)__ uses a family of standard web feed formats to publish frequently updated informationlike blog entries, news headlines, audio, video. RSS is XML formatted plain text.

* The RSS format itself is relatively easy to read both by automated processes and by humans alike.
* The RSS processed in this tutorial is the RSS feed of top news stories from a popular news website. You can check it out here. Our goal is to process this RSS feed (or XML file) and save it in some other format for future use.

In [1]:
import csv
import requests
import xml.etree.ElementTree as ET
from pprint import pprint

## Methods in ElementTree
* __tree = ET.parse(xmlfile)__ : returns tree of the xml file
* __root = tree.getroot()__ : get root element
* __child.tag__ : Use tag attribute to print out the tag attribute of each child element
* __Element.findall()__ :  finds only elements with a tag which are direct children of the current element.
* __Element.find()__ : finds the first child with a particular tag
* __Element.text__ :  accesses the element’s text content. 
* __Element.get()__ : accesses the element’s attributes:


In [2]:
#def loadRSS(url):
    # url of rss feed
    # Crearting HTTP request object from url
    
url =  'http://www.hindustantimes.com/rss/football/rssfeed.xml'
request_data = requests.get(url)    # try converting this data to json data and process
    # saving the xml file
    
with open('topnewsfeed.xml', 'wb') as f:
    f.write(request_data.content)

In [3]:
# create element tree object
tree = ET.parse('topnewsfeed.xml')
pprint(tree)

<xml.etree.ElementTree.ElementTree object at 0x0000007F99C81A90>


In [4]:
# get root element
root = tree.getroot()
root

<Element 'rss' at 0x0000007F99C985E8>

In [5]:
# Here, iterating over children over root element
print("Children of root : \n")
for child in root:
    print(child.tag)  # Use tag attribute to print out the tag attribute of each child element

Children of root : 

channel


In [6]:
root.tag

'rss'

In [7]:
root.attrib  # As an Element, root has a tag and a dictionary of attributes

{'version': '2.0'}

In [18]:
 # create empty list for news items
elements_items = []
# iterate news items
for item in root.findall('./channel/item/'):
    elements_items.append(item.tag)
    news = {}  # empty news dictionary
    for child in item:   # iterate child elements of item
        #special checking for namespace object content:media
        if child.tag == '{http://search.yahoo.com/mrss/}content':
            news['media'] = child.attrib['url']
        else:
            news[child.tag] = child.text.encode('utf8')

        newsitems.append(news)  # append news dictionary to news items list

In [22]:
#pprint(elements_items)        
pprint(newsitems[0])

{'description': b'With the mundanity of transfer paperwork finally resolved, N'
                b'eymar will line up for Paris Saint-Germain for the first tim'
                b'e since his mind-boggling 222 million euro ($261 million) mo'
                b've from FC\xe2\x80\x89Barcelona.',
 'guid': b'http://www.hindustantimes.com/football/neymar-braced-for-french-cult'
         b'ure-shock-in-paris-saint-germain-debut/story-8HEXdfFTv40K3NN67iScpO.'
         b'html',
 'link': b'http://www.hindustantimes.com/football/neymar-braced-for-french-cult'
         b'ure-shock-in-paris-saint-germain-debut/story-8HEXdfFTv40K3NN67iScpO.'
         b'html',
 'media': 'http://www.hindustantimes.com/rf/image_size_630x354/HT/p2/2017/08/13/Pictures/fbl-fra-psg-training_770f924e-801b-11e7-ba32-a280bea68af6.jpg',
 'pubDate': b'Sun, 13 Aug 2017 11:42:19 GMT ',
 'title': b'Neymar braced for French culture shock in Paris Saint-Germain de'
          b'but'}


## Saving the results to a CSV file
### Refer [DictWriter](https://docs.python.org/3/library/csv.html#csv.DictWriter)

In [20]:
filename = "hindustan_times.csv"
fieldnames = ["description", "guid", "link", "media", "pubdate", "title"]

In [23]:
with open(filename,'w') as fh:
    writer = csv.DictWriter(fh, fieldnames)  # creating a csv dict writer object
    writer.writeheader()    # writing headers (field names)
    writer.writerows(newsitems[0])   # writing data rows
       

AttributeError: 'str' object has no attribute 'keys'