# File formats

* Configuration files (`*.ini`)
* CSV
* XML
* JSON: see [REST](REST.ipynb)
* Binary files: see [Binary](Binary.ipynb)
* Network configuration (`.netrc`): see module [netrc](https://docs.python.org/3/library/netrc.html)

# Configuration files

* Configuration files are very similar to `*.ini` files under Windows.
* They are seperated in sections that contain options.
* The `configparser` module provides functions to access them.
* They seperate code and configuration.
* They can be modified by none developers, e.g. operations.

## Example configuration

```ini
# Example configuration file.

# Application settings
[myapp]
connection=driver={SQL Server};server=someserver;database=SomeDatabase;uid=me;pwd=1234;
timeout=10
fullscreen=true
```

## Example application

In [1]:
import configparser

class MyApp(object):
    def __init__(self, config_path):
        # Setup config with defaults.
        config = configparser.ConfigParser()
        config['myapp'] = {}
        config['myapp']['fullscreen'] = 'false'
        config['myapp']['timeout'] = '60'

        # Update config from file.
        with open(config_path, 'r', encoding='utf-8') as config_file:
            config.read_file(config_file)

        # Assign config values to attributes (using the proper type).
        self.connection = config.get('myapp', 'connection')
        self.is_fullscreen = config.getboolean('myapp', 'fullscreen')
        self.timeout = config.getint('myapp', 'timeout')

    def work(self):
        print('connection={}'.format(self.connection))
        print('is_fullscreen={}'.format(self.is_fullscreen))
        print('timeout={}'.format(self.timeout))

## Example application

In [2]:
import os.path

config_path = os.path.join('examples', 'myapp.cfg')
myapp = MyApp(config_path)
myapp.work()

connection=driver={SQL Server};server=someserver;database=SomeDatabase;uid=me;pwd=1234;
is_fullscreen=True
timeout=10


## Error handling

The quality of error messages varies.

In [3]:
config = configparser.ConfigParser()
config['myapp'] = {}
with open(os.path.join('examples', 'myapp.cfg'), 'r', encoding='utf-8') as config_file:
    config.read_file(config_file)

Messages on missing options are specific:

In [4]:
try:
    config.getint('myapp', 'no_such_option')
except configparser.Error as error:
    print(error)

No option 'no_such_option' in section: 'myapp'


Messages on wrong types not so much:

In [5]:
try:
    config.getboolean('myapp', 'timeout')
except ValueError as error:
    print(error)

Not a boolean: 10


# CSV

* "comma separated values"
* for tabular data
* delimiter can be a comma (,) but also others e.g. semicolon (`;`) or tab (`\t`)
* special characters and newlines must to be quoted (see [RFC4180](http://www.rfc-editor.org/rfc/rfc4180.txt))
* to exchange data accross platforms and systems
* for data driven tests

## Example CSV file

```
name;size;date_of_birth
Alice;172;1987-03-11
Bob;168;1976-04-27
Bärbel;;1991-02-15
```

## Reading a CSV as list

In [6]:
# %load examples/csvlist.py
import csv
import os.path

csv_path = os.path.join('examples', 'persons.csv')
with open(csv_path, 'r', encoding='utf-8', newline='') as csv_file:
    for items in csv.reader(csv_file, delimiter=';'):
        print(items)


['name', 'size', 'date_of_birth']
['Alice', '172', '1987-03-11']
['Bob', '168', '1976-04-27']
['Bärbel', '', '1991-02-15']


## Reading a CSV as dictionary

In [7]:
# %load examples/csvdict.py
import csv
import os.path

csv_path = os.path.join('examples', 'persons.csv')
with open(csv_path, 'r', encoding='utf-8', newline='') as csv_file:
    for person in csv.DictReader(csv_file, delimiter=';'):
        print(person)


{'name': 'Alice', 'date_of_birth': '1987-03-11', 'size': '172'}
{'name': 'Bob', 'date_of_birth': '1976-04-27', 'size': '168'}
{'name': 'Bärbel', 'date_of_birth': '1991-02-15', 'size': ''}


## CSV data type

* Values read from CSV are strings.
* Empty items are empty strings (`''`).
* Missing items are `None` (too few columns for `DictReader`)
* For other types, conversion is necessary (see [Conversion](Conversion.ipynb)

In [8]:
person['name']

'Bärbel'

In [9]:
int(person['size']) if person['size'] not in ('', None) else 0

0

In [10]:
from datetime import datetime
datetime.strptime(person['date_of_birth'], '%Y-%m-%d') \
    if person['date_of_birth'] not in ('', None) else None

datetime.datetime(1991, 2, 15, 0, 0)

# XML

* eXtensible Markup Language
* a blueprint for other file formats
* can represent sequences and hierarchies
* text based (binary somewhat possible using e.g. UUEncode)
* human readable
* somewhat verbose
* supports a Document Object Model (DOM)

## XML with Python

* [`xml`](https://docs.python.org/3/library/xml.html): part if the standard library
  * `xml.etree.ElementTree` - XML as pythonic Trees
  * `xml.dom.mindom` - DOM, warts and all
  * `xml.sax` - sequential parsing of large documents
  * works, but has limited support for namespaces, XPath etc.
* `lxml`: available from http://lxml.de/
  * Python wrapper to C based XML libraries
  * full support for namespaces, XPath, schemas etc
  * universally used for "serious" XML processing

## Example XML file

```xml
<?xml version="1.0" encoding="utf-8"?>
<people:list xmlns:people="https://www.example.org/xml/people">
   <people:updated date="2016-02-16" />
   <people:person name="Alice" phone="0650/12345678" size="172" />
   <people:person name="Bob" phone="0654/23456789" size="167" />
   <people:person name="Bärbel" phone="0699/34567890" size="182" />
   <people:person name="Günther" size="172">
      <people:note>Ask for phone number.</people:note>
   </people:person>
</people:list>
```

## XML namespaces

In our example

> `xmlns:people="https://www.example.org/xml/people"`

assigns the shortcut `people` to the namespace identified by `https://www.example.org/xml/people`.

## XPath

XPath is a query language to find nodes in XML documents. Examples:

* `/people:list/people:person` - all `person` elements in the document
* `/people:list/people:person[@phone]` - all `person` elements in the document with a `phone` attribute

Tutorial: http://www.w3schools.com/xsl/xpath_intro.asp

## Extract information from XML

Compute the path to our example XML file:

In [11]:
import os.path
people_xml_path = os.path.join('examples', 'people.xml')

Build the document root from the file:

In [12]:
from lxml import etree
people_root = etree.parse(people_xml_path)

## Setup the namespace

In [13]:
NAMESPACES = {
    'people': 'https://www.example.org/xml/people',
}

## Find persons and print details

In [14]:
# Find persons matching XPath.
person_elements = people_root.xpath(
    '/people:list/people:person[@phone]',
    namespaces=NAMESPACES)

# Print name and phone of persons found.
for person_element in person_elements:
    print(
        person_element.attrib['name'] + ': ' +
        person_element.attrib['phone'])

Alice: 0650/12345678
Bob: 0654/23456789
Bärbel: 0699/34567890


## Examining XML elements

Elements have a `tag`, where namespaces are represente using the [Clark notation](http://www.jclark.com/xml/xmlns.htm) `{namespace}tag`:

In [15]:
person_element.tag

'{https://www.example.org/xml/people}person'

XML attributes are a simlpe dictionary:

In [16]:
person_element.attrib

{'phone': '0699/34567890', 'name': 'Bärbel', 'size': '182'}

## Text nodes

Print notes about persons without a phone:

In [17]:
note_elements_for_persons_without_phone = \
    people_root.xpath(
        '/people:list/people:person[not(@phone)]/people:note',
        namespaces=NAMESPACES)

for note_element in note_elements_for_persons_without_phone:
    person_element = note_element.getparent()
    person_name = person_element.attrib['name']
    note_text = note_element.text
    print(person_name + ': ' + note_text)

Günther: Ask for phone number.


Use `getparent()` to access the enclosing XML element (as seen above).

# Summary

* Configuration files are useful to separate configuration from code.
* CSV is useful to store tabular data.
* XML is useful for hierarchical data. Use `lxml` for serious XML processing.