# Python Biblioteka Standardowa - serializacja CSV - zadania

In [14]:
from csv import DictReader, DictWriter, QUOTE_ALL

## Serialization CSV DictReader

1. Użyj danych z sekcji "Input" (patrz poniżej)
1. Pobierz plik https://python.astrotech.io/_static/iris.csv i zapisz go jako ``iris.csv`` w katalogu ze skryptami
1. Korzystając z ``csv.DictReader`` wczytaj zawartość pliku
1. Podaj jawnie ``encoding``, ``delimiter`` oraz ``quotechar``
1. Podmień nazwy kolumn na ``FIELDNAMES``
1. Pomiń pierwszą linię (nagłówek)
1. Wypisz wiersze z danymi
1. Porównaj wyniki z sekcją "Output" (patrz poniżej)

Output:
```python
{'Sepal Length': '5.4', 'Sepal Width': '3.9', 'Petal Length': '1.3', 'Petal Width': '0.4', 'Species': 'setosa'}
{'Sepal Length': '5.9', 'Sepal Width': '3.0', 'Petal Length': '5.1', 'Petal Width': '1.8', 'Species': 'virginica'}
{'Sepal Length': '6.0', 'Sepal Width': '3.4', 'Petal Length': '4.5', 'Petal Width': '1.6', 'Species': 'versicolor'}
...
```

In [1]:
FIELDNAMES = [
    'Sepal Length',
    'Sepal Width',
    'Petal Length',
    'Petal Width',
    'Species',
]

In [9]:
FILE = r'iris.csv'
result = []

with open(FILE, encoding='utf-8') as fp:
    file = DictReader(fp, fieldnames=FIELDNAMES, delimiter=',', quotechar='"')
    _ = next(file)
    for line in file:
        result.append(line)
        
result

[{'Sepal Length': '5.4',
  'Sepal Width': '3.9',
  'Petal Length': '1.3',
  'Petal Width': '0.4',
  'Species': 'setosa'},
 {'Sepal Length': '5.9',
  'Sepal Width': '3.0',
  'Petal Length': '5.1',
  'Petal Width': '1.8',
  'Species': 'virginica'},
 {'Sepal Length': '6.0',
  'Sepal Width': '3.4',
  'Petal Length': '4.5',
  'Petal Width': '1.6',
  'Species': 'versicolor'},
 {'Sepal Length': '7.3',
  'Sepal Width': '2.9',
  'Petal Length': '6.3',
  'Petal Width': '1.8',
  'Species': 'virginica'},
 {'Sepal Length': '5.6',
  'Sepal Width': '2.5',
  'Petal Length': '3.9',
  'Petal Width': '1.1',
  'Species': 'versicolor'},
 {'Sepal Length': '5.4',
  'Sepal Width': '3.9',
  'Petal Length': '1.3',
  'Petal Width': '0.4',
  'Species': 'setosa'},
 {'Sepal Length': '5.5',
  'Sepal Width': '2.6',
  'Petal Length': '4.4',
  'Petal Width': '1.2',
  'Species': 'versicolor'},
 {'Sepal Length': '5.7',
  'Sepal Width': '2.9',
  'Petal Length': '4.2',
  'Petal Width': '1.3',
  'Species': 'versicolor'},
 {

## Serialization CSV DictWriter

1. Użyj danych z sekcji "Input" (patrz poniżej)
1. Za pomocą ``csv.DictWriter()`` zapisz ``DATA`` do pliku
1. Spróbuj otworzyć plik w arkuszu kalkulacyjnym tj. Microsoft Excel / Libre Office / Numbers itp
1. Spróbuj otworzyć plik w IDE i prostym edytorze tekstu tj. Notepad, vim lub gedit
1. Porównaj wyniki z sekcją "Output" (patrz poniżej)
1. Wymagania niefunkcjonalne:

    * Wszystkie pola muszą być otoczone znakiem cudzysłowu ``"``
    * Użyj ``,`` do oddzielenia kolumn
    * Użyj kodowania ``utf-8``
    * Użyj zakończenia linii Unix ``\n``

Output:
```text
"firstname","lastname"
"Jan","Twardowski"
"José","Jiménez"
"Mark","Watney"
"Ivan","Ivanovic"
"Melissa","Lewis"
```

In [16]:
DATA = [
    {'firstname': 'Jan',  'lastname': 'Twardowski'},
    {'firstname': 'José', 'lastname': 'Jiménez'},
    {'firstname': 'Mark', 'lastname': 'Watney'},
    {'firstname': 'Ivan', 'lastname': 'Ivanovic'},
    {'firstname': 'Melissa', 'lastname': 'Lewis'},
]

FILE = '_temporary.csv'

with open(FILE, mode='w', encoding='utf-8') as file:
    writer = DictWriter(file, fieldnames=DATA[0].keys(), quoting=QUOTE_ALL, quotechar='"', delimiter=',', lineterminator='\n')
    writer.writeheader()
    writer.writerows(DATA)

In [18]:
!cat $FILE
# !type $FILE  ## dla Windows

"firstname","lastname"
"Jan","Twardowski"
"José","Jiménez"
"Mark","Watney"
"Ivan","Ivanovic"
"Melissa","Lewis"


## Serialization CSV List of Tuples

1. Użyj danych z sekcji "Input" (patrz poniżej)
1. Za pomocą ``csv.DictWriter()`` zapisz ``DATA`` do pliku
1. Porównaj wyniki z sekcją "Output" (patrz poniżej)
1. Wymagania niefunkcjonalne:

    * Nie używaj cudzysłowów w wynikowym pliku CSV
    * Użyj ``,`` do oddzielenia kolumn
    * Użyj kodowania ``utf-8``
    * Użyj zakończenia linii Unix ``\n``

Output:
```text
Sepal length,Sepal width,Petal length,Petal width,Species
5.8,2.7,5.1,1.9,virginica
5.1,3.5,1.4,0.2,setosa
5.7,2.8,4.1,1.3,versicolor
6.3,2.9,5.6,1.8,virginica
6.4,3.2,4.5,1.5,versicolor
4.7,3.2,1.3,0.2,setosa
7.0,3.2,4.7,1.4,versicolor
7.6,3.0,6.6,2.1,virginica
4.9,3.0,1.4,0.2,setosa
4.9,2.5,4.5,1.7,virginica
7.1,3.0,5.9,2.1,virginica
4.6,3.4,1.4,0.3,setosa
5.4,3.9,1.7,0.4,setosa
5.7,2.8,4.5,1.3,versicolor
5.0,3.6,1.4,0.3,setosa
5.5,2.3,4.0,1.3,versicolor
6.5,3.0,5.8,2.2,virginica
6.5,2.8,4.6,1.5,versicolor
6.3,3.3,6.0,2.5,virginica
6.9,3.1,4.9,1.5,versicolor
4.6,3.1,1.5,0.2,setosa
```

In [19]:
DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
    (7.0, 3.2, 4.7, 1.4, 'versicolor'),
    (7.6, 3.0, 6.6, 2.1, 'virginica'),
    (4.9, 3.0, 1.4, 0.2, 'setosa'),
    (4.9, 2.5, 4.5, 1.7, 'virginica'),
    (7.1, 3.0, 5.9, 2.1, 'virginica'),
    (4.6, 3.4, 1.4, 0.3, 'setosa'),
    (5.4, 3.9, 1.7, 0.4, 'setosa'),
    (5.7, 2.8, 4.5, 1.3, 'versicolor'),
    (5.0, 3.6, 1.4, 0.3, 'setosa'),
    (5.5, 2.3, 4.0, 1.3, 'versicolor'),
    (6.5, 3.0, 5.8, 2.2, 'virginica'),
    (6.5, 2.8, 4.6, 1.5, 'versicolor'),
    (6.3, 3.3, 6.0, 2.5, 'virginica'),
    (6.9, 3.1, 4.9, 1.5, 'versicolor'),
    (4.6, 3.1, 1.5, 0.2, 'setosa'),
]

In [23]:
header, *data = DATA
result = []

for row in data:
    pairs = zip(header, row)
    result.append(dict(pairs))
    


FILE = '_temporary.csv'

with open(FILE, mode='w', encoding='utf-8') as file:
    writer = DictWriter(file, fieldnames=result[0].keys(), quoting=QUOTE_ALL, quotechar='"', delimiter=',', lineterminator='\n')
    writer.writeheader()
    writer.writerows(result)

In [24]:
!cat $FILE

"Sepal length","Sepal width","Petal length","Petal width","Species"
"5.8","2.7","5.1","1.9","virginica"
"5.1","3.5","1.4","0.2","setosa"
"5.7","2.8","4.1","1.3","versicolor"
"6.3","2.9","5.6","1.8","virginica"
"6.4","3.2","4.5","1.5","versicolor"
"4.7","3.2","1.3","0.2","setosa"
"7.0","3.2","4.7","1.4","versicolor"
"7.6","3.0","6.6","2.1","virginica"
"4.9","3.0","1.4","0.2","setosa"
"4.9","2.5","4.5","1.7","virginica"
"7.1","3.0","5.9","2.1","virginica"
"4.6","3.4","1.4","0.3","setosa"
"5.4","3.9","1.7","0.4","setosa"
"5.7","2.8","4.5","1.3","versicolor"
"5.0","3.6","1.4","0.3","setosa"
"5.5","2.3","4.0","1.3","versicolor"
"6.5","3.0","5.8","2.2","virginica"
"6.5","2.8","4.6","1.5","versicolor"
"6.3","3.3","6.0","2.5","virginica"
"6.9","3.1","4.9","1.5","versicolor"
"4.6","3.1","1.5","0.2","setosa"


## Serialization CSV Schemaless

1. Użyj danych z sekcji "Input" (patrz poniżej)
1. Za pomocą ``csv.DictWriter()`` zapisz do pliku CSV dane o zmiennej strukturze
1. ``fieldnames`` musi być generowane automatycznie na podstawie ``DATA``
1. ``fieldnames`` ma być zawsze w takiej samej kolejności
1. Porównaj wyniki z sekcją "Output" (patrz poniżej)
1. Wymagania niefunkcjonalne:

    * Wszystkie pola muszą być otoczone znakiem cudzysłowu ``"``
    * Użyj ``,`` do oddzielenia kolumn
    * Użyj kodowania ``utf-8``
    * Użyj zakończenia linii Unix ``\n``

Output:
```csv
"Petal length", "Petal width", "Sepal length", "Sepal width", "Species"
"", "", "5.1", "3.5", "setosa"
"4.1", "1.3", "", "", "versicolor"
"", "1.8", "6.3", "", "virginica"
"", "0.2", "5.0", "", "setosa"
"4.1", "", "", "2.8", "versicolor"
"", "1.8", "", "2.9", "virginica"
```

In [25]:
DATA = [
    {'Sepal length': 5.1, 'Sepal width': 3.5, 'Species': 'setosa'},
    {'Petal length': 4.1, 'Petal width': 1.3, 'Species': 'versicolor'},
    {'Sepal length': 6.3, 'Petal width': 1.8, 'Species': 'virginica'},
    {'Sepal length': 5.0, 'Petal width': 0.2, 'Species': 'setosa'},
    {'Sepal width': 2.8, 'Petal length': 4.1, 'Species': 'versicolor'},
    {'Sepal width': 2.9, 'Petal width': 1.8, 'Species': 'virginica'},
]

In [49]:
%%timeit -r 100 -n 1000

fieldnames = []

for row in DATA:
    fieldnames.extend(list(row.keys()))
    
fieldnames = set(fieldnames)
fieldnames

3.47 µs ± 330 ns per loop (mean ± std. dev. of 100 runs, 1000 loops each)


In [50]:
%%timeit -r 100 -n 1000

fieldnames = []

for row in DATA:
    for key in row.keys():
        fieldnames.append(key)
    
fieldnames = set(fieldnames)
fieldnames

3.4 µs ± 260 ns per loop (mean ± std. dev. of 100 runs, 1000 loops each)


In [51]:
%%timeit -r 100 -n 1000

fieldnames = []

for row in DATA:
    for key in row.keys():
        if key not in fieldnames:
            fieldnames.append(key)
    
fieldnames

3.02 µs ± 208 ns per loop (mean ± std. dev. of 100 runs, 1000 loops each)


In [52]:
%%timeit -r 100 -n 1000

fieldnames = {key
              for row in DATA
              for key in row.keys()}
    
fieldnames

1.99 µs ± 230 ns per loop (mean ± std. dev. of 100 runs, 1000 loops each)


In [53]:
%%timeit -r 100 -n 1000

fieldnames = set()
fieldnames.update(key
              for row in DATA
              for key in row.keys())
    
fieldnames

3.02 µs ± 254 ns per loop (mean ± std. dev. of 100 runs, 1000 loops each)


In [54]:
%%timeit -r 100 -n 1000

fieldnames = set()

for row in DATA:
    for key in row.keys():
        fieldnames.add(key)
    
fieldnames

2.83 µs ± 191 ns per loop (mean ± std. dev. of 100 runs, 1000 loops each)


In [55]:
%%timeit -r 100 -n 1000

fieldnames = set()

for row in DATA:
    fieldnames.update(row.keys())
    
fieldnames

2.1 µs ± 181 ns per loop (mean ± std. dev. of 100 runs, 1000 loops each)


In [56]:
%%timeit -r 100 -n 1000

fieldnames = set()

for row in DATA:
    fieldnames.update(row)
    
fieldnames

1.21 µs ± 119 ns per loop (mean ± std. dev. of 100 runs, 1000 loops each)


In [57]:
FILE = '_temporary.csv'

with open(FILE, mode='w', encoding='utf-8') as file:
    writer = DictWriter(file, fieldnames=fieldnames, quoting=QUOTE_ALL, quotechar='"', delimiter=',', lineterminator='\n')
    writer.writeheader()
    writer.writerows(result)

## Serialization CSV Objects

1. Użyj danych z sekcji "Input" (patrz poniżej)
1. Za pomocą ``csv.DictWriter()`` zapisz dane do pliku CSV
1. Wymagania niefunkcjonalne:

    * Wszystkie pola muszą być otoczone znakiem cudzysłowu ``"``
    * Użyj ``,`` do oddzielenia kolumn
    * Użyj kodowania ``utf-8``
    * Użyj zakończenia linii Unix ``\n``

In [60]:
class Iris:
    def __init__(self, sepal_length, sepal_width,
                 petal_length, petal_width, species):

        self.sepal_length = sepal_length
        self.sepal_width = sepal_width
        self.petal_length = petal_length
        self.petal_width = petal_width
        self.species = species


DATA = [
    Iris(5.1, 3.5, 1.4, 0.2, 'setosa'),
    Iris(5.8, 2.7, 5.1, 1.9, 'virginica'),
    Iris(5.1, 3.5, 1.4, 0.2, 'setosa'),
    Iris(5.7, 2.8, 4.1, 1.3, 'versicolor'),
    Iris(6.3, 2.9, 5.6, 1.8, 'virginica'),
    Iris(6.4, 3.2, 4.5, 1.5, 'versicolor'),
]


with open(FILE, mode='w', encoding='utf-8') as file:
    writer = DictWriter(file,
                        fieldnames=list(DATA[0].__dict__.keys()),
                        quoting=QUOTE_ALL,
                        quotechar='"',
                        delimiter=',',
                        lineterminator='\n')
    
    writer.writeheader()

    for iris in DATA:
        writer.writerow(iris.__dict__)

In [64]:
!cat $FILE

"sepal_length","sepal_width","petal_length","petal_width","species"
"5.1","3.5","1.4","0.2","setosa"
"5.8","2.7","5.1","1.9","virginica"
"5.1","3.5","1.4","0.2","setosa"
"5.7","2.8","4.1","1.3","versicolor"
"6.3","2.9","5.6","1.8","virginica"
"6.4","3.2","4.5","1.5","versicolor"


## Serialization CSV Relations

1. Użyj danych z sekcji "Input" (patrz poniżej)
1. Za pomocą ``csv.DictWriter()`` zapisz kontakty z książki adresowej w pliku
1. Jak zapisać w CSV dane relacyjne (kontakt ma wiele adresów)?
1. Odtwórz strukturę obiektów na podstawie danych odczytanych z pliku
1. Wymagania niefunkcjonalne:

    * Wszystkie pola muszą być otoczone znakiem cudzysłowu ``"``
    * Użyj ``;`` do oddzielenia kolumn
    * Użyj kodowania ``utf-8``
    * Użyj zakończenia linii Unix ``\n``

In [81]:
class Contact:
    def __init__(self, firstname, lastname, addresses=()):
        self.firstname = firstname
        self.lastname = lastname
        self.addresses = addresses


class Address:
    def __init__(self, location, city):
        self.location = location
        self.city = city


DATA = [
    Contact(firstname='Jan', lastname='Twardowski', addresses=(
        Address(location='Johnson Space Center', city='Houston, TX'),
        Address(location='Kennedy Space Center', city='Merritt Island, FL'),
        Address(location='Jet Propulsion Laboratory', city='Pasadena, CA'),
    )),
    Contact(firstname='Mark', lastname='Watney'),
    Contact(firstname='Melissa', lastname='Lewis', addresses=()),
]

    
with open(FILE, mode='w', encoding='utf-8') as file:
    writer = DictWriter(file,
                        fieldnames=list(DATA[0].__dict__.keys()),
                        quoting=QUOTE_ALL,
                        quotechar='"',
                        delimiter=',',
                        lineterminator='\n')
    
    writer.writeheader()

    for contact in DATA:
        contact.addresses = [', '.join(addr.__dict__.values()) for addr in contact.addresses]
        contact.addresses = ';'.join(contact.addresses)
        writer.writerow(contact.__dict__)

In [82]:
!cat $FILE

"firstname","lastname","addresses"
"Jan","Twardowski","Johnson Space Center, Houston, TX;Kennedy Space Center, Merritt Island, FL;Jet Propulsion Laboratory, Pasadena, CA"
"Mark","Watney",""
"Melissa","Lewis",""
