In [None]:
CHAPTER 6
Object Serialization
=================================================================================================================
Python’s built-in File object and its methods performing read/write
operations are undoubtedly invaluable, as the ability to store data in a
persistent medium is as important as processing it. However, the File object
returned by Python’s built-in open() function has one important
shortcoming, as you must have noted in the previous chapter.
When opened with ‘w’ mode, the write() method accepts only the string
object. That means, if you have data represented in any non-string form, the
object of either in built-in classes (numbers, dictionary, lists or tuples) or
other user-defined classes, it cannot be written to file directly.

In [None]:
Example 6.1
>>> numbers=[10,20,30,40]
>>> file=open('numbers.txt','w')
>>> file.write(numbers)
Traceback (most recent call last):
File "<pyshell#10>", line 1, in <module>
file.write(numbers)
TypeError: write() argument must be str, not list
>>> p1=person()
>>> class person:
def __init__(self):
self.name='Anil'
>>> p1=person()
>>> file=open('persons.txt','w')
>>> file.write(p1)
Traceback (most recent call last):
File "<pyshell#20>", line 1, in <module>
file.write(p1)
TypeError: write() argument must be str, not person
>>>

In [None]:
Before writing, you need to convert it in its string representation.
Example 6.2
>>>
>>>
>>>
>>>
numbers=[10,20,30,40]
file=open('numbers.txt','w')
file.write(str(numbers))
file.close()
In case of a user-defined class:
Example 6.3
>>> class person:
def __init__(self):
self.name='Anil'
>>> p1=person()
>>> file=open('persons.txt','w')
>>> file.write(p1.__str__())
>>> file.close()
To read back data from the file in the respective data type, reverse
conversion needs to be done.

In [None]:
Example 6.4
>>> data=file.read()
>>> list(data)
[10, 20, 30, 40]
File object with ‘wb’ mode requires bytes object to be provided, as an
argument to write() method. In above case, the list of integers is converted
to bytes by bytearray() function and then written to file as below:
Example 6.5
>>>
>>>
>>>
>>>
>>>
numbers=[10,20,30,40]
data=bytearray(numbers)
file=open('numbers.txt','wb')
file.write(data)
file.close()

In [None]:
To read back data from the file in the respective data type, reverse
conversion needs to be done.
Example 6.4
>>> data=file.read()
>>> list(data)
[10, 20, 30, 40]
File object with ‘wb’ mode requires bytes object to be provided, as an
argument to write() method. In above case, the list of integers is converted
to bytes by bytearray() function and then written to file as below:
Example 6.5
>>>
>>>
>>>
>>>
>>>
numbers=[10,20,30,40]
data=bytearray(numbers)
file=open('numbers.txt','wb')
file.write(data)
file.close()

In [None]:
In case of user-defined class, attributes of its objects will have to be
converted to byte objects before writing to a disk file:
Example 6.6
>>> file=open('persons.txt','wb')
>>> file.write(p1.name.encode())
This type of manual conversion of objects in the string or byte format (and
vice versa) is very cumbersome and appears rather clunky. Python has better
solutions for this requirement. Several built-in modules are there to store and
retrieve a Python object directly to/from a file or byte string. A Python
object is said to be serialized when it is translated in a format from which it
can be reconstructed later when required. The serialized format can be stored
in a disk file, byte string or can be transmitted via network sockets. When
serialized data is brought back in a form identical to original, the mechanism
is called de-serialization.

In [None]:
Serialization formats, used by some built-in modules, are Python-specific,
whereas other modules use standard serialization protocols such as JSON,
XML, and so on. Pythonic term for serialization is pickling while de-
serialization is often referred to as unpickling in Python documentation.
Python-specific serialization/de-serialization is achieved by the built-in
pickle and shelve modules. Even though Python’s marshal module offers
similar functionality, it is primarily meant for internal use while reading and
writing pseudo-compiled versions of Python modules with .pyc extension
and is not recommended as a general persistence tool.
The serialized byte stream can optionally be written to a disk file. This is
called as object persistence. The File API discussed in the previous chapter
stores data persistently, but it is not in a serialized format. Python
serialization libraries, that we are going to explore in this chapter, are useful
for storing serialized object data to disk files.

In [None]:
6.1 pickle Module
The serialization format used by the pickle module, which is a part of
Python’s built-in module library, is very Python-specific. While this fact can
work as an advantage that it doesn’t face any restrictions by certain external
standards such as JSON format, it’s major disadvantage is that non-Pythonapplications may not be able to reconstruct ‘pickled’ objects. Also, the
pickle module is not considered secure when it comes to unpickling data
received from an unauthenticated or untrusted source.
The pickle module defines module-level dumps() function to obtain a byte
string ‘pickled’ representation of any Python object. It counterpart function
loads() reconstructs (‘unpickles’) the byte string identical Python object.

In [None]:
Following code snippet demonstrates the use of dumps() and loads()
functions:
Example 6.7
>>> import pickle
>>> numbers=[10,20,30,40]
>>> pickledata=pickle.dumps(numbers)
>>> pickledata
b'\x80\x03]q\x00(K\nK\x14K\x1eK(e.'
>>> #unpickled data
...
>>> unpickledata=pickle.loads(pickledata)
>>> unpickledata
[10, 20, 30, 40]
>>>
There are dump() and load() functions that respectively write pickled data
persistently to a file like object (which may be a disk file, a memory buffer
object, or a network socket object) having binary and write ‘wb’ mode
enabled, and reconstruct identical object from file like object having ‘rb’
permission.

In [None]:
Example 6.8
>>>
...
>>>
>>>
>>>
>>>
>>>
>>>
#pickle to file
import pickle
numbers=[10,20,30,40]
file=open('numbers.dat','wb')
pickle.dump(numbers, file)
file.close()
#unpickle from file...
>>> file=open('numbers.dat','rb')
>>> unpickledata=pickle.load(file)
>>> unpickledata
[10, 20, 30, 40]
>>>
Almost any type of Python object can be pickled. This includes built-in
types, built-in, and user-defined functions and objects of user-defined
classes.
The pickle module also provides object-oriented API as a substitute for
module-level dumps()/loads() and dump()/load() functions. The module
has a pickler class whose object can invoke dump() or dumps() method to
‘pickle’ an object. Conversely, the unpickler class defines load() and
loads() methods.

In [None]:
Following script has a person class whose objects are pickled in a file using
pickler class. Original objects are obtained by load() method of unpickler
class.
Example 6.9
from pickle import Pickler, Unpickler
class User:
def __init__(self,name, email, pw):
self.name=name
self.email=email
self.pw=pw
def __str__(self):
return ('Name: {} email: {} password: {}'. \
format(self.name, self.email, self.pw))
user1=User('Rajan', 'r123@gmail.com', 'rajan123')
user2=User('Sudheer', 's.11@gmail.com', 's_11')
print ('before pickling..')
print (user1)
print (user2)
file=open('users.dat','wb')
Pickler(file).dump(user1)
Pickler(file).dump(user2)file.close()
file=open('users.dat','rb')
obj1=Unpickler(file).load()
print ('unpickled objects')
print (obj1)
obj2=Unpickler(file).load()
print (obj2)

In [None]:
6.2 shelve Module
Serialization and persistence effected by functionality in this module depend
on the pickle storage format, although it is meant to deal with a dictionary
like object only and not with other Python objects. The shelve module
defines all important open() function that returns the ‘shelf’ object
representing the underlying disk file in which the ‘pickled’ dictionary object
is persistently stored.
Example 6.10
>>> import shelve
>>> obj=shelve.open('shelvetest')
In addition to filename, the open() function has two more optional
parameters. One is ‘flag’ which is by default set to ‘c’ indicating that the file
has read/write access. Other accepted values for flag parameter are ‘w’
(write only), ‘r’ (read only) and ‘n’ (new with read/write access). Second
optional parameter is ‘writeback’ whose default value is False. If this
parameter is set to True, any modification made to the shelf object will becached in the memory and will only be written to file on calling sync() or
close() methods, which might result in the process becoming slow.

In [None]:
Once a shelf object is declared, you can store key-value pair data to it.
However, the shelf object accepts only a string as the key. Value can be any
valid Python object.
Example 6.11
>>>
>>>
>>>
>>>
obj['name']='Virat Kohli'
obj['age']=29
obj['teams']=['India', 'IndiaU19', 'RCB', 'Delhi']
obj.close()

In [None]:
In the current working directory, a file named ‘shelvetest.dir’ will store the
above data. Since, the shelf is dictionary like object, it can invoke familiar
methods of built-in dict class. Using get() method, one can fetch value
associated with a certain key. Similarly, update() method can be used to
add/modify k-v pairs in shelf object.
Example 6.12
>>> obj.get('name')
'Virat Kohli'
>>> dct={'100s':64, '50s':69}
>>> obj.update(dct)
>>> dict(obj)
{'name':
'Virat
Kohli',
'age':
29,
'teams':
'IndiaU19', 'RCB', 'Delhi'], '100s': 64, '50s': 69}
['India',
The shelf object also returns views of keys, values, and items,same as the
built-in dictionary object.

In [None]:
Example 6.13
>>> keys=list(obj.keys())
>>> keys
['name', 'age', 'teams', '100s', '50s']
>>> values=list(obj.values())
>>> values['Virat Kohli', 29, ['India', 'IndiaU19', 'RCB', 'Delhi'], 64,
69]
>>> items=list(obj.items())
>>> items
[('name', 'Virat Kohli'), ('age', 29), ('teams', ['India',
'IndiaU19', 'RCB', 'Delhi']), ('100s', 64), ('50s', 69)]

In [None]:
6.3 dbm Modules
These modules in Python’s built-in library provide a generic dictionary like
interface to different variants of DBM style databases. These databases use
binary encoded string objects as key, as well as value. The dbm.gnu module
is an interface to the DBM library version as implemented by the GNU
project. On the other hand, dbm.ndbm module provides an interface to
UNIX nbdm implementation. Another module, dbm.dumb is also present
which is used as a fallback option in the event, other dbm implementations
are not found. This requires no external dependencies but is slower than
others.
Example 6.14
>>>
>>>
>>>
>>>
>>>
>>>
import dbm
db=dbm.open('mydbm.db','n')
db['title']='Introduction to Python'
db['publisher']='BPB'
db['year']='2019'
db.close()

In [None]:
As in the case of shelve database, user specified database name carries ‘.dir’
postfix. The dbm object’s whichdb() function tells which implementation of
dbm is available on current Python installation.
Example 6.15
>>> dbm.whichdb('mydbm.db')
'dbm.dumb'


In [None]:
The open() function allows mode these flags: ‘c’ to create a new database
with read/write permission, ‘r’ opens the database in read-only mode, ‘w’opens an existing database for writing, and ‘n’ flag always create a new
empty database with read/write permissions.
The dbm object is a dictionary like object, just as a shelf object. Hence, all
dictionary operations can be performed. The following code opens
‘mydbm.db’ with ‘r’ flag and iterates over the collection of key-value pairs.
Example 6.16
>>> db=dbm.open('mydbm.db','r')
>>> for k,v in db.items():
print (k,v)
b'title' : b'Introduction to Python'
b'publisher' : b'BPB'
b'year' : b'2019'

In [None]:
6.4 csv module
The Comma Separated Values (CSV) format is very widely used to import
and export data in spreadsheets and RDBMS tables. The csv module,
another built-in module in Python’s standard library, presents the
functionality to easily convert Python’s sequence object in CSV format and
write to a disk file. Conversely, data from CSV files is possible to be brought
in Python namespace. The reader and writer classes are defined in this
module that perform read/write operation on CSV files. In addition, this
module also has DictReader and DictWriter classes to work with Python’s
dictionary objects.
The object of writer class is obtained by the writer() constructor which
needs a file object having ‘w’ mode enabled. An optional ‘dialect’
parameter is given to specify the implementation type of CSV protocol,
which is by default ‘excel’ – the format preferred by MS Excel spreadsheet
software. We are now in a position to write one or more rows to the file
represented by the writer object.
Example 6.17
>>> import csv
>>>
data=[('TV','Samsung',25000),('Computer','Dell',40000),
('Mobile','Redmi',15000)]>>>
>>>
>>>
>>>
>>>
>>>
>>>
file=open('pricelist.csv','w', newline='')
obj=csv.writer(file)
#write single row
obj.writerow(data[0])
#write multiple rows
obj.writerows(data[1:])
file.close()

In [None]:
Note that, open() function needs newline='' parameter to correctly interpret
newlines inside quoted fields.The ‘pricelist.csv’ should be created in the
current working directory. Its contents can be verified by opening in any text
editor, as peryour choice.

In [None]:
The reader object, on the other, hand returns an iterator object of rows in the
CSV file. A simple for loop or next() function of an iterator can be used to
traverse all rows.
Example 6.18
>>> file=open('pricelist.csv','r', newline='')
>>> obj=csv.reader(file)
>>> for row in obj:
print (row)
['TV', 'Samsung', '25000']
['Computer', 'Dell', '40000']
['Mobile', 'Redmi', '15000']
>>>


In [None]:
The csv module offers powerful DictWriter and DictReader classes that
can deal with dictionary objects. DictWriter maps the sequence of
dictionary objects to rows in the CSV file. As always, the DictWriter
constructor needs a writable file object. It also needs a fieldnames parameter
whose value has to be a list of fields. These fields will be written as a first
row in the resultant CSV file. Let us convert a list of tuples, in the above
example, to list of dict objects and send it to csv format using DictWriter
object.

In [None]:
Example 6.19>>>
data=[{'product':'TV','brand':'Samsung','price':25000},
{'product':'Computer','brand':'Dell','price':40000},
{'product':'Mobile','brand':'Redmi','price':15000}]
>>> file=open('pricelist.csv','w',newline='')
>>> fields=data[0].keys()
>>> obj=csv.DictWriter(file,fields)

In [None]:
The DictWriter’s writeheader() method uses fieldnames parameter to write
header row in CSV file. Each row following the header contains the keys of
each dictionary item.
Example 6.20
>>> obj.writeheader()
>>> obj.writerows(data)
>>> file.close()


In [None]:
The resulting ‘pricelist.csv’ will show data, as follows:
Example 6.21
product,brand,price
TV,Samsung,25000
Computer,Dell,40000
Mobile,Redmi,15000
Reading rows in dictionary formation is easy as well. The DictReader
object is obtained from the source CSV file. The object stores strings in first
row in fieldnames attributes. A simple for loop can fetch subsequent rows.
However, each row returns an OrderedDict object. Use dict() function to
obtain a normal dictionary object out of each row.

In [None]:
Example 6.22
>>> file=open('pricelist.csv','r',newline='')
>>> obj=csv.DictReader(file)
>>> obj.fieldnames
['product', 'brand', 'price']
>>> for row in obj:
print (dict(row))
{'product': 'TV', 'brand': 'Samsung', 'price': '25000'}{'product': 'Computer', 'brand': 'Dell', 'price': '40000'}
{'product': 'Mobile', 'brand': 'Redmi', 'price': '15000'}

In [None]:
6.5 json Module
JavaScript Object Notation (JSON) is an easy to use lightweight data-
interchange format. It is a language-independent text format, supported by
many programming languages. This format is used for data exchange
between the web server and clients. Python’s json module, being a part of
Python’s standard distribution, provides serialization functionality similar to
the pickle module.
Syntactically speaking, json module offers identical functions for
serialization and de-serialization of Python objects. Module-level functions
dumps() and loads() convert Python data to its serialized string
representation and vice versa. The dump() function uses a file object to
persistently store serialized objects, whereas load() function reconstructs
original data from the file.
Notably, dumps() function uses additional argument sort_keys. Its default
value is False, but if set to True, JSON representation of Python’s dictionary
object holds keys in a sorted order.
Example 6.23
>>> import json
>>>
data=[{'product':'TV','brand':'Samsung','price':25000},
{'product':'Computer','brand':'Dell','price':40000},
{'product':'Mobile','brand':'Redmi','price':15000}]
>>> JString=json.dumps(data, sort_keys=True)
>>> JString
'[{"brand":
"Samsung",
"price":
25000,
"product":
"TV"},
{"brand": "Dell", "price": 40000, "product": "Computer"},
{"brand": "Redmi", "price": 15000, "product": "Mobile"}]'
The loads() function retrieves data in the original format.

In [None]:
Example 6.24
>>> string=json.loads(JString)
>>> string[{'brand':
'Samsung',
'price':
25000,
'product':
'TV'},
{'brand': 'Dell', 'price': 40000, 'product': 'Computer'},
{'brand': 'Redmi', 'price': 15000, 'product': 'Mobile'}]

In [None]:
To store/retrieve JSONed data to/from a disk file, use
functions respectively.
dump()
and
load()
Example 6.25
>>>
>>>
>>>
>>>
file=open('json.txt','w')
json.dump(data,file)
file=open('json.txt','r')
data=json.load(file)
Python’s built-in types are easily serialized in JSON format. Conversion is
done, as per the corresponding table below: (table 6.1)

In [None]:
Table 6.1 Python to JSON
Python JSON
Dict Object
list, tuple Array
Str String
int, float, int- & float-derived Enums Number
True True
False False
None Null
However, converting object of a custom class is a little tricky. The json
module defines JSONEndoder and JSONDecoder classes. We need to subclass
them to perform encoding and decoding of objects of user-defined classes.

In [None]:
Following script defines a User class and an encoder class inherited from the
JSONEncoder class. This subclass overrides the abstract default() method
to return a serializable version of User class which can further be encoded.Example 6.26
import json
class User:
def __init__(self,name, email, pw):
self.name=name
self.email=email
self.pw=pw
def __str__(self):
return ('Name: {} email: {} password: {}'. \
format(self.name, self.email, self.pw))
class UserEncoder(json.JSONEncoder):
def default(self, z):
if isinstance(z, User):
return (z.name, z.email, z.pw)
else:
super().default(self, z)
user1=User('Rajan','R@a.com','**')
encoder=UserEncoder()
obj=encoder.encode(user1)
file=open('jsonOO.txt','w')
json.dump(obj, file)
file.close()

In [None]:
To obtain the original object, we need to use a subclass of JSONDecoder . The
subclass should have a method that is assigned a value of object_hook
parameter. This method will be internally called when an object is sought to
be decoded.
Example 6.27
import json
class UserDecoder(json.JSONDecoder):
def __init__(self):
json.JSONDecoder.__init__(self,object_hook=self
.hook)
def hook(self,obj):
return dict(obj)
decoder=UserDecoder()file=open('jsonclass.txt','r')
retobj=json.load(file)
print (decoder.decode(retobj))
The json.tool module also has a command-line interface that validates data
in string or file and delivers nice formatted output of JSON objects.

In [None]:
Assuming that the current working directory has ‘file.txt’ that contains text
in JSON format, as below:
Example 6.28
{"name": "Rajan", "email": "r@a.com", "pw": "**"}
The following command produces pretty print output of the above string.
It accepts a
order.
–sort-keys
command line option to display keys in ascending

In [None]:
6.6 xml Package
XML is another well-known data interchange format, used by a large
number of applications. One of the main features of eXtensible Markup
Language (XML) is that its format is both human readable and humanreadable. XML is widely used by applications of web services, office tools,
and Service Oriented Architectures (SOA).
Standard Python library’s xml package consists of modules for XML
processing, as per different models. In this section, we discuss the
ElementTree module that provides a simple and lightweight API for XML
processing.

In [None]:
The XML document is arranged in a tree-like hierarchical format. The
document tree comprises of elements. Each element is a single node in the
tree and has an attribute enclosed in <> and </> tags. Each element may
have one or more sub-elements following the same structure.
A typical XML document appears, as follows:
Example 6.29
<?xml version="1.0" encoding="iso-8859-1"?>
<pricelist>
<product>
<name>TV</name>
<brand>Samsung</brand>
<price>25000</price>
</product>
<product>
<name>Computer</name>
<brand>Dell</brand>
<price>40000</price>
</product>
<product>
<name>Mobile</name>
<brand>Redmi</brand>
<price>15000</price>
</product>
</pricelist>
The elementTree module’s class structure also has Element and SubElement
objects. Each Element has a tag and attrib which is a dict object. For the root
element, an attrib is an empty dictionary.

In [None]:
Example 6.30>>> import xml.etree.ElementTree as xmlobj
>>> root=xmlobj.Element('PriceList')
>>> root.tag
'PriceList'
>>> root.attrib
{}
Now, we can add one or more nodes, i.e., elements under root. Each Element
object may have SubElements, each having an attribute and text property.
Let us setup ‘product’ element and ‘name’, ‘brand’, and ‘price’ as its sub
elements.
Example 6.31
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
product=xmlobj.Element('Product')
nm=xmlobj.SubElement(product, 'name')
nm.text='name'
brand=xmlobj.SubElement(product, 'brand')
nm.text='TV'
brand.text='Samsung'
price=xmlobj.SubElement(product, 'price')
price.text='25000'
The root node has append() method to add this node to it.
Example 6.32
>>> root.append(product)
Construct a tree from this root object and write its contents to the XML file.

In [None]:
Example 6.33
>>>
>>>
>>>
>>>
tree=xmlobj.ElementTree(root)
file=open('pricelist.xml','wb')
tree.write(file)
file.close()
The ‘pricelist.xml’ should be visible in the current working directory. The
following script writes a list of dictionary objects to the XML file:Example 6.34
import xml.etree.ElementTree as xmlobj
root=xmlobj.Element('PriceList')
pricelist=[{'name':'TV','brand':'Samsung','price':'25000'},
{'name':'Computer','brand':'Dell','price':'40000'},
{'name':'Mobile','brand':'Redmi','price':'15000'}]
i=0
for row in pricelist:
i=i+1
print (i)
element=xmlobj.Element('Product'+str(i))
for k,v in row.items():
sub=xmlobj.SubElement(element, k)
sub.text=v
root.append(element)
tree=xmlobj.ElementTree(root)
file=open('pricelist.xml','wb')
tree.write(file)
file.close()

In [None]:
To parse the XML file, construct document tree giving its name as file
parameter in ElementTree constructor.
Example 6.35
import xml.etree.ElementTree as xmlobj
tree = xmlobj.ElementTree(file='pricelist.xml')
The getroot() method of tree object fetches root element and getchildren()
returns a list of elements below it.
Example 6.36
root = tree.getroot()
children = root.getchildren()


In [None]:
We can now construct a dictionary object corresponding to each subelement
by iterating over sub-element collection of each child node.
Example 6.37for child in children:
product={}
pairs = child.getchildren()
for pair in pairs:
product[pair.tag]=pair.text
Each dictionary is then appended to a list returning original list of dictionary
objects. Complete code parsing XML file into a list of dictionaries is as
follows:
Example 6.38
import xml.etree.ElementTree as xmlobj
tree = xmlobj.ElementTree(file='pricelist.xml')
root = tree.getroot()
products=[]
children = root.getchildren()
for child in children:
product={}
pairs = child.getchildren()
for pair in pairs:
product[pair.tag]=pair.text
products.append(product)
print (products)
Save above script from ‘xmlreader.py’ and run it from command line:
Of other modules in xml package, xml.dom implements document object
model of XML format and xml.sax defines functionality to implement SAX
model.

In [None]:
6.7 plistlib ModuleLastly, we have a look at plist module that used to read and write ‘property
list’ files (they usually have .plist’ extension). This type of file is mainly
used by MAC OS X. These files are essentially XML documents, typically
used to store and retrieves properties of an object.
The functionality of plistlib module is more or less similar to other
serialization libraries. It defines dumps() and loads() functions for string
representation of Python objects. The load() and dump() functions read and
write plist disk files.

In [None]:
The following script stores a dict object to a plist file.
Example 6.39
import plistlib
proplist = {
"name" : "Ramesh",
"class":"XII",
"div":"B",
"marks" : {"phy":50, "che":60, "maths":80}
}
fileName=open('marks.plist','wb')
plistlib.dump(proplist, fileName)
fileName.close()
The load() function retrieves an identical dictionary object from the file.
Example 6.40
with open('marks.plist', 'rb') as fp:
pl = plistlib.load(fp)
print(pl)
Another important data persistence library in Python is the sqlite3 module.
It deals with read/write operations on the SQLite relational database. Before
we explore its functionality, let us get acquainted with RDBMS concepts and
basics of SQL, which is the next chapter.