In [None]:
CHAPTER 7 RDBMS Concepts
----------------------------------------------------------------------------------------------------------------
Previous chapter discussed various tools offered by Python for data
persistence. While the built-in file object can perform basic read/write
operations with a disk file, other built-in modules such as pickle and shelve
enable storage and retrieval of serialized data to/from disk files. We also
explored Python libraries that handle well-known data storage formats like
CSV, JSON, and XML.

In [None]:
7.1 Drawbacks of Flat File
However, files created using the above libraries are flat. They are hardly
useful when it comes to real-time, random access and in-place updates in
them. Also, files are largely unstructured. Although CSV files do have a
field header, the comma-delimited nature of data makes it very difficult to
modify the contents of a certain field in a particular row. The only alternative
remains, is to read the file in a Python object such as a dictionary,
manipulate its contents and rewrite it after truncating the file.This approach
is not feasible especially for large files as it may become time-consuming
and cumbersome.


In [None]:
Even if we keep this issue of in-place modification of file aside for a while,
there is another problem of providing concurrent r/w access to multiple
applications. This may be required in the client-server environment. None of
the persistence libraries of Python have built-in support to asynchronous
handling of files. If required, we have to rely upon locking features of the
operating system itself.

In [None]:
Another problem that may arise is that of data redundancy and inconsistency.
This arises primarily out of the unstructured nature of data files. The term
‘redundancy’ refers to the repetition of same data more than one times
while describing the collection of records in a file. The first row of a typical
CSV file defines the column headings, often called as fields and subsequent
rows are records.

In [None]:
Following table 7.1 shows a ‘pricelist.csv’ represented in the form of a table.
Popular wordprocessors (MS Word, OpenOffice Writer) and spreadsheet
programs (MS Excel, OpenOffice Calc) have this feature of converting text
delimited by comma or any other character to a table.
Table 7.1 Pricelist.csv
InvNo CustomerName Product Price
Quantity
total
1 Ravikumar Laptop 25000 2 50000
2 John TV 40000 1 40000
3 Divya Laptop 25000 1 25000
4 Divya Mobile 15000 3 45000
5 John Mobile 15000 2 30000
6 Ravikumar TV 40000 1 40000

In [None]:
As we can see, data items such as customer’s name, product’s name, and
price are appearing repeatedly in the rows. This can lead to two issues: One,
a manual error such as spelling or maintaining correct upper/lower case can
creep up. Secondly, change in the value of a certain data item needs to reflect
at its all occurrences, failing which it may lead to a discrepancy. For
example, if the price of TV goes up to 45000, price and total columns in
invoice numbers 2 and 6 should be updated. Otherwise, there will be
inconsistency in further processing of data. These problems can be overcome
by using a relational database.

In [None]:
7.2 Relational Database
The term ‘database’ refers to an organized collection of data so as to
remove redundancy and inconsistency, and to ensure data integrity. Over the
years, different database models have been in use. Early days of computing
observed the use of hierarchical and network database models. Soon, they
were replaced by the relational database model, which is still used very
predominantly. Last 10-15 years have seen emergence of NOSQL databases
like MongoDB and Cassandra.

In [None]:
The relational database model, proposed by Edgar Codd in 1970, aims to
arrange data according to the entities. Each entity is represented by a table
(called relation). You can think of the entity as a class. Just as a class, an
entity is characterized by attributes (also called fields, in the database
terminology) that form columns of the table. Each instance of the entity is
described in subsequent rows, below the heading row. The entity table
structure provides one attribute whose value is unique for each row. Such an
attribute is called ‘primary key’.
If we analyze the pricelist example above, it involves three entities,
Customers, Products, and Invoices. We have prepared three tables
representing them, as follows: (Figure 7.1)

In [None]:
The important aspect of relational database design is to establish a
relationship between tables. In the three tables above, the attributes
‘prodID’, ‘CustID’, and ‘InvNo’ are primary keys in products, customers
and invoices tables respectively.
Further, structure of the ‘invoices’ table uses ‘CustID’ and ‘ProductID’
attributes which are the primary keys of other two tables. When primary key
of one table appears in the structure of other tables, it is called ‘Foreign key’
and this forms the basis of the relationship between the two.


In [None]:
This approach of database design has two distinct advantages. Firstly, using
the relationship between primary and foreign key, details of the
corresponding row can be fetched without repetition. For example,
‘invoices’ table has ‘ProdID’ foreign key which is the primary key in the
‘Products’ table, hence the ‘name’ and ‘price’ attributes can be fetched usingthis relationship. The same is true about ‘CustID’ which appears as the
foreign key in ‘invoices’ and is the primary key in the ‘customers’ table. We
can thus reconstruct the original pricelist table by using relationships.

In [None]:
Secondly, you need not make any changes in the ‘invoices’ table, if either
name of product or price changes, change in ‘Products’ table will
automatically reflect in all rows of invoices table because of the primary-
foreign key relationship. Also, the database engine won’t allow deleting a
certain row in customers or products table, if its primary key is being used as
foreign keys in the invoices table. This ensures data integrity.
Software products based on this relational model are popularly called as
Relational DataBase Systems (RDBMS). Some of the renowned RDBMS
brands are Oracle, MySQL, MS SQL Server, Postgre SQL, DB2, SQLite,
etc.

In [None]:
7.3 RDBMS ProductsRelational Software Inc. (now Oracle Corp) developed its first SQL based
RDBMS software called Oracle V2. IBM introduced System-R as its
RDBMS product in 1974 and followed it by a very successful DB2 product.
Microsoft released SQL Server for Windows NT in 1994. Newer versions
of MS SQL server are integrated with Microsoft’s .NET Framework.
SAP is an enterprise-level RDBMS product targeted towards UNIX based
systems being marketed as Enterprise Resource Planning (ERP) product.
An open-source RDBMS product named as MySQL, developed by a
Swedish company MySQL AB, was later acquired by Sun Microsystems,
which in turn has, now, been acquired by Oracle Corporation. Being an
open-source product, MySQL is a highly popular choice, after Oracle.
MS Access, shipped with Microsoft Office suite, is widely used in small-
scale projects. The entire database is stored in a single file and, hence, is
easily portable. It provides excellent GUI tools to design tables, queries,
forms, and reports.


In [None]:
PostgreSQL is also an open-source object-oriented RDBMS, which has
evolved from the Ingres project of the University of California, Berkley. It is
available for use on diverse operating system platforms and SQL
implementation is supposed to be closest to SQL standard.
SQLite is a very popular relational database used in a wide variety of
applications. Unlike other databases like Oracle, MySQL, etc., SQLite is a
transactional SQL database engine that is self-contained and serverless. As
its official documentation describes, it is a self-contained, serverless, zero-
configuration, transactional SQL database engine.The entire database is a
single file that can be placed anywhere in the file system.


In [None]:
SQLite was developed by D. Richard Hipp in 2000. Its current version is
3.27.2. It is fully ACID compliant which ensures that transactions are
atomic, consistent, isolated, and durable.
Because of its open-source nature, very small footprint, and zero
configuration, SQLite databases are popularly used in embedded devices,
IOT and mobile apps. Many web browsers and operating systems also use
SQLite database for internal use. It is also used as a prototyping and demo of
larger enterprise RDBMS.
Despite being very lightweight, it is a full-featured SQL implementation
with all the advanced capabilities. SQLite database can be interfaced withmost of the mainstream languages like C/C++, Java, PHP, etc. Python’s
standard library contains the sqlite3 module. It provides all the functionality
for interfacing Python program with the SQLite database.

In [None]:
7.4 SQLite Installation
Installation of SQLite is simple and straightforward. It doesn’t need any
elaborate installation. The entire application is a self-contained executable
‘sqlite3.exe’. Official website of SQLite, (https://sqlite.org/download.html)
provides pre-compiled binaries for various operating system platforms
containing the command line shell bundled with other utilities. All you have
to do is download a zip archive of SQLite command-line tools, unzip to a
suitable location and invoke sqlite3.exe from DOS prompt by putting name
of the database you want to open.
If already existing, the SqLite3 database engine will connect to it; otherwise,
a new database will be created. If the name is omitted, an in-memory
transient database will open. Let us ask SQLite to open a new
mydatabase.sqlite3.

In [None]:
In the command window a sqlite prompt appears before which any SQL
query can be executed. In addition, there “dot commands” (beginning with a
dot “.”) typically used to change the output format of queries, or to execute
certain prepackaged query statements.
An existing database can also be opened using .open command.The first step is to create a table in the database. As mentioned above, we
need to define its structure specifying name of the column and its data type.

In [None]:
7.5 SQLite Data Types
ANSI SQL defines generic data types, which are implemented by various
RDBMS products with a few variations on their own. Most of the SQL
database engines (Oracle, MySQL, SQL Server, etc.) use static typing.
SQLite, on the other hand, uses a more general dynamic type system. Each
value stored inSQLite database (or manipulated by the database engine) has
one of the following storage classes:
NULL
INTEGER
REAL
TEXT
BLOB


In [None]:
A storage class is more general than a datatype. These storage classes are
mapped to standard SQL data types. For example, INTEGER in SQLite has
a type affinity with all integer types such as int, smallint, bigint, tinyint,
etc. Similarly REAL in SQLite has a type affinity with float and double data
type. Standard SQL data types such as varchar, char,nchar, etc. are
equivalent to TEXT in SQLite.
SQL as a language consists of many declarative statements that perform
various operations on databases and tables. These statements are popularly
called queries. CREATE TABLE query defines table structure using the
above data types.

In [None]:
7.6 CREATE TABLE
This statement is used to create a new table, specifying following details:
Name of new table
Names of columns (fields) in the desired table
Type, width, and the default value of each column.
Optional constraints on columns (PRIMARY KEY, NOT NULL,
FOREIGN KEY)

In [None]:
Example 7.1
CREATE TABLE table_name (
column1 datatype [width] [default] [constraint],
column2 ....,
column3 ...,
....
);

In [None]:
7.7 Constraints
Constraints enforce restrictions on data that a column can contain. They help
in maintaining the integrity and reliability of data in the table. Following
clauses are used in the definition of one or more columns of a table to
enforce constraints:
PRIMARY KEY: Only one column in a table can be defined to be a
primary key. The value of this table will uniquely identify each row (a
record) in the table. The primary key can be set to AUTOINCREMENT if its
type is INTEGER. In that case, its value need not be manually filled.


In [None]:
NOT NULL: By default value for any column in a row can be left as null.
NOT NULL constraint ensures that while filling a new row in the table or
updating an existing row, the contents of specified columns are not allowed
to be null. In the above definition, to ensure that the ‘name’ column must
have a certain value, NOT NULL constraint is applied to it.
FOREIGN KEY: This constraint is used to enforce ‘exists’ relationship
between two tables.


In [None]:
Let us create a Products table in ‘mydatabase’ that we created above. As
shown in the Figure 7.1, diagram, the ‘products’ table consists of ProductID,
Name, and Price columns, with ProductID as its primary key.
(Ensure that the SQL statement ends with a semi-colon. You may span one
statement over multiple lines in the console)
We also create another ‘Customers’ table in the same database with CustID
and Name fields. CustID field should be defined as the primary key.

In [None]:
Finally, we create another ‘Invoices’ table. As shown in the Figure 7.1
diagram, this table has InvID as primary key and two foreign key columns
referring to ProductID in ‘Products’ table and CustID in ‘Customers’ table.
The ‘Invoices’ table also contains the ‘price’ column.

In [None]:
To confirm that our tables have been successfully created, use .tables
command:
SQLite stores schema of all databases in the SQLITE_MASTER table. We
can fetch names of our databases and tables with following command:To terminate current session of SQLite3
activity use .quit command.

In [None]:
7.8 INSERT Statement
Now that we have created tables in our database, let us add few records in
them. SQL provides an INSERT statement for the purpose. Itsstandard
syntax is as follows:
Example 7.2
INSERT INTO tablename (col1, col2, ...) VALUES (val1, val2, val3,
...);
Name of the table in which a new record (row) is to be added, follows
mandatory keywords INSERT INTO. The column list is given after the name
in parentheses, which is followed by the VALUES clause. The data
corresponding to each column is given in another set of parentheses.
Following statement adds one record in Products table:
We insert a row in ‘Customers’ table by executing the following statement in
SQLite console:

In [None]:
Similarly, the following statement adds a record in ‘Invoices’ table:
Note that, in the above INSERT statements, we have not included
ProductID, CustID, and InvID columns in respective column lists
parentheses because they have been defined as autoincrement fields. Thecolumn list may be omitted altogether if you intend to provide values for all
columns in the table (excluding autoincrement fields). They must be given in
the VALUES list exactly in the same order in which their fields have been
defined.

In [None]:
You may add a few more records in these three tables. Sample data for these
tables is given below: (table 7.3, table 7.4, and table 7.5)
Table 7.3 ProductsTable
ProductID Name Price
1 Laptop 25000
2 TV 40000
3 Router 2000
4 Scanner 5000
5 Printer 9000
6 Mobile 15000

In [None]:
Table 7.4 Customers Table:
CustID Name GSTIN
1 Ravikumar 27AAJPL7103N1ZF
2 Patel 24ASDFG1234N1ZN
3 Nitin 27AABBC7895N1ZT
4 Nair 32MMAF8963N1ZK
5 Shah 24BADEF2002N1ZB
6 Khurana 07KABCS1002N1ZV
7 Irfan 05IIAAV5103N1ZA
8 Kiran 12PPSDF22431ZC
9 Divya 15ABCDE1101N1ZA10
John
29AAEEC4258E1ZK

In [None]:
Table 7.5 Invoices Table
InvID CustID ProductID Quantity
1 1 1 2
2 10 2 1
3 9 6 3
4 4 1 6
5 10 5 3
6 2 2 5
7 2 1 4
8 5 3 10
9 7 5 2
10 3 4 3

In [None]:
7.9 SELECT Statement
This is one of the most frequently used SQL statements. The purpose of
SELECT statement is to fetch data from a database table and return in the
form of a result set. In its simplest form SELECT statement is used as
follows:
Example 7.3
SELECT col1, col2, .., coln FROM table_name;
SQLite console displays data from the named table for all rows in specified
columns. SQLite console offers two useful ‘dot’ commands for a neat and
formatted output of the SELECT statement. The ‘.header on’ command will
display the column names as the header of output. The ‘.mode column’
command will force left alignment of data in columns.

In [None]:
You can use ‘*’ wild card character to indicate all columns in the table.
The ORDER BY clause lists selected rows according to ascending order of
data in specified column. Following statement displays records in the
Products table in ascending order of price.

In [None]:
To enforce descending order, attach ‘DESC’ to the ORDER BY clause.
You can apply filter on selection of rows by using the WHERE clause. The
WHERE keyword is followed by a logical condition having logical operators
(<, >, <=, >=, =, IN, LIKE, etc.). In the following example, only those rows
will be selected for which value of the ‘price’ column is less than 10000.

In [None]:
A big advantage of the relational model comes through when data from two
related tables can be fetched. In our ‘Invoices’ table, we have ProductID asone of the columns that is a primary key of the ‘Products’ table. The
following example uses WHERE clause to join two tables - Invoices and
Products - and fetch data from them in a single SELECT statement.

In [None]:
It is also possible to generate a calculated column depending on some
operation on other columns. Any column heading can also be given an alias
name using AS keyword.
Following SELECT statement displays Total column which is
Products.Price*Quantity. The column shows values of this expression is
named AS Total.

In [None]:
7.10 UPDATE Statement
It is possible to modify data of a certain field in given table using the
UPDATE statement. The usage syntax of the UPDATE query is as follows:
Example 7.4
UPDATE table_name SET col1=val1, col2=val2,.., colN=valN WHERE
[expression];
Note that,the WHERE clause is not mandatory when executing the UPDATE
statement. However, you would normally want to modify only those records
satisfying a certain condition. If the WHERE clause is not specified, all
records will be modified.
For example, the following statement changes the price of ‘Printer’ to
10000.However, if you want to increase the price of each product by 10 percent,
you don’t have to specify the WHERE clause.

In [None]:
7.11 DELETE Statement
If you need to remove one or more records from a certain table, use the
DELETE statement. General syntax of DELETE query is as under:
Example 7.5
DELETE FROM table_name WHERE [condition];In most circumstances, the WHERE clause should be specified unless you
intend to remove all records from the table. The following statement will
remove those records from the Invoices table having Quantity>5.

In [None]:
7.12. ALTER TABLE statement
On many occasions, you may want to make changes in a table’s structure.
This can be done by the ALTER TABLE statement. It is possible to change
the name of a table or a column, or add a new column in the table.
Following statement adds a new column in ‘Customers’ table:

In [None]:
7.13 DROP TABLE StatementThis statement will remove the specified table from the database. If you try
to drop a non-existing table, the SQLite engine shows an error.
When ‘IF EXISTS’ option is used, the named table will be deleted only if
exists and the statement will be ignored if it doesn’t exist.

In [None]:
7.14 Transaction Control
As mentioned above, SQLite is a transactional database and all transactions
are ACID compliant. ACID stands for Atomic, Consistent, Isolated and
Durable. As a result, it ensures that the SQLite database doesn’t lose
integrity, even if transaction such as INSERT, DELETE, or UPDATE, is
interrupted because of any reason whatsoever.
A transaction is the propagation of changes to the database. The operation
performed by INSERT, UPDATE or DELETE statement results in a
transaction.
Atomicity: When we say that a transaction should be atomic, it means that a
change cannot be effected in parts. Either the entire transaction is applied or
not applied.
Consistency: After any transaction is completed, the database should hold
on to the changes in its state.
Isolation: It must be ensured that the transaction such as INSERT,
UPDATE, or DELETE, performed by a client should only be visible to other
clients after successful completion.
Durability: Result of successfully committed transactions must be
permanent in the database regardless of the condition such as power failure
or program crash.
SQLite provides two statements for transaction control. They are COMMIT
and ROLLBACK. All CRUD (CREATE, RETRIEVE, UPDATE and
DELETE) operations first take effect in memory and then they arepermanently saved (committed) to the disk file. SQLite transactions are
automatically committed without giving any chance to undo (rollback) the
changes.


In [None]:
To control the commitment and rolling back manually, start transactions after
issuing the directive BEGIN TRANSACTION. Whatever operations done
thereafter will not be confirmed,until COMMIT is issued and will be
annulled if ROLLBACK is issued.

In [None]:
In the above example, the price of ‘Router’ is initially 2200. It was changed
to 2000 but rolled back. Hence its earlier value is restored. Followingexample shows effect of commit statement where the effect of UPDATE
statement is confirmed.

In [None]:
7.15 MySQL
So far we have learned how some basic SQL operations are performed over
a relational database using SQLite console. Similar console driven
interaction is possible with other RDBMS products. MySQL console works
more or less similar (barring certain syntactical differences) to the SQLite
console we’ve used in this chapter. Following piece of code shows a sample
MySQL console session:MS SQL Server also has a console based frontend called SQLCMD which
also works similarly. Command-line interface of Oracle is called SQL*Plus.As far as PostgreSQL is concerned, its primary command-line interface is
psql program.
All the RDBMS products also provide GUI based environments to perform
various SQL related operations instead of command-line actions. Oracle’s
SQL Developer, Microsoft’s SQL Server management studio, pgAdmin
for PostgreSQL, and Workbench for MySQL are respective examples. SQL
server client is integrated with Visual Studio which helps the user to perform
database operations graphically. MySQL module is shipped with various
web server software bundles (for example, LAMP, XAMPP, etc.), providing
a web-based interface called PhpMyAdmin. (Figure 7.2)

In [None]:
Although SQLite doesn’t provide its own GUI tool for database
management, many third-party tools are available. One such utility is
SQLiteStudio that is very popularly used.

In [None]:
7.16 SQLiteStudio
SQLiteStudio is an open-source software from https://sqlitestudio.pl. It is
portable, which means it can be directly run without having to install. It is
powerful, fast and yet very light. You can perform CRUD operations on a
database using GUI as well as by writing SQL queries.Download and unpack zip archive of the latest version for Windows from
the downloads page. Run SQLiteStudio.exe to launch the SqliteStudio. It’s
opening GUI appears as follows: (Figure 7.3)
Figure 7.3 SQLiteStudio GUI


In [None]:
Currently attached databases appear as expandable nodes in the left column.
Click any one to select and the ‘Tables’ sub-node shows tables in the
selected database. On the right, there is a tabbed pane. The first active tab
shows structure of the selected table and the second tab shows its data. The
structure, as well as data, can be modified. Right-click on the Tables sub
node on the left or use the Structure menu to add a new table. User-friendly
buttons are provided in the Structure tab and data tab to insert/modify
column/row, commit or rollback transactions.
This concludes the current chapter on RDBMS concepts with a focus on the
SQLite database. As mentioned in the beginning, this is not a complete
tutorial on SQLite but a quick hands-on experience of interacting with
SQLite database to understand Python’s interaction with databases with DB-
API that is the subject of next chapter.