<center>
  <a href="10.1-Working-with-text-files.ipynb">Previous Page</a> | <a href="./">Content Page</a> | <a href="12.1-working-with-web-data.ipynb">Next Page</a></center>
</center>

# 1.11 Working with databases

Now that we know how to access data from text files, let's explore how to work with databases and use loops to iterate through lists. While CSV files are easy to access, they have a lot of limitations. One such limitation is that there can be a problem with loading large CSV files into memory. Databases are designed to handle large sets of data easily.

We will be working with the same data set from https://data.gov.sg, the `Number of Parcels cleared at the Parcel Post Centre` data set. 

This data set has already been imported as a `sqlite3` database named `parcels.db` in the same directory as this notebook.

To interact with `sqlite3` databases, we use a language called `SQL` or Structured Query Language. We use SQL to query and modify the data in a database. SQL us the most common language for working with databases and an important tool in any data professional's toolkit.

Let's explore how we can use SQL with work with the data.

In order to work with a `sqlite3` database, we need to import the `sqlite3` module. Let's do that now.

In [1]:
import sqlite3

To read the database, we need to create a `Connection` object that represents the database. Let's do that now.

In [2]:
conn = sqlite3.connect("parcels.db")

# let's see what the conn object is
conn

<sqlite3.Connection at 0x7f770509c990>

Once we have a `Connection` object, we can create a `Cursor` object that points to the data in the database and call the `.execute()` method to perform SQL commands:

In [3]:
c = conn.cursor()

# use the cursor object to do a SELECT command from the parcels table
c.execute('SELECT * from parcels')

<sqlite3.Cursor at 0x7f7705096880>

Let's print out a row from the data set with the `.fetchone()` method.

In [4]:
print(c.fetchone())

('2004', 1175900)


We can see that a row data is comprised of a `Tuple` object. In this case, there are two elements in the tuple object above, the year (a String object) and the parcels count for that year (in an integer)

We can also print all the rows in the parcels table.

In [5]:
# Let's iterate via a for loop and print all the row data
for row in c.execute('SELECT * from parcels'):
    print(row)

('2004', 1175900)
('2005', 1148900)
('2006', 1190384)
('2007', 1346800)
('2008', 1432700)
('2009', 1527400)
('2010', 1564000)
('2011', 2176900)
('2012', 2602700)
('2013', 2987800)
('2014', 3646600)
('2015', 4829700)
('2016', 5162576)


Not much different from when reading in the CSV file previously. What other things can we do with the row data?

Let's say we want to count the total number number of parcels in the data set. How would you do this?

In [6]:
total_parcels = 0
for row in c.execute('SELECT * from parcels'):
    year, parcels_count = row
    total_parcels = total_parcels + parcels_count
    
print("The total number of parcels is " + str(total_parcels))

The total number of parcels is 30792360


We can also modify and add new data. For more information, please refer to the `sqlite3` documentation (https://docs.python.org/3/library/sqlite3.html)

<center>
  <a href="10.1-Working-with-text-files.ipynb">Previous Page</a> | <a href="./">Content Page</a> | <a href="12.1-working-with-web-data.ipynb">Next Page</a></center>
</center>