# seed.ipynb library module

### Parse a csv-formatted file to seed an Sqlite database table.

**Part of [Building a Desktop Database Application](./Building a Desktop Database Application.ipynb)**

**Developer: [David Schenck](https://github.com/zero2cx/)**

**App 5 - [The Python Mega Course](https://www.udemy.com/the-python-mega-course/) (Course Creator & Facilitator: [Ardit Sulce](http://pythonhow.com/author))**

Note: Python 3.6 or higher is required to execute this notebook application.

In [1]:
%reload_ext autoreload
%autoreload 2

In [2]:
import os
import csv

Import the notebook that contains the **Database** class.

In [3]:
%%capture
%run db.ipynb

### Function definition: _assign_column_types()

Read the records of a csv-formatted file.

Apply an appropriate Sqlite data-type for each column of the seed data.

In [4]:
def _assign_column_types(records):
    
    '''Build a list of lists that makes note of the data type of each field
    within each data record.'''
    data_types = []
    for i in range(len(records[0])):
        data_types.append([])
    for record in records:
        for i in range(len(data_types)):
            try:
                int(record[i])
            except ValueError:
                try:
                    float(record[i])
                except:
                    data_types[i].append('TEXT')
                else:
                    data_types[i].append('REAL')
            else:
                data_types[i].append('INTEGER')
                
    '''Assign types to the table columns to match the parsed data types from the
    above list-of-lists.'''
    column_types = []
    for i in range(len(data_types)):
        if 'REAL' in data_types[i]:
            column_types.append('REAL')
        elif 'INTEGER' in data_types[i]:
            column_types.append('INTEGER')
        else:
            column_types.append('TEXT')
            
    return column_types

### Function definition: _get_seed_data()

Read a csv-formatted file.

Return a 2-tuple that contains a list of database-table column names and a list of data records.

In [5]:
def _get_seed_data(seed_file):

    rows = []
    with open(file=seed_file, newline='') as file:
        reader = csv.reader(file)
        for row in reader:
            rows.append(row)
    return rows.pop(0), rows

### Function definition: seed_database()

Read the contents of a csv-formatted file that contains seed data.

Populate a new Sqlite database and save to file. If a file already exists, then do a destructive overwrite with the new database file.

In [6]:
def seed_database(path, name):

    '''Fetch the seed data including column names from the seed file.'''
    column_names, records = _get_seed_data(f'{path}/{name}.csv')
    if not column_names:
        return 1
    
    '''Parse each data record, and designate the appropriate data type for each
    table column.'''
    columns = dict(zip(column_names, _assign_column_types(records=records)))
    
    '''Delete the old database file, if exists.'''
    db_file = f'./{path}/{name}.db'
    if os.path.isfile(path=db_file):
        os.remove(db_file)
    
    '''Create an empty database file and connect to it. Add seed data to it,
    then close the connection.'''
    db = Database(path=path, name=name, **columns)
    for record in records:
        db.add_record(record=record)
    db.close()
                               
### TODO: extend flexibility for using a variety of seed data field-seperators.
### TODO: maybe could write a data pre-processor to standardize sloppy seed data. DIFFICULT!!!
### TODO: add web-scraping as a source of seed data.

### Example usage: _assign_column_types() function

Examine each field of each record for three database tables.

Determine the appropriate column-type for all columns in each table.

In [7]:
csv_files = ['garden', 'books', 'ledzep']

for file in csv_files:
    
    column_names, records = _get_seed_data(f'./data/{file}.csv')
    columns = dict(zip(column_names, _assign_column_types(records=records)))

    print(f'{file}:')
    print(f'column names/types {columns}')

garden:
column names/types {'NAME': 'TEXT', 'SPECIES': 'TEXT', 'TYPE': 'TEXT', 'SIZE': 'TEXT', 'HABITAT': 'TEXT'}
books:
column names/types {'TITLE': 'TEXT', 'AUTHOR': 'TEXT', 'YEAR': 'INTEGER', 'ISBN': 'INTEGER'}
ledzep:
column names/types {'ALBUM_TITLE': 'TEXT', 'RELEASE_DATE': 'TEXT', 'LABEL': 'TEXT', 'CATEGORY': 'TEXT'}


### Example usage: _get_seed_data() function

Fetch the seed data from three seed files.

In [8]:
csv_files = ['garden', 'books', 'ledzep']

for file in csv_files:
    print(f'{file}:')
    display(_get_seed_data(seed_file=f'./data/{file}.csv'))

garden:


(['NAME', 'SPECIES', 'TYPE', 'SIZE', 'HABITAT'],
 [['Bugleweed',
   'Ajuga reptans',
   'Mint',
   '4 to 14 inches tall',
   'Woods and rough pastures'],
  ['Scorpion Orchid',
   'Arachnis breviscapa',
   'Orchid',
   'n/a',
   'Island of Borneo'],
  ['Gold Blossom Tree',
   'Barklya syringifolia',
   'Peacock Flower',
   '60 feet tall',
   'Australia'],
  ['Dinosaur Food',
   'Gunnera manicata',
   'Giant Rhubarb',
   '8 foot tall by 13 feet wide',
   'Brazil']])

books:


(['TITLE', 'AUTHOR', 'YEAR', 'ISBN'],
 [['Python Programming: An Introduction to Computer Science',
   'John M. Zelle',
   '2004',
   '1887902996'],
  ['Programming Python: Powerful Object-Oriented Programming',
   'Mark Lutz',
   '2010',
   '1449302750'],
  ['Core Python Programming', 'Wesley J Chun', '2006', '0137061595'],
  ['Expert Python Programming', 'Tarek Ziade', '2008', '1847194958'],
  ['Python for Kids: A Playful Introduction to Programming',
   'Jason R. Briggs',
   '2013',
   '1593274076'],
  ['Learning Python: Powerful Object-Oriented Programming',
   'Mark Lutz',
   '2013',
   '1449355692'],
  ['Python Programming in Context',
   'Bradley N. Miller & \u200eDavid L. Ranum',
   '2010',
   '1449660347'],
  ['Python Programming for the Absolute Beginner: Third Edition',
   'Michael Dawson',
   '2010',
   '1435456017'],
  ['Python Programming On Win32: Help for Windows Programmers',
   'Mark Hammond & \u200eAndy Robinson',
   '2000',
   '1565926218'],
  ['Functional Python Pr

ledzep:


(['ALBUM_TITLE', 'RELEASE_DATE', 'LABEL', 'CATEGORY'],
 [['Led Zeppelin', '12 January 1969', 'Atlantic', 'Studio'],
  ['Led Zeppelin II', '22 October 1969', 'Atlantic', 'Studio'],
  ['Led Zeppelin III', '5 October 1970', 'Atlantic', 'Studio'],
  ['Led Zeppelin IV', '8 November 1971', 'Atlantic', 'Studio'],
  ['Houses of the Holy', '28 March 1973', 'Atlantic', 'Studio'],
  ['Physical Graffiti', '24 February 1975', 'Swan Song', 'Studio'],
  ['The Song Remains the Same', '22 October 1976', 'Swan Song', 'Live'],
  ['Presence', '31 March 1976', 'Swan Song', 'Studio'],
  ['In Through the Out Door', '15 August 1979', 'Swan Song', 'Studio'],
  ['Coda', '19 November 1982', 'Swan Song', 'Studio'],
  ['Led Zeppelin Boxed Set', '7 September 1990', 'Atlantic', 'Compilation'],
  ['Led Zeppelin Remasters', '15 October 1990', 'Atlantic', 'Compilation'],
  ['Led Zeppelin Boxed Set 2', '21 September 1993', 'Atlantic', 'Compilation'],
  ['The Complete Studio Recordings',
   '24 September 1993',
   'Atlan

### Example usage: seed_database() function

Seed fresh data records into three database files.

Overwrite each current database file with a freshly-seeded file.

In [9]:
from IPython.display import HTML

!rm -f ./data/*.db

csv_files = ['garden', 'books', 'ledzep']

for file in csv_files:
    print(f'{file}:')

    seed_database('./data', file)

    db = Database(path='./data', name=file)
    display(HTML(db.to_html(True)))
    db = None

garden:
CREATE TABLE IF NOT EXISTS garden (id INTEGER PRIMARY KEY, name TEXT, species TEXT, type TEXT, size TEXT, habitat TEXT)
INSERT INTO garden VALUES (NULL, "Bugleweed", "Ajuga reptans", "Mint", "4 to 14 inches tall", "Woods and rough pastures")
INSERT INTO garden VALUES (NULL, "Scorpion Orchid", "Arachnis breviscapa", "Orchid", "n/a", "Island of Borneo")
INSERT INTO garden VALUES (NULL, "Gold Blossom Tree", "Barklya syringifolia", "Peacock Flower", "60 feet tall", "Australia")
INSERT INTO garden VALUES (NULL, "Dinosaur Food", "Gunnera manicata", "Giant Rhubarb", "8 foot tall by 13 feet wide", "Brazil")
SELECT * FROM garden


id,name,species,type,size,habitat
1,Bugleweed,Ajuga reptans,Mint,4 to 14 inches tall,Woods and rough pastures
2,Scorpion Orchid,Arachnis breviscapa,Orchid,,Island of Borneo
3,Gold Blossom Tree,Barklya syringifolia,Peacock Flower,60 feet tall,Australia
4,Dinosaur Food,Gunnera manicata,Giant Rhubarb,8 foot tall by 13 feet wide,Brazil


books:
CREATE TABLE IF NOT EXISTS books (id INTEGER PRIMARY KEY, title TEXT, author TEXT, year INTEGER, isbn INTEGER)
INSERT INTO books VALUES (NULL, "Python Programming: An Introduction to Computer Science", "John M. Zelle", "2004", "1887902996")
INSERT INTO books VALUES (NULL, "Programming Python: Powerful Object-Oriented Programming", "Mark Lutz", "2010", "1449302750")
INSERT INTO books VALUES (NULL, "Core Python Programming", "Wesley J Chun", "2006", "0137061595")
INSERT INTO books VALUES (NULL, "Expert Python Programming", "Tarek Ziade", "2008", "1847194958")
INSERT INTO books VALUES (NULL, "Python for Kids: A Playful Introduction to Programming", "Jason R. Briggs", "2013", "1593274076")
INSERT INTO books VALUES (NULL, "Learning Python: Powerful Object-Oriented Programming", "Mark Lutz", "2013", "1449355692")
INSERT INTO books VALUES (NULL, "Python Programming in Context", "Bradley N. Miller & ‎David L. Ranum", "2010", "1449660347")
INSERT INTO books VALUES (NULL, "Python Programm

id,title,author,year,isbn
1,Python Programming: An Introduction to Computer Science,John M. Zelle,2004,1887902996
2,Programming Python: Powerful Object-Oriented Programming,Mark Lutz,2010,1449302750
3,Core Python Programming,Wesley J Chun,2006,137061595
4,Expert Python Programming,Tarek Ziade,2008,1847194958
5,Python for Kids: A Playful Introduction to Programming,Jason R. Briggs,2013,1593274076
6,Learning Python: Powerful Object-Oriented Programming,Mark Lutz,2013,1449355692
7,Python Programming in Context,Bradley N. Miller & ‎David L. Ranum,2010,1449660347
8,Python Programming for the Absolute Beginner: Third Edition,Michael Dawson,2010,1435456017
9,Python Programming On Win32: Help for Windows Programmers,Mark Hammond & ‎Andy Robinson,2000,1565926218
10,Functional Python Programming,Steven Lott,2015,1784396992


ledzep:
CREATE TABLE IF NOT EXISTS ledzep (id INTEGER PRIMARY KEY, album_title TEXT, release_date TEXT, label TEXT, category TEXT)
INSERT INTO ledzep VALUES (NULL, "Led Zeppelin", "12 January 1969", "Atlantic", "Studio")
INSERT INTO ledzep VALUES (NULL, "Led Zeppelin II", "22 October 1969", "Atlantic", "Studio")
INSERT INTO ledzep VALUES (NULL, "Led Zeppelin III", "5 October 1970", "Atlantic", "Studio")
INSERT INTO ledzep VALUES (NULL, "Led Zeppelin IV", "8 November 1971", "Atlantic", "Studio")
INSERT INTO ledzep VALUES (NULL, "Houses of the Holy", "28 March 1973", "Atlantic", "Studio")
INSERT INTO ledzep VALUES (NULL, "Physical Graffiti", "24 February 1975", "Swan Song", "Studio")
INSERT INTO ledzep VALUES (NULL, "The Song Remains the Same", "22 October 1976", "Swan Song", "Live")
INSERT INTO ledzep VALUES (NULL, "Presence", "31 March 1976", "Swan Song", "Studio")
INSERT INTO ledzep VALUES (NULL, "In Through the Out Door", "15 August 1979", "Swan Song", "Studio")
INSERT INTO ledzep VA

id,album_title,release_date,label,category
1,Led Zeppelin,12 January 1969,Atlantic,Studio
2,Led Zeppelin II,22 October 1969,Atlantic,Studio
3,Led Zeppelin III,5 October 1970,Atlantic,Studio
4,Led Zeppelin IV,8 November 1971,Atlantic,Studio
5,Houses of the Holy,28 March 1973,Atlantic,Studio
6,Physical Graffiti,24 February 1975,Swan Song,Studio
7,The Song Remains the Same,22 October 1976,Swan Song,Live
8,Presence,31 March 1976,Swan Song,Studio
9,In Through the Out Door,15 August 1979,Swan Song,Studio
10,Coda,19 November 1982,Swan Song,Studio


### Information about this notebook:

In [10]:
from IPython.display import FileLink

print('Additional notebooks in this application (click to open in a new tab):')
display(FileLink('Building a Desktop Database Application.ipynb'))
display(FileLink('db.ipynb'))

print('Associated files that contain the seed data (click to download):')
display(FileLink('./data/books.csv'))
display(FileLink('./data/garden.csv'))
display(FileLink('./data/ledzep.csv'))

license = '''
This software is licensed under the Gnu GPLv3
(c) 2017 David Schenck https://github.com/zero2cx/

This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program.  If not, see <https://www.gnu.org/licenses/>.
'''

Additional notebooks in this application (click to open in a new tab):


Associated files that contain the seed data (click to download):
