<a id='home'></a>

<a id='1'></a>

# Project : Itunes DB

In this small project, I build a PostgreSQL DB from the Itunes library which is an XML file. 
- The DB is manipulated by using psycopg2 library.
- The DB and XML file is checked by queries executed directly from bash.

The workflow is as following:

- First, the database and tables are initialized
- Second, the library file is parsed and converted to a format which is readeble by the psql
- Third, the data is inserted to the database's appropriate tables
- (todo)Finally, some interesting queries are made in order to have an idea about the data.

#### Create the Database

In [1]:
!psql -c 'DROP DATABASE IF EXISTS itunes;' postgres

ERROR:  database "itunes" is being accessed by other users
DETAIL:  There is 1 other session using the database.


In [2]:
! psql -c 'CREATE DATABASE itunes;' postgres

ERROR:  database "itunes" already exists


In [3]:
# check the databases
! psql -l

                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   
-----------+----------+----------+-------------+-------------+-----------------------
 itunes    | samet    | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 testdb    | samet    | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
(5 rows)



#### Create Tables

In [4]:
import psycopg2

In [5]:
con = psycopg2.connect(database='itunes', user='samet' , host='/var/run/postgresql/')  
cur = con.cursor()

cur.execute ('''
DROP TABLE IF EXISTS Artist;
DROP TABLE IF EXISTS Album;
DROP TABLE IF EXISTS Track;

CREATE TABLE Artist ( 
    id SERIAL PRIMARY KEY ,
    name TEXT UNIQUE
);

CREATE TABLE Album ( 
    id SERIAL PRIMARY KEY, 
    artist_id INTEGER, 
    title TEXT UNIQUE
);

CREATE TABLE Track ( 
    id SERIAL PRIMARY KEY, 
    title TEXT UNIQUE, 
    album_id INTEGER, 
    len INTEGER, 
    rating INTEGER, 
    count INTEGER 
) ''' )  

con.commit()  
con.close()

In [6]:
# check the tables
! psql -c "\dt" itunes

        List of relations
 Schema |  Name  | Type  | Owner 
--------+--------+-------+-------
 public | album  | table | samet
 public | artist | table | samet
 public | track  | table | samet
(3 rows)



#### Check the XML file

In [7]:
! head -50 Library.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>Major Version</key><integer>1</integer>
	<key>Minor Version</key><integer>1</integer>
	<key>Date</key><date>2015-11-24T11:12:10Z</date>
	<key>Application Version</key><string>12.3.1.23</string>
	<key>Features</key><integer>5</integer>
	<key>Show Content Ratings</key><true/>
	<key>Music Folder</key><string>file:///Users/csev/Music/iTunes/iTunes%20Music/</string>
	<key>Library Persistent ID</key><string>B7006C9E9799282E</string>
	<key>Tracks</key>
	<dict>
		<key>369</key>
		<dict>
			<key>Track ID</key><integer>369</integer>
			<key>Name</key><string>Another One Bites The Dust</string>
			<key>Artist</key><string>Queen</string>
			<key>Composer</key><string>John Deacon</string>
			<key>Album</key><string>Greatest Hits</string>
			<key>Genre</key><string>Rock</string>
			<key>Kind</ke

#### Parse the XML

In [8]:
import xml.etree.ElementTree as ET

In [9]:
def lookup(d, key):
    found = False
    for child in d:
        if found : return child.text
        if child.tag == 'key' and child.text == key :
            found = True
    return None

In [10]:
fname = "Library.xml"
stuff = ET.parse(fname)
all = stuff.findall('dict/dict/dict')
print 'Dict count:', len(all)

Dict count: 404


#### Insert Values to the DB

In [11]:
con = psycopg2.connect(database='itunes', user='samet' , host='/var/run/postgresql/')  
cur = con.cursor()

for entry in all:
    if ( lookup(entry, 'Track ID') is None ) : continue

    name = lookup(entry, 'Name')
    artist = lookup(entry, 'Artist')
    album = lookup(entry, 'Album')
    count = lookup(entry, 'Play Count')
    rating = lookup(entry, 'Rating')
    length = lookup(entry, 'Total Time')
    
    if name is None or artist is None or album is None : 
        continue

    cur.execute('''INSERT INTO Artist (name) 
        VALUES ( %s ) 
        ON CONFLICT DO NOTHING;''', ( artist, ) )
    cur.execute('SELECT id FROM Artist WHERE name = (%s) ', (artist, ))
    artist_id = cur.fetchone()[0]

    cur.execute('''INSERT INTO Album (title, artist_id) 
        VALUES ( %s, %s ) 
        ON CONFLICT DO NOTHING;''', ( album, artist_id ) )
    cur.execute('SELECT id FROM Album WHERE title = %s ', (album, ))
    album_id = cur.fetchone()[0]

    cur.execute('''INSERT INTO Track
        (title, album_id, len, rating, count) 
        VALUES ( %s, %s, %s, %s, %s ) 
        ON CONFLICT (title)
        DO UPDATE SET len = %s , rating= %s , count=%s 
        ;''', ( name, album_id, length, rating, count,length, rating, count) )

    con.commit()


In [12]:
# have a look
! psql -c "SELECT * FROM Track ORDER BY title LIMIT 10 " itunes

 id  |                     title                      | album_id |   len   | rating | count 
-----+------------------------------------------------+----------+---------+--------+-------
 102 | A Boy Named Sue (live)                         |       95 |  226063 |        |    37
 235 | A Brief History of Packets                     |      211 | 1004643 |        |      
 124 | Aguas De Marco                                 |      118 |  179408 |        |   407
 318 | Anant Agarwal                                  |      211 |  494000 |        |      
 245 | Andrew S. Tanenbaum on MINIX                   |      211 |  603000 |        |      
 212 | Andrew Tanenbaum: Writing the Book on Networks |      211 |  535040 |        |     4
 293 | Anil Jain: 25 Years of Biometric Recognition   |      211 |  661368 |        |      
 264 | An Interview with Don Waters                   |      264 | 1411082 |        |     2
   1 | Another One Bites The Dust                     |        1 |  

[Home](#home)