## Storing and retrieving imported data into a RDBMS

#### Storing and retrieving data using pandas to SQLite

#### Tags:
    Data: labeled data, Kaggle competition
    Technologies: python, pandas, sqlite
    Techniques: Storing and retrieving data in a relational database

In [61]:
import pandas as pd
import sqlite3 as lite
import os


In [4]:
'''
With pandas we can import the data from e.g. a .csv file and then store it into a sqlite db table. 
'''

births = pd.read_csv('births.csv')
# Data taken from: https://www.kaggle.com/xvivancos/barcelona-data-sets#births.csv

births.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 734 entries, 0 to 733
Data columns (total 7 columns):
Year                 734 non-null int64
District Code        734 non-null int64
District Name        734 non-null object
Neighborhood Code    734 non-null int64
Neighborhood Name    734 non-null object
Gender               734 non-null object
Number               734 non-null int64
dtypes: int64(4), object(3)
memory usage: 40.2+ KB


In [5]:
births.head()

Unnamed: 0,Year,District Code,District Name,Neighborhood Code,Neighborhood Name,Gender,Number
0,2017,1,Ciutat Vella,1,el Raval,Boys,283
1,2017,1,Ciutat Vella,2,el Barri Gòtic,Boys,56
2,2017,1,Ciutat Vella,3,la Barceloneta,Boys,51
3,2017,1,Ciutat Vella,4,"Sant Pere, Santa Caterina i la Ribera",Boys,90
4,2017,2,Eixample,5,el Fort Pienc,Boys,117


In [34]:
def retrieve_all_tables(cur):
'''
Here we are using the SQLite native methods to retrieve the results into a list
'''
    
    cur.execute(
        '''SELECT name FROM sqlite_master WHERE type='table';'''
    )
    return(cur.fetchall())

In [49]:
def retrieve_all_rows(con,table):
'''
Here we are using pandas dataframe ready methods for retrieveing the results
into a dataframe
'''
    
    sql = '''SELECT * FROM '''+table+''';'''
    
    df = pd.read_sql(sql,con)    

    return(df)

In [30]:
# connect to the local SQLite db

con = lite.connect('test.db')
cur = con.cursor()

In [33]:
retrieve_all_tables(cur)

[('births',)]

In [None]:
# Store df to SQLite
births.to_sql('births',con=con)

In [37]:
# Check that the table 'births' is now created
retrieve_all_tables(cur)

[('births',)]

In [52]:
births_retrieved = retrieve_all_rows(con,'births')

In [53]:
# Check the datatype of the retrieved data
type(births_retrieved)

pandas.core.frame.DataFrame

In [54]:
# Check retrieved df resembles original
births_retrieved.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 734 entries, 0 to 733
Data columns (total 8 columns):
index                734 non-null int64
Year                 734 non-null int64
District Code        734 non-null int64
District Name        734 non-null object
Neighborhood Code    734 non-null int64
Neighborhood Name    734 non-null object
Gender               734 non-null object
Number               734 non-null int64
dtypes: int64(5), object(3)
memory usage: 46.0+ KB


In [62]:
# Remove the test database that was created
os.remove('test.db')

In [63]:
# Now we have a simple way of storing and retrieving the data from the RDBMS. 
# In the next project we will take a look how to answer some questions using SQL on RDBMS through Python.
