In [1]:
import polars as pl
import sqlite3
import sqlalchemy

# 9.1 Reading data from SQL databases

So far we've only talked about reading data from CSV files. That's a pretty common way to store data, but there are many others! Pandas can read from HTML, JSON, SQL, Excel (!!!), HDF5, Stata, and a few other things. In this chapter we'll talk about reading data from SQL databases.

You can read data from a SQL database using the `pd.read_sql` function. `read_sql` will automatically convert SQL column names to DataFrame column names.

`read_sql` takes 2 arguments: a `SELECT` statement, and a database connection object. This is great because it means you can read from *any* kind of SQL database -- it doesn't matter if it's MySQL, SQLite, PostgreSQL, or something else.

This example reads from a SQLite database, but any other database would work the same way.

In [2]:
con = sqlite3.connect("./data/weather_2012.sqlite")
df = pl.read_database(query="SELECT * from weather_2012 LIMIT 3", connection=con)
df

id,date_time,temp
i64,str,f64
1,"""2012-01-01 00:00:00""",-1.8
2,"""2012-01-01 01:00:00""",-1.8
3,"""2012-01-01 02:00:00""",-1.8


In [4]:
df.schema

Schema([('id', Int64), ('date_time', String), ('temp', Float64)])

-- NOT APPLICABLE TO POLAR

`read_sql` doesn't automatically set the primary key (`id`) to be the index of the dataframe. You can make it do that by adding an `index_col` argument to `read_sql`.

If you've used `read_csv` a lot, you may have seen that it has an `index_col` argument as well. This one behaves the same way.

-- END NOT APPLICABLE TO POLAR

just sort by the column


In [5]:
df = pl.read_database(query="SELECT * from weather_2012 LIMIT 3", connection=con)
df = df.set_sorted("id")
df

id,date_time,temp
i64,str,f64
1,"""2012-01-01 00:00:00""",-1.8
2,"""2012-01-01 01:00:00""",-1.8
3,"""2012-01-01 02:00:00""",-1.8


# 9.2 Writing to a SQLite database

Pandas has a `write_frame` function which creates a database table from a dataframe. Right now this only works for SQLite databases. Let's use it to move our 2012 weather data into SQL.

You'll notice that this function is in `pd.io.sql`. There are a ton of useful functions for reading and writing various kind of data in `pd.io`, and it's worth spending some time exploring them. ([see the documentation!](http://pandas.pydata.org/pandas-docs/stable/io.html))

In [6]:
#HERE

weather_df = pl.read_csv('./data/weather_2012.csv')
# con = sqlite3.connect("./data/test_db.sqlite")
# con.execute("DROP TABLE IF EXISTS weather_2012")
# cur = con.cursor()
# cur.execute('select * from foo')
# rows = cur.fetchall()
# print(rows)
# con.close()
con2 = sqlite3.connect("./data/test_db.sqlite")
weather_df.write_database(
    table_name="weather_2012",
    connection="sqlite:///data/test_db.sqlite",
    if_table_exists="replace"
)

744

We can now read from the `weather_2012` table in  `test_db.sqlite`, and we see that we get the same data back:

In [8]:
con = sqlite3.connect("./data/test_db.sqlite")
df = pl.read_database(query="SELECT * from weather_2012 LIMIT 3", connection=con)
df

Longitude (x),Latitude (y),Station Name,Climate ID,Date/Time (LST),Year,Month,Day,Time (LST),Temp (C),Temp Flag,Dew Point Temp (C),Dew Point Temp Flag,Rel Hum (%),Rel Hum Flag,Precip. Amount (mm),Precip. Amount Flag,Wind Dir (10s deg),Wind Dir Flag,Wind Spd (km/h),Wind Spd Flag,Visibility (km),Visibility Flag,Stn Press (kPa),Stn Press Flag,Hmdx,Hmdx Flag,Wind Chill,Wind Chill Flag,Weather
f64,f64,str,i64,str,i64,i64,i64,str,f64,null,f64,null,i64,null,null,null,i64,null,i64,null,f64,null,f64,null,null,null,i64,null,str
-73.75,45.47,"""MONTREAL/PIERRE ELLIOTT TRUDEA…",7025250,"""2012-03-01 00:00""",2012,3,1,"""00:00""",-5.5,,-9.7,,72,,,,5,,24,,4.0,,100.97,,,,-13,,"""Snow"""
-73.75,45.47,"""MONTREAL/PIERRE ELLIOTT TRUDEA…",7025250,"""2012-03-01 01:00""",2012,3,1,"""01:00""",-5.7,,-8.7,,79,,,,6,,26,,2.4,,100.87,,,,-13,,"""Snow"""
-73.75,45.47,"""MONTREAL/PIERRE ELLIOTT TRUDEA…",7025250,"""2012-03-01 02:00""",2012,3,1,"""02:00""",-5.4,,-8.3,,80,,,,5,,28,,4.8,,100.8,,,,-13,,"""Snow"""


The nice thing about having your data in a database is that you can do arbitrary SQL queries. This is cool especially if you're more familiar with SQL. Here's an example of sorting by the Weather column:

<style>
    @font-face {
        font-family: "Computer Modern";
        src: url('http://mirrors.ctan.org/fonts/cm-unicode/fonts/otf/cmunss.otf');
    }
    div.cell{
        width:800px;
        margin-left:16% !important;
        margin-right:auto;
    }
    h1 {
        font-family: Helvetica, serif;
    }
    h4{
        margin-top:12px;
        margin-bottom: 3px;
       }
    div.text_cell_render{
        font-family: Computer Modern, "Helvetica Neue", Arial, Helvetica, Geneva, sans-serif;
        line-height: 145%;
        font-size: 130%;
        width:800px;
        margin-left:auto;
        margin-right:auto;
    }
    .CodeMirror{
            font-family: "Source Code Pro", source-code-pro,Consolas, monospace;
    }
    .text_cell_render h5 {
        font-weight: 300;
        font-size: 22pt;
        color: #4057A1;
        font-style: italic;
        margin-bottom: .5em;
        margin-top: 0.5em;
        display: block;
    }
    
    .warning{
        color: rgb( 240, 20, 20 )
        }  