<div style="position: relative;">
<img src="https://user-images.githubusercontent.com/7065401/98728503-5ab82f80-2378-11eb-9c79-adeb308fc647.png"></img>

<h1 style="color: white; position: absolute; top:27%; left:10%;">
    MySQL and MariaDB for Python Developers
</h1>

<h3 style="color: #ef7d22; font-weight: normal; position: absolute; top:55%; left:10%;">
    David Mertz, Ph.D.
</h3>

<h3 style="color: #ef7d22; font-weight: normal; position: absolute; top:62%; left:10%;">
    Data Scientist
</h3>
</div>

# MySQL functions

Within MySQL there are a large number of built-in functions.  As well, you may define your own user-defined functions.  In this lesson we also look at MySQL views, which work nicely with functions.

In [1]:
import mysql.connector
cred = dict(user='ine_student', password='ine-password', database='ine', host='localhost')
from collections import namedtuple

conn = mysql.connector.connect(**cred)
cur = conn.cursor()

We will use small function presented in a prior lesson.

In [2]:
import pandas as pd
def table_schema(table_name):
    cur.execute(f"SHOW columns FROM {table_name}")
    info_cols = [c[0] for c in cur.description]
    schema = cur.fetchall()
    df = pd.DataFrame(schema, columns=info_cols)
    # Cleaner to show DataFrame with str rather than bytes
    df['Type'] = df.Type.str.decode('utf-8')
    # And nullable as Bool value
    df['Null'] = df.Null == 'YES'
    return df

## Built-in functions

The hundreds of functions available as built-ins in MySQL can be loosely broken out by the data type(s) they operate on.  For example, we have numeric functions like:

* ABS(): Return the absolute value
* CEIL(): Return the smallest integer value not less than the argument
* COS(): Return the cosine
* DEGREES(): Convert radians to degrees
* LN(): Return the natural logarithm of the argument
* MOD(): Return the remainder
* SIGN(): Return the sign of the argument

Other functions deal with string manipulation, or datetimes, regular expression matching, geometric functions, and others.  

Another special kind of function is an aggregation that takes many inputs—typically the many values in a query column—and combines them into a single value.  Particularly notable among those are `COUNT()`, `AVG()`, `MIN()`, `MAX()`, and `SUM()`.  But more specialized ones like `VAR_POP()` (population variance) or `BIT_XOR()` are also available.

## User-defined functions

MySQL allows you to write external functions in C or C++, and install them into a MySQL server.  Writing C/C++ code is outside the scope of this lesson.  If you happen to have functions you have written or obtained, installing them is simple, e.g.:

```sql
CREATE FUNCTION metaphon
  RETURNS STRING
  SONAME 'udf_example.so';
```

Let us show a MySQL function we can write purely in MySQL.  In earlier lessons, we created tables that have geographic information about United States zipcodes.  This version only stores `POINT` data, but has generated columns for `lat` and `lon`.

In [3]:
table_schema('census_zipcode_integrity')

Unnamed: 0,Field,Type,Null,Key,Default,Extra
0,zipcode,char(5),False,PRI,,
1,usps,text,True,,,VIRTUAL GENERATED
2,aland,bigint,True,,,
3,awater,bigint,True,,,
4,aland_sqmi,"decimal(8,3)",True,,,VIRTUAL GENERATED
5,awater_sqmi,"decimal(8,3)",True,,,VIRTUAL GENERATED
6,lat,double,True,,,VIRTUAL GENERATED
7,lon,double,True,,,VIRTUAL GENERATED
8,location,point,True,,,


To define a UDF (user-defined function) we need to have administrative privileges.  Those have not been granted to the user `ine-student`, so as root I do the following in the MySLQ shell.  The specific math is not important, but the formula in `haversine()` is a standard way of measuring surface distances between latitude/longitude pairs.

```sql
mysql> DROP FUNCTION IF EXISTS haversine;
Query OK, 0 rows affected (0.02 sec)

mysql> DELIMITER $$
mysql> CREATE FUNCTION haversine(
    ->         lat1 FLOAT, lon1 FLOAT,
    ->         lat2 FLOAT, lon2 FLOAT
    ->      ) RETURNS FLOAT
    ->     NO SQL DETERMINISTIC
    ->     COMMENT 'Returns the distance in degrees on the Earth
    '>              between two known points of latitude and longitude'
    -> BEGIN
    ->     RETURN DEGREES(ACOS(
    ->               COS(RADIANS(lat1)) *
    ->               COS(RADIANS(lat2)) *
    ->               COS(RADIANS(lon2) - RADIANS(lon1)) +
    ->               SIN(RADIANS(lat1)) * SIN(RADIANS(lat2))
    ->             ));
    -> END$$
Query OK, 0 rows affected (0.01 sec)

mysql> DELIMITER ;
```

This used a little bit of special MySQL in temporarily redefining the delimiter from `;` because otherwise MySQL would take the end of the function line as the end of the SQL command.  If you have logged in as an administrator within Python, the delimiter issue should not matter via the DB-API interface.  By indicating that the function is `DETERMINISTIC` and `NO SQL` (does not make SQL queries internally), the optimizer can make it more efficient.

In an earlier lesson, we tried to find zip codes that are "close" to a given latitude and longitude.  However, as a compromise for that lesson, we simply used Cartesian distance with the Pythagorian formula.  On the surface of earth, this is wrong.

We can use the MySQL function purely as a function.  For example:

In [4]:
mylat = 45.024212
mylon = -69.289848
cur.execute("SELECT haversine(%s, %s, 46, -70)", (mylat, mylon))
cur.fetchone()

(1.09535,)

In contrast, here is the Pythagorian formula distance:

In [5]:
from math import sqrt
def pythag(lat1, lon1, lat2, lon2):
    return sqrt((lat1-lat2)**2 + (lon1-lon2)**2)

pythag(mylat, mylon, 46, -70)

1.206846338208802

Those are definitely different, but it gets more stark as the actual distance increases.

In [6]:
cur.execute("SELECT haversine(%s, %s, 80, -120)", (mylat, mylon))
print("Haversine:", cur.fetchone()[0])
print("Cartesian:", pythag(mylat, mylon, 80, -120))

Haversine: 39.2511
Cartesian: 61.60215306370425


Let us use the function to answer the question of "what zip codes are near me" more accurately than in the earlier lesson.

In [7]:
sql_near = """
SELECT zipcode, aland_sqmi, awater_sqmi, 
       haversine(%s, %s, lat, lon) as distance
FROM census_zipcode_integrity
ORDER BY distance;
"""
cur.execute(sql_near, (mylat, mylon))
pd.DataFrame(cur.fetchall(), 
             columns=[c[0] for c in cur.description]).set_index('zipcode')

Unnamed: 0_level_0,aland_sqmi,awater_sqmi,distance
zipcode,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
04930,53.969,2.373,0.020158
04928,38.712,0.767,0.083780
04479,38.441,1.359,0.100014
04939,37.670,0.269,0.106723
04923,21.358,0.071,0.110290
...,...,...,...
96910,15.120,0.053,113.896000
96915,82.008,6.421,113.982000
96928,0.459,0.001,113.993000
96917,17.947,0.048,114.041000


## Views

A view is a virtualized table that is only generated when it is accessed.  Among other benefits, this lets us include function calls in queries without a user needing to think about those functions.  View are also commonly useful when they are the result of JOINs, GROUP BYs, subqueries, and other more complex constructions.  The user of the virtual table does not need to think about how it is constructed, just use it as if it were a simple table.

In [8]:
sql_hashes = """
CREATE OR REPLACE VIEW book_hashes (book_id, para_num, excerpt, sha1) AS 
SELECT book_id, para_num, left(para_text, 40), sha1(para_text)
FROM books;
"""
cur.execute(sql_hashes)

In [9]:
sql = """
SELECT para_num, excerpt, sha1 
FROM book_hashes 
ORDER BY para_num
LIMIT 500
OFFSET 1000;
"""
cur.execute(sql)
pd.DataFrame(cur.fetchall(), 
             columns=[c[0] for c in cur.description]).set_index('para_num')

Unnamed: 0_level_0,excerpt,sha1
para_num,Unnamed: 1_level_1,Unnamed: 2_level_1
1000,The defects of written speech which have,cf84b2c5c56386d085ca15ea43483343627631ce
1001,The advantages of a fixed orthography ar,a1f30479d3dc089d8715f38fb55d8039afb1e5a6
1002,"On the whole, it is true that the natura",9bdeefca3a7a7840f4a889327580e46107bdfa0a
1003,If we should institute a comparison betw,2cb0e4c5e0b73fe08c16d4865e209b895a827d48
1004,One of the most obvious difficulties tha,78c45a50080537d1db61d5f7458788abbdb07c73
...,...,...
1495,"Gossip, 337",571ab80c4ae1ba77575e69dd3d598fc928c5d117
1496,"Gradation of vowel-sound, effect of, o",5d6bc0d9ed2090dae4fbaa1dbe8f37ffb42948ae
1497,"Grain, 44",02b86ddb759336158ec80540f5f5e4092da5a3d8
1498,"Grammars, all incomplete, 6;\n histo",432dbad0941cbe7d19bf6719a23f36af95144560


## Summary

Custom functions and can be a powerful enhancement to those MySQL provides.  Combining these with views can provide a simple face to quite complex underlying queries and synthesis of data.