### Objectives
---
This notebook explores the effect of setting grouped indices on mySQL tables. 

Specifying the correct indices will allow the insert function to work as expected.

For example, by setting indices to (`date`, `product`, `delivery`), there will only be one price value for a given combination of the unique index. 

Hence, `ON DUPLICATE KEY UPDATE` will update the existing values in the database if the same unique index combination is inserted into the db. 

### Get data and select top 10 rows

In [2]:
import mpob
import mysql

df = mpob.get_daily_prices(2019, 2019) # months = list(range(1,2))
df = df.head(10)

Reading data for 2019-1 ... 
Massaging final frame ... 
All done! 3 DataFrames were downloaded.


### Connect to db and create table

In [7]:
db = mysql.DB("mpob_daily")
db.connect()
db.create()

<_mysql.connection open to '35.240.192.164' at 0x5576e7238d28>
Successfully created table: mpob_daily with schema date DATE, product VARCHAR(255), delivery DATE, price FLOAT, updated DATETIME


Table schema

### Insert data into db

In [8]:
for idx, row in df.iterrows():
    db.insert(row) # Default overwrite = N - query excludes ON DUPLICATE KEY UPDATE

Initial result set

### Case 1: Set the index on date, product, delivery

#### Update value for first row

Scenario: MPOB Website decides to change the prices at some random time in the day

In [9]:
gen = df.iterrows()
row = next(gen)[1]
row

date        2019-01-09 00:00:00
product                     CPO
delivery    2019-04-01 00:00:00
price                      2200
updated     2019-08-20 07:02:35
id                         NULL
Name: 8, dtype: object

In [10]:
# Price is now changed!
row['price'] = 9999
row['updated'] = '2019-08-20 16:25:07'
row

date        2019-01-09 00:00:00
product                     CPO
delivery    2019-04-01 00:00:00
price                      9999
updated     2019-08-20 16:25:07
id                         NULL
Name: 8, dtype: object

In [11]:
db.insert(row)

We see that the first value has changed, with the updated time, as index was set on date, product and delivery.

### Case 2: Set the index on all columns

In [12]:
row['price'] = 8888
row['updated'] = '2019-08-20 20:25:07'
row

date        2019-01-09 00:00:00
product                     CPO
delivery    2019-04-01 00:00:00
price                      8888
updated     2019-08-20 20:25:07
id                         NULL
Name: 8, dtype: object

In [13]:
db.insert(row)

We see that the behavior is the same, regardless of whether we use overwrite = "Y" `ON DUPLICATE KEY UPDATE`, or overwrite = "N"

This is because every entry into the db is unique if the index is set on ALL columns!

Note: 
The insert function call has been simplified to remove the overwrite option. 

Previously, ON DUPLICATE KEY UPDATE was controlled by a boolean flag to be appended onto the INSERT INTO query. 

To control overwriting of existing values in the database, define your key indices correctly once the table has been created, i.e. set unique_index(date, product, delivery) if prices can be overwritten, and set unique_index on ALL columns if updated prices are to be inserted as a new row