<a href="https://colab.research.google.com/github/maratsmuk/daily_dashboards/blob/main/%5Bdaily_dashboards%5Ddb_generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Generation of a test database for interactive dashboards embedded in banking systems 
## Created by M.S. Mukhametzhanov
### Result of this notebook: a database with the following tables: 
1. Accounts(id UNSIGNED INT PK, open_dt DATE NOT NULL, close_dt DATE NOT NULL)
2. Dates(dt DATE PK)
3. Transactions(id UNSIGNED INT PK, account_id UNSIGNED INT NOT NULL, dttm DATETIME NOT NULL, amt FLOAT)

Connection of Google Drive for storing the DB.

In [1]:
from google.colab import drive
drive.mount('/content/drive')
!ls drive/MyDrive/daily_dashboards

Mounted at /content/drive
 accounts.csv					  dates.csv
'[daily_dashboards]db_generation.ipynb'		  DB_accounts.db
'[daily_dashboards]interactive_dashboard.ipynb'   transactions.csv
 datapane.yaml


Importing necessary libraries:

In [2]:
import pandas as pd
import datetime
import random
import sqlite3


Generation of the Dates table (randomly from 2010/01/01 to 2023/12/31, dimension of this table can be manually increased):

In [3]:
df_dates = pd.DataFrame(columns = ['dt'])
def random_date(start,end):
  return start + datetime.timedelta(days=random.randint(0, int((end - start).days)))
base = datetime.datetime(2010,1,1)
end = datetime.datetime(2023,12,31)
N_dates = 10000
date_list = list(set([random_date(base,end) for x in range(N_dates)]))
datenow = datetime.date.today()
datenow = datetime.datetime(datenow.year,datenow.month,datenow.day)
if datenow not in date_list:
  date_list.append(datenow)
date_list.sort()
df_dates['dt'] = date_list
print(df_dates.shape)
df_dates.head()

(4371, 1)


Unnamed: 0,dt
0,2010-01-01
1,2010-01-02
2,2010-01-04
3,2010-01-05
4,2010-01-06


Generation of the Accounts table randomly (open_dt and close_dt are chosen randomly from the Dates table, N_accounts can be modified manually):

In [4]:
N_accounts = 10
accounts = []
for id in range(1,N_accounts+1):
  date_open_i = random.randint(0,len(date_list))
  while date_list[date_open_i]>=datetime.datetime.utcnow():
    date_open_i = random.randint(0,date_open_i)
  date_close_i = min(len(date_list)-1,date_open_i+random.randint(1,len(date_list)-date_open_i))
  accounts.append([id,date_list[date_open_i],date_list[date_close_i]])
df_accounts = pd.DataFrame(accounts)
df_accounts.rename(columns={0:'id',1:'open_dt',2:'close_dt'},inplace=True)
#df_accounts.set_index('id',drop=True,inplace=True)
df_accounts.head()

Unnamed: 0,id,open_dt,close_dt
0,1,2016-05-31,2023-03-24
1,2,2020-03-18,2022-08-01
2,3,2011-06-25,2013-03-21
3,4,2019-07-06,2019-08-22
4,5,2015-04-20,2023-02-02


Generation of the Transactions table (accounts are chosen randomly from Accounts, dates are chosen randomly from Dates between open_dt and min(close_dt, today), amount is assigned randomly between -1000 and 1000). N_transactions can be chosen manually. Random hour-minute-second is added to the dates of the transactions.  

In [5]:
N_transactions = 1000
#df_transactions = pd.DataFrame(columns=['id','account_id','dttm','amt'])
transactions = []
for id in range(1,N_transactions+1):
  account_id = random.randint(df_accounts.id.min(),df_accounts.id.max())
  dttm_min = df_accounts[df_accounts['id']==account_id].iloc[0,1]
  dttm_max = min(df_accounts[df_accounts['id']==account_id].iloc[0,2],datenow)
  amt = round(random.uniform(-1000,1000),2)
  if date_list.index(dttm_min)+1 >= date_list.index(dttm_max)-1:
    dttm = date_list[date_list.index(dttm_min)]
  else:
    dttm = date_list[random.randint(date_list.index(dttm_min)+1,date_list.index(dttm_max)-1)]+datetime.timedelta(seconds = random.randint(0,86400))
  #transactions.append([id,account_id,dttm_min,dttm,dttm_max,amt])
  transactions.append([id,account_id,dttm,amt])
df_transactions = pd.DataFrame(transactions)
df_transactions.rename(columns={0:'id',1:'account_id',2:'dttm',3:'amt'},inplace=True)
#df_transactions.set_index('id',drop=True,inplace=True)
df_transactions.head()

Unnamed: 0,id,account_id,dttm,amt
0,1,7,2021-05-29 17:45:51,160.3
1,2,1,2018-11-03 05:08:54,-260.43
2,3,4,2019-08-11 15:06:23,-294.34
3,4,4,2019-08-13 21:04:09,195.38
4,5,7,2021-05-23 20:20:57,-288.46


(Non mandatory) Save the tables to csv-files

In [6]:
df_dates.to_csv('drive/MyDrive/daily_dashboards/dates.csv')
df_accounts.to_csv('drive/MyDrive/daily_dashboards/accounts.csv')
df_transactions.to_csv('drive/MyDrive/daily_dashboards/transactions.csv')

Open or create a new database DB_accounts.db. Drop the tables if they already exist:

In [7]:
db_connection = sqlite3.connect('drive/MyDrive/daily_dashboards/DB_accounts.db')
cursor = db_connection.cursor()
cursor.execute('''drop table accounts''')
cursor.execute('''drop table dates''')
cursor.execute('''drop table transactions''')
cursor.close()
db_connection.commit()
db_connection.close()

Re-open the database and create the required tables with their parameters. Check the first 5 rows of each table in order to guarantee their correct generation. 

In [8]:
db_connection = sqlite3.connect('drive/MyDrive/daily_dashboards/DB_accounts.db')
cursor = db_connection.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS accounts
              (id UNSIGNED INT PRIMARY KEY, open_dt DATE NOT NULL, close_dt DATE NOT NULL)''')
for t in zip(df_accounts['id'],df_accounts['open_dt'],df_accounts['close_dt']):
 cursor.execute('''insert into accounts values (?, ?, ?)''',(t[0],t[1].date(),t[2].date()))
db_connection.commit()
cursor.close()
cursor = db_connection.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS dates
              (dt DATE PRIMARY KEY)''')

for t in df_dates['dt']:
  cursor.execute('''insert into dates values (?)''',(t.date(),))
db_connection.commit()
cursor.close()

cursor = db_connection.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS transactions
              (id UNSIGNED INT PRIMARY KEY, account_id UNSIGNED INT NOT NULL, dttm DATETIME NOT NULL, amt FLOAT(8,2))''')
for t in zip(df_transactions['id'],df_transactions['account_id'],df_transactions['dttm'],df_transactions['amt']):
  cursor.execute('''insert into transactions values (?, ?, ?, ?)''',(t[0],t[1],str(t[2]),t[3]))
db_connection.commit()
cursor.close()

cursor = db_connection.cursor()
cursor.execute('''select * from accounts limit 5''')
print(cursor.fetchall())
cursor.execute('''select * from dates limit 5''')
print(cursor.fetchall())
cursor.execute('''select * from transactions limit 5''')
print(cursor.fetchall())
db_connection.close()


[(1, '2016-05-31', '2023-03-24'), (2, '2020-03-18', '2022-08-01'), (3, '2011-06-25', '2013-03-21'), (4, '2019-07-06', '2019-08-22'), (5, '2015-04-20', '2023-02-02')]
[('2010-01-01',), ('2010-01-02',), ('2010-01-04',), ('2010-01-05',), ('2010-01-06',)]
[(1, 7, '2021-05-29 17:45:51', 160.3), (2, 1, '2018-11-03 05:08:54', -260.43), (3, 4, '2019-08-11 15:06:23', -294.34), (4, 4, '2019-08-13 21:04:09', 195.38), (5, 7, '2021-05-23 20:20:57', -288.46)]


# The database has been generated!