# Interactive Exploratory Data Analysis Dashboard

## In order for the notebook to run properly without callback errors, please select the column needed firstly

### Introduction
This Jupyter Notebook is designed to create an interactive dashboard for Exploratory Data Analysis (EDA) of a given dataset. The dataset is assumed to be related to the banking sector with features like balance, income, age, etc. Also, the dataset contains a date-time column, which is used to create various time-based aggregates and visualizations.

The dashboard provides a variety of options for the user to choose and customize their EDA. These options include selecting the numeric column to explore, choosing the column to group the data by, selecting aggregation functions for time series analysis, specifying a range of dates to focus on, etc. Furthermore, the dashboard will automatically select the top five most interesting columns for the initial exploration, based on a scoring function that takes into account features like changes over time and uneven distributions.

### Expected Input and Output
A pandas DataFrame df that contains your dataset. This DataFrame is expected to include a column with date-time information.
Output
The output is a set of interactive graphs that provide insights about the data. These graphs include histograms, time-series, monthly averages, group time series, correlation graphs, weekday averages, yearly trends, box plots for different time frames, etc.

### Instructions to use
Follow these steps to use the dashboard:

Import the necessary libraries and load your dataset into a pandas DataFrame named df.
Run the cells to preprocess the data and start the dashboard.
The dashboard will open in a new browser window.
Use the dropdown menus to select the column you want to explore, the column to group the data by, the aggregation functions for time series analysis, and the numeric column for correlation analysis.
Use the range slider to select the date range you want to focus on.
The graphs will update automatically based on your selections.
Note: The dashboard assumes that your DataFrame df already exists in your environment and has been preprocessed as necessary before starting the dashboard. For example, missing values have been handled, categorical variables have been appropriately encoded if necessary, etc. The Time column is expected to be in the datetime format. If it's not, you need to convert it before starting the dashboard.

### Notice:

Note:
Since bach date lacks enough date data, I used the cancellation date column as a base for the date, which can be modified.

Since the dataset is too small, there are not enough numbers, and there are too many clusters of categorical variables, some of the line graphs about clusters do not exist because there are no two points that can be connected into a straight line (all are unique points of a certain cluster), and some have only one segment because there are not three or more valid points that can be connected into two segments.

The age in df is not a number and there is no column for marital status and loan status, so this cannot be completed in this dataset.

Explanation:
To find the five most interesting columns, I first defined what is interesting, which I defined as the absolute value of the standard deviation and skewness of each column, calculated the result, and then sorted it to select the five highest scoring columns.



In [1]:
import pandas as pd
import os
from jupyter_dash import JupyterDash
from dash import dcc
from dash import html
from dash.dependencies import Input, Output
import plotly.graph_objects as go
import pandas as pd
import numpy as np
import dash_bootstrap_components as dbc

In [2]:
# os.makedirs('my_directory', exist_ok=True)

In [3]:
data_folder_name = 'feature_store_data'
dfs = []
for file in os.listdir(data_folder_name):
    dfs.append(pd.read_parquet(os.path.join(data_folder_name,file)))
    # print('out/' + file)

In [4]:
for df in dfs:
    print(df.shape)

## This makes sense. Different version of data.
## @MAX: it's different version of mock data mixed up, when data got updated out folder should be cleared from old data

(7812, 548)
(7710, 548)
(7192, 548)
(7268, 548)
(6750, 548)
(8346, 548)
(6416, 548)
(8448, 548)
(6328, 548)
(9032, 548)
(5764, 548)
(8948, 548)
(7368, 548)
(7416, 548)
(7524, 548)
(7070, 548)
(8102, 548)
(7994, 548)
(8598, 548)
(5998, 548)
(8644, 548)
(6550, 548)
(9022, 548)
(8784, 548)
(6182, 548)
(8210, 548)
(7636, 548)
(6926, 548)
(6838, 548)
(5988, 548)
(6126, 548)
(8562, 548)
(6550, 548)
(6466, 548)
(8930, 548)
(8844, 548)
(7308, 548)
(6696, 548)
(6634, 548)
(7574, 548)
(7496, 548)
(7900, 548)
(9016, 548)
(6216, 548)
(8776, 548)
(6850, 548)
(6940, 548)
(7624, 548)
(8222, 548)
(6112, 548)
(5984, 548)
(8858, 548)
(8940, 548)
(6472, 548)
(6568, 548)
(8544, 548)
(7904, 548)
(6654, 548)
(6696, 548)
(7394, 548)
(7294, 548)
(7498, 548)
(7596, 548)
(6752, 548)
(7268, 548)
(7176, 548)
(7724, 548)
(7810, 548)
(8968, 548)
(5756, 548)
(9032, 548)
(6326, 548)
(8444, 548)
(6424, 548)
(8340, 548)
(8008, 548)
(8126, 548)
(7392, 548)
(7074, 548)
(7502, 548)
(5884, 548)
(8632, 548)
(6024, 548)
(860

In [5]:
df = pd.concat(dfs)
print(df.shape)

(2606036, 548)


In [6]:
for col in df.columns:
    if pd.api.types.is_integer_dtype(df[col]) and df[col].dtype == np.int32:
        df[col] = df[col].astype('int64')


In [7]:
pd.set_option('display.max_columns', 15000)

In [8]:
na_counts = df.isnull().sum()

for i in range(len(na_counts)):
    print(na_counts.index[i], na_counts[i])


CFK_CIF_NBR 0
AZ_BATCH_DATE 0
CFMR_BIRTHDATE 0
CFMR_DATE_OPENED 0
CFMR_DEATH_DT 0
CFMR_DT_CLOSED 0
CFMR_LAST_NAME 0
CFMR_APPL_CODE 0
CFMR_APPL_TYPE 0
CFMR_AGE 0
CFMR_1042_CORR 0
CFMR_NOSOL_EMAIL 0
CFMR_NOSOL_MAIL 0
CFMR_NOSOL_PHONE 0
CFMR_AGENT_OFFICER 0
CFMR_AR_FM_HHMM 0
CFMR_AR_PIN_OFF 0
CFMR_BUS_PH_INTL 0
CFMR_BUS_PHONE_EXT 0
CFMR_CELL_PH_INTL 0
CFMR_CENSUS_TRACT 0
CFMR_CRA_NBR 0
CFMR_EDUC_LEVEL 0
CFMR_FAX_PH_INTL 0
CFMR_HOME_PH_INTL 0
CFMR_HOUSEHOLD_NO 0
CFMR_INCOME_LEVEL 0
CFMR_LIFESTYLE_CD 0
CFMR_LOC_CODE 0
CFMR_NAICS_CODE 0
CFMR_NBR_DEPEND 0
CFMR_OCCUP_CODE 0
CFMR_OPEN_BRANCH 0
CFMR_OPEN_REASON 0
CFMR_SIC_CODE 0
CFMR_SMSA_NBR 0
CFMR_STATE_CODE 0
CFMR_TAX_ID_KEY 0
CFMR_TAX_ID_TYPE 0
CFMR_OCCUPATION 0
CFMR_CITY 0
CFMR_COUNTRY 0
CFMR_STATE 0
CFMR_RES_CNTRY_ABR 0
CFMR_ORG_CNTRY_ABR 0
CFMR_COUNTY 0
CFMR_SUFFIX 0
CFMR_CRED_RATING 0
CFMR_PREFIX 0
CFMR_NRA_NO_INC_YR 0
CFMR_RACE 0
CFMR_EMP_CD 0
CFMR_MARITAL_STAT 0
CFMR_NOTE_IND 0
CFMR_AR_STATUS 0
CFMR_CERT_TYPE 0
CFMR_NOSOL_FAX 0
CFMR_NR

In [9]:
df = df.dropna(axis='columns')

In [10]:
from IPython.display import display
with pd.option_context('display.min_rows', 200):
    display(df.head())

Unnamed: 0,CFK_CIF_NBR,AZ_BATCH_DATE,CFMR_BIRTHDATE,CFMR_DATE_OPENED,CFMR_DEATH_DT,CFMR_DT_CLOSED,CFMR_LAST_NAME,CFMR_APPL_CODE,CFMR_APPL_TYPE,CFMR_AGE,CFMR_1042_CORR,CFMR_NOSOL_EMAIL,CFMR_NOSOL_MAIL,CFMR_NOSOL_PHONE,CFMR_AGENT_OFFICER,CFMR_AR_FM_HHMM,CFMR_AR_PIN_OFF,CFMR_BUS_PH_INTL,CFMR_BUS_PHONE_EXT,CFMR_CELL_PH_INTL,CFMR_CENSUS_TRACT,CFMR_CRA_NBR,CFMR_EDUC_LEVEL,CFMR_FAX_PH_INTL,CFMR_HOME_PH_INTL,CFMR_HOUSEHOLD_NO,CFMR_INCOME_LEVEL,CFMR_LIFESTYLE_CD,CFMR_LOC_CODE,CFMR_NAICS_CODE,CFMR_NBR_DEPEND,CFMR_OCCUP_CODE,CFMR_OPEN_BRANCH,CFMR_OPEN_REASON,CFMR_SIC_CODE,CFMR_SMSA_NBR,CFMR_STATE_CODE,CFMR_TAX_ID_KEY,CFMR_TAX_ID_TYPE,CFMR_OCCUPATION,CFMR_CITY,CFMR_COUNTRY,CFMR_STATE,CFMR_RES_CNTRY_ABR,CFMR_ORG_CNTRY_ABR,CFMR_COUNTY,CFMR_SUFFIX,CFMR_CRED_RATING,CFMR_PREFIX,CFMR_NRA_NO_INC_YR,CFMR_RACE,CFMR_EMP_CD,CFMR_MARITAL_STAT,CFMR_NOTE_IND,CFMR_AR_STATUS,CFMR_CERT_TYPE,CFMR_NOSOL_FAX,CFMR_NRA_CLASS,CFMR_OWN_RENT,CFMR_SEX,CFMR_STATUS,CFMR_AFF_INFO_IND,CFMR_FOREIGN_FLG,CFMR_FOREIGN_IND,CFMR_PRIVACY_IND,CFMR_ADDRESS_IND,CFMR_MAIL_ADDR_IND,CFMR_NOSOL_BUS,CFMR_NOSOL_CELL,CFMR_NOSOL_PAGER,CFMR_CUSTOMER_ID,CFMR_ZIP,AMT_ACCT_CA,AMT_ACCT_OPEN_CA,AMT_ACCT_CLOSE_CA,MAX_ACCT_BAL_CA,MIN_ACCT_BAL_CA,AVG_ACCT_BAL_CA,SUM_ACCT_BAL_CA,SUM_DAYS_OVERDRAWN_PRV_YR_CA,SUM_DAYS_OVERDRAWN_YTD_CA,FRAUD_CA,MAX_OPEN_AMT_CA,MIN_OPEN_AMT_CA,AVG_OPEN_AMT_CA,MAX_IR_CA,MIN_IR_CA,AVG_IR_CA,MIN_DAYS_OPEN_CA,MAX_DAYS_OPEN_CA,AVG_DAYS_OPEN_CA,MAX_DEP_CA,MIN_DEP_CA,AVG_DEP_CA,MIN_DAYS_DEP_CA,MAX_DAYS_DEP_CA,AVG_DAYS_DEP_CA,MIN_DAYS_MONENT_TRANS_CA,MAX_DAYS_MONENT_TRANS_CA,AVG_DAYS_MONENT_TRANS_CA,ATM_CARD_CA,MAX_PREV_YTD_IR_CA,MIN_PREV_YTD_IR_CA,AVG_PREV_YTD_IR_CA,TRAN_ACCT_CA,MAX_YTD_IR_CA,MIN_YTD_IR_CA,AVG_YTD_IR_CA,MAX_BAL_YTD_CA,MIN_BAL_YTD_CA,AVG_BAL_YTD_CA,SUM_BAL_YTD_CA,MAX_BAL_PRV_CA,MIN_BAL_PRV_CA,AVG_BAL_PRV_CA,SUM_BAL_PRV_CA,IF_EMPLOYEE_CA,CASH_MNGMNT_CA,EXTND_OVERDRAFT_CA,NBR_WITHDRAWALS_OVERDARWN_CA,IF_LC_CODE_CA,SUM_NBR_CREDITS_CA,MIN_NBR_CREDITS_CA,MAX_NBR_CREDITS_CA,AVG_NBR_CREDITS_CA,SUM_NBR_DEBITS_CA,MAX_NBR_DEBITS_CA,MIN_NBR_DEBITS_CA,AVG_NBR_DEBITS_CA,SUM_NBR_DEP_ITEMS_CA,MAX_NBR_DEP_ITEMS_CA,MIN_NBR_DEP_ITEMS_CA,AVG_NBR_DEP_ITEMS_CA,SUM_NBR_DEPOSITS_CA,MAX_NBR_DEPOSITS_CA,MIN_NBR_DEPOSITS_CA,AVG_NBR_DEPOSITS_CA,SUM_OTC_CA,MAX_OTC_CA,MIN_OTC_CA,AVG_OTC_CA,MAX_INT_PR_STATE_CA,MIN_INT_PR_STATE_CA,AVG_INT_PR_STATE_CA,MAX_INT_YTD_STATE_CA,MIN_INT_YTD_STATE_CA,AVG_INT_YTD_STATE_CA,SUM_TIMES_NSF_LTD_CA,MAX_TIMES_NSF_LTD_CA,MIN_TIMES_NSF_LTD_CA,AVG_TIMES_NSF_LTD_CA,SUM_PATM_WD_CA,MAX_PATM_WD_CA,MIN_PATM_WD_CA,AVG_PATM_WD_CA,SUM_POS_WD_CA,MAX_POS_WD_CA,MIN_POS_WD_CA,AVG_POS_WD_CA,SUM_HARD_HOLDS_CA,MAX_HARD_HOLDS_CA,MIN_HARD_HOLDS_CA,AVG_HARD_HOLDS_CA,MAX_BAL_LAST_STMT_CA,MIN_BAL_LAST_STMT_CA,AVG_BAL_LAST_STMT_CA,SUM_BAL_LAST_STMT_CA,MIN_DORM_DAYS_CA,MAX_DORM_DAYS_CA,AVG_DORM_DAYS_CA,AMT_ACCT_RETAIL_CA,AMT_ACCT_COMMERCIAL_CA,MAX_EXC_ADJ_AMT_CA,MIN_EXC_ADJ_AMT_CA,AVG_EXC_ADJ_AMT_CA,MAX_NBR_OVC_CKS_CA,MIN_NBR_OVC_CKS_CA,AVG_NBR_OVC_CKS_CA
0,7006335,2022-08-12,2022-09-05,2022-06-23,2022-03-27,2022-08-05,192413.781313813,121080.50277171,182950.860276343,33,0,0,0,0,0,-954506,294768,0,0,0,0,-86523,0,0,0,0,0,11,0,0,721804,4,-576933,641027,-552023,0,0,-507255,2,0,10,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,0,0,0,0,0,7006335,vs8w5yuojx,1,1,0,228562.0,228562.0,228562.0,228562.0,-958070.0,139175.0,114278.0,179809.697259,179809.697259,179809.697259,137023.34408,137023.34408,137023.34408,296,296,296.0,72665.560899,72665.560899,72665.560899,326,326,326.0,319,319,319.0,1,59693.653624,59693.653624,59693.653624,1,10656.426398,10656.426398,10656.426398,66331.114073,66331.114073,66331.114073,66331.114073,171301.13845,171301.13845,171301.13845,171301.13845,1,1,1,838252.0,1,44902.0,44902.0,44902.0,44902.0,558473.0,558473.0,558473.0,558473.0,-125168.0,-125168.0,-125168.0,-125168.0,-408092.0,-408092.0,-408092.0,-408092.0,790156.0,790156.0,790156.0,790156.0,148026.878925,148026.878925,148026.878925,128247.317873,128247.317873,128247.317873,-771517.0,-771517.0,-771517.0,-771517.0,-590756.0,-590756.0,-590756.0,-590756.0,186527.0,186527.0,186527.0,186527.0,313790.0,313790.0,313790.0,313790.0,272571.0,272571.0,272571.0,272571.0,286,286,286.0,1,1,171493.0,171493.0,171493.0,-387675.0,-387675.0,-387675.0
1,7006337,2022-08-12,2021-06-09,2021-05-12,2021-06-27,2021-06-27,50957.187401005,25474.470996164,141970.104985779,35,0,0,0,0,0,800938,740806,0,0,0,0,271606,0,0,0,0,0,0,0,0,344226,0,-330730,-543125,-376472,0,8,210401,0,0,144,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,2,2,0,0,0,0,0,0,0,0,0,0,7006337,jt923pozgg,1,1,0,9157.0,9157.0,9157.0,9157.0,400525.0,-578249.0,862276.0,131197.400518,131197.400518,131197.400518,148544.057243,148544.057243,148544.057243,465,465,465.0,138361.379586,138361.379586,138361.379586,437,437,437.0,518,518,518.0,1,182098.57457,182098.57457,182098.57457,1,14935.367196,14935.367196,14935.367196,55092.93988,55092.93988,55092.93988,55092.93988,41992.789476,41992.789476,41992.789476,41992.789476,1,1,1,22980.0,1,-431083.0,-431083.0,-431083.0,-431083.0,-600459.0,-600459.0,-600459.0,-600459.0,-171603.0,-171603.0,-171603.0,-171603.0,8041.0,8041.0,8041.0,8041.0,712039.0,712039.0,712039.0,712039.0,18636.583046,18636.583046,18636.583046,538.464609,538.464609,538.464609,-107222.0,-107222.0,-107222.0,-107222.0,-534311.0,-534311.0,-534311.0,-534311.0,573235.0,573235.0,573235.0,573235.0,-599954.0,-599954.0,-599954.0,-599954.0,-900108.0,-900108.0,-900108.0,-900108.0,420,420,420.0,1,1,33929.0,33929.0,33929.0,806988.0,806988.0,806988.0
2,7006338,2022-08-12,2021-10-15,2021-06-04,2021-09-25,2021-08-16,116791.125076189,113320.627336352,70198.015788298,36,0,0,0,0,0,609277,-136514,0,0,0,0,765135,0,0,0,0,0,11,0,0,-74826,18,-101092,-602578,74324,0,1,-54364,0,26016,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,7006338,3sull82j5w,1,1,0,-13457.0,-13457.0,-13457.0,-13457.0,-879016.0,-486972.0,-785616.0,197086.005175,197086.005175,197086.005175,94401.140856,94401.140856,94401.140856,520,520,520.0,172981.954118,172981.954118,172981.954118,574,574,574.0,464,464,464.0,1,13839.750608,13839.750608,13839.750608,1,53298.598268,53298.598268,53298.598268,38289.145235,38289.145235,38289.145235,38289.145235,60423.985003,60423.985003,60423.985003,60423.985003,1,1,1,-453091.0,1,-985297.0,-985297.0,-985297.0,-985297.0,130863.0,130863.0,130863.0,130863.0,691379.0,691379.0,691379.0,691379.0,-886103.0,-886103.0,-886103.0,-886103.0,696412.0,696412.0,696412.0,696412.0,84288.554356,84288.554356,84288.554356,129415.374206,129415.374206,129415.374206,148195.0,148195.0,148195.0,148195.0,-132463.0,-132463.0,-132463.0,-132463.0,-726591.0,-726591.0,-726591.0,-726591.0,28831.0,28831.0,28831.0,28831.0,798088.0,798088.0,798088.0,798088.0,468,468,468.0,1,1,-496515.0,-496515.0,-496515.0,-120806.0,-120806.0,-120806.0
3,7006339,2022-08-12,2021-01-17,2021-04-07,2021-04-29,2021-03-21,4259.754373352,123020.605997751,146209.967851409,37,0,1,0,1,0,-971382,76246,0,0,0,0,-962713,0,0,0,0,0,0,0,0,-767945,0,-193862,185927,187428,0,0,179112,0,69,90,0,1,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,2,2,0,0,0,0,0,0,0,0,0,7006339,oos3s8nylq,1,1,0,174257.0,174257.0,174257.0,174257.0,42037.0,94523.0,-554210.0,15607.267697,15607.267697,15607.267697,58355.518973,58355.518973,58355.518973,451,451,451.0,74175.250064,74175.250064,74175.250064,451,451,451.0,485,485,485.0,1,76076.60025,76076.60025,76076.60025,1,198061.320854,198061.320854,198061.320854,114686.598959,114686.598959,114686.598959,114686.598959,77357.546795,77357.546795,77357.546795,77357.546795,1,1,1,53571.0,1,249390.0,249390.0,249390.0,249390.0,-143044.0,-143044.0,-143044.0,-143044.0,729681.0,729681.0,729681.0,729681.0,178528.0,178528.0,178528.0,178528.0,-95650.0,-95650.0,-95650.0,-95650.0,20184.179563,20184.179563,20184.179563,23772.997495,23772.997495,23772.997495,-993239.0,-993239.0,-993239.0,-993239.0,583875.0,583875.0,583875.0,583875.0,-474868.0,-474868.0,-474868.0,-474868.0,440853.0,440853.0,440853.0,440853.0,317082.0,317082.0,317082.0,317082.0,462,462,462.0,1,1,-462342.0,-462342.0,-462342.0,-465450.0,-465450.0,-465450.0
4,7006340,2022-08-12,2022-01-03,2022-04-05,2022-03-20,2022-01-28,43282.392350326,162401.090493371,16280.978220485,38,0,0,0,0,0,-361021,-937211,0,0,0,149,-965934,0,0,0,0,0,0,0,0,-417409,2,-590967,-81798,879370,0,0,-877617,0,0,19,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,7006340,8985jdhwkn,1,1,0,16847.0,16847.0,16847.0,16847.0,-84911.0,-349762.0,621795.0,168033.694102,168033.694102,168033.694102,128576.082345,128576.082345,128576.082345,470,470,470.0,40944.228128,40944.228128,40944.228128,512,512,512.0,547,547,547.0,1,171534.910485,171534.910485,171534.910485,1,181749.920804,181749.920804,181749.920804,19893.775853,19893.775853,19893.775853,19893.775853,43471.131283,43471.131283,43471.131283,43471.131283,1,1,1,966812.0,1,-829817.0,-829817.0,-829817.0,-829817.0,942302.0,942302.0,942302.0,942302.0,190382.0,190382.0,190382.0,190382.0,-626880.0,-626880.0,-626880.0,-626880.0,139643.0,139643.0,139643.0,139643.0,111975.373929,111975.373929,111975.373929,105457.126135,105457.126135,105457.126135,925532.0,925532.0,925532.0,925532.0,863581.0,863581.0,863581.0,863581.0,-60032.0,-60032.0,-60032.0,-60032.0,-67026.0,-67026.0,-67026.0,-67026.0,180200.0,180200.0,180200.0,180200.0,498,498,498.0,1,1,-525757.0,-525757.0,-525757.0,165926.0,165926.0,165926.0


In [11]:
Time = 'AZ_BATCH_DATE'

df[Time] = pd.to_datetime(df[Time])
df['year'] = df[Time].dt.year
df['quarter'] = df[Time].dt.quarter
df['month'] = df[Time].dt.month
df['week'] = df[Time].dt.isocalendar().week
df['day'] = df[Time].dt.day
df['hour'] = df[Time].dt.hour
df['day_of_week'] = df[Time].dt.dayofweek 
df['date'] = df[Time].dt.date

In [1]:
"""
Exploratory Data Analysis Dashboard for Banking Dataset

This script creates a dashboard using Dash and Plotly to perform exploratory data analysis on a banking dataset.
The dashboard allows users to visualize and analyze the data using various graphs and interactive components.

Inputs:
    - df: pandas DataFrame containing the banking dataset

Outputs:
    - Dash application: An interactive dashboard for exploring the dataset

Constants:
    - Time: Name of the column containing the time information

"""




for col in df.columns:
    # Convert specific integer columns to int64 type
    if pd.api.types.is_integer_dtype(df[col]) and (df[col].dtype.name == 'UInt32' or df[col].dtype == np.int32):
        df[col] = df[col].astype('int64')

# Calculate the score for each column
def calculate_score(column):
    return df[column].std() + abs(df[column].skew())

# Get numeric columns only
numeric_columns = [col for col in df.columns if np.issubdtype(df[col].dtype, np.number)]
object_columns = [col for col in df.columns if df[col].dtype == 'object' and col != Time]

# Calculate the score for each column and sort them
scores = {col: calculate_score(col) for col in numeric_columns}
top_columns = sorted(scores, key=scores.get, reverse=True)[:5]

def get_filtered_options():
    new_options = []
    for col in object_columns:
        if df[col].nunique() <= 15:
            new_options.append({'label': col, 'value': col})
    return new_options

filtered_options = get_filtered_options()
default_value = filtered_options[0]['value'] if filtered_options else None

app = JupyterDash(__name__)

app.layout = dbc.Container([
    html.H1("Exploratory Data Analysis of Banking Dataset", style={'textAlign': 'center'}),
    html.P(''' Rows: 3747606, Columns:724
    ''', style={'fontSize': '18px'}),
    html.P('''
        This Dash app is designed to create an interactive dashboard for Exploratory Data Analysis (EDA) of a given dataset. 
        The dataset is assumed to be related to the banking sector with features like balance, income, age, etc. 
        Also, the dataset contains a date-time column, which is used to create various time-based aggregates and visualizations.
    ''', style={'fontSize': '18px'}),
    html.P('''
        The dashboard provides a variety of options for the user to choose and customize their EDA. 
        These options include selecting the numeric column to explore, choosing the column to group the data by, 
        selecting aggregation functions for time series analysis, specifying a range of dates to focus on, etc. 
        Furthermore, the dashboard will automatically select the top five most interesting columns for the initial exploration, 
        based on a scoring function that takes into account features like changes over time and uneven distributions.
    ''', style={'fontSize': '18px'}),

    html.P("What is the distribution of values in the numerical column across all clients? ", 
       style={'fontSize': '25px', 'fontWeight': 'bold'}),
    html.P("What is the average value of the numerical column for all clients?", 
       style={'fontSize': '25px', 'fontWeight': 'bold'}),
    html.P("What is the maximum value of the numerical column for all clients?", 
       style={'fontSize': '25px', 'fontWeight': 'bold'}),
    html.P("What is the minimum value of the numerical column for all clients?", 
       style={'fontSize': '25px', 'fontWeight': 'bold'}),

    
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select1',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
    ]),

    html.Button('Load/Hide Histogram for Distribution, Min, Max, and Mean', id='load-histogram-button', n_clicks=0),
    dcc.Graph(id='histogram'),

    html.P("How does the value of the numerical column vary over time?", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select2',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
        dbc.Col([
            html.Label('Select Aggregation Functions for Time Series:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='agg-func-select',
                options=[{'label': i, 'value': i} for i in ['mean', 'median', 'std']],
                value='mean',
                multi=True
            ),
        ], width=6),
    ]),
    html.Button('Load/Hide Time Series', id='load-time-series-button', n_clicks=0),
    dcc.Graph(id='time-series'),
    
    
    html.P("How does the value of the numerical column change over time for a specific group of clients?", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
    dbc.Col([
        html.Label('Select Numeric Column:', style={'fontSize': 20}),
        dcc.Dropdown(
            id='numeric-column-select4',
            options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
            value=top_columns
        ),
    ], width=6),

    dbc.Col([
        html.Label('Select Group Column (Object Type):', style={'fontSize': 20}),
        dcc.Dropdown(
            id='group-column-select1',
            options=filtered_options,
            value=default_value
        ),
    ], width=6)
]),
    html.Button('Load/Hide Group Time Series', id='load-group-time-series-button', n_clicks=0),
    dcc.Graph(id='group-time-series'),
    
    html.P("What is the correlation between the numerical column and another numerical column?", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select5',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
        
        dbc.Col([
            html.Label('Select Second Numeric Column for Correlation:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column2-select',
                options=[{'label': i, 'value': i} for i in numeric_columns],
                value=numeric_columns[1] if len(numeric_columns) > 1 else numeric_columns[0]
            ),
        ], width=6)
    ]),
    html.Button('Load/Hide Correlation Graph', id='load-correlation-button', n_clicks=0),
    dcc.Graph(id='correlation-graph'),
    
    html.P("What is the average value of the numerical column for each day of the week?", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select6',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
        dbc.Col([
    html.Label('Select Year:', style={'fontSize': 20}),
    dcc.Dropdown(
        id='global-year-select1',
        options=[{'label': 'All Years', 'value': 'ALL'}] +
                [{'label': str(year), 'value': year} for year in df['year'].unique()],
        value=df['year'].min()
    ),
    dcc.DatePickerRange(
    id='date-picker-range-weekday',
    min_date_allowed=df['date'].min(),
    max_date_allowed=df['date'].max(),
    initial_visible_month=df['date'].min(),
    start_date=df['date'].min(),
    end_date=df['date'].max()
)

], width=6)


    ]),
    html.Button('Load/Hide Weekday Average', id='load-weekday-average-button', n_clicks=0),
    dcc.Graph(id='weekday-average'),
    
    
    html.P("What is the distribution of values in the numerical column for different groups of clients?", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select8',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
        dbc.Col([
        html.Label('Select Group Column (Object Type):', style={'fontSize': 20}),
        dcc.Dropdown(
            id='group-column-select2',
            options=filtered_options,
            value=default_value
        ),
    ], width=6)
    ]),
    html.Button('Load/Hide Grouped Box Plot', id='load-grouped-box-plot-button', n_clicks=0),
    dcc.Graph(id='grouped-box-plot'),
    
    
    html.P("What is the distribution of values in the numerical column for each day of the week?", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select10',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
        dbc.Col([
    html.Label('Select Year:', style={'fontSize': 20}),
    dcc.Dropdown(
        id='global-year-select3',
        options=[{'label': 'All Years', 'value': 'ALL'}] +
                [{'label': str(year), 'value': year} for year in df['year'].unique()],
        value=df['year'].min()
    ),
       dcc.DatePickerRange(
        id='date-picker-range',
        min_date_allowed=df['date'].min(),
        max_date_allowed=df['date'].max(),
        initial_visible_month=df['date'].min(),
        start_date=df['date'].min(),
        end_date=df['date'].max()
    ),
], width=6)


    ]),
    html.Button('Load/Hide Day Week Box Plot', id='load-day-week-box-plot-button', n_clicks=0),
    dcc.Graph(id='day-week-box-plot'),
    
    html.P("What is the average value of the numerical column for each year?", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select11',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
    ]),
    html.Button('Load/Hide Yearly Average Bar', id='load-yearly-average-bar-button', n_clicks=0),
    dcc.Graph(id='yearly-average-bar'),
    
    html.P("What is the distribution of values in the numerical column for each quarter of the year? ", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select12',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
        dbc.Col([
    html.Label('Select Year:', style={'fontSize': 20}),
    dcc.Dropdown(
        id='global-year-select4',
        options=[{'label': 'All Years', 'value': 'ALL'}] +
                [{'label': str(year), 'value': year} for year in df['year'].unique()],
        value=df['year'].min()
    ),
], width=6)


    ]),
    html.Button('Load/Hide Quarterly Box Plot', id='load-quarterly-box-plot-button', n_clicks=0),
    dcc.Graph(id='quarterly-box-plot'),
    
    html.P("What is the trend in the value of the numerical column over the days of the week?", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select13',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
        dbc.Col([
    html.Label('Select Year:', style={'fontSize': 20}),
    dcc.Dropdown(
        id='global-year-select5',
        options=[{'label': 'All Years', 'value': 'ALL'}] +
                [{'label': str(year), 'value': year} for year in df['year'].unique()],
        value=df['year'].min()
    ),
    dcc.DatePickerRange(
        id='my-date-picker-range1',
        min_date_allowed=df['date'].min(),
        max_date_allowed=df['date'].max(),
        initial_visible_month=df['date'].min(),
        start_date=df['date'].min(),
        end_date=df['date'].max()
    ),
], width=6)


    ]),
    html.Button('Load/Hide Day of Week Trend', id='load-day-of-week-trend-button', n_clicks=0),
    dcc.Graph(id='day-of-week-trend'),
    
    html.P("What is the distribution of values in the numerical column for each year? ", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select14',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
    ]),
    html.Button('Load/Hide Yearly Distribution', id='load-yearly-distribution-button', n_clicks=0),
    dcc.Graph(id='yearly-distribution'),
    
    html.P("What is the average value of the numerical column for each month of the year?", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select15',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
        dbc.Col([
    html.Label('Select Year:', style={'fontSize': 20}),
    dcc.Dropdown(
        id='global-year-select6',
        options=[{'label': 'All Years', 'value': 'ALL'}] +
                [{'label': str(year), 'value': year} for year in df['year'].unique()],
        value=df['year'].min()
    ),
], width=6)


    ]),
    html.Button('Load/Hide Monthly Average Bar', id='load-monthly-average-bar-button', n_clicks=0),
    dcc.Graph(id='monthly-average-bar'),
    
    html.P("What is the distribution of values in the numerical column for each week of the year?", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select16',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
        dbc.Col([
    html.Label('Select Year:', style={'fontSize': 20}),
    dcc.Dropdown(
        id='global-year-select7',
        options=[{'label': 'All Years', 'value': 'ALL'}] +
                [{'label': str(year), 'value': year} for year in df['year'].unique()],
        value=df['year'].min()
    ),
], width=6)


    ]),
    html.Button('Load/Hide Weekly Distribution', id='load-weekly-distribution-button', n_clicks=0),
    dcc.Graph(id='weekly-distribution'),
    
    html.P("What is the trend in the value of the numerical column over the days of the month?", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select17',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
        dbc.Col([
    html.Label('Select Year:', style={'fontSize': 20}),
    dcc.Dropdown(
        id='global-year-select8',
        options=[{'label': 'All Years', 'value': 'ALL'}] +
                [{'label': str(year), 'value': year} for year in df['year'].unique()],
        value=df['year'].min()
    ),
    dcc.DatePickerRange(
        id='my-date-picker-range2',
        min_date_allowed=df['date'].min(),
        max_date_allowed=df['date'].max(),
        initial_visible_month=df['date'].min(),
        start_date=df['date'].min(),
        end_date=df['date'].max()
    ),
], width=6)


    ]),
    html.Button('Load/Hide Day of Month Trend', id='load-day-of-month-trend-button', n_clicks=0),
    dcc.Graph(id='day-of-month-trend'),
    
    html.P("What is the distribution of values in the numerical column for each month of the year?", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select18',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
        dbc.Col([
    html.Label('Select Year:', style={'fontSize': 20}),
    dcc.Dropdown(
        id='global-year-select9',
        options=[{'label': 'All Years', 'value': 'ALL'}] +
                [{'label': str(year), 'value': year} for year in df['year'].unique()],
        value=df['year'].min()
    ),
], width=6)


    ]),
    html.Button('Load/Hide Monthly Distribution', id='load-monthly-distribution-button', n_clicks=0),
    dcc.Graph(id='monthly-distribution'),
    
    html.P("What is the average value of the numerical column for each day of the month?", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select19',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
        dbc.Col([
    html.Label('Select Year:', style={'fontSize': 20}),
    dcc.Dropdown(
        id='global-year-select10',
        options=[{'label': 'All Years', 'value': 'ALL'}] +
                [{'label': str(year), 'value': year} for year in df['year'].unique()],
        value=df['year'].min()
    ),
    dcc.DatePickerRange(
        id='my-date-picker-range3',
        min_date_allowed=df['date'].min(),
        max_date_allowed=df['date'].max(),
        initial_visible_month=df['date'].min(),
        start_date=df['date'].min(),
        end_date=df['date'].max()
    ),
], width=6)


    ]),
    html.Button('Load/Hide Daily Average Bar', id='load-daily-average-bar-button', n_clicks=0),
    dcc.Graph(id='daily-average-bar'),
    
    html.P("What is the distribution of values in the numerical column for each day of the month?", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select20',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
        dbc.Col([
    html.Label('Select Year:', style={'fontSize': 20}),
    dcc.Dropdown(
        id='global-year-select11',
        options=[{'label': 'All Years', 'value': 'ALL'}] +
                [{'label': str(year), 'value': year} for year in df['year'].unique()],
        value=df['year'].min()
    ),
    dcc.DatePickerRange(
        id='my-date-picker-range4',
        min_date_allowed=df['date'].min(),
        max_date_allowed=df['date'].max(),
        initial_visible_month=df['date'].min(),
        start_date=df['date'].min(),
        end_date=df['date'].max()
    ),
], width=6)


    ]),
    html.Button('Load/Hide Daily Distribution', id='load-daily-distribution-button', n_clicks=0),
    dcc.Graph(id='daily-distribution'),
    
    html.P("How does the value of the numerical column change over time for clients within specific ranges?", style={'fontSize': 25, 'fontWeight': 'bold'}),
    dbc.Row([
        dbc.Col([
            html.Label('Select Numeric Column:', style={'fontSize': 20}),
            dcc.Dropdown(
                id='numeric-column-select21',
                options=[{'label': i, 'value': i} for i in top_columns + [col for col in numeric_columns if col not in top_columns]],
                value=top_columns
            ),
        ], width=6),
        
        dbc.Col([
            html.Label('Select Number of Bins for Value Ranges:', style={'fontSize': 20}),
            dcc.Input(
                id='bin-count-input',
                type='number',
                value=5
            ),
        ], width=6)
    ]),
    html.Button('Load/Hide Range Time Series', id='load-range-time-series-button', n_clicks=0),
    dcc.Graph(id='range-time-series'), 
])

app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('histogram', 'style'),
    Input('load-histogram-button', 'n_clicks')
)

@app.callback(
    Output('histogram', 'figure'),
    [Input('numeric-column-select1', 'value'),
     Input('load-histogram-button', 'n_clicks')]
)
def update_histogram(column, n_clicks):
    if n_clicks > 0 and n_clicks % 2 == 1:
        fig = go.Figure()
        fig.add_trace(go.Histogram(x=df[column], nbinsx=50, name=column))

        # Add lines and annotations for mean, min, and max
        shapes = []
        annotations = []

        # Mean
        shapes.append(dict(type='line', yref='paper', y0=0, y1=1, xref='x', x0=df[column].mean(), x1=df[column].mean()))
        annotations.append(dict(x=df[column].mean(), y=0.9, xref='x', yref='paper', showarrow=False, text='Mean: {:.2f}'.format(df[column].mean()), font=dict(color='black')))

        # Min
        shapes.append(dict(type='line', yref='paper', y0=0, y1=1, xref='x', x0=df[column].min(), x1=df[column].min(), line=dict(color='blue', dash='dash')))
        annotations.append(dict(x=df[column].min(), y=0.8, xref='x', yref='paper', showarrow=False, text='Min: {:.2f}'.format(df[column].min()), font=dict(color='blue')))

        # Max
        shapes.append(dict(type='line', yref='paper', y0=0, y1=1, xref='x', x0=df[column].max(), x1=df[column].max(), line=dict(color='red', dash='dash')))
        annotations.append(dict(x=df[column].max(), y=0.8, xref='x', yref='paper', showarrow=False, text='Max: {:.2f}'.format(df[column].max()), font=dict(color='red')))

        fig.update_layout(title='Histogram, Mean, Min, and Max of {}'.format(column), shapes=shapes, annotations=annotations)

        return fig
    else:
        return {}


    

app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('time-series', 'style'),
    Input('load-time-series-button', 'n_clicks')
)

@app.callback(
    Output('time-series', 'figure'),
    [Input('numeric-column-select2', 'value'),
     Input('agg-func-select', 'value'),
     Input('load-time-series-button', 'n_clicks')]
)
def update_time_series(column, agg_funcs, n_clicks):
    if n_clicks > 0 and n_clicks % 2 == 1:
        df_temp = df.copy()
        df_temp.set_index(Time, inplace=True) 

        fig = go.Figure()

        if not isinstance(agg_funcs, list):
            agg_funcs = [agg_funcs]

        for agg_func in agg_funcs:
            if agg_func in ['std', 'mean', 'median']:
                monthly_data = df_temp[column].resample('M').agg(agg_func)
                fig.add_trace(go.Scatter(x=monthly_data.index, y=monthly_data,
                                        mode='lines', name='Monthly {} of {}'.format(agg_func, column)))
        
        fig.update_layout(title_text='Time Series of {} with Selected Aggregation Functions'.format(column))
        return fig
    else:
        return {}





# Update the group time series graph
app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('group-time-series', 'style'),
    Input('load-group-time-series-button', 'n_clicks')
)

@app.callback(
    Output('group-time-series', 'figure'),
    [Input('numeric-column-select4', 'value'),
     Input('group-column-select1', 'value'),
     Input('load-group-time-series-button', 'n_clicks')]
)
def update_group_time_series(column, group_column, n_clicks=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        df_temp = df.copy()
        df_temp[Time] = pd.to_datetime(df_temp[Time])
        df_temp.set_index(Time, inplace=True)

        fig = go.Figure()

        # Get unique values of the group column
        groups = df_temp[group_column].unique()

        for group in groups:
            # Calculate aggregation result for each group
            group_data = df_temp[df_temp[group_column] == group][column]
            group_agg = group_data.resample('M').mean()  # Using mean as an example aggregation function, modify as needed
        
            group_agg_sorted = group_agg.sort_index()  # Sort the data by index

            fig.add_trace(go.Scatter(x=group_agg_sorted.index, y=group_agg_sorted, mode='lines', name='{} - {}'.format(group_column, group)))

        fig.update_layout(title_text='Time Series of {} on Different {} Groups'.format(column, group_column))
        return fig
    else:
        return {}


# Update the correlation graph
app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('correlation-graph', 'style'),
    Input('load-correlation-button', 'n_clicks')
)

@app.callback(
    Output('correlation-graph', 'figure'),
    [Input('numeric-column-select5', 'value'),
     Input('numeric-column2-select', 'value'),
     Input('load-correlation-button', 'n_clicks')]
)

def update_correlation_graph(column1, column2, n_clicks=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        correlation = df[[column1, column2]].corr().iloc[0, 1]  # Calculate correlation coefficient

        fig = go.Figure()
        fig.add_trace(go.Scatter(x=df[column1], y=df[column2], mode='markers', name='Data Points'))  # Plot data points
        fig.update_layout(title='Correlation between {} and {}'.format(column1, column2),
                          xaxis_title=column1, yaxis_title=column2)
        fig.add_annotation(x=0.9, y=0.1, xref='paper', yref='paper', text='Correlation: {:.2f}'.format(correlation),
                           showarrow=False, font=dict(color='black'))  # Add annotation for correlation coefficient

        return fig
    else:
        return {}

app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('weekday-average', 'style'),
    Input('load-weekday-average-button', 'n_clicks')
)

@app.callback(
    Output('weekday-average', 'figure'),
    [Input('numeric-column-select6', 'value'),
     Input('global-year-select1', 'value'),
     Input('date-picker-range-weekday', 'start_date'),
     Input('date-picker-range-weekday', 'end_date'),
     Input('load-weekday-average-button', 'n_clicks')]
)
def update_weekday_average(column, year, start_date, end_date, n_clicks=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        df_temp = df.copy()

        if year != 'ALL':
            df_temp = df_temp[df_temp['year'] == year]
        
        df_temp = df_temp[(df_temp['date'] >= pd.to_datetime(start_date).date()) & (df_temp['date'] <= pd.to_datetime(end_date).date())]

        weekday_avg = df_temp.groupby('day_of_week')[column].mean()

        fig = go.Figure()
        fig.add_trace(go.Bar(x=weekday_avg.index, y=weekday_avg, name='Average of {}'.format(column)))

        fig.update_layout(title_text='Average of {} for Each Day of the Week in {}'.format(column, year),
                          xaxis_title='Day of the Week', yaxis_title='Average')
        return fig
    else:
        return {}




app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('grouped-box-plot', 'style'),
    Input('load-grouped-box-plot-button', 'n_clicks')
)
@app.callback(
    Output('grouped-box-plot', 'figure'),
    [Input('numeric-column-select8', 'value'),
     Input('group-column-select2', 'value'),
     Input('load-grouped-box-plot-button', 'n_clicks')]
)
def update_grouped_box_plot(column, group_column, n_clicks=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        fig = go.Figure()

        # Get unique values of the group column in the desired order
        groups = df[group_column].unique()
        groups = sorted(groups)  

        for group in groups:
            group_data = df[df[group_column] == group][column]
            fig.add_trace(go.Box(y=group_data, name='{} - {}'.format(group_column, group)))

        fig.update_layout(title_text='Box Plot of {} for Different {} Groups'.format(column, group_column))
        return fig
    else:
        return {}



app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('day-week-box-plot', 'style'),
    Input('load-day-week-box-plot-button', 'n_clicks')
)
@app.callback(
    Output('day-week-box-plot', 'figure'),
    [
        Input('numeric-column-select10', 'value'),
        Input('global-year-select3', 'value'),
        Input('load-day-week-box-plot-button', 'n_clicks'),
        Input('date-picker-range', 'start_date'),
        Input('date-picker-range', 'end_date')
    ]
)
def update_day_week_box_plot(column, year, n_clicks=None, start_date=None, end_date=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        if year == 'ALL':
            df_temp = df.copy()
        else:
            df_temp = df[df['year'] == year]

        df_temp = df_temp[(df_temp['date'] >= pd.to_datetime(start_date).date()) & (df_temp['date'] <= pd.to_datetime(end_date).date())]

        fig = go.Figure()

        for day in range(7):  # Assuming Monday=0, Sunday=6
            day_data = df_temp[df_temp['day_of_week'] == day][column]
            fig.add_trace(go.Box(y=day_data, name='Day {}'.format(day + 1)))

        fig.update_layout(title_text='Box Plot of {} for Each Day of the Week in {}'.format(column, year))
        return fig
    else:
        return {}






app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('yearly-average-bar', 'style'),
    Input('load-yearly-average-bar-button', 'n_clicks')
)

@app.callback(
    Output('yearly-average-bar', 'figure'),
    [Input('numeric-column-select11', 'value'),
     Input('load-yearly-average-bar-button', 'n_clicks')]
)
def update_yearly_average_bar(column, n_clicks=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        df_temp = df.copy()
        df_temp.set_index(Time, inplace=True)
        yearly_avg = df_temp[column].resample('Y').mean()

        fig = go.Figure()

        # Check if yearly average is empty
        if yearly_avg.empty:
            fig.update_layout(title_text='Yearly Average of {} (No data available)'.format(column))
        else:
            fig.add_trace(go.Bar(x=yearly_avg.index.year, y=yearly_avg, name='Yearly Average of {}'.format(column)))
            fig.update_layout(title_text='Yearly Average of {}'.format(column))

        return fig
    else:
        return {}


app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('quarterly-box-plot', 'style'),
    Input('load-quarterly-box-plot-button', 'n_clicks')
)

@app.callback(
    Output('quarterly-box-plot', 'figure'),
    [Input('numeric-column-select12', 'value'),
     Input('global-year-select4', 'value'),
     Input('load-quarterly-box-plot-button', 'n_clicks')]
)
def update_quarterly_box_plot(column, year, n_clicks=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        if year == 'ALL':
            df_year = df.copy()
        else:
            df_year = df[df['year'] == year]

        fig = go.Figure()

        for quarter in range(1, 5):  # 1 to 4 quarters
            quarter_data = df_year[df_year['quarter'] == quarter][column]
            fig.add_trace(go.Box(y=quarter_data, name='Quarter {}'.format(quarter)))

        fig.update_layout(title_text='Box Plot of {} for Each Quarter of the Year in {}'.format(column, year))
        return fig
    else:
        return {}


app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('day-of-week-trend', 'style'),
    Input('load-day-of-week-trend-button', 'n_clicks')
)
@app.callback(
    Output('day-of-week-trend', 'figure'),
    [Input('numeric-column-select13', 'value'),
     Input('global-year-select5', 'value'),
     Input('load-day-of-week-trend-button', 'n_clicks'),
     Input('my-date-picker-range1', 'start_date'),
     Input('my-date-picker-range1', 'end_date')]
)
def update_day_of_week_trend(column, year, n_clicks=None, start_date=None, end_date=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        if year == 'ALL':
            df_year = df.copy()
        else:
            start_date = pd.to_datetime(start_date).date() if start_date else None
            end_date = pd.to_datetime(end_date).date() if end_date else None
            df_year = df[(df['year'] == year) & 
                         (df['date'] >= start_date if start_date else df['date']) &
                         (df['date'] <= end_date if end_date else df['date'])]
        
        fig = go.Figure()
        df_grouped = df_year.groupby('day_of_week')[column].mean()
        fig.add_trace(go.Scatter(x=df_grouped.index, y=df_grouped.values, mode='lines'))

        fig.update_layout(title='Trend in {} Over Days of the Week in {}'.format(column, year))
        return fig
    else:
        return {}




app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('yearly-distribution', 'style'),
    Input('load-yearly-distribution-button', 'n_clicks')
)

@app.callback(
    Output('yearly-distribution', 'figure'),
    [Input('numeric-column-select14', 'value'),
     Input('load-yearly-distribution-button', 'n_clicks')]
)
def update_yearly_distribution(column, n_clicks=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        fig = go.Figure()
        for year in df['year'].unique():
            fig.add_trace(go.Box(y=df[df['year'] == year][column], name=str(year)))
        fig.update_layout(title='Yearly Distribution of {}'.format(column))
        return fig
    else:
        return {}


app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks === undefined || n_clicks === null || n_clicks % 2 == 0){
            return {'display': 'none'}
        } else {
            return {'display': 'block'}
        }
    }
    """,
    Output('monthly-average-bar', 'style'),
    Input('load-monthly-average-bar-button', 'n_clicks')
)
@app.callback(
    Output('monthly-average-bar', 'figure'),
    [Input('numeric-column-select15', 'value'),
     Input('global-year-select6', 'value'),
     Input('load-monthly-average-bar-button', 'n_clicks')]
)
def update_monthly_avg_bar(column, year, n_clicks=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        if year == 'ALL':
            df_temp = df.copy()
        else:
            df_temp = df[df['year'] == year].copy()
        df_grouped = df_temp.groupby('month')[column].mean()
        fig = go.Figure()
        fig.add_trace(go.Bar(x=df_grouped.index, y=df_grouped.values, name='Average of {}'.format(column)))

        fig.update_layout(title_text='Average {} for Each Month in {}'.format(column, year),
                          xaxis_title='Month', yaxis_title='Average')
        return fig
    else:
        return {}



app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('weekly-distribution', 'style'),
    Input('load-weekly-distribution-button', 'n_clicks')
)
@app.callback(
    Output('weekly-distribution', 'figure'),
    [Input('numeric-column-select16', 'value'),
     Input('global-year-select7', 'value'),
     Input('load-weekly-distribution-button', 'n_clicks')]
)
def update_weekly_distribution(column, year, n_clicks=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        if year == 'ALL':
            df_temp = df.copy()
        else:
            df_temp = df[df['year'] == year]
        
        fig = go.Figure()
        for week in sorted(df_temp['week'].unique()):
            fig.add_trace(go.Box(y=df_temp[df_temp['week'] == week][column], name=str(week)))
        fig.update_layout(title='Weekly Distribution of {} in {}'.format(column, year))
        return fig
    else:
        return {}



app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('day-of-month-trend', 'style'),
    Input('load-day-of-month-trend-button', 'n_clicks')
)
@app.callback(
    Output('day-of-month-trend', 'figure'),
    [Input('numeric-column-select17', 'value'),
     Input('global-year-select8', 'value'),
     Input('load-day-of-month-trend-button', 'n_clicks'),
     Input('my-date-picker-range2', 'start_date'),
     Input('my-date-picker-range2', 'end_date')]
)
def update_day_of_month_trend(column, year, n_clicks=None, start_date=None, end_date=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        if year == 'ALL':
            df_temp = df.copy()
        else:
            start_date = pd.to_datetime(start_date).date() if start_date else None
            end_date = pd.to_datetime(end_date).date() if end_date else None
            df_temp = df[(df['year'] == year) & 
                         (df['date'] >= start_date if start_date else df['date']) &
                         (df['date'] <= end_date if end_date else df['date'])]
        
        fig = go.Figure()
        df_grouped = df_temp.groupby('day')[column].mean()
        fig.add_trace(go.Scatter(x=df_grouped.index, y=df_grouped.values, mode='lines'))

        fig.update_layout(title='Trend in {} Over Days of the Month in {}'.format(column, year))
        return fig
    else:
        return {}



app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('monthly-distribution', 'style'),
    Input('load-monthly-distribution-button', 'n_clicks')
)
@app.callback(
    Output('monthly-distribution', 'figure'),
    [Input('numeric-column-select18', 'value'),
     Input('global-year-select9', 'value'),
     Input('load-monthly-distribution-button', 'n_clicks')]
)
def update_monthly_distribution(column, year, n_clicks=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        if year == 'ALL':
            df_temp = df.copy()
        else:
            df_temp = df[df['year'] == year]

        fig = go.Figure()
        for month in np.sort(df_temp['month'].unique()):
            fig.add_trace(go.Violin(y=df_temp[df_temp['month'] == month][column], name=str(month), box_visible=True))
        fig.update_layout(title='Monthly Distribution of {} in {}'.format(column, year))
        return fig
    else:
        return {}



app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('daily-average-bar', 'style'),
    Input('load-daily-average-bar-button', 'n_clicks')
)

@app.callback(
    Output('daily-average-bar', 'figure'),
    [Input('numeric-column-select19', 'value'),
     Input('global-year-select10', 'value'),
     Input('load-daily-average-bar-button', 'n_clicks'),
     Input('my-date-picker-range3', 'start_date'),
     Input('my-date-picker-range3', 'end_date')]
)
def update_daily_avg_bar(column, year, n_clicks=None, start_date=None, end_date=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        if year == 'ALL':
            df_temp = df.copy()
        else:
            start_date = pd.to_datetime(start_date).date() if start_date else None
            end_date = pd.to_datetime(end_date).date() if end_date else None
            df_temp = df[(df['year'] == year) & 
                         (df['date'] >= start_date if start_date else df['date']) &
                         (df['date'] <= end_date if end_date else df['date'])]
        
        fig = go.Figure()
        df_grouped = df_temp.groupby('day')[column].mean().sort_index()
        fig.add_trace(go.Bar(x=df_grouped.index, y=df_grouped.values))

        fig.update_layout(title='Average {} for Each Day of the Month in {}'.format(column, year))
        return fig
    else:
        return {}




app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks % 2 == 1){
            return {'display': 'block'}
        } else {
            return {'display': 'none'}
        }
    }
    """,
    Output('daily-distribution', 'style'),
    Input('load-daily-distribution-button', 'n_clicks')
)
@app.callback(
    Output('daily-distribution', 'figure'),
    [Input('numeric-column-select20', 'value'),
     Input('global-year-select11', 'value'),
     Input('load-daily-distribution-button', 'n_clicks'),
     Input('my-date-picker-range4', 'start_date'),
     Input('my-date-picker-range4', 'end_date')]
)
def update_daily_distribution(column, year, n_clicks=None, start_date=None, end_date=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        if year == 'ALL':
            df_temp = df.copy()
        else:
            start_date = pd.to_datetime(start_date).date() if start_date else None
            end_date = pd.to_datetime(end_date).date() if end_date else None
            df_temp = df[(df['year'] == year) & 
                         (df['date'] >= start_date if start_date else df['date']) &
                         (df['date'] <= end_date if end_date else df['date'])]
        
        fig = go.Figure()
        for day in np.sort(df_temp['day'].unique()):
            fig.add_trace(go.Violin(y=df_temp[df_temp['day'] == day][column], name=str(day), box_visible=True))
        fig.update_layout(title='Daily Distribution of {} in {}'.format(column, year))
        return fig
    else:
        return {}


app.clientside_callback(
    """
    function(n_clicks) {
        if(n_clicks === undefined || n_clicks === null || n_clicks % 2 == 0){
            return {'display': 'none'}
        } else {
            return {'display': 'block'}
        }
    }
    """,
    Output('range-time-series', 'style'),
    Input('load-range-time-series-button', 'n_clicks')
)
@app.callback(
    Output('range-time-series', 'figure'),
    [Input('numeric-column-select21', 'value'),
     Input('bin-count-input', 'value'),
     Input('load-range-time-series-button', 'n_clicks')]
)
def update_range_time_series(column, bin_count, n_clicks=None):
    if n_clicks > 0 and n_clicks % 2 == 1:
        df_temp = df.copy()
        df_temp[Time] = pd.to_datetime(df_temp[Time])
        df_temp.set_index(Time, inplace=True)

        # Bin the numeric column
        df_temp['{}_bin'.format(column)] = pd.qcut(df_temp[column], q=bin_count)

        # Sort the bins by their left boundary
        bins_boundaries = df_temp['{}_bin'.format(column)].unique()
        bins_boundaries_sorted = sorted(bins_boundaries, key=lambda x: x.left)

        # Convert bin to categorical and sort
        df_temp['{}_bin'.format(column)] = pd.Categorical(df_temp['{}_bin'.format(column)], ordered=True, categories=bins_boundaries_sorted)

        fig = go.Figure()

        for bin in bins_boundaries_sorted:  # Iterate over sorted bins
            # Calculate aggregation result for each bin
            bin_data = df_temp[df_temp['{}_bin'.format(column)] == bin][column]
            bin_agg = bin_data.resample('M').mean()  # Using mean as an example aggregation function, modify as needed
            
            bin_agg_sorted = bin_agg.sort_index()  # Sort the data by index

            fig.add_trace(go.Scatter(x=bin_agg_sorted.index, y=bin_agg_sorted, mode='lines', name='{} - {}'.format(column, bin)))

        fig.update_layout(title_text='Time Series of {} for Different Value Ranges'.format(column))
        return fig
    else:
        return {}

app.run_server(mode='external')



NameError: name 'df' is not defined