## Table of Contents

### Setup:
- [Standard Library Imports](#Standard-Library-Imports)
- [Global Variables](#Global-Variables)

### Data Operations:
- [Load Data Function](#Load-Data-Function)
- [Action 1: Clean Data Function](#Action-1-Clean-Data-Function)
- [Save and Convert Function](#Save-and-Convert-Function)
- [Extract DAB Function](#Extract-DAB-Function)
- [Action 2: Join JSON 'Dictionary'](#Action-2-Join-JSON-Dictionary)
- [Action 3: Mean, median, mode](#Action-3-Filter-and-Mean-Median-Mode)
- [Action 4: Correlation: Chi-square Test Function](#Action-4-Correlation-Chi-square-Test-Function)
- [Action 5: Visualize Data](#Action-5-Visualize-Data)

### GUI Operations:
- [Button Functions](#Button-Functions)
- [Closing the App Functions](#Closing-the-App-Functions)
- [GUI and Roots](#GUI-and-Roots)



<a id='Standard-Library-Imports'></a>
## Standard Library Imports


In [4]:

# Below is a list of the libraries needed to run this notebook. While I believe most of them are part of the standard Anaconda distribution,
# Some of them may need to be installed. Particularly seaborn and chardet
# !conda install chardet seaborn
#pip3 install chardet seaborn

# Standard library imports
import json
import math
import os

# additional imports
import chardet
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.stats as ss
import seaborn as sns
from collections import Counter
from functools import reduce
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg
from matplotlib.figure import Figure
from scipy.stats import chi2_contingency
from tkinter import filedialog, simpledialog, messagebox, ttk
import tkinter as tk






<a id="global-variables"></a>
### Global Variables

In [5]:
# Global Variables:
# `dataframes` is a list used to store the dataframes used throughout the application.
# The others are flags to check if the data has been successfully loaded, cleaned and extracted.
# They are also used for correct order of operation, user feedback and error management

dataframes = []
is_data_loaded = False
is_data_cleaned = False
is_data_extracted = False




<a id="Action-1-Clean-Data-Function"></a>
### Load Data Function

In [6]:
# load_data function
# This function loads 1 or more csv or json files, checks them for encoding and then joins them on the 'id' column.
# There is an error check if no 'id' column exists
# This also displays a small preview of the data so that the user has some idea of what has been loaded

def load_data():
    global is_data_loaded
    temp_dfs = []  # local list for storing loaded dataframes
    # allows loading the initial csv or the converted json
    filenames = filedialog.askopenfilenames(title="Select file(s)",
                                            filetypes=(("csv files", "*.csv"), ("json files", "*.json")))

    for filename in filenames:
        print(f"Loading file: {filename}...")
        text_box.insert(tk.END, f"Loading file: {filename}...\n")

        # Detect encoding with Chardet
        with open(filename, 'rb') as f:
            result = chardet.detect(f.read())
        encoding = result['encoding']

        # Load the data with correct encoding
        if filename.endswith('.csv'):
            df = pd.read_csv(filename, encoding=encoding)
        elif filename.endswith('.json'):
            df = pd.read_json(filename, lines=True, encoding=encoding)
        else:
            print("Invalid file type. Only CSV and JSON files are supported.")
            text_box.insert(tk.END, "Invalid file type. Only CSV and JSON files are supported.\n")
            return

        print(f"File loaded successfully with encoding {encoding}")
        text_box.insert(tk.END, f"File loaded successfully with encoding {encoding}\n")
        temp_dfs.append(df)

    # Check if 'id' column exists in all dataframes
    for temp_df in temp_dfs:
        if 'id' not in temp_df.columns and 'ID' not in temp_df.columns:
            print(f"No 'id' or 'ID' column in one of the dataframes. Please check your data.")
            text_box.insert(tk.END, "No 'id' or 'ID' column in one of the dataframes. Please check your data.\n")
            return

    # Convert all 'id' columns to lowercase for merging
    for temp_df in temp_dfs:
        if 'ID' in temp_df.columns:
            temp_df.rename(columns={'ID': 'id'}, inplace=True)

    # Merge dataframes on 'id' column
    global dataframes  # modifying global variable
    dataframes.clear()  # error check to clear existing dataframes if any
    merged_df = reduce(lambda left, right: pd.merge(left, right, on='id'), temp_dfs)
    dataframes.append(merged_df)

    print("Files joined successfully on 'id' column")
    text_box.insert(tk.END, "Files joined successfully on 'id' column\n")

    # Display a small 5x5 preview of the joined dataframe for user feedback
    preview = merged_df.iloc[:5, :5].to_string()  # example of slicing
    print("Preview of the joined dataframe:\n", preview)
    text_box.insert(tk.END, f"Preview of the joined dataframe:\n{preview}\n")
    is_data_loaded = True



<a id="clean-data-function"></a>
### Clean Data Function

In [7]:
# Action (1) clean_data function
# This function not only performs action 1 for removing specific NGRs,
# It also cleans the data and makes different preparations for the other steps such as whitespace,
# converting datetime and standardizing column names 

def clean_data():
    global is_data_loaded
    global is_data_cleaned
    if not is_data_loaded:
        text_box.insert(tk.END, "Please load the data before cleaning.\n")
        return

    if len(dataframes) == 1:
        df = dataframes[0]  # Get the merged dataframe
        df = df.copy()  # Make a copy of the dataframe for cleaning
        text_box.insert(tk.END, "Cleaning dataframe...\n")

        # remove trailing whitespaces from column names
        df.columns = df.columns.str.strip()
        text_box.insert(tk.END, "Removed trailing whitespaces from column names.\n")

        # Remove trailing and leading whitespaces the rest of the dataframe
        df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)
        text_box.insert(tk.END, "Removed trailing and leading whitespaces from all columns.\n")

        # convert 'Date' column for use in the calculations for 3rd requirement
        if 'Date' in df.columns:
            df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)
            text_box.insert(tk.END, "Converted 'Date' column to datetime.\n")

        # Remove commas from 'In-Use ERP Total' column and convert to float
        if 'In-Use ERP Total' in df.columns:
            if df['In-Use ERP Total'].dtype == 'object':  # Check if column type is 'object' which is equivalent to string
                df['In-Use ERP Total'] = df['In-Use ERP Total'].str.replace(',', '').astype(float)
                text_box.insert(tk.END, "Removed commas from 'In-Use ERP Total' column and converted it to float.\n")

        # Filter rows where 'Date' is before 1950* this was an issue mentioned in the report, there are 7 entries for '1900' 
        # DAB radio was only invented in the 1990s
        if 'Date' in df.columns:
            before_1950 = df[df['Date'].dt.year < 1990]
            df = df[df['Date'].dt.year >= 1950]
            text_box.insert(tk.END, "Filtered rows where 'Date' is before 1950.\n")

        # Use numpy to replace all missing values 
        df.fillna(np.nan, inplace=True)
        text_box.insert(tk.END, "Replaced all missing values with numpy NaN.\n")

        # Action 1 : Remove rows with specific NGR stations
        if 'NGR' in df.columns:
            df = df[~df['NGR'].isin(['NZ02553847', 'SE213515', 'NT05399374', 'NT252675908'])]
            text_box.insert(tk.END, "Removed rows with specific NGR stations.\n")

        # Action 2 prep for column names for Action 2
        # Rename columns
        if 'In-Use ERP Total' in df.columns:
            df.rename(columns={'In-Use ERP Total': 'Power(kW)'}, inplace=True)
            text_box.insert(tk.END, "Renamed 'In-Use ERP Total' to 'Power(kW)'.\n")

        if 'In-Use Ae Ht' in df.columns:
            df.rename(columns={'In-Use Ae Ht': 'Aerial height(m)'}, inplace=True)
            text_box.insert(tk.END, "Renamed 'In-Use Ae Ht' to 'Aerial height(m)'.\n")

        if 'Freq.' in df.columns:
            df.rename(columns={'Freq.': 'Freq'}, inplace=True)
            text_box.insert(tk.END, "Renamed 'Freq.' to 'Freq'.\n")

        # standardize all column titles - could have went lowercase but seemed more appropriate for a column headings
        df.columns = [col.upper() for col in df.columns]
        text_box.insert(tk.END, "Standardized all column titles to uppercase.\n")

        # Replace the old dataframe in the list with the cleaned dataframe as the new global dataframe to be used by other functions
        dataframes[0] = df
    
        is_data_cleaned = True
        text_box.insert(tk.END, "Dataframe cleaned.\n")
    else:
        text_box.insert(tk.END, "Please load the data before cleaning.\n")




<a id="Save-and-Convert-Function"></a>
### Save and Convert Function


In [8]:
# Save and convert JSON function. This meets functionality requirement to convert the dataset, 
# and additionally back up the dataset. The function on_closing() also helps fulfill this requirement

def save_file():
    if len(dataframes) == 0:
        text_box.insert(tk.END, "No data to save. Please load and clean data first.\n")
        return

    df = dataframes[0]  # Get the merged dataframe

    try:
        # Convert DataFrame to JSON string
        json_str = df.to_json(orient="records")

        # Ask the user where to save the json file. This meets the client functionality
        json_file_path = filedialog.asksaveasfilename(defaultextension=".json", filetypes=(("JSON files", "*.json"), ("All files", "*.*")))

        if json_file_path:
            # Save as json
            with open(json_file_path, 'w') as json_file:
                json_file.write(json_str)

            text_box.insert(tk.END, f"JSON file saved successfully at {json_file_path}.\n")

    except Exception as e:
        text_box.insert(tk.END, f"Error saving file: {e}\n")


    # Add buttons to dialog
    csv_button = tk.Button(dialog, text="CSV", command=lambda: save_as_filetype('csv'))
    csv_button.pack(side="left")
    json_button = tk.Button(dialog, text="JSON", command=lambda: save_as_filetype('json'))
    json_button.pack(side="left")


<a id="Extract-DAB-Function"></a>
### Extract DAB Function


In [9]:
# Extract DAB function. 
# This sets up actions 2,3,4 by extracting the DAB multiplexes. 
# Allows the user to choose the DAB multiplexes. This  meets the functionality requirement to 'handle other sets of data'
# This also creates the requested EID columns and fills them with values '1' or '0' to represent a match. 

import tkinter.simpledialog as simpledialog

def extract_data():
    global is_data_loaded
    global is_data_cleaned
    global is_data_extracted
    if not is_data_loaded or not is_data_cleaned:
        text_box.insert(tk.END, "Please load and clean the data.\n")
        return

    try:
        global dataframes  # modifying the global DataFrame

        # Asking user to specify multiplexes. This allows the functionality for other sets of data
        user_input = simpledialog.askstring("Input", "Please enter DAB for extraction separated by a space. Suggested: C18A C18F C188.")
        
        # Error check and feedback for closing selection box
        if user_input is None:
            text_box.insert(tk.END, "Data extraction cancelled by user.\n")
            return

        # Convert user input string to a list of dataframe names
        multiplexes = user_input.split()

        # Validate multiplexes 
        valid_multiplexes = dataframes[0]['EID'].unique().tolist()  # get the unique EIDs from the DataFrame to check if they exist
        for multiplex in multiplexes:
            if multiplex not in valid_multiplexes:
                text_box.insert(tk.END, f"Invalid DAB entered: {multiplex}. Please enter valid DABs.\n")
                return

        # Filter df based on the user-specified multiplexes
        dataframes[0] = dataframes[0][dataframes[0]['EID'].isin(multiplexes)]

        # Add new columns for each selected multiplex in df
        for multiplex in multiplexes:
            dataframes[0][multiplex] = dataframes[0]['EID'].apply(lambda x: 1 if x == multiplex else 0)
        
        text_box.insert(tk.END, "Extraction successful. DataFrame updated with selected EIDs.\n")
        is_data_extracted = True
    except Exception as e:
        text_box.insert(tk.END, f"Error extracting data: {e}\n")


<a id="Action-2-Join-JSON-Dictionary"></a>
### Action 2: Join JSON 'Dictionary'



In [10]:
# Action (2) Join JSON 'dictionary'
# This meets the specific column requirements to represent the client data as a 3-tier JSON dictionary with EID as the top tier then
# NGR at the second tier and then beneath NGR is every other column. This can be viewed in the GUI tabs as well as saved locally on the disk

def create_json_and_preview():
    global is_data_loaded
    global is_data_cleaned
    global is_data_extracted
    if not is_data_loaded or not is_data_cleaned or not is_data_extracted:
        text_box.insert(tk.END, "Please load, clean, and extract the data.\n")
        return
    try:
        # Your existing JSON generation code...
        df_json = dataframes[0].copy()
        df_json['DATE'] = df_json['DATE'].apply(lambda x: x.isoformat() if not pd.isnull(x) else '')
        nested_dict = df_json.groupby('EID').apply(lambda x: x.groupby('NGR')[['SITE', 'SITE HEIGHT', 'AERIAL HEIGHT(M)', 'POWER(KW)', 'DATE']].apply(lambda y: y.to_dict('records')).to_dict()).to_dict()
        
        json_str = json.dumps(nested_dict, indent=2)

        # Ask the user where to save the json file
        json_file_path = filedialog.asksaveasfilename(defaultextension=".json", filetypes=(("JSON files", "*.json"), ("All files", "*.*")))

        if json_file_path:
            # Save as json
            with open(json_file_path, 'w') as json_file:
                json_file.write(json_str)

            text_box.insert(tk.END, "JSON file created successfully.\n")

        # Create a new tab
        tab = ttk.Frame(tab_parent)
        tab_parent.add(tab, text = 'JSON Preview')

        # Create a text widget in the new tab
        text_widget = tk.Text(tab)
        text_widget.insert(tk.END, json_str)
        text_widget.pack()

        text_box.insert(tk.END, "JSON preview created successfully.\n")

    except Exception as e:
        text_box.insert(tk.END, f"Error creating JSON preview: {e}\n")



<a id="Action-3-Filter-and-Mean-Median-Mode"></a>
### Action 3: Mean, median, mode


In [11]:
# Action 3: Mean, median, mode
# This takes the requested DAB set and creates two statistical sets based on the requested filters (date & site height)
# it uses pandas for the calculations 

def calculate_stats_and_preview():
    global is_data_loaded
    global is_data_cleaned
    global is_data_extracted
    if not is_data_loaded or not is_data_cleaned or not is_data_extracted:
        text_box.insert(tk.END, "Please load, clean, and extract the data.\n")
        return

    try:
        if len(dataframes) == 0:
            text_box.insert(tk.END, "Please load and clean the data before calculating stats.\n")
            return

        df = dataframes[0]

        # Filter based on 'Site Height'
        df_filtered_height = df[df['SITE HEIGHT'] > 75]

        # Filter based on 'Date'
        df_filtered_year = df[df['DATE'].dt.year >= 2001]

        # Calculate stats for df_filtered_height
        mean_power_height = df_filtered_height['POWER(KW)'].mean()
        median_power_height = df_filtered_height['POWER(KW)'].median()
        mode_power_height = df_filtered_height['POWER(KW)'].mode()[0] if not df_filtered_height['POWER(KW)'].mode().empty else "No mode"

        # Calculate stats for df_filtered_year
        mean_power_year = df_filtered_year['POWER(KW)'].mean()
        median_power_year = df_filtered_year['POWER(KW)'].median()
        mode_power_year = df_filtered_year['POWER(KW)'].mode()[0] if not df_filtered_year['POWER(KW)'].mode().empty else "No mode"

        # Prepare the results string
        stats_str = (
            f"Stats for 'Site Height' > 75:\n"
            f"Mean of 'Power(kW)': {mean_power_height}\n"
            f"Median of 'Power(kW)': {median_power_height}\n"
            f"Mode of 'Power(kW)': {mode_power_height}\n\n"
            f"Stats for Year >= 2001:\n"
            f"Mean of 'Power(kW)': {mean_power_year}\n"
            f"Median of 'Power(kW)': {median_power_year}\n"
            f"Mode of 'Power(kW)': {mode_power_year}\n"
        )

        # Create a new tab
        tab = ttk.Frame(tab_parent)
        tab_parent.add(tab, text = 'Power(kW) Stats')

        # Create a text widget in the new tab
        text_widget = tk.Text(tab)
        text_widget.insert(tk.END, stats_str)
        text_widget.pack()

        text_box.insert(tk.END, "Stats calculated and previewed successfully.\n")

    except Exception as e:
        text_box.insert(tk.END, f"Error calculating and previewing stats: {e}\n")



<a id="Action-4-Correlation-Chi-square-Test-Function"></a>
### Action 4: Correlation: Chi-square Test Function


In [12]:
# Action (4) Correlation - Function to calculate Chi-square test
# This takes the specific labels to check for correlation in a list and then uses the scipy chi-sqaure test to find the values for 
# c, p, dof, expected. It also filters the results based on the significance level of .05 to display two groups in the GUI. 


def calc_chi_square():
    global is_data_loaded
    global is_data_cleaned
    global is_data_extracted
    if not is_data_loaded or not is_data_cleaned or not is_data_extracted:
        text_box.insert(tk.END, "Please load, clean, and extract the data.\n")
        return

    significance_level = 0.05
    df = dataframes[0].copy()
    labels = ['SITE', 'FREQ', 'BLOCK', 'SERV LABEL1', 'SERV LABEL2', 'SERV LABEL3', 'SERV LABEL4', 'SERV LABEL10']
    
    significant_results = []
    not_significant_results = []

    for i in range(len(labels)):
        for j in range(i+1, len(labels)):
            crosstab = pd.crosstab(df[labels[i]], df[labels[j]])
            # Chi-square test of independence.
            c, p, dof, expected = chi2_contingency(crosstab)
            
            result_string = (f"Between {labels[i]} and {labels[j]}:\n"
                             f"P-value: {p:.4f}\n"
                             f"Chi-square: {c:.4f}\n"
                             f"Degrees of Freedom: {dof}\n"
                             f"Expected Frequencies:\n {expected}\n\n")

            if p < significance_level:
                significant_results.append(result_string)
            else:
                not_significant_results.append(result_string)
                
    return {"significant": significant_results, "not_significant": not_significant_results}


# This is the function to display the GUI for the correlation. Initially I only displayed the full detailed list but it was too long.
# So I made two lists: an inital one with simple significant pairs and then the detailed results. 
def display_p_values(tab, p_values):
    text_box = tk.Text(tab)
    
    # Displaying simple list of significant pairs
    text_box.insert(tk.END, "Significant Pairs:\n")
    for value in p_values["significant"]:
        pair = value.split("\n")[0].replace("P-value between ", "").split(" and ")
        text_box.insert(tk.END, f"{pair[0]} - {pair[1]}\n")
    text_box.insert(tk.END, "\n")
    
    # Displaying simple list of non-significant pairs
    text_box.insert(tk.END, "Not Significant Pairs:\n")
    for value in p_values["not_significant"]:
        pair = value.split("\n")[0].replace("P-value between ", "").split(" and ")
        text_box.insert(tk.END, f"{pair[0]} - {pair[1]}\n")
    text_box.insert(tk.END, "\n")

    # Displaying detailed results
    text_box.insert(tk.END, "\n\nDetailed Results:\n")
    text_box.insert(tk.END, "-" * 50 + "\n")
    text_box.insert(tk.END, "Significant Results:\n")
    text_box.insert(tk.END, "\n".join(p_values["significant"]))
    text_box.insert(tk.END, "-" * 50 + "\n")
    text_box.insert(tk.END, "Not Significant Results:\n")
    text_box.insert(tk.END, "\n".join(p_values["not_significant"]))
    
    text_box.pack()




<a id="Action-5-Visualize-Data"></a>
### Action 5: Visualize Data


In [13]:
# Action (5) Visualize data
# This creates a Facetgrid to display the relationship between the labels and the three EIDs. It is then put into a scrollable canvas

def plot_label(df, tab_parent):
    global is_data_loaded
    global is_data_cleaned
    global is_data_extracted
    if not is_data_loaded or not is_data_cleaned or not is_data_extracted:
        text_box.insert(tk.END, "Please load, clean, and extract the data.\n")
        return

    labels = ['FREQ', 'SITE', 'BLOCK', 'SERV LABEL1', 'SERV LABEL2', 'SERV LABEL3', 'SERV LABEL4', 'SERV LABEL10']
    df_melted = df.melt(id_vars=['EID'], value_vars=labels)

    g = sns.FacetGrid(df_melted, col='EID', row='variable', height=4, aspect=1)
    g.map_dataframe(sns.countplot, x='value')
    g.set_axis_labels("Values", "Count")
    g.set_titles(col_template="{col_name} EID", row_template="{row_name}")
    for ax in g.axes.flat:
        for label in ax.get_xticklabels():
            label.set_rotation(90)

    facet_tab = ttk.Frame(tab_parent)
    tab_parent.add(facet_tab, text="EID FacetGrid")

    # Scrollable canvas setup
    scrollable_canvas = tk.Canvas(facet_tab)
    scrollable_canvas.pack(side=tk.LEFT, fill=tk.BOTH, expand=True)
    
    scrollbar = tk.Scrollbar(facet_tab, orient="vertical", command=scrollable_canvas.yview)
    scrollbar.pack(side=tk.RIGHT, fill=tk.Y)

    scrollable_canvas.configure(yscrollcommand=scrollbar.set)

    canvas_frame = tk.Frame(scrollable_canvas)
    scrollable_canvas.create_window((0, 0), window=canvas_frame, anchor="nw")

    canvas_facet = FigureCanvasTkAgg(g.fig, master=canvas_frame)
    canvas_facet.draw()
    canvas_facet.get_tk_widget().pack(side=tk.TOP, fill=tk.BOTH, expand=True)

    canvas_frame.update_idletasks()  # Ensure the frame's size is updated
    scrollable_canvas.config(scrollregion=scrollable_canvas.bbox("all"))



<a id="Button-Functions"></a>
### Button Functions


In [14]:
# Button functions

def load_file_button_command():
    try:
        load_data()
        text_box.insert(tk.END, "Data loaded successfully.\n")
    except Exception as e:
        text_box.insert(tk.END, f"Error loading data: {e}\n")

def save_file_button_command():
    try:
        save_file()
        text_box.insert(tk.END, "Data saved successfully.\n")
    except Exception as e:
        text_box.insert(tk.END, f"Error saving file: {e}\n")

def clean_data_button_command():
    try:
        clean_data()
        text_box.insert(tk.END, "Data cleaned successfully.\n")
    except Exception as e:
        text_box.insert(tk.END, f"Error cleaning data: {e}\n")

def extract_data_button_command():
    try:
        extract_data()
        text_box.insert(tk.END, "Data extracted successfully.\n")
    except Exception as e:
        text_box.insert(tk.END, f"Error extracting data: {e}\n")

def calculate_stats_button_command():
    try:
        calculate_stats_and_preview()
        text_box.insert(tk.END, "Stats calculated and previewed successfully.\n")
    except Exception as e:
        text_box.insert(tk.END, f"Error calculating and previewing stats: {e}\n")

def create_json_button_command():
    try:
        create_json_and_preview()
        text_box.insert(tk.END, "JSON file created and previewed successfully.\n")
    except Exception as e:
        text_box.insert(tk.END, f"Error creating and previewing JSON: {e}\n")

def plot_clustered_bars_button_command():
    plot_label(dataframes[0], tab_parent)


def plot_correlation_button_command():
    p_values = calc_chi_square()
    # Create a new tab
    tab = ttk.Frame(tab_parent)
    tab_parent.add(tab, text='Chi-Square Results')
    # Display p-values in the new tab
    display_p_values(tab, p_values)
    tab_parent.select(tab)  # Select (bring to front) the new tab
    # Display a message in the dialog box indicating successful calculation
    text_box.insert(tk.END, "Correlation calculation complete. Results displayed in a new tab.\n")
    


<a id="Closing-the-App-Functions"></a>
### Closing the App Functions


In [15]:
# Closing the app functions
# These functions meet the client requirements to back up the the format along with the save function. 
# It does this by using the Tkinter filedialog module and allows users to specify a filename and location for saving. Pandas is used
# for the conversion to JSON

def on_closing():
    if messagebox.askokcancel("Quit", "Do you want to quit?"):
        create_backup()  # Backup current state
        root.destroy()

def create_backup():
    global dataframes
    if len(dataframes) == 0:
        text_box.insert(tk.END, "No data to backup.\n")
        return

    try:
        # Ask the user where to save the JSON file
        json_file_path = filedialog.asksaveasfilename(defaultextension=".json", filetypes=(("JSON files", "*.json"), ("All files", "*.*")))

        if json_file_path:
            # Create a copy of the global dataframe for JSON serialization
            df_json = dataframes[0].copy()

            # Convert 'DATE' column to string in ISO format in the copied dataframe
            df_json['DATE'] = df_json['DATE'].apply(lambda x: x.isoformat() if not pd.isnull(x) else '')

            # Save as JSON
            with open(json_file_path, 'w') as json_file:
                df_json.to_json(json_file, orient='records', lines=True)

            text_box.insert(tk.END, f"JSON backup created successfully at {json_file_path}.\n")
        else:
            text_box.insert(tk.END, "Backup operation cancelled.\n")

    except Exception as e:
        text_box.insert(tk.END, f"Error creating JSON backup: {e}\n")

<a id="GUI-and-Roots"></a>
### GUI and Roots


In [None]:
#GUI and root


# Initialize Tkinter root window
root = tk.Tk()

# Set the title of the window
root.title("DAB Data Management")

# root.protocol goes after on_closing is defined
root.protocol("WM_DELETE_WINDOW", on_closing)

# Initialize Notebook (tab manager)
tab_parent = ttk.Notebook(root)

# Create and pack text box
text_box = tk.Text(root)

# ttk modern theme
style = ttk.Style()
style.theme_use("clam")  # 'clam' is a modern-looking theme

# setting font for all buttons
button_font = ("Arial", 24, "bold")

# setting the color for each button
colors = ["#B200ED", "#B200ED", "#B200ED", "#0B6623", "#50C878", "#50C878", "#50C878", "#50C878", "#50C878"]

# Create buttons with their corresponding command functions
buttons = [
    ("LOAD DATA", load_file_button_command),
    ("CLEAN DATA", clean_data_button_command),
    ("SAVE DATA AS .json", save_file_button_command),
    ("EXTRACT DAB: C18A, C18F, C188", extract_data),
    ("(2)JOIN JSON DICT", create_json_button_command),
    ("(3)CALC MEAN/MEDIAN/MODE", calculate_stats_and_preview),
    ("(4)VISUALIZE", plot_clustered_bars_button_command),
    ("(5)CORRELATIONS", plot_correlation_button_command)
]

# Update the button style for all buttons
for i, (text, command) in enumerate(buttons):
    button_style = f'Accentbutton{i}.TButton'  
    style.configure(button_style, foreground='white', background=colors[i], font=button_font)
    button = ttk.Button(root, text=text, command=command, style=button_style)
    button.grid(row=i, column=0, padx=5, pady=5, sticky='ew')  

# Arrange text box and tabs using grid
text_box.grid(row=10, column=0, padx=5, pady=5, sticky='nswe')  # padx and pady add some padding around the text box
tab_parent.grid(row=0, column=1, rowspan=11, padx=5, pady=5, sticky='nswe')  # rowspan=11 makes the notebook span 11 rows

# Configure rows and columns for proper resizing
root.grid_rowconfigure(10, weight=1)  # The text box is now in row 10
root.grid_columnconfigure(1, weight=1)

# Enter the Tkinter main event loop
root.mainloop()



2023-08-18 00:16:23.886 Python[35615:2903071] +[CATransaction synchronize] called within transaction
2023-08-18 00:16:23.965 Python[35615:2903071] +[CATransaction synchronize] called within transaction


Loading file: /Users/jp/Downloads/TxAntennaDAB.csv...
File loaded successfully with encoding ascii
Loading file: /Users/jp/Downloads/TxParamsDAB.csv...
File loaded successfully with encoding ISO-8859-1
Files joined successfully on 'id' column
Preview of the joined dataframe:
        id         NGR  Longitude/Latitude  Site Height  In-Use Ae Ht
0  745392  NO76418994  002W23 24 57N00 00          325           230
1  745393  NJ76043299  002W24 01 57N23 12          245           138
2  745394  NJ98315700  002W01 48 57N36 11          225            35
3  745395  NJ94270253  002W05 46 57N06 49           87            41
4  745396  NS29181617  004W41 59 55N24 35          273            34
