
<div id="singlestore-header" style="display: flex; background-color: rgba(209, 153, 255, 0.25); padding: 5px;">
    <div id="icon-image" style="width: 90px; height: 90px;">
        <img width="100%" height="100%" src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/notes.png" />
    </div>
    <div id="text" style="padding: 5px; margin-left: 10px;">
        <div id="badge" style="display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%">SingleStore Notebooks</div>
        <h1 style="font-weight: 500; margin: 8px 0 0 4px;">Database Performance Troubleshoot Notebook</h1>
    </div>
</div>

<table style="border: 0; border-spacing: 0; width: 100%; background-color: #03010D"><tr>
    <td style="padding: 0; margin: 0; background-color: #03010D; width: 33%; text-align: center"><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-vertical.png" style="height: 200px;"/></td>
    <td style="padding: 0; margin: 0; width: 66%; background-color: #03010D; text-align: right"><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-jupyter.png" style="height: 250px"/></td>
</tr></table>




## Intro

<p class="has-text-justified">
    Introducing a powerful Python script designed to ease performance analysis tasks for database management.
</p>
    
<ol>
    <li>This script loads query information from csv file exposed on public URL</li>
    <li>Executes SQL queries against selected database</li>
    <li>Exports results to searchable html tables and uploads archive of generated html files with index into stage area</li>
    <li>Handles Stage Area operations using singlestore python client which uses SingleStore Management API</li>
    <li>Simplifying complex tasks, this script is essential for streamlining workflows for administrators and developers alike</li>
</ol>


## What you will learn in this notebook:

1. How to read a csv and load data into pandas dataframes[Python] Download DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv file from url
2. Execute queries and export result into html files [Python]
4. Use of SingleStore client for db operations and stage area [Python]


## What benefits do you get out of using the notebook.

1. User will be able to run most used performance checks
2. Results are exported into HTML for better view
3. Along with analysis of known scenarios, script also provides background and possible actions to take



## Questions?

Reach out to us through our [forum](https://www.singlestore.com/forum).


### Pre-requisites

We will need below parameters to proceed.



<ol type="A">
    <li>SingleStore Management API KEY. Follow this <a href="https://docs.singlestore.com/cloud/reference/management-api/">link</a> for API Key </li>
    <li>Directory Path of Stage Area ( Target location to upload archive )</li>
    <li>URL to download csv file</li>
    <li>URL of result template directory</li>
</ol>

<p>
    Note: You may use the 
    <ul>
        <li><a href="https://s2-garageutils.s3.amazonaws.com/DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv">DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv</a> as template to add up your queries.</li>
    <li><a href="https://s2-garageutils.s3.amazonaws.com/templates">templates</a> as templates  for results</li>
    </ul>
</p>
<p>
    For simplicity of demo, here we are using a public accessible URL, you have to adapt access pattern to suit your needs.
</p>    

CSV File structure

<table class="table is-bordered is-narrow">
<th>
     <td>QueryID</td>
     <td>QueryName</td>
     <td>QueryTxt</td>
</th>
</table>


**Note** To enable logs

 - Modify 'set_logging_enabled(False)' to 'set_logging_enabled(True)' in code below




In [None]:
import io
import tarfile
import time
import logging
import getpass
import os

import pandas as pd
import singlestoredb as s2

from pathlib import Path
from urllib.request import urlopen
from urllib.error import HTTPError
from datetime import datetime

from IPython.display import display, HTML

query_data_url = "https://s2-garageutils.s3.amazonaws.com/DB_PERFORMANCE_TROUBLESHOOT_QUERIES.csv"
template_url_base = 'https://s2-garageutils.s3.amazonaws.com/templates/'
stage_folder_path = 'DBPERF-REPORT'

my_timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
my_db_conn_url = os.getenv('SINGLESTOREDB_URL')
database_name = my_db_conn_url[(my_db_conn_url.rfind('/') + 1):]
local_output_dir = database_name + '_' + my_timestamp + '_PERF_REPORT'

empty_result_table = '<p class="mb-3 mt-3" style="text-align:center;color:blue;">No Matching Records Found</p>'
result_table_html_classes = 'table table-striped table-bordered table-responsive my-2 px-2'

WORKGROUP_ID = os.getenv('SINGLESTOREDB_WORKSPACE_GROUP')

s2_workgroup_stage = None


def show_warn(warn_msg):
    """
    Display a warning message in a formatted HTML alert box.

    Parameters
    ----------
    warn_msg : str
        The warning message to display.
    """
    display(HTML(f'''<div class="alert alert-block alert-warning">
    <b class="fa fa-solid fa-exclamation-circle"></b>
    <div>
        <p><b>Action Required</b></p>
        <p>{warn_msg}</p>
    </div>
</div>'''))


def show_error(error_msg):
    """
    Display an error message in a formatted HTML alert box.

    Parameters
    ----------
    error_msg : str
        The error message to display.
    """
    display(HTML(f'''<div class="alert alert-block alert-danger">
    <b class="fa fa-solid fa-exclamation-triangle"></b>
    <div>
        <p><b>Error</b></p>
        <p>{error_msg}</p>
    </div>
</div>'''))


def show_success(success_msg):
    """
    Display a success message in a formatted HTML alert box.

    Parameters
    ----------
    success_msg : str
        The success message to display.
    """
    display(HTML(f'''<div class="alert alert-block alert-success">
    <b class="fa fa-solid fa-check-circle"></b>
    <div>
        <p><b>Success</b></p>
        <p>{success_msg}</p>
    </div>
</div>'''))


def execute_query(dbcon, query_txt):
    """
    Execute a SQL query on the specified database connection.

    Parameters
    ----------
    dbcon : connection
        The database connection object.
    query_txt : str
        The SQL query to execute.

    Returns
    -------
    list
        A list of rows returned by the query.
    """
    try:
        with dbcon.cursor() as cur:
            cur.execute(query_txt)
        return cur.fetchall()
    except Exception as e:
        logging.error(f"Failed to execute query: {e}")
        raise Exception('Failed to execute query')


def make_tarfile(output_filename, source_dir):
    """
    Create a tar.gz archive of a directory.

    Parameters
    ----------
    output_filename : str
        The name of the output archive file.
    source_dir : str
        The path to the directory to archive.

    Returns
    -------
    bool
        True if the archive was created successfully, False otherwise.
    """
    try:
        with tarfile.open(output_filename, "w:gz") as tar:
            tar.add(source_dir, arcname=os.path.basename(source_dir))
        time.sleep(2)
        return True
    except Exception as e:
        logging.error(f'Failed to create archive: {e}')
        raise Exception(f'Failed to create archive: {e}')


def generate_html_list(links):
    """
    Generate an HTML ordered list from a comma-separated list of links.

    Parameters
    ----------
    links : str
        A comma-separated list of links.

    Returns
    -------
    str
        The HTML formatted ordered list.
    """
    if 'nan' == links:
        return ''

    html_list = '<ol>'
    for item in links.split(','):
        html_list += f'<li><a href="{item}">{item}</a></li>'
    html_list += '</ol>'
    return html_list


def fetch_url_content(url):
    """
    Fetch the content of a URL.

    Parameters
    ----------
    url : str
        The URL to fetch.

    Returns
    -------
    str
        The content of the URL.
    """
    try:
        with urlopen(url) as response:
            if response.status == 200:
                my_bytes = response.read()
                file_content = my_bytes.decode("utf8")
                return file_content
    except HTTPError as e:
        logging.error(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')
        raise Exception(f'Failed to read {url} - HTTP error code: {e.code} reason: {e.reason}')


def load_query_data(url):
    """
    Load CSV data from a URL into a pandas DataFrame.

    Parameters
    ----------
    url : str
        The URL of the CSV file.

    Returns
    -------
    pandas.DataFrame
        The loaded DataFrame.
    """
    csv_file_content = fetch_url_content(url)
    csv_df = pd.read_csv(io.StringIO(csv_file_content), sep=",",
                         dtype={'QueryID': int, 'QueryName': str, 'QueryTxt': str, 'QueryParams': str})
    csv_df.sort_values(by=['QueryID'], inplace=True)
    return csv_df


def set_logging_enabled(enabled):
    """
    Set the logging level based on the enabled flag.

    Parameters
    ----------
    enabled : bool
        True to enable logging, False to disable it.
    """
    if enabled:
        logging.getLogger().setLevel(logging.INFO)
    else:
        logging.getLogger().setLevel(logging.CRITICAL)


def verify_stage_area():
    """
    Verify the existence and writability of a stage area.

    Returns
    -------
    bool
        True if the stage area is valid, False otherwise.
    """
    try:
        global s2_workgroup_stage
        my_workspace_mngr = s2.manage_workspaces(management_api_key)
        workspace_group = my_workspace_mngr.get_workspace_group(WORKGROUP_ID)
        stage_obj = workspace_group.stage.mkdir(stage_path=stage_folder_path, overwrite=False)
        logging.info(
            f'Stage Path {stage_folder_path} is ok. Is Directory: {stage_obj.is_dir()}. Is Writeable: {stage_obj.writable}')
        if stage_obj.is_dir() and stage_obj.writable:
            s2_workgroup_stage = workspace_group.stage
            logging.info(f'stage is valid: {s2_workgroup_stage is not None}')
            return True
        else:
            logging.error(f'As provided path is neither directory nor writable.')
            return False
    except Exception as stage_ex:
        logging.error(f'Stage Path Verification Failed. {stage_ex}')
        return False


def generate_stage_link(stg_path, curr_file_path):
    """
    Generate an HTML link to a stage area.

    Parameters
    ----------
    stg_path : str
        The path to the stage area.
    curr_file_path : str
        The current file path.

    Returns
    -------
    str
        The HTML formatted link.
    """
    url = f"https://portal.singlestore.com/organizations/{os.environ['SINGLESTOREDB_ORGANIZATION']}/workspaces/{os.environ['SINGLESTOREDB_WORKSPACE_GROUP']}#stage/{stg_path}"
    return f"""<div style=\"text-align:center;margin-top:5px; margin-bottom:5px;\">
                 File Uploaded to STAGE &nbsp;&nbsp;&nbsp;&nbsp; <a href='{url}'> {curr_file_path} </a>
               </div>"""


if __name__ == '__main__':

    if connection_url.endswith('/'):
        show_warn('Database not selected. Please select from dropdown in top of web page')
    else:
        management_api_key = getpass.getpass(prompt='Enter Single Store Cloud API KEY')
        execution_success = True
        final_file_path = None
        error_msg = None
        try:
            set_logging_enabled(False)
            if verify_stage_area():

                conn = s2.connect(results_type='dict')
                logging.info('Database Connection establised')
                queries_df = load_query_data(url=query_data_url)
                logging.info('Query Data loaded')

                path = Path(local_output_dir)
                path.mkdir(exist_ok=True)

                for idx, row in queries_df.astype(str).iterrows():
                    query_id = row['QueryID']
                    query_name = row['QueryName']
                    query = row['QueryTxt']

                    logging.debug(f'about to execute {query_name}')

                    try:
                        result = execute_query(conn, query)

                        logging.info(f"Fetched query ID: {query_id} NAME: {query_name}")
                        template = fetch_url_content(template_url_base + 'Result-' + str(query_id) + '.template.html')
                        if not result:
                            logging.warning(f"Query result is empty for query '{query_name}'")
                            final_content = template.replace('rstable', empty_result_table)

                            # display(HTML(final_content))

                        else:
                            result_df = pd.DataFrame(result)
                            # capitalize column names
                            result_df.columns = map(str.upper, result_df.columns)
                            result_table_id = 'rstbl'
                            result_table_content = result_df.to_html(table_id=result_table_id,
                                                                     index=False,
                                                                     classes=result_table_html_classes)

                            final_content = template.replace('rstable', result_table_content)

                            # display(HTML(final_content))

                        report_file = f'{local_output_dir}/{query_id}.html'

                        with open(report_file, 'w') as writer:
                            writer.write(final_content)

                    except Exception as e:
                        logging.error(f"Error executing query ID: {query_id}, NAME: {query_name}: {e}")
                        logging.exception("Exception details")
                        show_warn(f"Error executing query ID: {query_id}, NAME: {query_name}")

                    logging.info(f'process completed for ID:{query_id} Name:{query_name}')

                logging.info('Result Pages are generated')

                index_file = f'{local_output_dir}/index.html'

                index_file_content = fetch_url_content(template_url_base + 'index.template.html')

                with open(index_file, 'w') as writer:
                    writer.write(str(index_file_content))

                logging.info('Index Page are generated')

                zip_file_path = database_name + '_PERF_REPORT_' + my_timestamp + '.tar.gz'

                zip_success = make_tarfile(zip_file_path, local_output_dir)

                logging.info('archive created')

                if zip_success:
                    try:
                        uploaded_obj = s2_workgroup_stage.upload_file(local_path=zip_file_path,
                                                                      stage_path=f'{stage_folder_path}/{zip_file_path}')
                        logging.info(f'Upload success. Path: {uploaded_obj.abspath()} ')
                        print(f'File uploaded to STAGE AREA: {uploaded_obj.abspath()}')
                        logging.info('Upload success')
                        final_file_path = zip_file_path
                        os.remove(zip_file_path)
                        logging.info('Local archive file removed')
                        logging.info('about to clean previous generated files in local dir')
                        for root, dirs, files in os.walk(local_output_dir):
                            for file in files:
                                if file.endswith('.html'):
                                    os.remove(os.path.join(root, file))
                        os.rmdir(local_output_dir)
                        logging.info('Local files cleaned')
                    except Exception as e:
                        execution_success = False
                        logging.error(f'Failed during upload process{e}')
                        error_msg = 'File Upload failed'

                else:
                    logging.error('Failed to create archive')
                    execution_success = False
                    error_msg = 'Failed to create archive'

            else:
                logging.info("Stage Area Verification Failed. Exiting.")
                print('Script execution Failed')
                execution_success = False
                error_msg = 'Failed to create missing stage area path or it is not writeable'
        except Exception as e:
            execution_success = False
            logging.error(f"An error occurred: {e}")
            logging.exception("Exception details")
            error_msg = f'Exception occured. {str(e)}'

        if execution_success:
            show_success(generate_stage_link(stage_folder_path, final_file_path))
        else:
            show_error(error_msg)

        logging.info(f'Script execution completed sucessfully: {execution_success}')


**Important NOTE** 

 - Actions suggested suit most of performance improvement scenarios,Still we would encourage to test and verify before applying on prod environemnts

<div id="singlestore-footer" style="background-color: rgba(194, 193, 199, 0.25); height:2px; margin-bottom:10px"></div>
<div><img src="https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-grey.png" style="padding: 0px; margin: 0px; height: 24px"/></div>