### Dialing Hope: Examining the 988 Suicide Helpline in the United States
---------------------------------------------------------------------

###  By: Mohd Uwaish
----------------------------------------------------------------------
-  [Blog](https://www.digitaldankeschoen.com/)
- [GitHub](https://github.com/mohdUwaish59/)
-  [LinkedIn](https://www.linkedin.com/in/mohd-uwaish-72b779282/)  
-  [Xing](https://www.xing.com/profile/Mohd_Uwaish/cv)



### Datasets:
--------------------------------
### (1) [988Lifeline](https://988lifeline.org/our-network/)
### (2) [Global Data Lab](https://globaldatalab.org/shdi/table/shdi/USA/)

**Mental health in the US and the 988 line**

The USA is in the throes of a mental health crisis, and has been since the start of the 21st century.  With a 30% increase in the nationwide suicide rate since 1999, and steadily increasing rates the last decade in 49 out of the 50 states in the country, it is difficult to overstate the impact of declining mental health on American society. In a great many studies of the phenomenon across the last two decades, often overlooked is the disregard of mental health issues faced by BIPOC and LGBTQ+ people in particular. More recently however, decentralised, community-centric efforts have been made to address these inequities.   

It did not help this cause that a pandemic that forced staying insular and made less human interactions public policy descended upon the whole world. COVID-19 aggravated the prevailing issue of inequities while addressing the mental health crisis; the above mentioned communities experienced heightened rates of suicide while the majority white Caucasian and older people saw lower rates as compared to the previous year. Young people aged between 5 and 24 also saw increased rates, with, again, people from marginalised communities seeing disproportionately higher deaths from suicide in 2021.  

In the midst of this fresh predicament, the US government unveiled a new suicide helpline, 988, with the aim of alleviating the country’s mental health struggles in the long term through timely, appropriate support and intervention. This 3-digit dialing code was proposed by Congress in 2020 in place of the then-existing National Suicide Prevention Lifeline. This new Suicide and Crisis Lifeline, a network of 200+ crisis centres across the 50 states and other US-held territories, promised 24/7 service via a toll-free hotline.   

However, criticism followed shortly, with detractors pointing to lived experiences of those who had tried to utilise the service being redirected often to the police, followed by detainment and/or involuntary psychiatric treatment. The administrators of the helpline, Vibrant, along with government representatives, made it a point to draw attention to the fact that only 2% of all calls in 2021 were classified as those needing emergency intervention of any kind.  

The aim of this study is to draw out KPIs from the given call data from the month of July in 2021 till July 2022. Not withstanding the criticisms of such a support system, regardless of their validity, the need for it cannot be a matter of contention. In fact, analyses of this system and studying its pitfalls may also benefit decentralised efforts toward more intimate, community-based support in the long run. I wish to carry out this study to broaden our own understanding of the benefits and pitfalls of this system, and possibly serve as a cornerstone in any analyses of other similar crisis intervention systems across the world.   


Finally, a note: I understand that the issue being discussed is of a sensitive nature, and frivolous and unnecessary comparisons have been sought out and avoided to the best of our abilities.


---------------------------------------
**Suicide rate for Year 2021 (Jul-Dec)**
---------------------------------------
---------------------------------------

In [52]:
import pandas as pd
import plotly.express as px
import plotly.graph_objs as go


data = pd.read_csv('CSV_DATA/SUICIDE/suicide_data_2021.csv',delimiter=';')

# Find the state with the highest and lowest suicide rates
state_with_highest_rate = data.loc[data['RATE'].idxmax()]
state_with_lowest_rate = data.loc[data['RATE'].idxmin()]

# Create the choropleth map without GeoJSON
fig = px.choropleth(
    data_frame=data,
    locations='State',  # Assuming your state column is named 'State'
    locationmode='USA-states',  # Specify the location mode for US states
    color='RATE',  # Use the 'RATE' column for coloring
    color_continuous_scale="viridis",  # Choose your desired color scale
    scope="usa",  # Set the map scope to USA
    title='Suicide Rates by State in the USA (Jul 2021 - Dec 2021)',
    hover_data=['YEAR', 'DEATHS', 'URL']  # Columns to display in hover text
)

# Create custom 3D markers for the states with the highest and lowest rates
marker_size = 20  # Adjust the marker size as needed
fig.add_trace(go.Scattergeo(
    locations=[state_with_highest_rate['State'], state_with_lowest_rate['State']],
    locationmode='USA-states',
    #text=[f"Highest Rate ({state_with_highest_rate['RATE']})",
          #f"Lowest Rate ({state_with_lowest_rate['RATE']})"],
    marker=dict(
        size=[marker_size, marker_size],
        symbol=['circle-x-open', 'triangle-up'],  # Choose custom marker symbols
        color=['purple', 'red'],
        opacity=0.7,  # Adjust marker opacity
        line=dict(width=2, color='black'),  # Marker border
    ),
    mode='markers+text',
    hoverinfo='text',
))

# Add a legend
legend_title = 'Legend'
custom_markers = [
    {'label': f'{state_with_highest_rate["State"]} - Highest Rate ({state_with_highest_rate["RATE"]})', 'symbol': 'circle-x-open', 'color': 'purple'},
    {'label': f'{state_with_lowest_rate["State"]} - Lowest Rate ({state_with_lowest_rate["RATE"]})', 'symbol': 'triangle-up', 'color': 'red'}
]

# Create the legend items and add them to the figure
for marker_info in custom_markers:
    legend_marker = go.Scattergeo(
        lon=[None],  # Set the longitude to None for a legend item
        lat=[None],  # Set the latitude to None for a legend item
        mode='markers',
        marker=dict(
            size=10,
            symbol=marker_info['symbol'],
            color=marker_info['color'],
            opacity=1,
            line=dict(width=2, color='black'),
        ),
        name=marker_info['label'],  # Legend label
    )
    fig.add_trace(legend_marker)

# Set legend title and labels, and position the legend to the left
fig.update_layout(
    legend=dict(
        #title_text=legend_title,
        traceorder='normal',
        x=-0.5,  # Set x to 0 to move the legend to the left
        y=0.5,  # Adjust y to position the legend vertically
    ),
)

fig.show()


The analysis of suicide rates reveals a stark contrast between states, with Montana and Wyoming standing out as having the highest rates, while New York, Illinois and California report the lowest rates. 
This disparity underscores the regional variations in suicide risk within the United States. Montana and Wyoming, both characterized by vast rural areas, face unique challenges related to mental health access, isolation, and economic factors, which may contribute to their elevated rates. Conversely, New York, New Jersey and California, with their more densely populated urban centers and robust healthcare systems, appear to exhibit lower suicide rates.


---------------------------------------
**Suicide rate for Year 2022 (Jan-Jul)**
---------------------------------------
---------------------------------------


In [53]:
import pandas as pd
import plotly.graph_objects as go

# Load the data from the Excel file
excel_file = "CSV_DATA/SUICIDE/suicide_data_2022.csv"  # Replace with the actual path to your Excel file
df = pd.read_csv(excel_file, delimiter=';')

# Filter data for months from January to July
months_to_include = ["January", "February", "March", "April", "May", "June", "July"]
df = df[df["Month"].isin(months_to_include)]

# Find the month with the maximum and minimum rate
max_rate_month = df[df["RATE"] == df["RATE"].max()]["Month"].iloc[0]
min_rate_month = df[df["RATE"] == df["RATE"].min()]["Month"].iloc[0]

# Create a stacked bar chart
fig = go.Figure()

# Create traces for each year
for year in df["YEAR"].unique():
    year_data = df[df["YEAR"] == year]
    fig.add_trace(go.Bar(
        x=year_data["Month"],
        y=year_data["RATE"],
        text=[f"{rate:.2f}" for rate in year_data["RATE"]],  # Format as two decimal places
        textposition="outside",  # Display text outside the bars
        name=str(year),
        marker_color='blue'  # Default color
    ))

# Find the indices of max and min rate months
max_rate_index = df[df["Month"] == max_rate_month].index[0]
min_rate_index = df[df["Month"] == min_rate_month].index[0]

# Change the color of max and min rate bars
fig.data[0].marker.color = ['yellow' if i == max_rate_index else 'green' if i == min_rate_index else 'skyblue' for i in range(len(df))]

# Update layout
fig.update_layout(
    barmode='stack',
    title="Stacked Bar Chart of Suicide Rates by Month (Jan to July, 2022)",
    xaxis_title="Month",
    yaxis_title="Suicide Rate",
    showlegend=True
)

# Show the plot
fig.show()


---------------------------------------
**States-wise HDI vs suicide rates**
---------------------------------------
---------------------------------------

The Human Development Index (HDI) is a widely recognized metric that serves as a measure of overall development and well-being in societies worldwide. It encompasses three key dimensions: health, education, and standard of living. 
By combining these components, the HDI offers a comprehensive snapshot of a nation's development status, making it a valuable tool for policymakers, researchers, and organizations seeking to assess and improve the well-being of populations around the world.

Examining the relationship between the Human Development Index (HDI) and suicide rates is a crucial aspect of the research because it delves into the multifaceted dynamics of societal well-being and its impact on mental health outcomes. By investigating this relationship, I aim to uncover critical insights into the factors that contribute to suicide rates across different states. 


In [54]:
import pandas as pd
import plotly.express as px

# Load the data into a DataFrame named 'df'
df = pd.read_csv("CSV_DATA/HDI/HDI_vs_RATE.csv", delimiter=';')

# Create a bubble plot with Plotly Express
fig = px.scatter(
    df,
    x='Value',      # HDI column name
    y='RATE',       # Suicide Rate column name
    #text='STATE',   # State column name for hover text
    labels={'Value': 'HDI (Human Development Index)', 'RATE': 'Suicide Rate'},
    title='State-wise HDI vs Suicide Rate (Bubble Plot)',
    size='RATE',    # Size of the bubble based on suicide rate
    color='STATE'   # Color each bubble by state
)

# Customize the appearance of the plot
fig.update_traces(marker=dict(opacity=0.7),
                  textposition='top center')

# Show the plot
fig.show()

The disparity in suicide rates among high HDI (Human Development Index) states in the USA, with states like New York(HDI =0.93), New Jersey(HDI = =0.94), and Massachusetts(HDI = 0.94) showing lower rates, compared to states like Montana, Alaska and Wyoming with higher rates, is a complex observation. While high HDI states generally benefit from better access to healthcare, education, and economic opportunities, this result highlights that the relationship between development and suicide risk is complicated. 

Several factors may contribute to these variations, including regional demographics, social support systems, and mental health service availability. In high HDI states with lower suicide rates, robust mental health infrastructure and supportive communities may contribute to better outcomes. Conversely, in high HDI states with higher suicide rates, other factors such as isolation, improper mental health infrastructure or unique regional stressors may play a role in elevating the risk. This result states the importance of recognizing that high development levels alone may not fully mitigate suicide risk, and that targeted, region-specific suicide prevention strategies and resources are essential to address the complex factors influencing suicide rates within high HDI states.


---------------------------------------
**Suicide helpline answer rates vs suicide rates (month wise)**
---------------------------------------
---------------------------------------

Suicide helplines play an indispensable role in suicide prevention efforts, providing a lifeline for individuals in crisis. They play a crucial role in bridging the gap between those in need and professional mental health services. They contribute to reducing the overall suicide rate by providing support, and assistance to those who need it most.
In the context of suicide prevention, these helplines are indispensable, providing immediate help, hope for those in despair, and their role remains pivotal in saving lives and reducing the devastating impact of suicide on individuals and communities.

In [55]:
import ipywidgets as widgets
from IPython.display import display, clear_output
import pandas as pd
import plotly.express as px
import os

# Directory where your monthly data files are located
data_directory = "CSV_DATA/Suicide helpline answer rates vs suicide rates"

# List of file names for each month (assuming you have data for multiple months)
file_names = os.listdir(data_directory)

# Dropdown widget for file selection
file_dropdown = widgets.Dropdown(options=file_names, description='Select File:')

# Output area for displaying graphs
output_area = widgets.Output()

# Initialize the figures as None
fig = None

# Function to generate Plotly graphs
def generate_graph(selected_file):
    global fig  # Access the global figure
    
    # Construct the full file path
    file_path = os.path.join(data_directory, selected_file)
    
    # Read the dataset (assuming Excel format)
    df = pd.read_csv(file_path, delimiter=';')

    # Create a scatter plot for "In-State Answer Rate" vs. "RATE" with hover text as state names
    fig = px.scatter(df, 
                     x='In-State Answer Rate',
                     y='RATE', 
                     color='In-State Answer Rate',
                     color_continuous_scale="viridis",
                     title=f'Suicide Helpline Answer Rate vs. Suicide Rates ({selected_file})',
                     hover_name='State'  # Include state names in hover text
                    )
    fig.update_traces(marker=dict(size=12))
    
    # Clear the previous plot and display the new one
    with output_area:
        clear_output()
        fig.show()

# Event handler for dropdown change
def handle_dropdown_change(change):
    selected_file = change.new
    generate_graph(selected_file)

# Link the dropdown widget to the event handler
file_dropdown.observe(handle_dropdown_change, names='value')

# Display widgets
display(file_dropdown)
display(output_area)


Dropdown(description='Select File:', options=('AUGUST-2021.csv', 'DECEMBER-2021.csv', 'JULY-2021.csv', 'NOVEMB…

Output()

The states like Wyoming (WY) and Alaska (AK) exhibit lower suicide helpline answer rates alongside higher suicide rates, showing a concerning pattern in these regions. The lower answer rates suggest that individuals in these states may face challenges in accessing immediate crisis intervention, potentially contributing to the elevated suicide rates. To improve suicide helpline answer rates in the states with higher suicide rates, such as Wyoming and Alaska, a comprehensive approach is necessary. This includes securing increased funding and resources to support helpline capacity, recruiting and training more volunteers to handle crisis calls effectively, ensuring 24/7 availability of services and providing cultural competency training to helpline staff to better address the unique needs of diverse populations.

---------------------------------------
**ASA vs abandoned calls**
---------------------------------------
---------------------------------------

In the context of helplines and crisis intervention, the Average Speed of Answer (ASA) holds profound significance. ASA measures the amount of time a caller must wait before their call is answered by a trained professional or volunteer. Abandoned Call Rate is a measure of the percentage of calls that are disconnected by callers before they can be answered by a helpline operator.
These metrics directly reflect the accessibility and efficiency of the helpline's response system. These metrics are essential for optimizing helpline performance and ensuring that individuals in distress receive the support they need promptly. A shorter ASA is vital because it signifies swift access to critical support for individuals in distress.


In [56]:
import pandas as pd
import plotly.express as px
import os
import ipywidgets as widgets
from IPython.display import display, clear_output, HTML

# Directory containing the monthly datasets
data_dir = "CSV_DATA/DATA"

# List of files in the directory
file_list = os.listdir(data_dir)

# Dropdown widget for selecting a dataset
dataset_dropdown = widgets.Dropdown(
    options=file_list,
    description='Select Dataset:',
    disabled=False,
)

# Output area for displaying the scatter plot
output_area = widgets.Output()

# Function to generate and display the scatter plot based on user selection
def generate_scatter_plot(change):
    with output_area:
        clear_output()  # Clear the output area
        file_name = change.new
        file_path = os.path.join(data_dir, file_name)

        # Read the dataset
        df = pd.read_csv(file_path, delimiter=';')

        # Check for missing values in 'ASA In-State' and 'Abandoned In-State' columns
        if df['ASA In-State'].isnull().any() or df['Abandoned In-State'].isnull().any():
            raise ValueError(f"There are missing values in 'ASA In-State' or 'Abandoned In-State' columns in {file_name}.")

        # Prepare data
        #data['DEATHS'] = data['DEATHS'].str.replace(',', '').astype(int)
        df['Abandoned In-State'] = df['Abandoned In-State'].str.replace(',', '').astype(int)  # Ensure it's in integer format
        df['ASA In-State'] = df['ASA In-State'].str.split(':').apply(lambda x: int(x[0]) * 60 + int(x[1]))

        # Create an interactive scatter plot with hover functionality using Plotly Express
        fig = px.scatter(
            df,
            x='ASA In-State',
            y='Abandoned In-State',  # Use 'Abandoned In-State' for the y-axis
            color='State',  # Use 'State' column for color
            color_discrete_sequence=px.colors.qualitative.Set1,  # Set a color sequence
            labels={'ASA In-State': 'Average Speed of Answer (Minutes)', 'Abandoned In-State': 'Abandoned Call Count'},
            title=f'State Trends for Abandoned Calls vs Average Speed of Answer - {file_name}'
        )

        # Customize hover text
        hover_text = [f"State: {state}<br>ASA: {asa} minutes<br>Abandoned Calls: {abandoned}" for state, asa, abandoned in
                      zip(df['State'], df['ASA In-State'], df['Abandoned In-State'])]
        fig.update_traces(text=hover_text, hoverinfo='text')

        # Show the plot
        fig.show()

# Attach the callback to the dropdown widget
dataset_dropdown.observe(generate_scatter_plot, names='value')

# Display the dropdown widget initially
display(dataset_dropdown)

# Display the output area for the scatter plot
display(output_area)


Dropdown(description='Select Dataset:', options=('APRIL-2022.csv', 'AUGUST-2021.csv', 'DECEMBER-2021.csv', 'FE…

Output()

The observed improvement in the Average Speed of Answer (ASA) for Virginia (VA) from August 2021 to February 2022 is a positive trend that suggests enhanced efficiency in addressing calls and reducing waiting times for individuals in crisis. This improvement may reflect investments in staffing, technology, or training that have enabled VA's helpline to respond more promptly to callers.

On the other hand, the states of Florida (FL), New York (NY), and California (CA) present a contrasting pattern with high abandoned call rates despite having ASA values that lie in the middle range. This result is notable as it indicates a potential issue with call handling capacity or other operational factors in these states. The high abandonment rate suggests that callers in FL, NY, and CA may be facing extended wait times or challenges in connecting with helpline operators, which could be a barrier to accessing critical support during their moments of crisis. Addressing this discrepancy between ASA and abandonment rates in these states is essential to ensure that individuals receive the timely assistance they need, emphasizing the importance of improving operational efficiency and resource allocation.

---------------------------------------
**Talktime vs number of calls**
---------------------------------------
---------------------------------------

The relationship between talk time and the number of calls handled by a helpline is a critical aspect of understanding the helpline's operational dynamics. 
When a helpline experiences an increased number of calls, it often leads to shorter talk times per call, as operators need to manage the volume efficiently. Conversely, during periods of lower call volume, operators may have more time to engage in longer, more in-depth conversations with callers. Finding the right equilibrium between these factors is a challenge faced by helpline administrators. 
Analyzing this relationship can offer insights into the helpline's capacity to manage varying call volumes while maintaining the quality of support provided, contributing to its overall effectiveness in crisis intervention.  
  

In [57]:
import ipywidgets as widgets
from IPython.display import display, clear_output
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import os

# Directory where your monthly data files are located
data_directory = "CSV_DATA/DATA"

# List of file names for each month (assuming you have data for multiple months)
file_names = os.listdir(data_directory)

# Dropdown widget
file_dropdown = widgets.Dropdown(options=file_names, description='Select File:')

# Output area for displaying graphs
output_area = widgets.Output()

# Initialize the figures as None
fig = None
scatter_fig = None

# Function to generate Plotly graphs
def generate_graphs(selected_file):
    global fig, scatter_fig  # Access the global figures
    
    # Construct the full file path
    file_path = os.path.join(data_directory, selected_file)
    
    # Read the dataset (assuming Excel format)
    df = pd.read_csv(file_path, delimiter=';')
    df['Received'] = df['Received'].str.replace(',', '').astype(int)
    df['Avg. Talk Time In-State'] = df['Avg. Talk Time In-State'].apply(lambda x: int(x.split(':')[0]) * 60 + int(x.split(':')[1]))

    # Calculate statistics for the "Received" column
    lowest_value = df['Received'].min()
    highest_value = df['Received'].max()
    mean_value = round(df['Received'].mean(), 1)
    state_with_highest_received = df.loc[df['Received'].idxmax(), 'State']
    state_with_lowest_received = df.loc[df['Received'].idxmin(), 'State']
    
    # Create bar graph for the statistics
    bar_fig = go.Figure()
    
    bar_fig.add_trace(go.Bar(
        x=['Lowest Number of Calls', 'Highest Number of Calls', 'Mean Number of Calls'],
        y=[lowest_value, highest_value, mean_value],
        text=[lowest_value, highest_value, mean_value],
        hovertext=['State with Lowest Calls: ' + state_with_lowest_received, 
                   'State with Highest Calls: ' + state_with_highest_received, 
                   'Mean Calls'],
    ))

    bar_fig.update_layout(
        title=f'Statistics for Number of Calls in {selected_file}',
        xaxis_title='Statistic',
        yaxis_title='Number of Calls',
        yaxis_type='log'
    )
    def convert_to_seconds(time_str):
        if isinstance(time_str, int):
            return time_str
        else:
            time_str = str(time_str)  # Convert to string if it's not already
            minutes, seconds = map(int, time_str.split(':'))
            return minutes * 60 + seconds
    
    # Create a Plotly scatter plot for Talktime vs. Number of Calls
    df['Avg. Talk Time In-State'] = df['Avg. Talk Time In-State'].apply(convert_to_seconds)
    #df['Minutes'] = df['Avg. Talk Time In-State'].apply(lambda x: int(x.split(':')[0]) * 60 + int(x.split(':')[1]))

    # Create a Plotly scatter plot for Talktime vs. Number of Calls
    scatter_fig = px.scatter(df,
                             x='Avg. Talk Time In-State',  # X-axis now represents minutes
                             y='Received', 
                             color='State',  # Use the State column for coloring
                             title=f'Talktime vs. Number of Calls for {selected_file}'
                            )
    scatter_fig.update_xaxes(title_text='Avg. Talk Time (Seconds)')  # Update the X-axis label
    scatter_fig.update_yaxes(title_text='Number of Calls')

    # Clear the previous plots and display the new ones
    with output_area:
        clear_output()
        bar_fig.show()
        scatter_fig.show()

# Event handler for dropdown change
def handle_dropdown_change(change):
    selected_file = change.new
    generate_graphs(selected_file)

# Link the dropdown widget to the event handler
file_dropdown.observe(handle_dropdown_change, names='value')

# Display widgets
display(file_dropdown)
display(output_area)

Dropdown(description='Select File:', options=('APRIL-2022.csv', 'AUGUST-2021.csv', 'DECEMBER-2021.csv', 'FEBRU…

Output()

**Key Findings:**  
  
  - Consistently low talk times observed in CA, AZ, FL, along with lack of improvement in NC over the year.  
    
**Challenges Identified:**  
  
  - Potential difficulties in managing call volumes effectively.
  - Limited time, operators can spend with each caller which would affect the quality of service.  
   
**Possible Causes:** 
  
  - Heavy call traffic in the mentioned states.
  - Existing resources may be stretched thin.  
    
**Recommended Actions:**

  - Investment in increased staffing and additional training.
  - Dedicating more resources and attention to identified states.
  - Need for adequate personnel and infrastructure.

---------------------------------------
**States-wise backup calls**
---------------------------------------
---------------------------------------

Backup calls are a vital resource in the context of helplines, serving as additional support during periods of high demand or surges in call volume. 
The concept of backup calls is crucial because it ensures that individuals in crisis do not face extended wait times, and their calls are not left unanswered. This mechanism allows helplines to effectively manage fluctuations in call volume, maintain responsiveness, and prevent callers from becoming discouraged or disconnected during peak periods. 
By offering immediate support or information, backup calls are a strategic component of a helpline's crisis intervention strategy, ensuring that help is accessible precisely when it is needed most.  
 


In [58]:
import os
import pandas as pd
import plotly.express as px
import ipywidgets as widgets
import plotly.graph_objects as go
from IPython.display import display, clear_output
from datetime import datetime

# Directory where your monthly data files are located
data_directory = "CSV_DATA/DATA"

# List of file names for each month (assuming you have data for multiple months)
file_names = os.listdir(data_directory)

# Create a dropdown widget for selecting the month
file_dropdown = widgets.Dropdown(
    options=file_names,
    description='Select File:',
)

# Output area for displaying plots
output_area = widgets.Output()

# Function to generate and display the choropleth map and the bar chart based on the selected month
def generate_choropleth(selected_file):
    file_path = os.path.join(data_directory, selected_file)
    df = pd.read_csv(file_path, delimiter=';')
    df['Flowout to Backup'] = df['Flowout to Backup'].str.replace(',', '').astype(int)

    # Extract the date from the selected file name (assuming the format "FINAL-YYYY-MM...")
    file_parts = selected_file.split("-")
    file_date = file_parts[1]  # Extract the date part (YYYY-MM)
    
    # Calculate statistics for the "Flowout to Backup" column
    lowest_value = df['Flowout to Backup'].min()
    highest_value = df['Flowout to Backup'].max()
    mean_value = round(df['Flowout to Backup'].mean(), 1)
    state_with_highest_received = df.loc[df['Flowout to Backup'].idxmax(), 'State']
    
    # Create a list of states with zero calls
    states_with_zero_calls = df.loc[df['Flowout to Backup'] == 0, 'State'].tolist()
    
    # Create bar graph for the statistics
    bar_fig = go.Figure()
    
    bar_fig.add_trace(go.Bar(
        x=['Lowest Number of Calls', 'Highest Number of Calls', 'Mean Number of Calls'],
        y=[lowest_value, highest_value, mean_value],
        text=[lowest_value, highest_value, mean_value],
        hovertext=['State with Lowest Calls: ' + state_with_highest_received, 
                   'State with Highest Calls: ' + state_with_highest_received, 
                   'Mean Calls'],
    ))

    bar_fig.update_layout(
        title=f'Statistics for Number of Calls in {selected_file}',
        xaxis_title='Statistic',
        yaxis_title='Number of Calls'
    )

    # Create a choropleth map for the month's data
    fig = px.choropleth(
        df,
        locations='State',
        locationmode='USA-states',
        color='Flowout to Backup',  # Use the "Flowout to Backup" column for coloring
        hover_name='State',
        hover_data=['Flowout to Backup'],  # Include the original data
        color_continuous_scale='plasma_r',
         #color_continuous_scale="plasma,inferno,magma,cividis,jet,viridis"
        scope='usa',
        title=f'States-wise Flowout to Backup for {selected_file}'
    )
     # Add a text annotation to the bar chart listing states with zero calls at the top with larger font size
    if states_with_zero_calls:
        zero_calls_text = "States with Zero Backup Call:\n" + "\n".join(states_with_zero_calls)
        bar_fig.add_trace(go.Scatter(
            x=['Lowest Number of Calls'],
            y=[highest_value * 1.2],  # Adjust the Y position to be above the highest bar
            text=[zero_calls_text],
            mode='text',
            textposition='top center',
            textfont=dict(size=14)  # Increase font size to make it bold
        ))
    else:
        zero_calls_text = "No States with Zero Backup Calls"
        bar_fig.add_trace(go.Scatter(
            x=['Lowest Number of Calls'],
            y=[highest_value * 1.2],  # Adjust the Y position to be above the highest bar
            text=[zero_calls_text],
            mode='text',
            textposition='top center',
            textfont=dict(size=14)  # Increase font size to make it bold
        ))

    # Display both the choropleth map and the bar chart
    with output_area:
        clear_output()
        display(fig, bar_fig)

# Event handler for dropdown change
def handle_dropdown_change(change):
    selected_file = change.new
    generate_choropleth(selected_file)

# Link the dropdown widget to the event handler
file_dropdown.observe(handle_dropdown_change, names='value')

# Display widgets
display(file_dropdown)
display(output_area)


Dropdown(description='Select File:', options=('APRIL-2022.csv', 'AUGUST-2021.csv', 'DECEMBER-2021.csv', 'FEBRU…

Output()

**Key Findings:**  
    
  - TX, IL, and NY consistently show higher flow-out to backup calls.
   
**Interpretation:**  
  
  - High flow-out suggests redirection of calls to backup resources.
  - Indicate an inability to handle call volume effectively. 
  - No sufficient staff count in proportion to number of calls recieved.
    
**Recommended Actions:**  
To reduce reliance on backup calls.
  - Allocation of better resource in these states.
  - Enhanced investment in improvement of call management strategies.
     - Investment in:
       - Additional staffing.
       - Extensive Training programs.
       - Technology upgrades (Quick IVR systems, Call Queuing Algorithms)
       - Predictive Analytics to forecast peak time.

---------------------------------------
**States-wise talktime month by month**
---------------------------------------
--------------------------------------

Talk time in helpline interactions holds immense significance as it serves as a key indicator of the quality and depth of support provided to callers. Longer talk times often correlate with a more supportive response from helpline staff. Longer talk times also provide more time for helpline staff to assess the caller's emotional state and offer appropriate guidance and resources.

Ultimately, talk time reflects the helpline's commitment to providing effective support, ensuring that individuals in distress receive the attention and care they need during their critical moments of crisis.

To get better insights, lets's head into the Choropleth of the USA and bar graph chart.


In [59]:
import os
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import ipywidgets as widgets
from IPython.display import display
from ipywidgets import interactive_output

# Directory where your monthly data files are located
data_directory = "CSV_DATA/DATA_1"

# List of file names for each month (assuming you have data for multiple months)
file_names = os.listdir(data_directory)

# Create a dropdown widget for selecting the month
month_dropdown = widgets.Dropdown(
    options=file_names,
    description='Select Month:',
)

# Function to generate and display the choropleth map and bar chart based on the selected month
def generate_choropleth(month):
    file_path = os.path.join(data_directory, month)
    df = pd.read_csv(file_path, delimiter=';')
    

    # Extract the "Avg. Talk Time In-State" column and add a new column for the month
    month_name = month.split("-")[2]
    df['Month'] = month_name[0:1]

    # Convert "Avg. Talk Time In-State" to total minutes as an integer
    df['Seconds'] = df['Avg. Talk Time In-State'].apply(lambda x: int(x.split(':')[0]) * 60 + int(x.split(':')[1]))

    # Calculate the lowest, highest, and mean values
    lowest_value = df['Seconds'].min()
    highest_value = df['Seconds'].max()
    mean_value = df['Seconds'].mean()
    state_with_highest_received = df.loc[df['Seconds'].idxmax(), 'State']
    states_with_zero_calls = df.loc[df['Seconds'] == 0, 'State'].tolist()

    # Create bar graphs for lowest, highest, and mean values
    bar_fig = go.Figure()

    bar_fig.add_trace(go.Bar(
        x=['Lowest', 'Highest', 'Mean'],
        y=[lowest_value, highest_value, mean_value],
        text=[f"{lowest_value:.1f}", f"{highest_value:.1f}", f"{mean_value:.1f}"],  # Format the text with one decimal place
        hovertext=['State with Highest Average Talktime: ' + state_with_highest_received,
                    'State with Highest Average Talktime: ' + state_with_highest_received,
                    'Mean Calls'],
        textposition='auto',
    ))

    bar_fig.update_layout(
        title=f'Statistics for Avg. Talk Time In-State in {month[6:13]}',
        xaxis_title='Statistic',
        yaxis_title='Seconds',
    )

    # Create a choropleth map for the month's data
    choropleth_fig = px.choropleth(
        df,
        locations='State',
        locationmode='USA-states',
        color='Seconds',  # Use 'Minutes' column for coloring
        hover_name='State',  # Display state names on hover
        hover_data=['Avg. Talk Time In-State'],
        color_continuous_scale='cividis',  # Use the 'Cividis' color scale
        scope='usa',
        title=f'States-wise Avg. Talk Time In-State for {month[6:13]}'
    )

    # Set custom hovertemplate to display state names and 'Avg. Talk Time In-State'
    choropleth_fig.update_traces(
        hovertemplate='%{hovertext}<br>Avg. Talk Time In-State: %{customdata}',
        customdata=df['Avg. Talk Time In-State']  # Include 'Avg. Talk Time In-State' in customdata
    )

    # Add a text annotation to the bar chart listing states with zero calls at the top with larger font size
    if states_with_zero_calls:
        zero_calls_text = "States with Zero Avg. Talk Time:\n" + "\n".join(states_with_zero_calls)
        bar_fig.add_trace(go.Scatter(
            x=['Lowest Number of Calls'],
            y=[highest_value * 1.2],  # Adjust the Y position to be above the highest bar
            text=[zero_calls_text],
            mode='text',
            textposition='top left',
            textfont=dict(size=14)  # Increase font size to make it bold
        ))
    else:
        zero_calls_text = "No States with Zero Avg. Talk Time"
        bar_fig.add_trace(go.Scatter(
            x=['Lowest Value'],
            y=[highest_value * 1.2],  # Adjust the Y position to be above the highest bar
            text=[zero_calls_text],
            mode='text',
            textposition='top center',
            textfont=dict(size=14)  # Increase font size to make it bold
        ))

    # Display the choropleth map and bar chart
    display(choropleth_fig, bar_fig)

# Link the dropdown widget to the plot generation function using interactive_output
output = interactive_output(generate_choropleth, {'month': month_dropdown})

# Display the widgets and the output
display(month_dropdown, output)


Dropdown(description='Select Month:', options=('FINAL-2021-07_988-Monthly-State-Report.csv', 'FINAL-2021-08_98…

Output()

The graphs here reveal that in states like New York (NY), Illinois (IL), and Texas (TX), where there is high flow-out to backup calls also have high average talk times. While the high flow-out to backup calls indicates that backup resources are frequently deployed to manage the call volume effectively, the high average talk times suggest that helpline operators are dedicating substantial time and attention to each caller.
This indicates a commitment to providing comprehensive and empathetic support despite the challenges posed by heavy call traffic. It also highlights the dedication of helpline staff in these states to ensure that individuals in crisis receive the time and assistance they need. 

To further enhance these efforts, they could invest in increased staffing levels, ensuring that there are enough operators available to handle the demand effectively. This would help reduce the reliance on backup calls while allowing operators to allocate more time and attention to each caller. 

---------------------------------------
**Backup calls vs received calls**
---------------------------------------
---------------------------------------

The relationship between the number of calls answered by a helpline and backup call usage is dynamic and interconnected. These backup calls help bridge the gap between high demand and limited operator availability. As more backup calls are utilized, the helpline can manage a larger number of incoming calls effectively, thus increasing the total number of calls answered. By strategically deploying backup call resources, helplines can optimize their responsiveness and, ultimately, contribute to the effectiveness of their crisis intervention efforts.

In [60]:
import pandas as pd
import plotly.express as px
import ipywidgets as widgets
from IPython.display import display

# Load your datasets into a dictionary
datasets = {
    "JUL-2021": pd.read_csv("CSV_DATA/DATA_1/FINAL-2021-07_988-Monthly-State-Report.csv", delimiter=';'),
    "AUG-2021": pd.read_csv("CSV_DATA/DATA_1/FINAL-2021-08_988-Monthly-State-Report.csv", delimiter=';'),
    "SEP-2021": pd.read_csv("CSV_DATA/DATA_1/FINAL-2021-09_988-Monthly-State-Report.csv", delimiter=';'),
    "OCT-2021": pd.read_csv("CSV_DATA/DATA_1/FINAL-2021-10_988-Monthly-State-Report.csv", delimiter=';'),
    "NOV-2021": pd.read_csv("CSV_DATA/DATA_1/FINAL-2021-11_988-Monthly-State-Report.csv", delimiter=';'),
    "DEC-2021": pd.read_csv("CSV_DATA/DATA_1/FINAL-2021-12_988-Monthly-State-Report.csv", delimiter=';'),
    "JAN-2022": pd.read_csv("CSV_DATA/DATA_1/FINAL-2022-01_988-Monthly-State-Report.csv", delimiter=';'),
    "FEB-2022": pd.read_csv("CSV_DATA/DATA_1/FINAL-2022-02_988-Monthly-State-Report.csv", delimiter=';'),
    "MAR-2022": pd.read_csv("CSV_DATA/DATA_1/FINAL-2022-03_988-Monthly-State-Report.csv", delimiter=';'),
    "APR-2022": pd.read_csv("CSV_DATA/DATA_1/FINAL-2022-04_988-Monthly-State-Report.csv", delimiter=';'),
    "MAY-2022": pd.read_csv("CSV_DATA/DATA_1/FINAL-2022-05_988-Monthly-State-Report.csv", delimiter=';'),
    "JUN-2022": pd.read_csv("CSV_DATA/DATA_1/FINAL-2022-06_988-Monthly-State-Report.csv", delimiter=';'),
    "JUL-2022": pd.read_csv("CSV_DATA/DATA_1/FINAL-2022-07_988-Monthly-State-Report.csv", delimiter=';'),
}

# Create a function to generate the proportional map using Plotly
def generate_proportional_map(dataset_name):
    data = datasets.get(dataset_name)
    data['Flowout to Backup'] = data['Flowout to Backup'].str.replace(',', '').astype(int)
    data['Received'] = data['Received'].str.replace(',', '').astype(int)
    proportion = data['Flowout to Backup'] / data['Received']
    
    if data is None:
        print("Dataset not found.")
        return
    
    data['Proportion'] = data['Flowout to Backup'] / data['Received']*100

    # Create a choropleth map using Plotly Express
    fig = px.choropleth(
    data,
    locations="State",  # Column with state names
    locationmode="USA-states",  # Set location mode to USA-states
    color="Proportion",  # Color scale based on the proportion
    range_color=(0, 100),  # Specify the color range
    hover_name="State",  # Display state name on hover
    custom_data=["Flowout to Backup", "Received"],  # Custom data for hover
    title=f"Percentage of Received Calls gone to Backup - Statewise for {dataset_name}"
    )
    fig.update_geos(
        visible=False,
        resolution=110,
        showcoastlines=True,
        coastlinecolor="Black",
        showland=True,
        landcolor="white",
    )

    
    fig.update_traces(
    hovertemplate="<b>%{hovertext}</b><br>" +
                  "Flowout to Backup: %{customdata[0]}<br>" +
                  "Received: %{customdata[1]}<br>" +
                  "Proportion: %{z:.2f}"
)

    fig.show()
 

# Create a dropdown widget with dataset names as options
dataset_dropdown = widgets.Dropdown(
    options=list(datasets.keys()),
    description='Select Dataset:',
)

# Use the interact function to connect the dropdown to the map generation function
widgets.interact(generate_proportional_map, dataset_name=dataset_dropdown)


interactive(children=(Dropdown(description='Select Dataset:', options=('JUL-2021', 'AUG-2021', 'SEP-2021', 'OC…

<function __main__.generate_proportional_map(dataset_name)>

In [4]:
import pandas as pd
import folium
from geopy.geocoders import Nominatim
import ipywidgets as widgets
from IPython.display import display


# Load your datasets into a dictionary
datasets = {
    "JUL-2021": pd.read_csv("CSV_DATA/DATA_1/FINAL-2021-07_988-Monthly-State-Report.csv", delimiter=';'),
    "AUG-2021": pd.read_csv("CSV_DATA/DATA_1/FINAL-2021-08_988-Monthly-State-Report.csv", delimiter=';'),
    "SEP-2021": pd.read_csv("CSV_DATA/DATA_1/FINAL-2021-09_988-Monthly-State-Report.csv", delimiter=';'),
    "OCT-2021": pd.read_csv("CSV_DATA/DATA_1/FINAL-2021-10_988-Monthly-State-Report.csv", delimiter=';'),
    "NOV-2021": pd.read_csv("CSV_DATA/DATA_1/FINAL-2021-11_988-Monthly-State-Report.csv", delimiter=';'),
    "DEC-2021": pd.read_csv("CSV_DATA/DATA_1/FINAL-2021-12_988-Monthly-State-Report.csv", delimiter=';'),
    "JAN-2022": pd.read_csv("CSV_DATA/DATA_1/FINAL-2022-01_988-Monthly-State-Report.csv", delimiter=';'),
    "FEB-2022": pd.read_csv("CSV_DATA/DATA_1/FINAL-2022-02_988-Monthly-State-Report.csv", delimiter=';'),
    "MAR-2022": pd.read_csv("CSV_DATA/DATA_1/FINAL-2022-03_988-Monthly-State-Report.csv", delimiter=';'),
    "APR-2022": pd.read_csv("CSV_DATA/DATA_1/FINAL-2022-04_988-Monthly-State-Report.csv", delimiter=';'),
    "MAY-2022": pd.read_csv("CSV_DATA/DATA_1/FINAL-2022-05_988-Monthly-State-Report.csv", delimiter=';'),
    "JUN-2022": pd.read_csv("CSV_DATA/DATA_1/FINAL-2022-06_988-Monthly-State-Report.csv", delimiter=';'),
    "JUL-2022": pd.read_csv("CSV_DATA/DATA_1/FINAL-2022-07_988-Monthly-State-Report.csv", delimiter=';'),
    
    # Add more datasets as needed
}

# Initialize Geopy's Nominatim geocoder with a custom user agent
geolocator = Nominatim(user_agent="my_geocoder")

# Create a function to generate the proportional map
def generate_proportional_map(dataset_name):
    data = datasets.get(dataset_name)
    data['Flowout to Backup'] = data['Flowout to Backup'].str.replace(',', '').astype(int)
    data['Received'] = data['Received'].str.replace(',', '').astype(int)
    if data is None:
        print("Dataset not found.")
        return

    # Create a Folium map centered around the contiguous U.S.
    m = folium.Map(location=[37.0902, -95.7129], zoom_start=4)

    # Add proportional circles for each contiguous state
    for _, row in data.iterrows():
        state = row['State']
        proportion = row['Flowout to Backup'] / row['Received']

        # Use Geopy to fetch coordinates for the state or territory
        location = geolocator.geocode(state, exactly_one=True)

        if location:
            latitude = location.latitude
            longitude = location.longitude

            # Adjust the circle size based on the proportion
            radius = proportion * 20

            folium.CircleMarker(
                location=[latitude, longitude],
                radius=radius,
                color='red',
                fill=True,
                fill_color='red',
                fill_opacity=0.6,
                popup=f"{state}: {proportion:.2%}",
            ).add_to(m)

    # Display the map
    display(m)

# Create a dropdown widget with dataset names as options
dataset_dropdown = widgets.Dropdown(
    options=list(datasets.keys()),
    description='Select Dataset:',
)

# Use the interact function to connect the dropdown to the map generation function
widgets.interact(generate_proportional_map, dataset_name=dataset_dropdown)


interactive(children=(Dropdown(description='Select Dataset:', options=('JUL-2021', 'AUG-2021', 'SEP-2021', 'OC…

<function __main__.generate_proportional_map(dataset_name)>

As can be seen, Illinois(IL) has the highest rate of calls going to backup among all received call in mainland USA. this indicates a need for higher staffing in the main offices in the state.
The above visualization is slightly erroneous due to the difficulty in rendering it.

It was also found that U.S. territories outside the mainland, such as American Samoa (AS), Northern Mariana Islands (MP), Virgin Islands (VI), and Puerto Rico (PR), exhibit a high rate of backup calls compared to received calls in the Backup Calls vs. Received Calls graph is a significant observation. This suggests that these territories may face unique challenges in managing their call volumes and providing immediate crisis intervention. It's possible that these regions have limited access to trained helpline operators, or infrastructure constraints that result in a higher reliance on backup call resources.

To address this, it's crucial to prioritize investment in these territories, enhancing staffing, training, and technological support to ensure that individuals in crisis receive timely and comprehensive assistance.

---------------------------------------
**ASA vs Backup Calls**
---------------------------------------
---------------------------------------

In [61]:
import pandas as pd
import plotly.express as px
import os
import ipywidgets as widgets
from IPython.display import display, clear_output

# Directory containing the monthly datasets
data_dir = "CSV_DATA/DATA"

# List of files in the directory
file_list = os.listdir(data_dir)

# Dropdown widget for selecting a dataset
dataset_dropdown = widgets.Dropdown(
    options=file_list,
    description='Select Dataset:',
    disabled=False,
)

# Button to manually clear the output
clear_button = widgets.Button(description="Clear Output")

# Output widget for displaying the biplot
output = widgets.Output()

# Function to generate and display the biplot based on user selection
def generate_biplot(file_name):
    with output:
        clear_output(wait=True)  # Clear the output area without closing the dropdown
        file_path = os.path.join(data_dir, file_name)

        # Read the dataset
        df = pd.read_csv(file_path, delimiter=';')

        # Check for missing values in 'ASA In-State' and 'Abandoned In-State' columns
        if df['ASA In-State'].isnull().any() or df['Abandoned In-State'].isnull().any():
            raise ValueError(f"There are missing values in 'ASA In-State' or 'Abandoned In-State' columns in {file_name}.")

        # Prepare data
        df['ASA In-State'] = df['ASA In-State'].str.split(':').apply(lambda x: int(x[0]) * 60 + int(x[1]))

        # Create a biplot using Plotly Express
        fig = px.scatter(df, x='ASA In-State', y='Flowout to Backup', color='State',
                         labels={'ASA In-State': 'Average Speed of Answer (Seconds)', 'Flowout to Backup': 'Backup Calls'},
                         title=f'Biplot of ASA vs. Backup Calls - {file_name}',)

        # Show the plot
        fig.show()

# Define a callback function to generate the plot when the dropdown value changes
def on_dropdown_change(change):
    if change['type'] == 'change' and change['name'] == 'value':
        generate_biplot(change['new'])

# Attach the callback to the dropdown widget
dataset_dropdown.observe(on_dropdown_change)

# Function to manually clear the output
def clear_output_button(b):
    with output:
        clear_output()
        display(dataset_dropdown, clear_button, output)

# Attach the clear_output_button callback
clear_button.on_click(clear_output_button)

# Display the dropdown widget initially
display(dataset_dropdown, clear_button, output)

Dropdown(description='Select Dataset:', options=('APRIL-2022.csv', 'AUGUST-2021.csv', 'DECEMBER-2021.csv', 'FE…

Button(description='Clear Output', style=ButtonStyle())

Output()

It was observed that Florida (FL) and Minnesota (MN) have experienced a worsening trend in Average Speed of Answer (ASA) over a year while simultaneously witnessing a significant increase in backup calls, whereas Illinois (IL) has shown improvement. 

The ASA values in FL and MN coupled with surge in backup calls could indicate a growing demand for crisis helpline services in these states, potentially outpacing their capacity to respond promptly. This situation can strain the capacity of the helpline to provide immediate assistance. 

Conversely, IL's improved number of backup calls indicates enhanced call management and responsiveness to callers' needs. To optimize helpline operations, FL and MN may consider strategies such as increased staffing, enhanced training, and process improvements.

---------------------------------------
### References:
---------------------------------------
  - https://www.forbes.com/sites/anafaguy/2023/08/10/suicide-rate-reaches-all-time-high-in-2022-cdc-data-suggests/?sh=6a6d8d3d58b8  
  
  
 - https://www.nytimes.com/2023/07/13/well/mind/988-suicide-crisis-hotline.html  
  
  - https://www.npr.org/sections/health-shots/2022/08/11/1116769071/social-media-posts-warn-people-not-to-call-988-heres-what-you-need-to-know  
  
  - https://hdr.undp.org/data-center/human-development-index#/indicies/HDI


###  Thank you
----------------------------------------------------------------------

- [GitHub](https://github.com/mohdUwaish59/)
-  [LinkedIn](https://www.linkedin.com/in/mohd-uwaish-72b779282/)  
-  [Xing](https://www.xing.com/profile/Mohd_Uwaish/cv)

