<div class='alert alert-info'>
The story I wanted to convey was regarding the Increasing Suicide Rates in India. For this, I fetched my data from the Indian government's open data platform portal and the latest data available was for the year 2022. I didn't have a single dataset but a set of datasets that was available in the catalogue <b>Accidental Deaths & Suicides in India (ADSI) - 2022</b> link : https://www.data.gov.in/catalog/accidental-deaths-suicides-india-adsi-2022. The datasets that I used from the catalogue are as below:<br>
<ul>
<li>https://www.data.gov.in/resource/cause-wise-age-and-gender-distribution-suicides-during-2022 - ADSI_Table_2.0</li>
<li>https://www.data.gov.in/resource/year-wise-details-incidence-and-rate-suicides-all-india-2012-2022 - ADSI_Table_2.1</li>
<li>https://www.data.gov.in/resource/stateutscity-wise-incidence-and-rate-suicides-during-2022 - ADSI_Table_2.2</li>
<li>https://www.data.gov.in/resource/stateuts-wise-profession-wise-distribution-suicides-during-2022 - ADSI_Table_2.7</li>
<li>https://www.data.gov.in/resource/stateuts-wise-marital-status-wise-distribution-suicides-during-2022 - ADSI_Table_2.9</li>
<li>https://www.data.gov.in/resource/stateuts-wise-economic-status-wise-distribution-suicides-during-2022 - ADSI_Table_2.10</li>
<li>https://www.data.gov.in/resource/stateuts-wise-educational-status-wise-distribution-suicides-during-2022 - ADSI_Table_2.11</li>
</ul>
</div>

***

In [55]:
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px

`Getting the year wise statistics to provide a context of where the report will be headed. For this I will be using the year wise dataset that is available in the catalogue.`

In [56]:
year_wise_suicide_rate_df = pd.read_csv('data/ADSI_Table_2.1.csv')
year_wise_suicide_rate_df.rename(columns={'Total Number of Suicides - Col. (3)': 'number_of_suicides',
                                           'Projected Mid -Year Population (In Lakh) - Col. (4)': 'proj_population_in_lakhs',
                                            'Rate of Suicides - Col. (5) (Col.3/Col.4)' : 'suicide_rate'}, inplace=True)

In [57]:
fig = px.line(year_wise_suicide_rate_df, x='Year', y='suicide_rate', hover_data='number_of_suicides', markers=True)
fig.update_layout(title_text="Suicide Rates Over Time in India", xaxis_title="Year",
    yaxis_title='Suicide Rate Per Lakh Population', font_family="Rockwell", template = 'plotly_white+presentation')

fig.show()
# fig.write_image("timerange.svg")

`The further information required to deep dive into the year 2022 can be obtained by combining the state/UT(Union Territory) wise data.`

In [58]:
state_wise_suicide_rate_df = pd.read_csv('data/ADSI_Table_2.2.csv')

# pre-processing
state_wise_suicide_rate_df = state_wise_suicide_rate_df[~state_wise_suicide_rate_df['State/UT/Cities'].str.contains(
    'Total', case=False, na=False)]
state_wise_suicide_rate_df.drop(columns='Sl. No.', inplace=True)
state_wise_suicide_rate_df.columns = state_wise_suicide_rate_df.columns.str.replace(r" - Col\. \(\d+\)", "", regex=True)
state_wise_suicide_rate_df.rename(columns={'Rate of Suicides (Col.3/Col.5)' : 'Suicide Rate'}, inplace=True)

In [59]:
# fetch only the data for states and union territories and not cities.
state_wise_suicide_rate_df_sub = state_wise_suicide_rate_df.iloc[0:36]

# renaming the state names as otherwise they won't show up in the map.
state_wise_suicide_rate_df_sub.loc[state_wise_suicide_rate_df_sub['State/UT/Cities']=='Jammu and Kashmir', 'State/UT/Cities']='Jammu & Kashmir'
state_wise_suicide_rate_df_sub.loc[state_wise_suicide_rate_df_sub['State/UT/Cities']=='Delhi (UT)', 'State/UT/Cities']='Delhi'

# for this portion referred: https://stackoverflow.com/a/62976794/12924114
fig = px.choropleth(
    state_wise_suicide_rate_df_sub,
    geojson="https://gist.githubusercontent.com/jbrobst/56c13bbbf9d97d187fea01ca62ea5112/raw/e388c4cae20aa53cb5090210a42ebb9b765c0a36/india_states.geojson",
    featureidkey='properties.ST_NM',
    locations='State/UT/Cities',
    color='Suicide Rate',
    hover_data=['Suicide Rate', 'Number of Suicides'],
    title='Heat Map of Suicide Rates Across Indian States 2022 (Per Lakh Population)',
    color_continuous_scale='Reds'
)

fig.update_geos(fitbounds="locations", visible=False)
fig.update_layout(height=600, font_family="Rockwell", template='plotly_white+presentation', margin=dict(t=50, l=25, r=25, b=25))
fig.show()

`Using the same dataset to compare the suicide rates in few cities`

In [60]:
state_wise_suicide_rate_df_cities = state_wise_suicide_rate_df[state_wise_suicide_rate_df['State/UT/Cities'].isin(
    ['Bengaluru','Chennai', 'Delhi (city)', 'Hyderabad', 'Kolkata', 'Mumbai', 'Surat',
      'Kannur', 'Kollam', 'Kozhikode', 'Kochi', 'Malappuram', 'Thiruvananthapuram', 'Thrissur'])].copy()
kerala_cities = ['Kannur', 'Kollam', 'Kozhikode', 'Kochi', 'Malappuram', 'Thiruvananthapuram', 'Thrissur']
state_wise_suicide_rate_df_cities['State'] = state_wise_suicide_rate_df_cities['State/UT/Cities'].apply(lambda x: 'Kerala' if x in kerala_cities else 'Others')
state_wise_suicide_rate_df_cities.rename(columns={'State/UT/Cities': 'City'}, inplace=True)

fig = px.scatter(state_wise_suicide_rate_df_cities, x='City', y='Suicide Rate', color='State', size='Number of Suicides', hover_data={'Number of Suicides': True},
            title="Suicide Rates in Indian Cities 2022<br><sup><i>Comparing the suicide rates of different cities/towns of Kerala Vs few major cities of the country.</i></sup>")

fig.update_layout(height=600, yaxis_title='Suicide Rate per Lakh(100k) population',
    font_family="Rockwell", template='plotly_white+presentation')
fig.show()

`Now we try to figure out the major causes of suicide and compare how the data is distributed based on Gender and Age Group`

In [61]:
cause_wise_age_gender_df = pd.read_csv('data/ADSI_Table_2.0.csv')

# pre-processing
cause_wise_age_gender_df_sub = cause_wise_age_gender_df[['Cause', 'Total - Male', 'Total - Female', 'Total - Transgender']]
cause_wise_age_gender_df_sub = cause_wise_age_gender_df_sub[~cause_wise_age_gender_df_sub['Cause'].isin(
    ['Non Settlement of Marriage','Dowry Related Issues', 'Extra Marital Affairs', 'Divorce', 'Others', 'AIDS/STD', 'Cancer',
      'Paralysis', 'Insanity/ Mental Illness', 'Other Prolonged Illness', 'Total'])]
cause_wise_age_gender_df_sub['Cause'] = cause_wise_age_gender_df_sub['Cause'].str.replace(r'\(.*?\)', '', regex=True).str.strip()

cause_wise_age_gender_df_sub = pd.melt(cause_wise_age_gender_df_sub, id_vars=['Cause'], var_name='Gender', value_name='Count')
cause_wise_age_gender_df_sub['Gender'] = cause_wise_age_gender_df_sub['Gender'].str.extract(r'Total - (\w+)', expand=False)


In [62]:
cause_wise_age_gender_df_sub = cause_wise_age_gender_df_sub[~cause_wise_age_gender_df_sub['Cause'].isin(
    ['Other Causes', 'Causes Not Known'])]
fig = px.treemap(cause_wise_age_gender_df_sub, path=['Cause', 'Gender'], values='Count',
                 hover_data=['Gender', 'Count'], title="Major Reasons for Suicide and their distribution by Gender")
fig.update_layout(height=600, font_family="Rockwell", template='plotly_white+presentation', margin=dict(t=50, l=25, r=25, b=25))
fig.show()

<div class='alert alert-success'>
<h1>The above image is not rendered for some reason when converting the word file to pdf. Hence, I have added the word file as well to show how it fits in the entire report.</h1>
</div>

In [63]:
sub_df = cause_wise_age_gender_df.copy()
# pre-processing
sub_df = sub_df.drop(columns=['Sl. No.', 'Below 18 years - Total', '18 yrs.- Below 30 years - Total', '30 yrs.- Below 45 years - Total',
    '45 yrs.- Below 60 years - Total', '60 years & above - Total', 'Total - Male', 'Total - Female', 'Total - Transgender', 'Total - Total'])
sub_df = sub_df[~sub_df['Cause'].isin(['Total'])]

sub_df = pd.melt(sub_df, id_vars=['Cause'], var_name='Age group and Gender', value_name='Count')
sub_df[['Age group', 'Gender']] = sub_df['Age group and Gender'].str.split(' - ', expand=True)
sub_df.drop(columns='Age group and Gender', inplace= True)
sub_df['Cause'] = sub_df['Cause'].str.replace(r'\(.*?\)', '', regex=True).str.strip()

In [64]:
fig = px.bar(sub_df, x='Cause', y='Count', color='Age group', barmode='group',
    title="Suicide Causes by Age Group India 2022", labels={'Count': 'Number of Cases', 'Cause': 'Cause of Suicide'})

fig.update_layout(height = 700, xaxis={'categoryorder': 'total descending'}, font_family = "Rockwell",
    template = 'plotly_white+presentation')
fig.show()

`Comparing the data for the states Kerala and Sikkim with the rest of the country on different groups/sections of the society`

In [65]:
economic_status_wise_suicide_data_df = pd.read_csv('data/ADSI_Table_2.10.csv')

#fetching only states data
sub_df = economic_status_wise_suicide_data_df.iloc[0:28].copy()
sub_df.drop(columns=[
    'Sl. No.', 'less than Rs. 1 lakh - Total','Rs. 1 lakh & above - less than Rs.5 lakhs - Total',
    'Rs. 5 lakhs & above - less than Rs. 10 lakhs - Total', 'Rs. 10 lakhs and above - Total',
    'Total - Male', 'Total - Female', 'Total - Transgender', 'Total'], inplace=True)
sub_df = pd.melt(sub_df, id_vars=['State/UT'], var_name='income and gender', value_name='Count')
sub_df['State'] = sub_df['State/UT'].apply(lambda x: x if x in ['Kerala', 'Sikkim'] else 'Others')
sub_df[['Income group', 'Gender']] = sub_df['income and gender'].str.rsplit(' - ', n=1, expand=True)
sub_df.drop(columns=['State/UT', 'income and gender'], inplace=True)
sub_df = sub_df.groupby(['State', 'Income group', 'Gender']).mean().reset_index()

In [66]:
fig = px.bar(sub_df, x='Count', y='Income group', color='Gender', facet_col='State', barmode='relative',
            title="Suicide Counts by Income Groups 2022 <br><sup><i>Comparing the suicide counts based on income groups of Kerala and Sikkim Vs the rest of the country(avg) </i></sup>")

fig.update_layout(height=600, font_family="Rockwell", template = 'plotly_white+presentation')
fig.show()

In [67]:
education_status_wise_suicide_data_df = pd.read_csv('data/ADSI_Table_2.11.csv', encoding='ISO-8859-1')
sub_df = education_status_wise_suicide_data_df.iloc[0:28].copy()

# pre-processing
sub_df.drop(columns=[
    'Sl. No.', 'No Education - Total','Primary (up to class5th) - Total', 'Middle (up to class8th) - Total',
    'Diploma/Certificate/ ITI - Total', 'Matriculate/ Secondary (up to class10th) - Total', 'Hr. Secondary/ Intermediate/Pre-University (up to class12th) - Total',
    'Graduate and above - Total', 'Graduate and above - Total', 'Professionals (MBA) - Total', 'Status Not known - Total',
    'Total - Male', 'Total - Female', 'Total - Transgender', 'Total'], inplace=True)

sub_df = pd.melt(sub_df, id_vars=['State/UT'], var_name='education and gender', value_name='Count')
sub_df['State'] = sub_df['State/UT'].apply(lambda x: x if x in ['Kerala', 'Sikkim'] else 'Others')
sub_df[['Education Level', 'Gender']] = sub_df['education and gender'].str.split(' - ', expand=True)
sub_df.drop(columns=['State/UT', 'education and gender'], inplace=True)
sub_df = sub_df.groupby(['State', 'Education Level', 'Gender']).mean().reset_index()
sub_df['Education Level'] = sub_df['Education Level'].str.replace(r'\x96', ' ', regex=True).str.strip()

In [68]:
fig = px.bar(sub_df, x='Count', y='Education Level', color='Gender', 
             facet_col='State', barmode='relative',
             title="Suicide Counts by Education Level 2022 <br><sup><i>Comparing the suicide counts based on education levels of Kerala and Sikkim Vs the rest of the country(avg) </i></sup>")


fig.update_layout(height=600, font_family="Rockwell", template='plotly_white+presentation')
fig.show()

In [69]:
marital_status_wise_suicide_data_df = pd.read_csv('data/ADSI_Table_2.9.csv', encoding='ISO-8859-1')
sub_df = marital_status_wise_suicide_data_df.iloc[0:28].copy()

sub_df.drop(columns=[
    'Sl. No.', 'Un-Married - Total','Married - Total', 'Others - Total', 'Status Not Known - Total',
    'Widowed/Widower - Total', 'Divorcee - Total', 'Separated - Total',
    'Total - Male', 'Total - Female', 'Total - Transgender', 'Total'], inplace=True)

sub_df = pd.melt(sub_df, id_vars=['State/UT'], var_name='marital and gender', value_name='Count')
sub_df['State'] = sub_df['State/UT'].apply(lambda x: x if x in ['Kerala', 'Sikkim'] else 'Others')
sub_df[['Marital Status', 'Gender']] = sub_df['marital and gender'].str.split(' - ', expand=True)
sub_df.drop(columns=['State/UT', 'marital and gender'], inplace=True)
sub_df = sub_df.groupby(['State', 'Marital Status', 'Gender']).mean().reset_index()

In [70]:
fig = px.bar(sub_df, x='Count', y='Marital Status', color='Gender', 
             facet_col='State', barmode='relative',
             title="Suicide Counts by Marital Status 2022 <br><sup><i>Comparing the suicide counts based on marital status of Kerala and Sikkim Vs the rest of the country(avg) </i></sup>")

fig.update_layout(height=600, xaxis=dict(tickangle=90), xaxis2=dict(tickangle=90),
    xaxis3=dict(tickangle=90), font_family="Rockwell", template='plotly_white+presentation')
fig.show()

In [71]:
profession_wise_suicide_data_df = pd.read_csv('data/ADSI_Table_2.7.csv', encoding='ISO-8859-1')
sub_df = profession_wise_suicide_data_df.iloc[0:28].copy()
sub_df.fillna(0, inplace=True)

# pre-processing - had to remove lot of parent preofessions as we are considering professions under the parent group already.
sub_df.drop(columns=[
    'Sl. No.', 'House wife - Total', 'Professionals/Salaried Persons (Total) - Male', 'Professionals/Salaried Persons (Total) - Female',
    'Professionals/Salaried Persons (Total) - Transgender', 'Professionals/Salaried Persons (Total) - Total', 'Professionals/Salaried Persons (Total) - Male.1',
    'Professionals/Salaried Persons (Total) - Female.1', 'Professionals/Salaried Persons (Total) - Transgender.1', 'Professionals/Salaried Persons (Total) - Total.1',
    'Professionals/Salaried Persons - Government Servants(Central/UT Govt. Servants) - Total', 'Professionals/Salaried Persons - Government Servants(State Govt. Servants) - Total',
    'Professionals/Salaried Persons - Government Servants (Other Statutory Body) - Total', 'Professionals/Salaried Persons (Private Sector Enterprises) - Total',
    'Professionals/Salaried Persons (Public Sector Undertaking) - Total', 'Students - Total', 'Unemployed Persons - Total', 'Self-employed Persons(Total) - Total',
    'Self-employed Persons - Business (Total) - Total', 'Self-employed Persons - Business(Vendor) - Total', 'Self-employed Persons - Business(Tradesmen) - Total',
    'Self-employed Persons - Business(Other Business) - Total', 'Other Self-employed Persons - Total', 'Persons Engaged in Farming Sector (Total) - Male',
    'Persons Engaged in Farming Sector (Total) - Female', 'Persons Engaged in Farming Sector (Total) - Transgender', 'Persons Engaged in Farming Sector (Total) - Total',
    'Persons Engaged in Farming Sector - Farmers/Cultivators (Total) - Total',  'Farmers/Cultivators (Who Cultivate Their Own Land) - Total',  'Farmers/Cultivators (Who Cultivate On Lease Land) - Total',
    'Persons engaged in Farming Sector (Agricultural Laborers) - Total',  'Daily Wage Earner - Total', 'Retired Persons - Total', 'Retired Persons - Total.1',
    'Self-employed Persons(Total) - Male', 'Self-employed Persons(Total) - Female', 'Self-employed Persons(Total) - Transgender', 'Self-employed Persons - Business (Total) - Male',
    'Self-employed Persons - Business (Total) - Female', 'Self-employed Persons - Business (Total) - Transgender', 'Persons Engaged in Farming Sector - Farmers/Cultivators (Total) - Male',
    'Persons Engaged in Farming Sector - Farmers/Cultivators (Total) - Female', 'Persons Engaged in Farming Sector - Farmers/Cultivators (Total) - Transgender',
    'Total - Male', 'Total - Female', 'Total - Transgender', 'Total'], inplace=True)
sub_df.rename(columns={'Retired Persons - Male.1': 'Retired Persons (Other) - Male', 'Retired Persons - Female.1': 'Retired Persons (Other) - Female',
 'Retired Persons - Transgender.1': 'Retired Persons (Other) - Transgender'}, inplace=True)

sub_df = pd.melt(sub_df, id_vars=['State/UT'], var_name='profession and gender', value_name='Count')
sub_df['State'] = sub_df['State/UT'].apply(lambda x: x if x in ['Kerala', 'Sikkim'] else 'Others')
sub_df[['Profession', 'Gender']] = sub_df['profession and gender'].str.rsplit(' - ', n=1, expand=True)
sub_df.drop(columns=['State/UT', 'profession and gender'], inplace=True)
sub_df['Profession'] = sub_df['Profession'].apply(lambda x: x.split(' - ', 1)[-1] if ' - ' in x else x)
sub_df = sub_df.groupby(['State', 'Profession', 'Gender']).mean().reset_index()

In [72]:
fig = px.bar(sub_df, x='Count', y='Profession', color='Gender', 
             facet_col='State', barmode='relative',
             title="Suicide Counts by Profession 2022 <br><sup><i>Comparing the suicide counts based on profession for Kerala and Sikkim Vs the rest of the country(avg) </i></sup>")

fig.update_layout(height=600, font_family="Rockwell", template='plotly_white+presentation')
fig.show()