# Suicide rate in Brazil between 1985 ~ 2016: A statistical and sociological investigation

# Introduction

Emile Durkheim defines the suicide as "all death case that results of an act executed by victim herself [1]. That is, the person himself takes his life.
What we usually see in individuals who resort to this attitude is the psychological situation they are in. Failure to recover from a trauma, whether recent or not; the recent loss of a loved one are examples of factors that greatly influence the person in choosing to put an end to what he is living.

  The yellow September was created in 2014, in order to help prevent suicide through media campaigns, so that this issue brings the population closer, in order to think about the problem. After all, talking about this extreme act becomes more and more necessary, since the cases have been increasing in recent years.

  Depression has been identified as the great villain in this story. Therefore, the identification and treatment of this disease is of paramount importance, preferably in the early stages. However, due to the lack of information, people still treat this disease as a "freshness", ignoring its signs, and thus aggravating its consequences.
  
  Like homicide and traffic accident rates, suicide rates vary with age groups, sex and race. A first picture on the subject was provided by the work of Durkheim (2014) in the book The suicide. The author, in 1897, stated that the number of voluntary deaths varied according to the degree of integration and regulation of individuals in society. When comparing Catholics, Protestants and Jews, Durkheim concluded that the weakening of traditional ties (visible in the family, political society, religion) was an indicator of excessive individuation, of loss of cohesion, leading to an increase in voluntary deaths. On the other hand, too much social integration also led to suicide, as observed, for example, in the army, where the taste for impersonality, willingness to renounce, passive obedience, absolute submission and impersonalism developed. Durkheim also pointed out that suicide rates increased during periods of industrial or economic-financial crises. The reason for this would be the fact that in the anomic state, that is, of disturbance of the collective order, society leaves individual passions without restraint.

In [None]:
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import preprocessing
sns.set_style("dark")

In [None]:
color_gender = ['#F781D8', '#819FF7']

In [None]:
df = pd.read_csv('/kaggle/input/suicide-rates-overview-1985-to-2016/master.csv')
df.head()

In [None]:
df.info()

## Data dictionary

**Country**: Column representing countries (string)

**Year**: Column representing years (integer)

**Sex**: Column representing genres (string)

**Age**: Column representing ages (string)

**Suicides_no**: Column representing the number of suicides (integer)

**Population**: Column representing the population (integer)

**Suicides / 100k pop**: Column that represents the number of suicides per 100 thousand inhabitants (float)

**Country-year**: Column that represents (string)

**HDI for year**: Column that represents the HDI in each year (float)

**Gdp_for_year($)**: Column representing GDP in each year (string)

**Gdp_per_capita($)**: Column representing GDP per capita (float)

**Generation**: Column that represents the generation (string)

As we are dealing with the Brazilian case, we will limit our dataframe, selecting only "Brazil" in the "Country" column

In [None]:
df.country.unique()

In [None]:
df_Sri_Lanka = df[df['country']=='Sri Lanka'].copy()
df_Sri_Lanka.head()

Verifying missing datas

In [None]:
print('World------------')
display(df.isnull().sum())
print('Sri Lanka----------')
display(df_Sri_Lanka.isnull().sum())


## Suicide mean rate in Brazil X in the World

---
First, we will make a comparative analysis of the average number of suicides in Brazil x in the world. It is important to check the trend over time, in order to have a general diagnosis about the behavior of the variable.

Knowing the behavior of the variable, we can try to trace possible explanations for the phenomenon. Perhaps, data science is not the necessary tool for this, we could use other tools to explain the phenomenon.

In [None]:
#Take the world and Brazilian average in suicides
ages =df_Sri_Lanka.year.unique()
suicide_SriLanka_avg = df_Sri_Lanka.groupby('year')['suicides/100k pop'].mean()
suicide_world_avg = df.groupby('year')['suicides/100k pop'].mean()
gdp_avg_world = df.groupby('year')['gdp_per_capita ($)'].mean()
gdp_avg_brazil = df_Sri_Lanka.groupby('year')['gdp_per_capita ($)'].mean()

suicide_world_avg.drop(2016, inplace=True)

fig = plt.figure(figsize=(15,5))
#ax = sns.lineplot(x=ages,y=suicide_world_avg, label='World', color='blue')
ax = sns.lineplot(x=ages, y = suicide_SriLanka_avg, label='SriLanka', color='green')
plt.title('Average suicide over time (SriLanka x World)', fontsize=19)
plt.ylabel('Number of cases per 100 thousand people',fontsize=13);

The average suicide rate in SriLanka has been decreasing at a small rate. This average suicide rate was **47.00 per 100,000** inhabitants in 1985 to **27.5 per 100,000 inhabitants** in 2005. There is exponential growth in 1997

On the other hand, the global suicide rate has been decreasing over time, but it has not always been so. From 1985 to 1995 the number of suicides per 100 thousand inhabitants grew by about **115%**, but this soon changed, reaching a reduction of approximately **85.35%** in the numbers between 1995 to 2015, approximately **4.26%** per year. Over the 30 years analyzed, the reduction rate was **68.47%**, approximately **2.30%** per year.

We can observe that, the average of suicide has been decreasing since 1995 in the world, while Brazil continues with a growth (not very high) in that average. And to better understand this behavior, a study in the area of psychology and / or sociology would be more feasible to better understand this phenomenon and how Brazil and Brazilians are relating to it.

## Age group

---

To better understand who the people we are studying are, we can classify them by age group. The age group will give us a basic overview of who commits suicide. The objective here is to identify which age group has the highest representation among those who take their own lives, and to identify whether there has been any change in the profile over the period studied

In [None]:
tabela = pd.pivot_table(df_Sri_Lanka, values='suicides_no', index=['year'], columns=['age'])
column_order = ['5-14 years', '15-24 years', '25-34 years', '35-54 years', '55-74 years']
tabela = tabela.reindex(column_order, axis=1)
tabela.head(10)

5-14 years age group has increase in suicide cases from 1996-97-98

In [None]:
tabela2 = pd.pivot_table(df_Sri_Lanka, values ='suicides/100k pop',index=['year'],columns=['sex'])
tabela2.head()

In [None]:
tabela.plot.bar(stacked=True,figsize=(16,8))
plt.legend(title='Age')
plt.xlabel(' ')
plt.title(' Suicide by age group',fontsize=21);

In [None]:
df_Sri_Lanka['generation'].value_counts().sum

## About generations

**Generation X**: Generation X is an expression that refers to the generation born after the post-World War II baby boom. Although there is no agreement on the period it covers, it generally includes people born from the 1960s to the end of the 1970s. [2]

**Silent**: Silent Generation is a term used to refer to the population born between 1925 and 1942, namely during the Great Depression and World War II. [3]

**Millenials**: Generation Y, also called millennial generation, internet generation, or millennials is a concept in Sociology that refers to the court of those born after the early 1980s until, approximately, the end of the century. Some authors consider until 2005. [4]

**Boomers**: Baby boomers gives name to the generation of people born between the years 1946 and 1964. The expression can be freely translated as "baby boom". The term “explosion” is used here in the sense of “unbridled growth”, which has caused a demographic boom on the planet, especially in the United States. [5]

**GI Generation**: Greatest generation (in English, Greatest generation) is an expression coined by journalist and writer Tom Brokaw, in his book The Greatest Generation, to refer to the generation formed by individuals who grew up during the Great Depression (1929–39) in United States and later participated in the fighting of the Second World War (1939–45, with participation of the USA between 1941 and 1945), as well as by those who, remaining in the country, participated in the war effort in the so-called home front. [6]

**Generation Z**: Generation Z is the sociological definition for the generation of people born, on average, between the second half of the 1990s until the beginning of 2010. The theory most accepted by scholars is that it emerged as a succession to Generation Y, from end of 1982 (beginning of Echo Boom). Therefore, it is the generation that corresponds to the idealization and birth of the World Wide Web, created in 1990 by Tim Berners-Lee, and in the "boom" of the creation of modern technological devices. The great nuance of this generation is to zap, having several options, among television channels, internet, video games and smartphones. [7]

In [None]:
fig = plt.figure(figsize=(13,5))
sns.countplot('generation', order = df_Sri_Lanka['generation'].value_counts().index, data =df_Sri_Lanka)
plt.xlabel('Generations', fontsize=13)
plt.ylabel(' ')
plt.title('Suicides by generation',fontsize=21);

The age group with the highest number of suicides was 35 to 54 years old, with **36%**, followed by 25 to 34 years old from 25 to 34 years old (**26%**).


The Generation with the highest number of suicides recorded was Generation X (**23.65%**), followed by the Silent Generation (**22.04%**).

By understanding the age group, we can direct our message more efficiently, as we will focus on those most affected. How to do this? Intensifying campaigns and debates in the places where these people usually go.

By understanding the age group, we can direct our message more efficiently, as we will focus on those most affected. How to do this? Intensifying campaigns and debates in the places where these people usually go.

## Gender

---
Another interesting feature is the genre. Understanding the gender difference in suicide tendency can again help us to focus the message, in addition to understanding some social patterns


In [None]:
genders = df_Sri_Lanka.groupby('sex').suicides_no.sum() / df_Sri_Lanka.groupby('sex').suicides_no.sum().sum()

fig = plt.figure(figsize=(6,6))
plt.pie(genders, labels=['Women', 'Men'], colors = color_gender, autopct='%1.1f%%', shadow = True, startangle=90);

Men commit suicide **3** times more than women, and as we can see in the graph below, this pattern has been repeated over time.

In [None]:
tabela2.plot.bar(stacked=True, figsize=(15,5), color=color_gender)
plt.xlabel(' ')
plt.title('Gender over time', fontsize=19)
plt.ylabel('Number of suicides per 100 thousand people', fontsize=13);

In [None]:
tabela2.plot(stacked=True, figsize=(15,5), color=color_gender)
plt.xlabel(' ')
plt.title('Gender over time', fontsize=19)
plt.ylabel('Number of suicides per 100 thousand people', fontsize=13);

This graph doesn't tell us much, we need the normalized data to better observe

In [None]:

min_max_scaler = preprocessing.MinMaxScaler()
np_scaled = min_max_scaler.fit_transform(tabela2)
tabela2_normal = pd.DataFrame(np_scaled, columns=['Women', 'Men'], index=tabela2.index)
tabela2_normal.head()

In [None]:
tabela2_normal.plot(stacked=True, figsize=(15,5), color=color_gender)
plt.xlabel(' ')
plt.title('Gender over time (normalized data)', fontsize=19)
plt.ylabel('Number of suicides per 100 thousand people', fontsize=13);

Now it’s much better to observe and understand trends

## Age group by total number of suicides

In [None]:
women = df.groupby(['sex', 'age'])['suicides_no'].sum()[:6] # gender and age --> Number of suicides --> add and get the first 6
men = df.groupby(['sex', 'age'])['suicides_no'].sum()[6:] # gender and age --> Number of suicides --> add and get the last 6
w = [] # Women
m = [] # Men
wn = [] # Number of women
mn = [] # Number of men
for i in range(6):
  w.append(women.index[i][1].split(' ')[0])
  m.append(men.index[i][1].split(' ')[0])
  wn.append(women[i])
  mn.append(men[i])


In [None]:
fig = plt.figure(figsize=(10,5))
sns.barplot(x=w, y = wn, data=df_Sri_Lanka)
plt.title('Age group', fontsize=19);

In [None]:
fig = plt.figure(figsize=(10,5))
sns.barplot(x=m, y = mn, data=df_Sri_Lanka);

In [None]:
print(f'''
Total of men: {sum(mn)}
Total of women: {sum(wn)}
''')

Now that we know age and gender, we can target our message even better


## Economics data

---
Finally, economic analyzes are important factors, because with them we can see if some variables influence or make no difference in suicide.

In our data set, economic variables are **GDP, GDP per capita and HDI**


In [None]:
fig = plt.figure(figsize=(15,5))
sns.lineplot(x=ages, y =suicide_SriLanka_avg, color = 'green')
plt.title('Average suicide each year per 100 thousand inhabitants', fontsize=15)
plt.ylabel('Average suicide / 100k inhabitants', fontsize=13);

We can see here, that even with the GDP per capita increasing, the number of suicides per 100 thousand inhabitants has not reduced, showing that even with the country growing, the problem of suicide did not take another direction, and to explain this an economic and statistical analysis it may not be enough.

## Conclusion

Studying topics such as suicide helps us to understand the relationship of people with their own life, and to realize that it is not such a simple matter. Analyzing the global and national trend is very different from understanding what makes a person commit suicide, it is not just data that tells us something, it is not just the insights about data that tell us stories. Behind the theme, there is a science dedicated to studying suicide from a different perspective than **Data Science, Statistics and Economics**.

## Conclusion of data
Suicide rates:

The suicide rate in SriLanka kept decreasing over the period time till 1996 it rose exponentially in 1997 and then gradually decreased.
The suicide rate in the world has been decreasing since 1995 (**4.26%** per year)
Major victims:

Men (**75.6%**)
    
Age group 35-54 years old while 5-14 years age group has increase in suicide cases from 1996-97-98 

Generation X 

## References:

[1] De Vares S. F, (2007). The suicide problem in Émile Durkheim

[2] Carlson, Elwood (2008). The Lucky Few: Between the Greatest Generation and the Baby Boom.

[3] Time Magazine (1951). "The Younger Generation"

[4] Dimock, Michael (2019). Defining generations: Where Millennials end and Generation Z begins ». Pew Research Center.

[5] Willetts, David (2010). The Pinch: How the Baby Boomers Took their Children's Future and How They Can Give it Back, Atlantic.

[6] Brokaw, Tom (1998). The Greatest Generation, Random House.

[7] Sam Savage (2006). The Generation Z Connection: Teaching Information Literacy to the Newest Net Generation