**<p style="font-size: 35px; text-align: center">Exploratory Data Analysis UNODC Homicide</p>**



## ***<center>Data intake and initial processing</center>***

<hr/>

![homicide](https://i.imgur.com/UupWMeA.png)


<hr/>

## Table of contents <a name="home"/>
1. [Project settings](#settings)
2. [Description](#description)<br>
    2.1 [Data description](#data-description)
3. [Reading data](#reading)

<hr/>



## 1. Project Settings <a name="settings"/>[🏠](#home)

In [6]:
# --- Watermark --- #

%load_ext watermark
%watermark

# ----------------- #

The watermark extension is already loaded. To reload it, use:
  %reload_ext watermark
2019-03-21T15:44:19-05:00

CPython 3.7.0
IPython 6.5.0

compiler   : MSC v.1912 64 bit (AMD64)
system     : Windows
release    : 10
machine    : AMD64
processor  : Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
CPU cores  : 4
interpreter: 64bit


In [1]:
# --- Importing tools --- #

import pandas as pd

# ----------------------- #

<hr/>

## 2. Description <a name="description"/> [🏠](#home) 


In this first section of our Exploratory Data Analysis report, we will make a reading of the data that we will analyze and an initial processing of that data.

...

### 2.1 Data Description <a name="data-description">
It is important to talk about the dataset and the data contained in it, so, we can find 6 columns which will be our variables:

- **Country or Area:** The country or Area recorded.
- **Year:** Year of the record from 1995 to 2012.
- **Count:** This count means the amount of deaths per country in each year.
- **Rate:** This rate means the rate of deaths per 100 000 population.
- **Source:** It is the source which provides the information about the deaths.
- **Source Type:** It is the type of the source which provides the information about the deaths, could be provided by the "CJ: Criminal Justice" or by the "PH: Public Health"

For more information about the dataset description, see here: [http://data.un.org/Data.aspx?d=UNODC&f=tableCode%3a1#UNODC](http://data.un.org/Data.aspx?d=UNODC&f=tableCode%3a1#UNODC)

...

*****

## 3. Reading data <a name="reading"/> [🏠](#home)


We will read the data from the .csv file provided by the UNODC. We are reading it as a `pandas DataFrame`.

In [7]:
# Reading the .csv file with pandas, reading it as a pandas DataFrame
homicides = pd.read_csv("../data/UNdata_Export_20190321_150129401.csv")

With this done, we have now the .csv information as a `pandas DataFrame`, so we can work with it now.

In [8]:
# Looking for the shape of the Data Frame
homicides.shape

(1719, 6)

In this dataset originally provided by the UNODC we could find <mark>1719</mark> records for the <mark>6</mark> columns; let's take a look for the first 5 records.

In [9]:
# Showing the first 5 records from our DataFrame
homicides.head()

Unnamed: 0,Country or Area,Year,Count,Rate,Source,Source Type
0,Afghanistan,2008,712,2.4,WHO,PH
1,Albania,2010,127,4.0,CTS/Transmonee,CJ
2,Albania,2009,85,2.7,CTS/Transmonee,CJ
3,Albania,2008,93,2.9,CTS/Transmonee,CJ
4,Albania,2007,105,3.3,CTS/Transmonee,CJ


It is convenient to rename the columns and give them a more descriptive name; looking to avoid all possible mistakes or confusions.

In [13]:
# Renaming the dataset columns
homicides = homicides.rename(columns = {
    "Country or Area" : "country",
    "Year" : "year",
    "Count" : "amount", # Total amount of deaths per year per country
    "Rate" : "rate",
    "Source" : "source",
    "Source Type" : "sourceType"
})

In [14]:
homicides.head()

Unnamed: 0,country,year,amount,rate,source,sourceType
0,Afghanistan,2008,712,2.4,WHO,PH
1,Albania,2010,127,4.0,CTS/Transmonee,CJ
2,Albania,2009,85,2.7,CTS/Transmonee,CJ
3,Albania,2008,93,2.9,CTS/Transmonee,CJ
4,Albania,2007,105,3.3,CTS/Transmonee,CJ
