# Introduction
Since the recent killings in America, a hype of racial discrimination is in the air. This is part of the reason why I chose this dataset. In addition, my objective is to visualize while answering some of the intriguing questions that relove around racism.

# Objective
I chose this dataset for a personal project. Using python and pandas, I will perform the EDA and necessary actions with the data to prepare it for Visualization in Tableau. 

# Important
I am not going to visualize anything here. But I'll post, at the end of this notebook, a dashboard with the many of insights. Therefore, I'll work only with the dataframes here.

In [None]:
import pandas as pd
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

Lets first load the data and see the initial view

In [None]:
file_path = '/kaggle/input/data-police-shootings/fatal-police-shootings-data.csv'
shootings = pd.read_csv(file_path, index_col = 'id')
shootings.head()

Alright, it looks good. But lets do the primary obligation i.e. Listing and correcting missing values

In [None]:
shootings.isnull().sum()

Since the integration point is **Race**, nulls are of no use to us. Furthermore, there are no features to predict race.
# Action on Race
Remove Data where Race is Null

In [None]:
shootings = shootings.loc[shootings.race.notnull()]

Lets have a look at missing values again

In [None]:
shootings.isnull().sum()

We still have missing values in few other columns.
However, it is obvious that none of these values can be determined using the given features.
So, lets see how much do we have to sacrifice

In [None]:
(round(shootings.isna().sum()/shootings.shape[0]*100, 2)).astype('str') + '%'

It looks like things are on our side. Barely 5% is going to be removed. So, it's a good move!
On another note, it could have been removed earlier but according to our goal, we needed to see the most important **Race** column first and then move on to others.

In [None]:
shootings = shootings.dropna()

**<h1>Congrats!</h1>** We now have 0% missing values
Now move on to the column level to see type of data for each column.
By looking at the initial rows, <code>'name', 'date', 'age', 'gender', 'race', 'city', 'state'</code> are straight-forward. But there seems to be ambiguity with other columns. 

In [None]:
print(shootings.manner_of_death.unique())
print(shootings.armed.unique())
print(shootings.signs_of_mental_illness.unique())
print(shootings.threat_level.unique())
print(shootings.flee.unique())
print(shootings.body_camera.unique())

manner_of_death: **✓**
<br/>armed: **✗**
<ol>
    <li> Looks to be entered manually</li>
    <li>The words 'unknown', 'unknown weapon', 'undetermined' are of same meaning</li>
    <li>So many repitions of weapons that are similar e.g. 'vehicle and gun', 'gun and vehicle'</li>
</ol>
<br/>signs_of_mental_illness: <b>✓</b>
<br/>threat_level: <b>✓</b>
<br/>flee: <b>✓</b>


# ARMED normalization/categorization
I have created a static dictionary myself. So, I can categorize each of the weapons.
But first, lets correct point#2 and normalize <code>['unknown', 'undetermined', 'unidentifiable', 'claimed to be armed']</code> to one word 'unknown'

In [None]:
unknowns_dict = ['unknown', 'undetermined', 'unidentifiable', 'claimed to be armed']
shootings.armed.loc[shootings.armed.str.contains('|'.join(unknowns_dict),case = False)] = 'unknown'

# The Dictionary

In [None]:
dictionary = {
    'Guns': 
        ['bb gun', 'pellet gun', 'air pistol', 'bean-bag gun', 'gun'],
    'Blunt instruments':
        ['hammer', 'axe', 'ax', 'hatchet', 'crowbar', 'pole', 'rod', 'walking stick', 'stick', 'rock', 'baton', 'shovel',
        'metal object', 'baseball bat', 'bat', 'flagpole', 'metal pole', 'metal stick', 'blunt object', 'metal pipe', 
         'carjack', 'brick', 'garden tool', 'metal rake', 'mace', 'wrench', 'pipe'],
    'Sharp objects':
        ['knife', 'bayonet', 'razor', 'blade', 'machete', 'sword', 'chainsaw', 'chain saw' , 'sharp object', 'scissor',
         'scissors','chain saw', 'glass shard', 'samurai sword', 'lawn mower blade', 'box-cutter', 'straight edge razor',
        'beer bottle', 'bottle', 'sharp object','meat cleaver', 'box cutter'],
    'Piercing objects':
        ['spear', 'pick-axe', 'pick axe', 'pitchfork', 'cordless drill', 'nail gun', 'nailgun', 'pen',
        'crossbow', 'arrow and bow', 'screwdriver', 'ice pick', 'bow and arrow'],
    'Other unusual objects':
        ['oar', 'chair', 'barstool', 'pepper spray', 'spray', 'wasp spray', 'piece of wood',
        'toy weapon', 'torch', 'flashlight', 'air conditioner', 'hand torch', "contractor's level", 'chain', 'stapler'],
    'Hand tools':
        ['metal hand tool'],
    'Vehicles':
        ['motorcycle', 'car', 'van', 'wagon', 'bike', 'vehicle'],
    'Electrical devices':
        ['taser'],
    'Explosives':
        ['fireworks', 'grenade', 'molotov cocktail', 'incendiary device']
}

Don't worry, I have used the cheaper approach and written it myself. Each category holds its own type of weapons.
Now I am mapping each of the weapon to another categorical column i.e. <code>arms_category</code>

In [None]:
def categorize_arms(row):
    armed = row['armed'].lower()
    if 'and' in armed.split(' '): #using 'and' as separator to recognize multiple weapons
        row['arms_category'] = 'Multiple'
    elif armed in dictionary.keys():
        row['arms_category'] = armed
    elif armed == 'unknown':
        row['arms_category'] = 'Unknown'
    elif armed == 'unarmed':
        row['arms_category'] = 'Unarmed'
    else:
        for key, value in dictionary.items():
            if armed in value:
                row['arms_category'] = key
    return row

arms_categorized = shootings
arms_categorized['arms_category'] = None
arms_categorized = arms_categorized.apply(categorize_arms, axis = 'columns')
shootings = arms_categorized

We have successfully categorized all the weapons and will use this column for future analysis as it provides more concise information. See for yourself

In [None]:
shootings.head()

# The last thing before Visualization
I don't like how the Races are denoted. It will surely confuse naive audience <code>
    A for Asian
    B for Black
    W for White
    H for Hispanic
    N for Native
    O for Other
</code>
Now I am changing the Letters to full words

In [None]:
def assign_Race(race):
    if race == 'A':
        race = 'Asian'
    elif race == 'B':
        race = 'Black'
    elif race == 'H':
        race = 'Hispanic'
    elif race == 'N':
        race = 'Native'
    elif race == 'O':
        race = 'Other'
    elif race == 'W':
        race = 'White'
    return race
shootings.loc[:,'race'] = shootings.race.apply(assign_Race)

# All done. Our data is a lot better than it was!

![Tableau Shot](https://i.ibb.co/M1pn354/ewadsdas.png)