<div class="alert alert-info">
    <h1>Global Fire Power Analysis</h1>
    <div>
        <ul>
            <li>Identifica la distribución de cada de las variables</li>
            <li>Que pasa si hago un k-means sin escalar los datos?</li>
            <li>Como se ven los datos escalados?</li>
            <li>Como se ve un K-means con los datos escalados?</li>
            <li>Que pasa si aplicamos un PCA</li>
            <li>Como se ve un K-means con PCA?</li>
            <li>Como se ven los paises en 2-D con un PCA?</li>
            <li>Como se ve un mapa global con los clusters?</li>
        </ul>
    </div>
</div>

In [11]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Importa/limpia la data

Importa los indicadores

In [8]:
data = pd.read_csv('gfp_countries_indicators.csv')
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 145 entries, 0 to 144
Data columns (total 28 columns):
 #   Column                                     Non-Null Count  Dtype 
---  ------                                     --------------  ----- 
 0   country_longName                           145 non-null    object
 1   country_shortName                          145 non-null    object
 2   Total population                           145 non-null    int64 
 3   Reaching Military Age                      145 non-null    int64 
 4   Active Service                             145 non-null    int64 
 5   Active Reserves                            145 non-null    int64 
 6   Paramilitary                               145 non-null    int64 
 7   Fighters/Interceptors                      145 non-null    int64 
 8   Attack/Strike                              145 non-null    int64 
 9   Helicopter Fleets                          145 non-null    int64 
 10  Armored Fighting Vehicles             

Vamos a quedarnos con las variables que esten directamente más relacionado con lo militar (activos)

In [9]:
# Define una lista con los activos militares
mil_vars = ['Active Service', 'Active Reserves', 'Paramilitary', 'Fighters/Interceptors', 
            'Attack/Strike', 'Helicopter Fleets', 'Armored Fighting Vehicles', 'Towed Artillery',
           'Submarines', 'Frigates', 'Corvettes']

data = data.set_index(['country_longName', 'country_shortName'])[mil_vars]
display(data.head(5), data.info())

<class 'pandas.core.frame.DataFrame'>
MultiIndex: 145 entries, ('Afghanistan', 'AFG') to ('Zimbabwe', 'ZIM')
Data columns (total 11 columns):
 #   Column                     Non-Null Count  Dtype
---  ------                     --------------  -----
 0   Active Service             145 non-null    int64
 1   Active Reserves            145 non-null    int64
 2   Paramilitary               145 non-null    int64
 3   Fighters/Interceptors      145 non-null    int64
 4   Attack/Strike              145 non-null    int64
 5   Helicopter Fleets          145 non-null    int64
 6   Armored Fighting Vehicles  145 non-null    int64
 7   Towed Artillery            145 non-null    int64
 8   Submarines                 145 non-null    int64
 9   Frigates                   145 non-null    int64
 10  Corvettes                  145 non-null    int64
dtypes: int64(11)
memory usage: 23.5+ KB


Unnamed: 0_level_0,Unnamed: 1_level_0,Active Service,Active Reserves,Paramilitary,Fighters/Interceptors,Attack/Strike,Helicopter Fleets,Armored Fighting Vehicles,Towed Artillery,Submarines,Frigates,Corvettes
country_longName,country_shortName,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Afghanistan,AFG,0,0,80000,0,0,11,6555,0,0,0,0
Albania,ALB,6600,2000,500,0,0,19,976,0,0,0,0
Algeria,ALG,325000,135000,150000,102,42,298,35990,483,6,8,16
Angola,ANG,107000,0,10000,57,26,116,5500,552,0,0,0
Argentina,ARG,108000,0,20000,24,10,90,21724,172,2,0,9


None

¿Cómo se ven visualmente estas variables?

In [73]:
%%html
<style>

    .indicators {
        display: flex;
        padding: 5%;
    }
    
    .row-of-indicators {
        display: flex;
        align-items: center;
        justify-content: center;
        left-padding: auto;
        height: auto;
        /* border: 1px solid white; debugging */
    }

    .indicator-el {
        width: 25%;
        height: 100%;
        padding: 1vh;
        /* border: 1px solid blue; debugging */
    }

    .indicator-el img {
        width: 50vh;
        height: 25vh;
    }

    .indicator-el p {
        text-align: center;
        height: 20%;
    }
</style>

<div>
    <div class="row-of-indicators">
        <div class="indicator-el">
            <img src="https://upload.wikimedia.org/wikipedia/commons/d/d1/F-106A_Chase_Dart_%28cropped%29.jpg" alt="fighters/interceptors">
            <p>Fighters/Interceptors aircraft</p>
        </div>
        <div class="indicator-el">
            <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/80/Fairchild_Republic_A-10_Thunderbolt_II_-_32156159151.jpg/1200px-Fairchild_Republic_A-10_Thunderbolt_II_-_32156159151.jpg" alt="attack/strike">
            <p>Attack/Strike aircraft</p>
        </div>
        <div class="indicator-el">
            <img src="https://upload.wikimedia.org/wikipedia/commons/4/48/Jordanian_Air_Force_UH-60_Black_Hawk_helicopter_%28cropped%29.jpg" alt="helicopter-fleets">
            <p>Helicopter Fleets</p>
        </div>
    </div>
    <div class="row-of-indicators">
        <div class="indicator-el">
            <img src="https://eco-cdn.iqpc.com/eco/images/channel_content/images/boxer.webp" alt="armored-fighting-vehicles">
            <p>Armored Fighting Vehicles</p>
        </div>
        <div class="indicator-el">
            <img src="https://upload.wikimedia.org/wikipedia/commons/a/ac/M777_howitzer_rear.jpg" alt="towered-artillery">
            <p>Towed Artillery</p>
        </div>
    </div>
    <div class="row-of-indicators">
        <div class="indicator-el">
            <img src="https://upload.wikimedia.org/wikipedia/commons/b/bb/US_Navy_040730-N-1234E-002_PCU_Virginia_%28SSN_774%29_returns_to_the_General_Dynamics_Electric_Boat_shipyard.jpg" alt="submarines">
            <p>Submarines</p>
        </div>
        <div class="indicator-el">
            <img src="https://upload.wikimedia.org/wikipedia/commons/thumb/5/5a/Admiral_Gorshkov_frigate_03.jpg/640px-Admiral_Gorshkov_frigate_03.jpg" alt="frigates">
            <p>Frigates</p>
        </div>
        <div class="indicator-el">
            <img src="https://upload.wikimedia.org/wikipedia/commons/d/d2/Zarechny_%28parade%29.jpg">
            <p>Corvettes</p>
        </div>
    </div>
</div>

## EDA

### Describe las variables, muestra las distribuciones e identifica posibles correlaciones

In [12]:
# Describe las variables numericas
data.describe()

Unnamed: 0,Active Service,Active Reserves,Paramilitary,Fighters/Interceptors,Attack/Strike,Helicopter Fleets,Armored Fighting Vehicles,Towed Artillery,Submarines,Frigates,Corvettes
count,145.0,145.0,145.0,145.0,145.0,145.0,145.0,145.0,145.0,145.0,145.0
mean,153884.3,201809.2,121307.0,73.668966,27.234483,144.337931,16538.241379,407.441379,3.296552,2.82069,2.951724
std,296848.8,575948.0,609642.4,209.141302,101.437584,506.244619,39810.58731,1014.412302,10.033364,5.507153,9.774777
min,0.0,0.0,0.0,0.0,0.0,0.0,100.0,0.0,0.0,0.0,0.0
25%,18400.0,0.0,2000.0,0.0,0.0,13.0,1112.0,0.0,0.0,0.0,0.0
50%,50000.0,26000.0,12500.0,11.0,0.0,38.0,4522.0,72.0,0.0,0.0,0.0
75%,162000.0,130000.0,55000.0,53.0,23.0,100.0,13710.0,300.0,2.0,4.0,2.0
max,2035000.0,5000000.0,6800000.0,1854.0,896.0,5737.0,360069.0,8356.0,65.0,42.0,83.0


Notas

- Parece que muy pocos paises tienen `Submarines`, `Frigates` y `Corvettes` i.e. activos navales
- Los `Active Service`, `Active Reserves` y `Paralimitary` estan en millones de personas
- Los `Armored Fighting Vehicles` parece ser uno de los indicadores con mayores valores (miles) en comparación con otros equipos como barcos.