**Import Pandas and Numpy**


In [1]:
import pandas as pd
import numpy as np

### ABO Blood Groups


In this notebook we will analyze the world distribution of the different blood groups.

On the surface of the red blood cells (also known as erythrocytes) are expressed molecules that are determined on the genetic code of individuals. Among those molecules are two important ones, that determinate the blood type. The implication of those types is that individuals needing a blood transfussion can not ever receive blood from any other individual. Many people have to receive only the same type they have. Also, pregnant women that their babies have an incompatible blood type with their mothers are at risk of abortion or can generate a dangerous reaction on the mother, that may even cause the risk of death on the mother. Thus, the importance of testing the blood type on all the population. 

There are four groups that are responsible of the blood compatibility: 

-Group O: Is the most common blood group. People with this blood group can recieve blood only from the same type.

-Group A: Is the second most frequently found group. People with this group can recieve from A and O groups.

-Group B: Is the third type in frequency. People with this group can recieve from B and O groups.

-Group AB: Is the least type in frequency. People with this group can recieve blood from any type (A,B, and O).

Another molecule present in the red blood cells and which determines the **compatibility** of the blood is known as the <i> Rhesus D factor </i>. This factor was discovered in monkies (from here the name). When this factor is present we say that the blood type is Rh-positive (Rh+). When it is absent is considered Rh-negative (Rh-).
 
In summary, the combination of the ABO types and the Rhesus D factor results in Eight different possible blood types: O+, A+, B+, AB+, O-, A-, B-, AB-. 
 
In the present dataset we have the distribution of the eight blood types by country. We will analyze this data using the statistical knowledge we have learned. 




### Loading data by using Pandas

In [8]:
### user pandas to read the csv (read_csv())
### show the top rows
# "blood_groups_world_distribution.csv"
df_blood = pd.read_csv(r"blood_groups_world_distribution.csv")
df_blood.head() #The head() method returns the first 5 rows if a number is not specified.

Unnamed: 0,Country,Population,O+,A+,B+,AB+,O-,A-,B-,AB-
0,Armenia,2931568,0.29,0.463,0.12,0.056,0.02,0.037,0.01,0.004
1,Norway,5330986,0.33,0.415,0.068,0.034,0.06,0.075,0.012,0.006
2,Cyprus,1189395,0.3522,0.4035,0.1111,0.0472,0.0385,0.0348,0.0087,0.004
3,Portugal,10264672,0.363,0.4,0.066,0.029,0.06,0.066,0.011,0.005
4,Switzerland,8454321,0.35,0.4,0.07,0.03,0.06,0.07,0.01,0.01


In [9]:
df_blood.describe()

Unnamed: 0,Population,O+,A+,B+,AB+,O-,A-,B-,AB-
count,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0,101.0
mean,65705260.0,0.400744,0.30318,0.156366,0.047123,0.039508,0.03473,0.01322,0.005131
std,194418600.0,0.109435,0.066497,0.079631,0.023111,0.025815,0.026884,0.009315,0.003989
min,38140.0,0.2463,0.087,0.0228,0.005,0.0008,0.001,0.0001,0.0001
25%,5541328.0,0.32,0.26,0.09,0.03,0.0143,0.008,0.005,0.001
50%,16642880.0,0.38,0.31,0.14,0.043,0.04,0.028,0.011,0.0049
75%,49069270.0,0.462,0.357,0.207,0.0635,0.06,0.06,0.02,0.01
max,1388251000.0,0.855,0.463,0.3814,0.1132,0.09,0.08,0.0357,0.012


### 1. Which is the most common blood type: O+ or O-?

### Adding new columns



**Adding new columns can copy an old data frame to new blocks.**

**If you add a new type column no copying happens.**

**The re-creation of the blocks does not occur every time you add columns but:**

**1) After adding many new columns.**

**2) After calling some functions such as "values".**


In [10]:
df_blood['Amount_O+']=df_blood['O+']*df_blood['Population']
df_blood['Amount_O-']=df_blood['O-']*df_blood['Population']
df_blood.head()


Unnamed: 0,Country,Population,O+,A+,B+,AB+,O-,A-,B-,AB-,Amount_O+,Amount_O-
0,Armenia,2931568,0.29,0.463,0.12,0.056,0.02,0.037,0.01,0.004,850154.72,58631.36
1,Norway,5330986,0.33,0.415,0.068,0.034,0.06,0.075,0.012,0.006,1759225.38,319859.16
2,Cyprus,1189395,0.3522,0.4035,0.1111,0.0472,0.0385,0.0348,0.0087,0.004,418904.919,45791.7075
3,Portugal,10264672,0.363,0.4,0.066,0.029,0.06,0.066,0.011,0.005,3726075.936,615880.32
4,Switzerland,8454321,0.35,0.4,0.07,0.03,0.06,0.07,0.01,0.01,2959012.35,507259.26


In [11]:
sum_o_positive=df_blood['Amount_O+'].sum()
sum_o_negative=df_blood['Amount_O-'].sum()

if( sum_o_positive < sum_o_negative):
    print("The most common blood type is  O-. The amount of people with this blood type is:", int(sum_o_negative))
else:
    print("The most common blood type is  O+ .The amount of people with this blood type is:", int(sum_o_positive))

The most common blood type is  O+ .The amount of people with this blood type is: 2557531432


### 2. On how many countries the most common blood type is A?

In [12]:
df_blood['A']=df_blood['A+']+df_blood['A-']
df_blood['B']=df_blood['B+']+df_blood['B-']
df_blood['O']=df_blood['O+']+df_blood['O-']
df_blood['AB']=df_blood['AB+']+df_blood['AB-']

df_blood.head()


Unnamed: 0,Country,Population,O+,A+,B+,AB+,O-,A-,B-,AB-,Amount_O+,Amount_O-,A,B,O,AB
0,Armenia,2931568,0.29,0.463,0.12,0.056,0.02,0.037,0.01,0.004,850154.72,58631.36,0.5,0.13,0.31,0.06
1,Norway,5330986,0.33,0.415,0.068,0.034,0.06,0.075,0.012,0.006,1759225.38,319859.16,0.49,0.08,0.39,0.04
2,Cyprus,1189395,0.3522,0.4035,0.1111,0.0472,0.0385,0.0348,0.0087,0.004,418904.919,45791.7075,0.4383,0.1198,0.3907,0.0512
3,Portugal,10264672,0.363,0.4,0.066,0.029,0.06,0.066,0.011,0.005,3726075.936,615880.32,0.466,0.077,0.423,0.034
4,Switzerland,8454321,0.35,0.4,0.07,0.03,0.06,0.07,0.01,0.01,2959012.35,507259.26,0.47,0.08,0.41,0.04


The idxmax() method returns a Series with the index of the maximum value for each column.
The tail() method returns the last 5 rows if a number is not specified.

In [22]:
df_blood['most_common']=df_blood[['A','B','O','AB']].idxmax(axis=1)
df_blood.head()

Unnamed: 0,Country,Population,O+,A+,B+,AB+,O-,A-,B-,AB-,Amount_O+,Amount_O-,A,B,O,AB,most_common
0,Armenia,2931568,0.29,0.463,0.12,0.056,0.02,0.037,0.01,0.004,850154.72,58631.36,0.5,0.13,0.31,0.06,A
1,Norway,5330986,0.33,0.415,0.068,0.034,0.06,0.075,0.012,0.006,1759225.38,319859.16,0.49,0.08,0.39,0.04,A
2,Cyprus,1189395,0.3522,0.4035,0.1111,0.0472,0.0385,0.0348,0.0087,0.004,418904.919,45791.7075,0.4383,0.1198,0.3907,0.0512,A
3,Portugal,10264672,0.363,0.4,0.066,0.029,0.06,0.066,0.011,0.005,3726075.936,615880.32,0.466,0.077,0.423,0.034,A
4,Switzerland,8454321,0.35,0.4,0.07,0.03,0.06,0.07,0.01,0.01,2959012.35,507259.26,0.47,0.08,0.41,0.04,A


In [11]:
count_a=df_blood[df_blood["most_common"] =='A'].count().iloc[0] #iloc = integer-location based indexing for selection by position.
type(count_a)

numpy.int64

In [12]:
count_a

34

In [13]:
print("A is the most common blood type in {} out of 101 countries".format(count_a))

A is the most common blood type in 34 out of 101 countries


### 3. Show the five countries with the higher percentage of AB types.

In [18]:
sorted_df_AB=df_blood.sort_values(['AB'], ascending = (False))
sorted_df_AB.head(5)

Unnamed: 0,Country,Population,O+,A+,B+,AB+,O-,A-,B-,AB-,Amount_O+,Amount_O-,A,B,O,AB,most_common
49,North Korea,25432033,0.2715,0.3108,0.3015,0.1132,0.0008,0.001,0.001,0.0003,6904797.0,20345.63,0.3118,0.3025,0.2723,0.1135,A
37,South Korea,50748307,0.279,0.3387,0.2692,0.1098,0.001,0.0013,0.0008,0.0002,14158780.0,50748.31,0.34,0.27,0.28,0.11,A
94,Pakistan,180440005,0.2463,0.206,0.344,0.0952,0.0417,0.0266,0.0357,0.0045,44442370.0,7524348.0,0.2326,0.3797,0.288,0.0997,B
5,Japan,126044340,0.299,0.398,0.199,0.099,0.0015,0.002,0.001,0.0005,37687260.0,189066.5,0.4,0.2,0.3005,0.0995,A
92,Bangladesh,164833667,0.3118,0.2144,0.3458,0.0885,0.0139,0.0096,0.0096,0.0064,51395140.0,2291188.0,0.224,0.3554,0.3257,0.0949,B


In [23]:
print("The 5 countries with the most highest percentage of AB types:\n{} ".format(sorted_df_AB['Country'][0:5]))

The 5 countries with the most highest percentage of AB types:
49    North Korea
37    South Korea
94       Pakistan
5           Japan
92     Bangladesh
Name: Country, dtype: object 


### 4. How many people (and which percent of the total world population) has a negative Rh?

In [17]:
df_blood['sum_neg']=(df_blood["A-"]+df_blood["B-"]+df_blood["AB-"]+df_blood["O-"])*df_blood["Population"]
df_blood.head()

Unnamed: 0,Country,Population,O+,A+,B+,AB+,O-,A-,B-,AB-,Amount_O+,Amount_O-,A,B,O,AB,most_common,sum_neg
0,Armenia,2931568,0.29,0.463,0.12,0.056,0.02,0.037,0.01,0.004,850154.72,58631.36,0.5,0.13,0.31,0.06,A,208141.328
1,Norway,5330986,0.33,0.415,0.068,0.034,0.06,0.075,0.012,0.006,1759225.38,319859.16,0.49,0.08,0.39,0.04,A,815640.858
2,Cyprus,1189395,0.3522,0.4035,0.1111,0.0472,0.0385,0.0348,0.0087,0.004,418904.919,45791.7075,0.4383,0.1198,0.3907,0.0512,A,102287.97
3,Portugal,10264672,0.363,0.4,0.066,0.029,0.06,0.066,0.011,0.005,3726075.936,615880.32,0.466,0.077,0.423,0.034,A,1457583.424
4,Switzerland,8454321,0.35,0.4,0.07,0.03,0.06,0.07,0.01,0.01,2959012.35,507259.26,0.47,0.08,0.41,0.04,A,1268148.15


In [20]:
total_neg=df_blood['sum_neg'].sum()
print("The amount of people with negative Rh is: {}".format(total_neg))

The amount of people with negative Rh is: 397137348.7785


In [21]:
total_pop=df_blood['Population'].sum()
total_pop

6636231605

In [22]:
print( "The precentage of the people with negative Rh of the total world population is {}".format(total_neg/total_pop*100)) 

The precentage of the people with negative Rh of the total world population is 5.9843804800194285
