# Information of the data
## Column headers for the data
- year: The year the data in the row is for
- WHO region: The region in which the country is located(Western Pacific, Americas)
- Country: The country of the data is for
- Beverage Types: The type of veverage 
- Display Value: The average number of the liters drunk per capita

In [1]:
# import numpy & read the data set into a ndarray
# the dtype of the ndarray is selected as unicode to not drop any datas  
# Header skiped
import numpy as np
data = np.genfromtxt('world_alcohol.csv', delimiter=',',dtype='U75', skip_header=1)
data

array([['1986', 'Western Pacific', 'Viet Nam', 'Wine', '0'],
       ['1986', 'Americas', 'Uruguay', 'Other', '0.5'],
       ['1985', 'Africa', "Cte d'Ivoire", 'Wine', '1.62'],
       ...,
       ['1986', 'Europe', 'Switzerland', 'Spirits', '2.54'],
       ['1987', 'Western Pacific', 'Papua New Guinea', 'Other', '0'],
       ['1986', 'Africa', 'Swaziland', 'Other', '5.15']], dtype='<U75')

# Getting the general informations about the ndarray

In [2]:
# look through the shape of the ndarray
print(data.shape)

(3257, 5)


It shows that the whole data set is in 3258 rows and 5 columns

# Things that could be analyzed 
1. Did the number of alcohol decreased or increased by time
2. Which country and region drinks the most alcohol
3. Which is the most popular alcohol among the world

# Drawing useful functions to use for the analysis

In [3]:
# Input '0' for empty display values 
bool_val = data[:, 4] == ''
data[bool_val, 4] = '0'

### Function for making a pivot table

In [4]:
def display_value_pivot_table(column):
    # Making a list of the unique values inside the column
    unique_val = list(np.unique(data[:, column]))
    # Empty dict to retrun
    result = {}
    
    for i in unique_val:
        # Finding the values that macthes the column
        boolean_val = data[:,column] == i
        values = data[boolean_val, 4]
        # Shift the dtype of the elements and add up
        values = values.astype(float)
        values = values.sum()
        
        result[i] = values
    return result

### Function for getting the ratio value of the pivot table 


In [5]:
def display_value_pivot_ratio(column):
    # Bringing up the pivot table data
    pivot_table = display_value_pivot_table(column)
    result = {}
    # Getting the value of sums of the elements
    values = pivot_table.values()
    value_sum = sum(values)
    # Making a new dict with the ratio values
    for key, value in pivot_table.items():
        result[key] = value / value_sum * 100
    return result

    

## Analyze the amount of the alcohol drank every year

In [6]:
# Using the pivot table function
year_pivot = display_value_pivot_table(0)
year_pivot

{'1984': 804.67,
 '1985': 790.9,
 '1986': 771.01,
 '1987': 767.0,
 '1989': 775.3799999999999}

In [7]:
for key, value in year_pivot.items():
    if key == '1984':
        privious_val = value
        privious_year = key
    else:
        difference = value - privious_val
        print('The difference between year{0} and {1} is {2}'.format(privious_year, key, difference))
        privious_val = value
        privious_year = key


The difference between year1984 and 1985 is -13.769999999999982
The difference between year1985 and 1986 is -19.889999999999986
The difference between year1986 and 1987 is -4.009999999999991
The difference between year1987 and 1989 is 8.379999999999882


### Results 
    The number of alcohols drank over the world have gently decreased over time expect the last year. Need more data so find the exact reason for this fact. If later on it turns out the America had the most leap the end of the Cold War in December of 1989 might have a influence on this fact.
    Another way of seeing this data is that from 1986 some event made people drink more. Since from that time peirod the number that when down have became soft. 


### Alcohol drank ratio and amout by region 

In [8]:
# Using the pivot table function to see the numbers for each region
display_value_pivot_table(1)

{'Africa': 735.68,
 'Americas': 902.69,
 'Eastern Mediterranean': 129.45999999999998,
 'Europe': 1737.01,
 'South-East Asia': 46.91,
 'Western Pacific': 357.21}

In [9]:
# Using the ratio function to see the ratio for each region
display_value_pivot_ratio(1)

{'Africa': 18.820351193156235,
 'Americas': 23.092843109164587,
 'Eastern Mediterranean': 3.311878351275019,
 'Europe': 44.436627645204865,
 'South-East Asia': 1.2000634439851008,
 'Western Pacific': 9.138236257214194}

### Results
    Europe is the region that drinks most of the alcohol. The ratio is almost half of the whole world which is 44.43%. America and Afria follows that with 23 and 18%. Rest of the regions seems dont to spend a lot on drinking with less than ten persent. Especially, South-East Asia only takes around one persent. The reason for South-East Asia being so low will have conection to the laws and religions of the counties on that part.

## Analyzing the rank of drinking for each country

### Results 
It clear that most of the alcohols are drank over Europe and Ameica. 

In [10]:
country_dict = display_value_pivot_table(2)
# Changing the dict to list to make it have sequence and sorting the list
country_list = [[key, value] for key, value in country_dict.items()]
country_list.sort(key=lambda x: x[1], reverse=True)

for i in enumerate(country_list):
    print('{0} is the {1} country that drinks the most alcohol with {2} for five years'.format(i[1][0], i[0]+1, i[1][1]))

Hungary is the 1 country that drinks the most alcohol with 82.22 for five years
France is the 2 country that drinks the most alcohol with 81.56 for five years
Slovenia is the 3 country that drinks the most alcohol with 80.05 for five years
Luxembourg is the 4 country that drinks the most alcohol with 77.36999999999999 for five years
Portugal is the 5 country that drinks the most alcohol with 75.50999999999999 for five years
Germany is the 6 country that drinks the most alcohol with 74.48 for five years
Spain is the 7 country that drinks the most alcohol with 71.02000000000001 for five years
Switzerland is the 8 country that drinks the most alcohol with 70.64 for five years
Bahamas is the 9 country that drinks the most alcohol with 69.83999999999999 for five years
Austria is the 10 country that drinks the most alcohol with 68.2 for five years
Czech Republic is the 11 country that drinks the most alcohol with 67.75 for five years
Slovakia is the 12 country that drinks the most alcohol wi

### Result 
    One interesting fact is the countries in the high ranks are not the ones with a huge population. This shows that there will be a even bigger gap between the countries if we search for amount of drinks drank by a person. Most of the high rank countries seems to be in Europ which proves the previous data about the region once agaion. 
    Another facter in this dataset is that there are five countries with 0 alcohol spent for 5 years. This might be because of it is illegal or the difficulty of collection data in those countries. 

## Analyzing the most popular alcohol among the world

In [11]:
popularity = display_value_pivot_table(3)
popularity

{'Beer': 1353.83,
 'Other': 415.27,
 'Spirits': 1282.45,
 'Wine': 857.4100000000001}

### Results 
    Beer is the most drank alcohol among the world and spirits closly follows it. It is interesting that two of the most drank alcohols are totally differnt kind. Beer mostly has less than 5% of alcohol in it, but spitits are the opposite. Huge amout of people wine too, but all the other kinds of alcohols seems not that populler around the world 
    

## Difference between the counties with the popular alcohol

In [12]:
# Making a dict that contains all the counties that drink that alcohol as the fist.
country_set = set(data[:,2])
kinds_set = set(data[:,3])

In [59]:
list_of_popular_alcohol = []
for country in country_set:
    bool_val = data[:, 2] == country
    datas_of_country = data[bool_val, :]
    holder = np.array([])
    max_val = 0
    for kinds in kinds_set:
        bool_val = datas_of_country[:, 3] == kinds
        elements = datas_of_country[bool_val, :]
        display_sum = elements[:,4].astype(float).sum() 
        holder = np.append(holder, [country, kinds, display_sum])
        
        if display_sum > max_val:
            max_val = display_sum
    holder = holder.reshape(4,3)
    bool_val = holder[:,2].astype(float) == max_val
    element = holder[bool_val, :]
    list_of_popular_alcohol.append(element)
    

list_of_popular_alcohol

[[array(['Iraq', 'Spirits', '2.17'], dtype='<U32')],
 [array(['Austria', 'Beer', '34.8'], dtype='<U32')],
 [array(['Malawi', 'Other', '3.87'], dtype='<U32')],
 [array(['Senegal', 'Wine', '1.22'], dtype='<U32')],
 [array(['Venezuela (Bolivarian Republic of)', 'Beer', '23.18'],
        dtype='<U34')],
 [array(['Sierra Leone', 'Other', '22.03'], dtype='<U32')],
 [array(['Iceland', 'Spirits', '15.87'], dtype='<U32')],
 [array(['Sweden', 'Beer', '15.290000000000001'], dtype='<U32')],
 [array(['Mali', 'Other', '2.76'], dtype='<U32')],
 [array(['Ghana', 'Other', '5.95'], dtype='<U32')],
 [array(['New Zealand', 'Beer', '38.85'], dtype='<U32')],
 [array(['Republic of Korea', 'Other', '33.95'], dtype='<U32')],
 [array(['Algeria', 'Beer', '0.8999999999999999'], dtype='<U32')],
 [array(['Burundi', 'Other', '26.25'], dtype='<U32')],
 [array(['Thailand', 'Spirits', '16.6'], dtype='<U32')],
 [array(['Democratic Republic of the Congo', 'Other', '8.85'], dtype='<U32')],
 [array(["Democratic People's Re