In [1]:
import pandas as pd
import numpy as np

# Conditional Probability Demonstration

Let's start with a concrete example. According to recent OECD statistics, there are 26522652 professional fishers (though not officially recognized, we’re going to use this term as an alternative to fisherman/fisherwoman/fisherperson) employed in the fishing industry in Finland – the most "recent" statistics are from 2013 but this isn't important for the point being made. The total population of Finland in 2013 was 5{,}439{,}0005,439,000, so fishing employed only about 0.0490.049 percent, or fewer than one in 20002000 Finns. Meanwhile, in Norway, the number of fishers was 11{,}61111,611 out of the total population of 5{,}080{,}0005,080,000, which makes about0.2290.229 percent, about one in 438438 Norwegians. The following table contains the figures for all of the Nordic countries:

In [4]:
denmark = pd.Series([5615000, 1891, 0.034])
finland = pd.Series([5439000, 2652, 0.049])
iceland = pd.Series([324000, 3800, 1.173])
norway = pd.Series([5080000, 11611, 0.229])
sweden = pd.Series([9609000, 1757, 0.018])

In [7]:
fishing_figures = pd.DataFrame({'Denmark': denmark,
                                'Finland': finland,
                                'Iceland': iceland,
                                'Norway': norway,
                                'Sweden': sweden})
fishing_figures

Unnamed: 0,Denmark,Finland,Iceland,Norway,Sweden
0,5615000.0,5439000.0,324000.0,5080000.0,9609000.0
1,1891.0,2652.0,3800.0,11611.0,1757.0
2,0.034,0.049,1.173,0.229,0.018


In [15]:
# Add column displaying the total
fishing_figures['TOTAL'] = fishing_figures.Denmark + fishing_figures.Finland + fishing_figures.Iceland + fishing_figures.Norway + fishing_figures.Sweden

## Figures for All Nordic Countries

In [17]:
fishing_figures

Unnamed: 0,Denmark,Finland,Iceland,Norway,Sweden,TOTAL
0,5615000.0,5439000.0,324000.0,5080000.0,9609000.0,26067000.0
1,1891.0,2652.0,3800.0,11611.0,1757.0,21711.0
2,0.034,0.049,1.173,0.229,0.018,1.503


In [31]:
p_country_list = []

def p_country(dataframe):
    for eachcountry in dataframe.columns[0:5]:
        a = round((dataframe[str(eachcountry)].loc[0] / dataframe['TOTAL'].loc[0]) * 100, 1)
        p_country_list.append(a)

In [33]:
country = pd.Series(['Denmark', 'Finland', 'Iceland', 'Norway', 'Sweden'])
fisher_prob = pd.Series(p_country_list)

In [36]:
country_winner = pd.DataFrame({'Country': country, 'Probability of choosing a winner (%)': fisher_prob})

## Probability of Winner Being from each Country

Let's imagine that we could run a lottery where we randomly choose any citizen in the Nordic countries. Since Sweden has a way bigger population than Iceland, it would be more likely that the winner is a Swede rather than an Icelander. The probability of choosing a person in a specific country can be calculated by dividing the population of the country by the total population of the Nordic countries, 26,067,000

In [37]:
country_winner

Unnamed: 0,Country,Probability of choosing a winner (%)
0,Denmark,21.5
1,Finland,20.9
2,Iceland,1.2
3,Norway,19.5
4,Sweden,36.9


the math👆🏾

`P(country) = population(country) ÷ totalpopulation`

In [58]:
p_fisher_list = []

In [57]:
def p_fisher(dataframe):
    for eachcountry in dataframe.columns[0:5]:
        a = round((dataframe[str(eachcountry)].loc[1] / dataframe['TOTAL'].loc[1]) * 100, 1)
        p_fisher_list.append(a)

In [59]:
p_fisher(fishing_figures)

In [60]:
fish_winner = pd.Series(p_fisher_list)

fish_prob = pd.DataFrame({'Country': country, 'Probability of choosing a winner (%)': fish_winner})

## Probability of Winner Being from a Country & turns out to be a Fisher

What if we were told that the winner has been chosen and that he or she turns out to be a fisher? The chances of the winner being from a specific country can now be calculated by dividing the number of fishers in the country by the total number of fishers in all of the above countries, which is 21,711. The probabilities of the winner being a citizen of a certain country, given that the winner makes their living in fishing, look quite different from the probabilities of the winner being a citizen of the same country without specifying whether they are a fisher or not:

In [61]:
fish_prob

Unnamed: 0,Country,Probability of choosing a winner (%)
0,Denmark,8.7
1,Finland,12.2
2,Iceland,17.5
3,Norway,53.5
4,Sweden,8.1


### Q1 
What is the probability that the winner is a fisher given that they are Norwegian? 

Be mindful of the innocent sounding distinction between the probability of X given Y and the probability of Y given X.

In [70]:
q1 = fishing_figures.Norway[1]/fishing_figures.Norway[0]  * 100

In [72]:
print(f'The probability that the winner is a fisher and a norwegian is {round(q1, 3)}%')

The probability that the winner is a fisher and a norwegian is 0.229%


### Q2
Write a program that uses statistics about the population and fishing industry employment to print out conditional probabilities of each nationality given that the winner works in the fishing industry.

The data is given in lists containing the population and the number of fishers in each country.

In [115]:
import numpy as np

countries = ['Denmark', 'Finland', 'Iceland', 'Norway', 'Sweden']
populations = np.array([5615000, 5439000, 324000, 5080000, 9609000])
fishers = np.array([1891, 2652, 3800, 11611, 1757])

total_fishers = sum(fishers)
total_population = sum(populations)


# write your solution here
population = (fishers/total_fishers) * 100

In [116]:
for country, population in zip(countries, population):
    print("%s %.2f%%" % (country, population)) # modify this to print correct results

Denmark 8.71%
Finland 12.22%
Iceland 17.50%
Norway 53.48%
Sweden 8.09%
