# Introduction to Probability

The dataset contains information on flags of countries around the world. Each row is a country. Here are the relevant columns: 

- `name` -- name of the country 
- `landmass` -- which continent the country is in (1=N.America, 2=S.America, 3=Europe, 4=Africa, 4=Asia, 6=Oceania) 
- `area` -- country area, in thousands of square kilometers 
- `population` -- rounded to the nearest million 
- `bars` -- Number of vertical bars in the flag 
- `stripes` -- Number of horizontal stripes in the flag 
- `colors` -- Number of different colours in the flag 
- `red, green, blue, gold, white, black, orange` -- 0 if color absent, 1 if color present in the flag

### Exploring the Data

In [1]:
import pandas as pd

In [2]:
flags = pd.read_csv("flags.csv")

In [3]:
flags.head()

Unnamed: 0,name,landmass,zone,area,population,language,religion,bars,stripes,colors,...,saltires,quarters,sunstars,crescent,triangle,icon,animate,text,topleft,botright
0,Afghanistan,5,1,648,16,10,2,0,3,5,...,0,0,1,0,0,1,0,0,black,green
1,Albania,3,1,29,3,6,6,0,0,3,...,0,0,1,0,0,0,1,0,red,red
2,Algeria,4,1,2388,20,8,2,2,0,3,...,0,0,1,1,0,0,0,0,green,white
3,American-Samoa,6,3,0,0,1,1,0,0,5,...,0,0,0,0,1,1,1,0,blue,red
4,Andorra,3,1,0,0,6,0,3,0,3,...,0,0,0,0,0,0,0,0,blue,red


In [7]:
flags.shape

(194, 30)

In [8]:
flags.columns.values

array(['name', 'landmass', 'zone', 'area', 'population', 'language',
       'religion', 'bars', 'stripes', 'colors', 'red', 'green', 'blue',
       'gold', 'white', 'black', 'orange', 'mainhue', 'circles', 'crosses',
       'saltires', 'quarters', 'sunstars', 'crescent', 'triangle', 'icon',
       'animate', 'text', 'topleft', 'botright'], dtype=object)

In [4]:
# The country with the most bars in its flag

flag_most_bars = flags.sort_values("bars", ascending=[0])["name"].iloc[0]
flag_most_bars

'St-Vincent'

In [5]:
# The country with the highest population

country_highest_pop = flags.sort_values("population", ascending=[0])["name"].iloc[0]
country_highest_pop

'China'

### Calculating Probability 

In [10]:
# The probability of a country having a flag with the color orange in it

num_countries = flags.shape[0]
orange_probability = flags[flags["orange"] == 1].shape[0] / num_countries
orange_probability

0.13402061855670103

In [11]:
# The probability of a country having a flag with more than 1 stripe in it

stripe_probability = flags[flags["stripes"] > 1].shape[0] / num_countries
stripe_probability

0.41237113402061853

### Conjunctive Probability

In [12]:
# The probability that 10 (100) flips in a row will all turn out heads

ten_heads = 0.5 ** 10
hundred_heads= 0.5 ** 100

### Dependent Probabilities

In [13]:
# The probability picking (without replacement) three countries with red in their flags in a row

red_in_flag_count = flags[flags["red"] == 1].shape[0]
one_red = (red_in_flag_count / num_countries) 
two_red = one_red * ((red_in_flag_count - 1) / (num_countries - 1))
three_red = two_red * ((red_in_flag_count - 2) / (num_countries - 2))
three_red

0.4884855242775493

### Disjunctive probability

In [18]:
# We have a random number generator that generates numbers from 1 to 18000.
# What are the odds of getting a number evenly divisible by 100, with no remainder?

count = 0
for i in range(1, 18000):
    if (i % 100) == 0:
        count += 1
count / 18000

0.009944444444444445

In [19]:
# What are the odds of getting a number evenly divisible by 70, with no remainder?

count = 0
for i in range(1, 18000):
    if (i % 70) == 0:
        count += 1
count / 18000

0.014277777777777778

### Disjunctive Dependent Probabilities

In [24]:
# The probability of a flag having red or orange as a color

red_in_flag_prob = flags[flags["red"] == 1].shape[0] / flags.shape[0]
orange_in_flag_prob = flags[flags["orange"] == 1].shape[0] / flags.shape[0]
red_and_orange_in_flag_prob = flags[(flags["red"] == 1) & (flags["orange"] == 1)].shape[0] / flags.shape[0]

red_or_orange_in_flag_prob = red_in_flag_prob + orange_in_flag_prob - red_and_orange_in_flag_prob
red_or_orange_in_flag_prob

0.8247422680412371

In [25]:
# The probability of a flag having at least one stripes or at least one bars

stripes_in_flag_prob = flags[flags["stripes"] > 0].shape[0] / flags.shape[0]
bars_in_flag_prob  = flags[flags["bars"] > 0].shape[0] / flags.shape[0]
stripes_and_bars_in_flag_prob  = flags[(flags["stripes"] > 0) & (flags["bars"] > 0)].shape[0] / flags.shape[0]

stripes_or_bars_in_flag_prob  = stripes_in_flag_prob  + bars_in_flag_prob  - stripes_and_bars_in_flag_prob
stripes_or_bars_in_flag_prob

0.5927835051546392