# Setup: Python, Jupyter Notebook

## Python and Jupyter 

Python is a general purpose programming language that allows for both simple and complex data analysis. Python is incredibly versatile, allowing analysts, consultants, engineers, and managers to obtain and analyze information for insightful decision-making.

The Jupyter Notebook is an open-source web application that allows for Python code development. Jupyter further allows for inline plotting and provides useful mechanisms for viewing data that make it an excellent resource for a variety of projects and disciplines.

The following section will outline how to install and begin working with Python and Juypter.

## Setting up the Python Environment

Instruction guides for Windows and macOS are included below. Follow the one that corresponds with your operating system.

### Windows Install:
Step 1: Open browser and go to https://www.anaconda.com/distribution/

Step 2: Click on "Windows" and then "Download" for Python 3.7 64-bit installer

Step 3: Run the downloaded file found in the downloads section from Step 2

Step 4: Click through the install prompts

Step 5: Go to menu (or Windows Explorer), find the Anaconda3 folder, and double-click to run

### macOS Install:
Step 1: Open browser and go to https://www.anaconda.com/distribution/

Step 2: Click on "macOS" and then "Download" for Python 3.7 64-bit installer

Step 3: Run the downloaded file found in the downloads section from Step 2

Step 4: Click through the install prompts

Step 5: In Finder (or Launchpad), browse to the Anaconda3 folder to find the Jupyter program, and double-click to run

## File Management with Python and Jupyter

It is common practice to have a main folder where all projects will be located (e.g. "jupyter_research"). The following are guidelines you can use for Python projects to help keep your code organized and accessible:

1. Create subfolders for each Jupyter-related project
3. Group related .ipynb files together in the same folder
2. It can be useful to create a "Data" folder within individual project folders if using a large number of related data files

You should now be set up and ready to begin coding in Python!

# Identifying Expansion Opportunities for Luxury Commercial Airline Flights

## Case Introduction

**Business Context.** You are an employee for GrowthAir, a growing commercial airline company. In the past few years, GrowthAir has expanded luxury flight services to locations across the globe. Following your team's excellent performance in identifying new business opportunities last year, you have been tasked with identifying the top countries to further expand GrowthAir's luxury flight service.

**Business Problem.** Your manager has asked you to answer the following question: <b>"In which countries should GrowthAir expand its luxury flight service?"</b>

**Analytical Context.** The relevant data is a series of success estimates (i.e. probabilities of success) that your internal marketing research teams have come up with. Using your ability to conduct data analysis in Python, you will embark on summarizing the available success estimates to produce a concise recommendation to your boss.

## Fundamentals of Python


Python is an interpreted, high-level general programming language that was first released in 1991. Python allows users to easily manipulate data and store values in what are known as <b>objects</b>. Everything in Python is an object and has a <b>type</b>. For example, if a user aims to store the integer 5 in an object named ```my_int```, this can be accomplished by the statement, ```my_int = 5```. This statement tells Python to assign the integer value of 5 to the **variable** ```my_int``` (called a variable here because it can change value). ```my_int``` is a Python object, and has type ```int```.

Similar to how Excel distinguishes different data types (such as Text, Number, Currency, Scientific), Python offers a variety of data types. Here are a few common data types:

1. Integers, type ```int```: ```my_int = 1```
2. Float type ```float```: ```my_float = 25.5```
3. Strings, type ```str```: ```my_string = 'Hello'```

Here we see (1) <b>integers</b> and (2) <b>floats</b> store numeric data.  The difference between the two is that floats store decimal variables, whereas the integer type can only store integer variables. (3) is the <b>string</b> type. Strings are used to store textual data in Python. This case will use string variables to store country names. They are often used to store identifiers such as person names, city names, and more.

There are other data types available in Python; however, these are the three fundamental types that you will see across almost every Python program. Always keep in mind that **every** object in Python has a type.

Now that we've covered the fundamentals of Python, let's take a look at GrowthAir's proprietary company data on country success estimates.

## Exploring company data on success estimates

Let's take a look at a common data structure used to hold your company's proprietary data on estimates of probability of success for global expansion projects by country.  The ```success_estimates``` variable below is a Python dictionary, which is being assigned certain data using the ```'='``` <b>assignment operator</b>. Each estimate here is a number (float) between 0 and 1, inclusive, which represents the probability that expanding to that country will be successful.

Python's dictionary type stores key-value pairs that allow users to quickly access information for a particular key. By specifying a key, the user can return the value corresponding to the given key. Python's syntax for dictionaries uses curly braces {},

```python
user_dictionary = {'Key1':Value1, 'Key2':Value2, 'Key3':Value3}
```

The ```success_estimates``` dictionary has keys which are strings, and values which are of type <b>list</b>. A <b>list</b> is an incredibly useful data structure in Python that can store any number of Python objects, and are denoted by the use of square brackets ```[]```. In ```success_estimates``` below, the list contains float types. Lists are versatile and can be expanded by adding new elements to the end of the list (the right-most side is considered the end of the list). Moreover, list elements (i.e. the objects in the list) can be accessed easily using integer indices. Interestingly, lists can also store other lists (called a lists of lists).  This makes them a powerful tool for holding complex data sets.

Let's take a look at the ```success_estimates``` data:

In [1]:
# Data on probability of expansion success by country estimates
success_estimates = {'Australia': [0.6, 0.33, 0.11, 0.14],
                     'France': [0.66, 0.78, 0.98, 0.2],
                     'Italy': [0.6],
                     'Brazil': [0.22, 0.22, 0.43],
                     'USA': [0.2, 0.5, 0.3],
                     'England': [0.45],
                     'Canada': [0.25, 0.3],
                     'Argentina': [0.22],
                     'Greece': [0.45, 0.66, 0.75, 0.99, 0.15, 0.66],
                     'Morocco': [0.29], 
                     'Tunisia': [0.68, 0.56],
                     'Egypt': [0.99],
                     'Jamaica': [0.61, 0.65, 0.71],
                     'Switzerland': [0.73, 0.86, 0.84, 0.51, 0.99],
                     'Germany': [0.45, 0.49, 0.36]}

Python easily allows you to print the elements stored in any variable to the screen using the ```print()``` statement:

In [2]:
print(success_estimates)

{'Brazil': [0.22, 0.22, 0.43], 'Canada': [0.25, 0.3], 'Italy': [0.6], 'USA': [0.2, 0.5, 0.3], 'France': [0.66, 0.78, 0.98, 0.2], 'Morocco': [0.29], 'Switzerland': [0.73, 0.86, 0.84, 0.51, 0.99], 'Argentina': [0.22], 'Jamaica': [0.61, 0.65, 0.71], 'Australia': [0.6, 0.33, 0.11, 0.14], 'England': [0.45], 'Tunisia': [0.68, 0.56], 'Egypt': [0.99], 'Germany': [0.45, 0.49, 0.36], 'Greece': [0.45, 0.66, 0.75, 0.99, 0.15, 0.66]}


Notice the re-ordering of the dictonary elements when we print compared to the order in which we originally defined the dictionary. This is a key aspect of dictionary data types – they are unordered! (This is very different compared to list data types, which are ordered. More on this later.)

Now intuitively, we would like to recommend that the business put effort into the country with the highest success estimate. But what does this mean when there are multiple success estimates for some countries, and only one for others? We will explore this next.

## Interacting with dictionaries and lists

Taking a careful look at the ```success_estimates``` dictionary, you notice some countries only have one success estimate, while others have many. For example, England has only one estimate contained in its list [0.45], while Jamaica has three estimates contained in its list [0.61, 0.65, 0.71]. Let's zoom in on Jamaica and take a look at some summary statistics of the estimates.

In Python, the dictionary type has built-in methods (functions, which we will discuss later) to access the dictionary keys and values. These methods are called by typing ```.keys()``` or ```.values()``` after the dictonary object. We will change the return type of calling ```.keys()``` and ```.values()``` to a list by using the ```list()``` method.

In [3]:
# Look at the keys...
list(success_estimates.keys())

['Brazil',
 'Canada',
 'Italy',
 'USA',
 'France',
 'Morocco',
 'Switzerland',
 'Argentina',
 'Jamaica',
 'Australia',
 'England',
 'Tunisia',
 'Egypt',
 'Germany',
 'Greece']

In [4]:
# ...and their corresponding values
list(success_estimates.values())

[[0.22, 0.22, 0.43],
 [0.25, 0.3],
 [0.6],
 [0.2, 0.5, 0.3],
 [0.66, 0.78, 0.98, 0.2],
 [0.29],
 [0.73, 0.86, 0.84, 0.51, 0.99],
 [0.22],
 [0.61, 0.65, 0.71],
 [0.6, 0.33, 0.11, 0.14],
 [0.45],
 [0.68, 0.56],
 [0.99],
 [0.45, 0.49, 0.36],
 [0.45, 0.66, 0.75, 0.99, 0.15, 0.66]]

We will make use of the access to keys and values of a dictionary later in the case when comparing across numerous countries' estimates. For now, just remember that you can access a dictionary's full list of keys or values simply by calling built-in methods.

We'd also like to check if a country name is one the keys in the dictionary. Python allows us to check if a key is in a dictionary through the use of the ```in``` keyword. The statement ```key in dictionary``` will return a <b>boolean type</b> of ```True``` if the key is one of the keys in the dictionary and ```False``` otherwise. Let's take a look at how this works.

In [5]:
print('Checking if Morocco key is present:')
print('Morocco' in success_estimates)

print('Checking if Japan key is present:')
print('Japan' in success_estimates)

Checking if Morocco key is present:
True
Checking if Japan key is present:
False


We'd now like to access the value corresponding to a specific key in the ```success_estimates``` dictionary. Simply type the value name in square brackets adjacent to the dictionary name. For example, ```success_estimates['Jamaica']``` will return Jamaica's list of estimates:

In [6]:
success_estimates['Jamaica'] 

[0.61, 0.65, 0.71]

If you would like to store the result in a variable to be used later, use the assignment operator ```'='```:

In [7]:
jamaica_list = success_estimates['Jamaica'] 

You can then view the contents of the list via the ```print()``` method:

In [8]:
print(jamaica_list)

[0.61, 0.65, 0.71]


Here, you'll see that the order of the elements in the Jamaica list is the same as what was originally defined above. This is because lists are ordered objects. In fact, you can access elements of a list by an <i>index</i>. In Python, indices start at 0 (for the first element of a given list) and increment by 1 for each successive element. For example, let's print each element of the Jamaica list:

In [9]:
# Each successive print statement will print on a new line
print(jamaica_list[0]) # prints the first element of the list
print(jamaica_list[1]) # prints the second element of the list
print(jamaica_list[2]) # prints the third element of the list

0.61
0.65
0.71


Python also offers a simple way to determine the length of a list: the ```len()``` method. We expect the length of ```jamaica_list``` to be 3 since it has three elements:

In [10]:
len(jamaica_list) # returns the length of the list

3

### Exercise 1:

Print the length of the success estimate lists for France, Greece, and Morocco.

In [11]:
for key,val in success_estimates.items():
    print('Para el pais '+ key + ' el tamano es ' + str(len(val)))

Para el pais Brazil el tamano es 3
Para el pais Canada el tamano es 2
Para el pais Italy el tamano es 1
Para el pais USA el tamano es 3
Para el pais France el tamano es 4
Para el pais Morocco el tamano es 1
Para el pais Switzerland el tamano es 5
Para el pais Argentina el tamano es 1
Para el pais Jamaica el tamano es 3
Para el pais Australia el tamano es 4
Para el pais England el tamano es 1
Para el pais Tunisia el tamano es 2
Para el pais Egypt el tamano es 1
Para el pais Germany el tamano es 3
Para el pais Greece el tamano es 6


### Exercise 2:

Which of the following would be useful to store project success estimates if they were available at a regional level instead of a country level?

(a) List

(b) Dictionary

(c) Float

(d) String


Now that we're familiar with using lists and know that lists are ordered data structures while dictionaries are unordered data structures, let's begin to compare success estimates across countries.

## Calculating a country-specific average success estimate

Continuing our analysis on Jamaica, the list contains three numbers, [0.61, 0.65, 0.71]. Recall these number are of type ```float``` in Python, which stores numeric decimal values. One logical way to summarize these estimates so that they can be compared across countries is to use the arithmetic average. Let's use basic arthimetic operators to calculate the average success estimate for Jamaica, storing the result in a new variable ```avg_jamaica```:

In [12]:
avg_jamaica = (0.61 + 0.65 + 0.71) / 3
print(avg_jamaica)

0.656666666667


We see the average probability of success estimate for Jamaica is approximately 0.657.  However, we produced this estimate by hand-coding the values. If we were to do this for every country, it would take quite a long time. So we'd like to use a more automated way of producing the average.

To produce an average we can utilize a <b>function</b>. Functions operate on data and variables in Python to perform a desired action. Functions may have both <b>inputs</b> and <b>outputs</b>, just like familiar mathematical operators like addition, subtraction, multiplication, and division (which each have two inputs and one output). While functions in Python may still be for a mathematical purpose, such as squaring an integer, Python allows for more abstract function behaviour, such as printing to the screen. In this case, the ```print()``` function will print its input to the screen.

Let's use Python's built-in mathematical functions ```sum()```, ```min()```, and ```max()``` to calculate Jamaica's average success estimate, minimum success estimate, and maximum success estimate, respectively:

In [13]:
country_name = 'Jamaica'
jamaica_list = success_estimates[country_name] # list of the estimates for Jamaica
print(jamaica_list)

[0.61, 0.65, 0.71]


In [14]:
avg_jamaica = sum(jamaica_list) / len(jamaica_list)
min_jamaica = min(jamaica_list)
max_jamaica = max(jamaica_list)
print("Country:",country_name,", Average:",avg_jamaica)
print("Country:",country_name,", Min:",min_jamaica)
print("Country:",country_name,", Max:",max_jamaica)

('Country:', 'Jamaica', ', Average:', 0.6566666666666666)
('Country:', 'Jamaica', ', Min:', 0.61)
('Country:', 'Jamaica', ', Max:', 0.71)


As expected, we get the same average result of approximately 0.657. Note that we could also have rounded the results to two decimal places using the ```round()``` method. This can improve readability.

In [15]:
avg_jamaica = round(sum(jamaica_list) / len(jamaica_list),2)
min_jamaica = round(min(jamaica_list),2)
max_jamaica = round(max(jamaica_list),2)
print("Country:",country_name,", Average:",avg_jamaica)
print("Country:",country_name,", Min:",min_jamaica)
print("Country:",country_name,", Max:",max_jamaica)

('Country:', 'Jamaica', ', Average:', 0.66)
('Country:', 'Jamaica', ', Min:', 0.61)
('Country:', 'Jamaica', ', Max:', 0.71)


Functions in Python are a very powerful tool to increase productivity and perform more complex tasks.

### Exercise 3:

Write a script to calculate the average success for every country. Output (using ```print()```) each country's average success estimate to the screen. The print statements should output each country on a new line, for example:
```
Country: France , Average: 0.655
Country: Brazil , Average: 0.29
```

In [16]:
for k,v in success_estimates.items():
    prom= sum(v)/len(v)
    print('para el pais ' +k +' el promedio es '+ str(round(prom,2)))

para el pais Brazil el promedio es 0.29
para el pais Canada el promedio es 0.28
para el pais Italy el promedio es 0.6
para el pais USA el promedio es 0.33
para el pais France el promedio es 0.66
para el pais Morocco el promedio es 0.29
para el pais Switzerland el promedio es 0.79
para el pais Argentina el promedio es 0.22
para el pais Jamaica el promedio es 0.66
para el pais Australia el promedio es 0.3
para el pais England el promedio es 0.45
para el pais Tunisia el promedio es 0.62
para el pais Egypt el promedio es 0.99
para el pais Germany el promedio es 0.43
para el pais Greece el promedio es 0.61


## Systematically determine the average success estimate for all of the countries

The end goal of this analysis is a recommendation for where global expansion opportunities should be considered. To reach a conclusion, it'd be ideal to have the average success probability for each country.

To achieve this, we will use a control flow element in Python -  the <b>for loop</b>.  The ```for``` loop allows one to execute the same statements over and over again (i.e. looping). This saves a significant amount of time coding repetitive tasks and aids in code readability.  The general structure of a for loop is:

```python
for iterator_variable in some_sequence:
    statements(s)
```

The for loop iterates over ```some_sequence``` and performs ```statements(s)``` at each iteration. That is, at each iteration the ```iterator_variable```  is updated to the next value in ```some_sequence```. As a concrete example, consider the loop:

```python
for i in [1,2,3,4]:
    print(i*i)
```

Here, the for loop will print to the screen four times; that is it will print ```1``` on the first iteration of the loop, ```4``` on the second iteration, ```9``` on the third, and ```16``` on the fourth. Hence, the for loop statement will iterate over all the elements of the list ```[1,2,3,4]```, and at each iteration it updates the iterator variable ```i``` to the next value in the list ```[1,2,3,4]```.

Let's use a for loop on our country data by getting a list of all the keys in ```success_estimates```:

In [17]:
# Get all the keys from the success_estimates dictionary
country_name_list = list(success_estimates.keys())
print(country_name_list)

['Brazil', 'Canada', 'Italy', 'USA', 'France', 'Morocco', 'Switzerland', 'Argentina', 'Jamaica', 'Australia', 'England', 'Tunisia', 'Egypt', 'Germany', 'Greece']


Here we loop through all the elements in ```country_name_list```, extract the corresponding value from ```success_estimates``` (which will be of type list), and subsequently take the mean of the list. Detailed printing will guide you through the for loop execution.

In [18]:
# Loop through all countries and calculate their mean success estimate
for i in country_name_list:
    print('--Begin one iteration of loop--')
    print('Element of country_name_list, placeholder i = ' + i)
    print('Access value from dict success_estimates[i]: ', success_estimates[i])
    print('Average of list from success_estimates[i]: ', sum(success_estimates[i]) / len(success_estimates[i]))
    print('--Go to next iteration of loop--')

--Begin one iteration of loop--
Element of country_name_list, placeholder i = Brazil
('Access value from dict success_estimates[i]: ', [0.22, 0.22, 0.43])
('Average of list from success_estimates[i]: ', 0.29)
--Go to next iteration of loop--
--Begin one iteration of loop--
Element of country_name_list, placeholder i = Canada
('Access value from dict success_estimates[i]: ', [0.25, 0.3])
('Average of list from success_estimates[i]: ', 0.275)
--Go to next iteration of loop--
--Begin one iteration of loop--
Element of country_name_list, placeholder i = Italy
('Access value from dict success_estimates[i]: ', [0.6])
('Average of list from success_estimates[i]: ', 0.6)
--Go to next iteration of loop--
--Begin one iteration of loop--
Element of country_name_list, placeholder i = USA
('Access value from dict success_estimates[i]: ', [0.2, 0.5, 0.3])
('Average of list from success_estimates[i]: ', 0.3333333333333333)
--Go to next iteration of loop--
--Begin one iteration of loop--
Element of co

Let's take a closer look at the above ```for``` loop. The ```country_name_list``` has 15 countries which the ```for``` loop is iterating over. The ```for``` loop uses a placeholder variable, denoted ```i``` in this case, to store the element of ```country_name_list``` that each loop iteration corresponds to. Namely, for the first iteration of the ```for``` loop, ```i = 'Brazil'```. For the second iteration, ```i = 'Canada'```. And so on until the loop reaches the final element of ```country_name_list```, which it then completes and exits the looping process.

Why is this looping process useful? Well, we've performed the same calculation statements 15 times while only writing the code once! Notice that for each iteration, the corresponding value from ```success_estimates``` is accessed, and the mean of the returned list is calculated. The ```for``` loop process also enhances code readability.

### Exercise 4:

Write a for loop to instead calculate the minimum and maximum of each country's list of success estimates, printing each out consecutively as in the for loop example above.

In [19]:
for k,v in success_estimates.items():
    print('--Begin one iteration of loop--')
    print('Para el pais '+ k +' El minimo es '+ str(min(v))+' '+ 'El maximo es '+ str(max(v)))


--Begin one iteration of loop--
Para el pais Brazil El minimo es 0.22 El maximo es 0.43
--Begin one iteration of loop--
Para el pais Canada El minimo es 0.25 El maximo es 0.3
--Begin one iteration of loop--
Para el pais Italy El minimo es 0.6 El maximo es 0.6
--Begin one iteration of loop--
Para el pais USA El minimo es 0.2 El maximo es 0.5
--Begin one iteration of loop--
Para el pais France El minimo es 0.2 El maximo es 0.98
--Begin one iteration of loop--
Para el pais Morocco El minimo es 0.29 El maximo es 0.29
--Begin one iteration of loop--
Para el pais Switzerland El minimo es 0.51 El maximo es 0.99
--Begin one iteration of loop--
Para el pais Argentina El minimo es 0.22 El maximo es 0.22
--Begin one iteration of loop--
Para el pais Jamaica El minimo es 0.61 El maximo es 0.71
--Begin one iteration of loop--
Para el pais Australia El minimo es 0.11 El maximo es 0.6
--Begin one iteration of loop--
Para el pais England El minimo es 0.45 El maximo es 0.45
--Begin one iteration of loop

### Exercise 5:

Using the for loop, write code to determine the country with the largest range of success estimates (that is, the largest difference between the smallest and largest estimate for a country).

In [20]:
range_dic ={}
for k,v in success_estimates.items():
    c =  max(v)-min(v)
    range_dic[k]=c
print('El pais con maximo rango de estimación de exitos es '+ max(range_dic)+' Con valor de ' + str(range_dic[max(range_dic)]))


El pais con maximo rango de estimación de exitos es USA Con valor de 0.3


## Using list comprehensions to determine the number of estimates for each country

Moving forward, we are interested in knowing the number of success estimates available for each country. Python offers a concise way to achieve this goal through the use of <b>list comprehensions</b>.

List comprehensions allow one to concisely build a list. Let's take a look at how this works.

In [21]:
key_name_list = [i for i in success_estimates] # loop over each item i in success_estimates and put i in the list
key_name_list

['Brazil',
 'Canada',
 'Italy',
 'USA',
 'France',
 'Morocco',
 'Switzerland',
 'Argentina',
 'Jamaica',
 'Australia',
 'England',
 'Tunisia',
 'Egypt',
 'Germany',
 'Greece']

Here we see that we've looped over each key of the dictionary success_estimates (hence each country), and extracted the country name, all in one line of code. We can also access the values of each key in success_estimates.

In [22]:
value_name_list = [success_estimates[i] for i in success_estimates] # loop over each item i in success_estimates and put success_estimates[i] in the list
value_name_list

[[0.22, 0.22, 0.43],
 [0.25, 0.3],
 [0.6],
 [0.2, 0.5, 0.3],
 [0.66, 0.78, 0.98, 0.2],
 [0.29],
 [0.73, 0.86, 0.84, 0.51, 0.99],
 [0.22],
 [0.61, 0.65, 0.71],
 [0.6, 0.33, 0.11, 0.14],
 [0.45],
 [0.68, 0.56],
 [0.99],
 [0.45, 0.49, 0.36],
 [0.45, 0.66, 0.75, 0.99, 0.15, 0.66]]

In the list comprehension above, each value of ```i``` is a country name and the value is returned when ```success_estimates[i]``` is called. We see the list comprehension is an effective and concise way to write a for loop that creates a list.

We can the use this to quickly determine how many success estimates are available for each country.

In [26]:
# Number of estimates available for each country
[[i,len(success_estimates[i])]for i in success_estimates]

[['Brazil', 3],
 ['Canada', 2],
 ['Italy', 1],
 ['USA', 3],
 ['France', 4],
 ['Morocco', 1],
 ['Switzerland', 5],
 ['Argentina', 1],
 ['Jamaica', 3],
 ['Australia', 4],
 ['England', 1],
 ['Tunisia', 2],
 ['Egypt', 1],
 ['Germany', 3],
 ['Greece', 6]]

### Exercise 6:

Using list comprehensions, write a script to create a <b>list of lists</b> called ```sum_squares_list```, where each element of the list is a two-item list [country name, value]. The value item in the list should be the sum of squares of that country's success estimates. For example, one element of ```sum_squares_list``` should be for Jamaica, where the two-item list is [Jamaica, 1.2987] (since 1.2987 = 0.61^2 + 0.65^2 + 0.71^2).

In [40]:
[[i,round(sum([k*k for k in success_estimates[i]]),2)] for i in success_estimates]

[['Brazil', 0.28],
 ['Canada', 0.15],
 ['Italy', 0.36],
 ['USA', 0.38],
 ['France', 2.04],
 ['Morocco', 0.08],
 ['Switzerland', 3.22],
 ['Argentina', 0.05],
 ['Jamaica', 1.3],
 ['Australia', 0.5],
 ['England', 0.2],
 ['Tunisia', 0.78],
 ['Egypt', 0.98],
 ['Germany', 0.57],
 ['Greece', 2.64]]

### Exercise 7:

We'd like to determine the spread around the mean success estimate for each country. Using list comprehensions, write a script that subtracts the mean success estimate for a given country from each success estimate for that country. Store the results in a list named ```removed_mean_list```. Round values to two decimal places. Your output should produce the following list of lists:

```
[['Australia', [0.3, 0.03, -0.19, -0.16]],
 ['France', [0.01, 0.12, 0.32, -0.46]],
 ['Italy', [0.0]],
 ['Brazil', [-0.07, -0.07, 0.14]],
 ['USA', [-0.13, 0.17, -0.03]],
 ['England', [0.0]],
 ['Canada', [-0.03, 0.02]],
 ['Argentina', [0.0]],
 ['Greece', [-0.16, 0.05, 0.14, 0.38, -0.46, 0.05]],
 ['Morocco', [0.0]],
 ['Tunisia', [0.06, -0.06]],
 ['Egypt', [0.0]],
 ['Jamaica', [-0.05, -0.01, 0.05]],
 ['Switzerland', [-0.06, 0.07, 0.05, -0.28, 0.2]],
 ['Germany', [0.02, 0.06, -0.07]]]
```

In [61]:
removed_mean_list=[[k,[round(z-(sum(v)/len(v)),2) for z in v]] for k,v in success_estimates.items()]
removed_mean_list



[['Brazil', [-0.07, -0.07, 0.14]],
 ['Canada', [-0.03, 0.02]],
 ['Italy', [0.0]],
 ['USA', [-0.13, 0.17, -0.03]],
 ['France', [0.01, 0.13, 0.32, -0.46]],
 ['Morocco', [0.0]],
 ['Switzerland', [-0.06, 0.07, 0.05, -0.28, 0.2]],
 ['Argentina', [0.0]],
 ['Jamaica', [-0.05, -0.01, 0.05]],
 ['Australia', [0.3, 0.03, -0.19, -0.16]],
 ['England', [0.0]],
 ['Tunisia', [0.06, -0.06]],
 ['Egypt', [0.0]],
 ['Germany', [0.02, 0.06, -0.07]],
 ['Greece', [-0.16, 0.05, 0.14, 0.38, -0.46, 0.05]]]

## Reflecting on the country-specific mean success estimate

Based on the above analysis, we see the mean country success estimates vary widely, from the lowest, Canada = 0.275, to the highest, Egypt = 0.99. However, notice that Egypt's mean is calculated from 1 success estimate. Are we confident in trusting a single estimate as a proxy for the average success estimate?

Given that the global expansion project will utilize valuable company resources, we decide it is best to restrict our analysis to countries that have two or more success estimates. To accomplish this task, we will use a control structure in Python known as the <b>if...elif...else statement</b>. The general structure follows.

```python
if test_expression_1:
    block1_statement(s)
elif test_expression_2:
    block2_statement2(s)
else:
    block3_statement(s)
```

Here, ```test_expression_1``` and ```test_expression_2``` must evaluate to ```True``` or ```False```, a Python <b>boolean</b> type. The boolean type is associated with variables that are either ```True``` or ```False```.

If ```test_expression_1``` is True, ```block1_statement(s)``` will execute and the other block statements will not. If ```test_expression_1``` is False yet ```test_expression_2``` is True, then ```block2_statement2(s)``` will execute and the others will not. Finally, if ```test_expression_1``` and ```test_expression_2``` are both False, then the else section's ```block3_statement(s)``` will execute. This conditional structure of an if statement allows one to control the flow of Python code.

Let's use this to filter out the countries that only have one success estimate.

## Selecting only multi-observation countries for global expansion potential

We will use the if statement above to remove countries with less than one success estimate. For convenience of viewing the result, we will store the mean estimates for each country in a new dictionary ```country_means```.

In [116]:
# Get a list of all the country names
country_name_list = list(success_estimates.keys())

# Create an empty dictionary to hold country mean estimates
country_means = {}

# Loop through all countries and calculate their mean success estimate
for i in country_name_list:
    list_country_estimates = success_estimates[i] # list of estimates for a country
    # if more than one country estimate, then record the mean estimate, otherwise go to next loop iteration
    if len(success_estimates[i]) > 1:
        country_mean_value = sum(list_country_estimates) / len(list_country_estimates)
        country_means[i] =  country_mean_value # insert country mean value into dict using country name as key
    
country_means
#Para paises con 1 elemento
# else:
#         country_mean_value=success_estimates[i]
#         type(country_mean_value)
#         country_means[i]=country_mean_value

{'Australia': 0.29500000000000004,
 'Brazil': 0.29,
 'Canada': 0.275,
 'France': 0.655,
 'Germany': 0.4333333333333333,
 'Greece': 0.61,
 'Jamaica': 0.6566666666666666,
 'Switzerland': 0.7859999999999999,
 'Tunisia': 0.6200000000000001,
 'USA': 0.3333333333333333}

Let's format our results, modifying the string output to the screen to use 2 decimals when printing the float type. This is accomplished using the string type's ```.format()``` functionality. The ```{0:s}``` and ```{1:.2f}``` in the string indicate to the ```.format()``` method to format the first variable it receives as input as a string and replace the ```{0:s}``` placeholder, and to format the second variable it receives as input as a 2-decimal float and replace the ```{1:.2f}``` placeholder.

With this formatting, the ```country_key``` variable will be displayed as a string in place of ```{0:s}```, while the ```country_means[country_key]``` variable will be displayed as a 2-decimal float in place of ```{1:.2f}```.  This advanced string formatting approach is useful to improve the clarity of the results.

In [118]:
# Nicely format the result for printing to the screen
for country_key in country_means: 
    print("Country: {0:s}, Avg Success Estimate: {1:.2f}".format(country_key, country_means[country_key]))

Country: Brazil, Avg Success Estimate: 0.29
Country: Canada, Avg Success Estimate: 0.28
Country: Australia, Avg Success Estimate: 0.30
Country: USA, Avg Success Estimate: 0.33
Country: Jamaica, Avg Success Estimate: 0.66
Country: Tunisia, Avg Success Estimate: 0.62
Country: France, Avg Success Estimate: 0.66
Country: Germany, Avg Success Estimate: 0.43
Country: Greece, Avg Success Estimate: 0.61
Country: Switzerland, Avg Success Estimate: 0.79


Observing the resulting country means, we notice the country with the largest mean success estimate is Switzerland at 0.79, while the lowest mean success estimate is Canada at 0.28.

### Exercise 8:

After reviewing company policy on statistical procedures, you notice the company recommends that all estimates (averages, minimums, maximums) must have at least three values contributing to the summary statistic. Write a for loop and use the ```if``` statement structure to select and print the average success estimates for the countries satisfying this policy. If the country does not satisfy the policy, print the country name and ``"*Does not meet company policy*"``. Each country should appear on a new line.

In [147]:
country_names = success_estimates.keys()
avr_country={}
for i in country_names:
    country_success_list=success_estimates[i]
    if len(country_success_list)>2:
        prom=sum(country_success_list)/len(country_success_list)
        mini=min(country_success_list)
        maxi=max(country_success_list)
        avr_country[i]=[prom,mini,maxi]
    else:
        print('El pais '+ i+' no cumple con las politicas de la empresa')
print('-----------------------------------------------------------------------------')
for k,v in avr_country.items():
    print('Para el pais '+ k+' las estadisticas son: Promedio '+str(round(v[0],2))+' ,Minimo '+str(v[1])+' ,Maximo '+str(v[2]))
    print('----------------------------------------------------------------------------------')
    

El pais Canada no cumple con las politicas de la empresa
El pais Italy no cumple con las politicas de la empresa
El pais Morocco no cumple con las politicas de la empresa
El pais Argentina no cumple con las politicas de la empresa
El pais England no cumple con las politicas de la empresa
El pais Tunisia no cumple con las politicas de la empresa
El pais Egypt no cumple con las politicas de la empresa
-----------------------------------------------------------------------------
Para el pais Brazil las estadisticas son: Promedio 0.29 ,Minimo 0.22 ,Maximo 0.43
----------------------------------------------------------------------------------
Para el pais Jamaica las estadisticas son: Promedio 0.66 ,Minimo 0.61 ,Maximo 0.71
----------------------------------------------------------------------------------
Para el pais Australia las estadisticas son: Promedio 0.3 ,Minimo 0.11 ,Maximo 0.6
----------------------------------------------------------------------------------
Para el pais USA las e

### Exercise 9:

What is another approach to ameliorate the one-sample problem for some countries? Think in terms of the factors that drive confidence in data-driven business decisions.

(a) Group countries together to larger regions to ensure each region has at least one estimate

(b) Only remove a country if its esimates are very large or very small compared to other estimates

(c) Use a different summary statistic for the analysis other than the average value

(d) Revisit why some countries only have one estimate and see if more data can be sourced for these countries

## Putting it all together

We've used for loops and control structures to calculate partial summary statistics for each of the countries. Let's put it all together to obtain a recommendation on which country we should choose to expand the luxury flight services.

### Exercise 10:

Write code to print each country name and summary statistics. Each line should show one country and the corresponding summary statistics: Min Estimate (float), Average Estimate (float), Max Estimate (float), Number of Estimates (int), Meets Company Policy of at least 3 estimates (bool). For example, the line for France would appear as:
```
Country: France , Min: 0.2 , Average: 0.655 , Max: 0.98 , NumEst: 4 , MeetsPolicy: True
```

Now that we have summary statistics for a variety of country estimates, let's construct our actionable business recommendation.

## Conclusions

From the analysis, Switzerland is the country with the largest chance of success for global expansion with an estimated success rate of 0.79.  The summary statistic for Switzerland was calculated with an adequate number of estimates according to company policy. Thus, it is recommended that management explore opportunities in Switzerland for luxury flight services.

In addition, other countries that should be closely monitored for luxury flight services are Jamaica and France, where each held a 0.66 average success estimate. If there are additional resources in the future for further luxury service expansion, these countries may be apt choices.

## Takeaways

In this case, we've learned the foundations of Python. Through identifying global expansion opportunities for an airline company we've covered fundamental data types, control structures, and a useful Python workflow to analyze a given set of data. You also learned about various summary statistics and how much confidence you can have in drawing conclusions from them.

Building on this knowledge, you can use these Python tools as both a foundation and a framework to build more complex projects and solve critical business problems. Python continues to be an outstanding tool to perform data-driven analysis and deliver key business insights.