Often when we're coding we want to control the flow of our actions. This can be done by setting actions to occur only if a condition or a set of conditions are met. Alternatively, we can also set an action to occur a particular number of times.

There are several ways you can control flow in Python. For conditional statements, the most commonly used approaches are the constructs:

```
# if
if condition is true:
  perform action


# if ... else
if condition is true:
  perform action
else:                # that is, if the condition is false,
  perform alternative action
```

Say, for example, that we want Python to print a message if a variable x has a particular value:

In [15]:
# sample a random number from a Poisson distribution
# with a mean (lambda) of 8
import numpy as np

x = np.random.poisson(lam=8, size=1)

if x >= 10:
  print("x is greater than or equal to 10")

x

x is greater than or equal to 10


array([11])

Let's set a seed so that we generate the same 'pseudo-random' number each time, and then print more information:

In [16]:
import numpy as np

np.random.seed(10)

x = np.random.poisson(lam=8, size=1)

if x >= 10:
    print("x is greater than or equal to 10")
elif x>5:
    print("x is greater than 5")
else:
    print("x is less than 5")
    
x

x is greater than 5


array([6])

## Tip: pseudo-random numbers
In the above case, the function np.random.poisson() generates a random number following a Poisson distribution with a mean (i.e. lambda) of 8. The function np.random.seed() guarantees that all machines will generate the exact same 'pseudo-random' number (more about pseudo-random numbers). So if we np.random.seed(10), we see that x takes the value 6. You should get the exact same number.

**Important:** when Python evaluates the condition inside if() statements, it is looking for a logical element, i.e., `TRUE` or `FALSE`. This can cause some headaches for beginners. For example:

In [17]:
x = 4 == 3
if (x):
  "4 equals 3"

As we can see, the message was not printed because the vector x is `FALSE`

In [18]:
x = 4 == 3
x

False

## Challenge 1

Use an `if` statement to print a suitable message reporting whether there are any records from 2002 in the `gapminder` dataset. Now do the same for 2012.

### Solution to Challenge 1

We first obtain a filtered data frame describing which element of `gapminder['year']` is equal to `2002`:

In [27]:
import pandas as pd

gapminder = pd.read_csv("https://raw.githubusercontent.com/mydatastory-dev/r_intro_class/master/data/gapminder.csv")
gapminder[gapminder.year == 2002].head()

Unnamed: 0,country,year,pop,continent,lifeExp,gdpPercap
10,Afghanistan,2002,25268405.0,Asia,42.129,726.734055
22,Albania,2002,3508512.0,Europe,75.651,4604.211737
34,Algeria,2002,31287142.0,Africa,70.994,5288.040382
46,Angola,2002,10866106.0,Africa,41.003,2773.287312
58,Argentina,2002,38331121.0,Americas,74.34,8797.640716


Then, we count the number of rows of the data frame `gapminder` that correspond to the 2002:

In [30]:
rows2002_number = len(gapminder[gapminder.year == 2002])
rows2002_number

142

The presence of any record for the year 2002 is equivalent to the request that `rows2002_number` is one or more:

In [31]:
rows2002_number >= 1

True

Putting all together, we obtain:

In [32]:
if len(gapminder[gapminder.year == 2002]) >= 1:
    print("Record(s) for the year 2002 found.")

Record(s) for the year 2002 found.


# Repeating operations

If you want to iterate over a set of values, when the order of iteration is important, and perform the same operation on each, a `for` loop will do the job. We saw `for` loops in the shell lessons earlier. This is the most flexible of looping operations, but therefore also the hardest to use correctly. Avoid using `for` loops unless the order of iteration is important: i.e. the calculation at each iteration depends on the results of previous iterations.

The basic structure of a `for` loop is:

```
for (iterator) in (set of values):
  do a thing
```

For example:

In [33]:
for i in range(10):
  print(i)

0
1
2
3
4
5
6
7
8
9


The `range(10)` bit creates a vector on the fly; you can iterate over any other vector as well.

We can use a `for` loop nested within another `for` loop to iterate over two things at once.

In [34]:
for i in range(5):
  for j in ['a', 'b', 'c', 'd', 'e']:
    print(i,j)

0 a
0 b
0 c
0 d
0 e
1 a
1 b
1 c
1 d
1 e
2 a
2 b
2 c
2 d
2 e
3 a
3 b
3 c
3 d
3 e
4 a
4 b
4 c
4 d
4 e


Rather than printing the results, we could write the loop output to a new object.

In [38]:
output_list = []
for i in range(5):
  for j in ['a', 'b', 'c', 'd', 'e']:
    temp_output = (i,j)
    output_list.append(temp_output)
    
output_list

[(0, 'a'),
 (0, 'b'),
 (0, 'c'),
 (0, 'd'),
 (0, 'e'),
 (1, 'a'),
 (1, 'b'),
 (1, 'c'),
 (1, 'd'),
 (1, 'e'),
 (2, 'a'),
 (2, 'b'),
 (2, 'c'),
 (2, 'd'),
 (2, 'e'),
 (3, 'a'),
 (3, 'b'),
 (3, 'c'),
 (3, 'd'),
 (3, 'e'),
 (4, 'a'),
 (4, 'b'),
 (4, 'c'),
 (4, 'd'),
 (4, 'e')]

This approach can be useful, but 'growing your results' (building the result object incrementally) is computationally inefficient, so avoid it when you are iterating through a lot of values.

### Tip: don't grow your results

One of the biggest things that trips up novices and experienced Python users alike, is building a results object (vector, list, matrix, data frame) as your for loop progresses. Computers are very bad at handling this, so your calculations can very quickly slow to a crawl. It's much better to define an empty results object before hand of the appropriate dimensions. So if you know the end result will be stored in a matrix like above, create an empty matrix with 5 row and 5 columns, then at each iteration store the results in the appropriate location.

A better way is to define your (empty) output object before filling in the values. For this example, it looks more involved, but is still more efficient.

In [44]:
output_matrix = np.chararray((5,5))

for i in range(5):
  for j in ['a', 'b', 'c', 'd', 'e']:
    np.put(output_matrix, [i,j], (i,j))
    
    
output_matrix

ValueError: invalid literal for int() with base 10: 'a'

### Tip: While loops

Sometimes you will find yourself needing to repeat an operation until a certain condition is met. You can do this with a `while` loop.

```
while (this condition is true):
  do a thing
```

As an example, here's a while loop that generates random numbers from a uniform distribution (the `np.random.normal()` function) between 0 and 1 until it gets one that's less than 0.1.

In [45]:
z = 1
while z > 0.1:
    z = np.random.normal(1)
    print(z)

0.34900912034698395
1.5125677606032588
1.262388824861749
0.4172683521575127
2.3684219313478225
1.053368656156947
2.4905736372489926
1.7563669302802125
1.4696409251121245
0.766037409654811
1.3536579387976073
2.783700941384823
0.6496987861532273
0.010350395858569339


`while` loops will not always be appropriate. You have to be particularly careful that you don't end up in an infinite loop because your condition is never met.

## Challenge 2

Compare the objects output_vector and output_vector2. Are they the same? If not, why not? How would you change the last block of code to make output_vector2 the same as output_vector?

# Needs to be look at, issues with numpy matrix with characters

## Challenge 3

Write a script that loops through the `gapminder` data by continent and prints out whether the mean life expectancy is smaller or larger than 50 years.

### Solution to Challenge 3

Cant find FiveYearData.csv

# Long & Wide Data Formats

Often, you'll need to convert a dataframe from wide to long format. And sometimes, you may need to go the other way. To illustrate, the file home_ownership.txt contains "homeownership rates, in percentages, by state for the years 1985, 1996, and 2002. These values represent the proportion of homes owned by the occupant to the total number of occupied homes" (Ott & Longnecker, 2016, p. 129). The file used in the code below contains just 11 rows of the complete dataset.

In [46]:
# just use .T on a pandas df

Data Source: Statistical Methods and Data Analysis (p. 129)  

As you can see, the year columns in the wide format are stacked when converted to the long format.

## Challenge 6

Now that you've had a chance to see how to convert a dataframe from wide to long format, let's practice doing this with the data in wide.csv. Because the data in this file is in a wide format, you'll need to write code to convert it to a long format. And then, as a test of your knowledge, convert the file back to wide.

## Challenge 7