# Logic, Control Flow and Filtering
**Boolean logic is the foundation of decision-making in Python programs. Learn about different comparison operators, how to combine them with Boolean operators, and how to use the Boolean outcomes in control structures. You'll also learn to filter data in pandas DataFrames using logic.**

## Compare arrays
Out of the box, you can also use comparison operators with Numpy arrays.

Remember `areas`, the list of area measurements for different rooms in your house from Introduction to Python? This time there's two Numpy arrays: `my_house` and `your_house`. They both contain the areas for the kitchen, living room, bedroom and bathroom in the same order, so you can compare them.

Using comparison operators, generate boolean arrays that answer the following questions:

- Which areas in `my_house` are greater than or equal to `18`?
- You can also compare two Numpy arrays element-wise. Which areas in `my_house` are smaller than the ones in `your_house`?
- Make sure to wrap both commands in a `print()` statement so that you can inspect the output!

In [2]:
# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

# my_house greater than or equal to 18
print(my_house >= 18)

# my_house less than your_house
print(my_house < your_house)

[ True  True False False]
[False  True  True False]


## Boolean operators with Numpy
Before, the operational operators like `<` and `>=` worked with Numpy arrays out of the box. Unfortunately, this is not true for the boolean operators `and`, `or`, and `not`.

To use these operators with Numpy, you will need `np.logical_and()`, `np.logical_or()` and `np.logical_not()`. Here's an example on the `my_house` and `your_house` arrays from before to give you an idea:

```python
np.logical_and(my_house > 13, 
               your_house < 15)
```

- Generate boolean arrays that answer the following questions:
    - Which areas in `my_house` are greater than `18.5` or smaller than `10`?
    - Which areas are smaller than `11` in both `my_house` and `your_house`? Make sure to wrap both commands in `print()` statement, so that you can inspect the output.

In [4]:
# Create arrays
import numpy as np
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

# my_house greater than 18.5 or smaller than 10
print(np.logical_or(my_house > 18.5, my_house < 10))

# Both my_house and your_house smaller than 11
print(np.logical_and(my_house < 11, your_house < 11))

[False  True False  True]
[False False False  True]


## Driving right (1)
Remember that `cars` dataset, containing the cars per 1000 people (`cars_per_cap`) and whether people drive right (`drives_right`) for different countries (`country`)? 

Let's start simple and try to find all observations in `cars` where `drives_right` is `True`.

`drives_right` is a boolean column, so you'll have to extract it as a Series and then use this boolean Series to select observations from `cars`.

- Extract the `drives_right` column as a Pandas Series and store it as `dr`.
- Use `dr`, a boolean Series, to subset the cars DataFrame. Store the resulting selection in `sel`.
- Print `sel`, and assert that `drives_right` is `True` for all observations.

In [6]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Extract drives_right column as Series: dr
dr = cars['drives_right'] == True

# Use dr to subset cars: sel
sel = cars[dr]

# Print sel
print(sel)

     cars_per_cap        country  drives_right
US            809  United States          True
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True


## Driving right (2)
The code in the previous example worked fine, but you actually unnecessarily created a new variable `dr`. You can achieve the same result without this intermediate variable. Put the code that computes `dr` straight into the square brackets that select observations from `cars`.

- Convert the code to a one-liner that calculates the variable `sel` as before.

In [11]:
# Convert code to a one-liner
sel = cars[cars['drives_right'] == True]

# Print sel
print(sel)

     cars_per_cap        country  drives_right
US            809  United States          True
RU            200         Russia          True
MOR            70        Morocco          True
EG             45          Egypt          True


## Cars per capita (1)
Let's stick to the `cars` data some more. This time you want to find out which countries have a high cars per capita figure. In other words, in which countries do many people have a car, or maybe multiple cars.

Similar to the previous example, you'll want to build up a boolean Series, that you can then use to subset the `cars` DataFrame to select certain observations. If you want to do this in a one-liner, that's perfectly fine!

- Select the `cars_per_cap` column from `cars` as a Pandas Series and store it as `cpc`.
- Use `cpc` in combination with a comparison operator and `500`. You want to end up with a boolean Series that's `True` if the corresponding country has a `cars_per_cap` of more than `500` and `False` otherwise. Store this boolean Series as `many_cars`.
- Use `many_cars` to subset `cars`, similar to what you did before. Store the result as `car_maniac`.
- Print out `car_maniac` to see if you got it right.

In [14]:
# Create car_maniac: observations that have a cars_per_cap over 500
cpc = cars['cars_per_cap']
many_cars = cpc > 500
car_maniac = cars[many_cars]

# Print car_maniac
print(car_maniac)

     cars_per_cap        country  drives_right
US            809  United States          True
AUS           731      Australia         False
JAP           588          Japan         False


## Cars per capita (2)
Remember about `np.logical_and()`, `np.logical_or()` and `np.logical_not()`, the Numpy variants of the `and`, `or` and `not` operators? You can also use them on Pandas Series to do more advanced filtering operations.

Take this example that selects the observations that have a `cars_per_cap` between 10 and 80. Try out these lines of code step by step to see what's happening.
```python
cpc = cars['cars_per_cap']
between = np.logical_and(cpc > 10, cpc < 80)
medium = cars[between]
```

- Use the code sample provided to create a DataFrame `medium`, that includes all the observations of `cars` that have a `cars_per_cap` between `100` and `500`.
- Print out `medium`.

In [17]:
# Create medium: observations with cars_per_cap between 100 and 500
cpc = cars['cars_per_cap']
between = np.logical_and(cpc > 100, cpc < 500)
medium = cars[between]

# Print medium
print(medium)

    cars_per_cap country  drives_right
RU           200  Russia          True


---

# Loops
**There are several techniques you can use to repeatedly execute Python code. While loops are like repeated if statements, the for loop iterates over all kinds of data structures. Learn all about them in this chapter.**

## Loop over list of lists
Remember the `house` variable from the Intro to Python course? Have a look at its definition in the script. It's basically a list of lists, where each sublist contains the name and area of a room in your house.

It's up to you to build a `for` loop from scratch this time!

- Write a `for` loop that goes through each sublist of `house` and prints out `the x is y sqm`, where x is the name of the room and y is the area of the room.

In [18]:
# house list of lists
house = [["hallway", 11.25], 
         ["kitchen", 18.0], 
         ["living room", 20.0], 
         ["bedroom", 10.75], 
         ["bathroom", 9.50]]
         
# Build a for loop from scratch
for item in house:
    print('the ' + item[0] + ' is ' + str(item[1]) + ' spm')

the hallway is 11.25 spm
the kitchen is 18.0 spm
the living room is 20.0 spm
the bedroom is 10.75 spm
the bathroom is 9.5 spm


## Loop over dictionary
In Python 3, you need the `items()` method to loop over a dictionary:

```python
world = { "afghanistan":30.55, 
          "albania":2.77,
          "algeria":39.21 }

for key, value in world.items() :
    print(key + " -- " + str(value))
```

Remember the `europe` dictionary that contained the names of some European countries as key and their capitals as corresponding value? Go ahead and write a loop to iterate over it!

- Write a `for` loop that goes through each key:value pair of `europe`. On each iteration, `"the capital of x is y"` should be printed out, where x is the key and y is the value of the pair.

In [19]:
# Definition of dictionary
europe = {'spain':'madrid', 'france':'paris', 'germany':'berlin',
          'norway':'oslo', 'italy':'rome', 'poland':'warsaw', 'austria':'vienna' }
          
# Iterate over europe
for key, value in europe.items():
    print('the capital of ' + key + ' is ' + value)

the capital of spain is madrid
the capital of france is paris
the capital of germany is berlin
the capital of norway is oslo
the capital of italy is rome
the capital of poland is warsaw
the capital of austria is vienna


## Loop over DataFrame (1)
Iterating over a Pandas DataFrame is typically done with the `iterrows()` method. Used in a `for` loop, every observation is iterated over and on every iteration the row label and actual row contents are available:
```python
for lab, row in brics.iterrows() :
    ...
```
In this and the following exercises you will be working on the `cars` DataFrame. It contains information on the cars per capita and whether people drive right or left for seven countries in the world.

- Write a `for` loop that iterates over the rows of `cars` and on each iteration perform two `print()` calls: one to print out the row label and one to print out all of the rows contents.

In [20]:
# Iterate over rows of cars
for lab, row in cars.iterrows():
    print(lab)
    print(row)

US
cars_per_cap              809
country         United States
drives_right             True
Name: US, dtype: object
AUS
cars_per_cap          731
country         Australia
drives_right        False
Name: AUS, dtype: object
JAP
cars_per_cap      588
country         Japan
drives_right    False
Name: JAP, dtype: object
IN
cars_per_cap       18
country         India
drives_right    False
Name: IN, dtype: object
RU
cars_per_cap       200
country         Russia
drives_right      True
Name: RU, dtype: object
MOR
cars_per_cap         70
country         Morocco
drives_right       True
Name: MOR, dtype: object
EG
cars_per_cap       45
country         Egypt
drives_right     True
Name: EG, dtype: object


## Loop over DataFrame (2)
The row data that's generated by `iterrows()` on every run is a Pandas Series. This format is not very convenient to print out. Luckily, you can easily select variables from the Pandas Series using square brackets:
```python
for lab, row in brics.iterrows() :
    print(row['country'])
```

- Using the iterators `lab` and `row`, adapt the code in the for loop such that the first iteration prints out `"US: 809"`, the second iteration `"AUS: 731"`, and so on.
- The output should be in the form `"country: cars_per_cap"`. Make sure to print out this exact string (with the correct spacing).
    - *You can use `str()` to convert your integer data to a string so that you can print it in conjunction with the country label.*

In [21]:
# Adapt for loop
for lab, row in cars.iterrows():
    print(lab + ': ' + str(row[0]))

US: 809
AUS: 731
JAP: 588
IN: 18
RU: 200
MOR: 70
EG: 45


## Add column (1)
```python
for lab, row in brics.iterrows() :
    brics.loc[lab, "name_length"] = len(row["country"])
```
You can do similar things on the cars DataFrame.

- Use a `for` loop to add a new column, named `COUNTRY`, that contains a uppercase version of the country names in the `"country"` column. You can use the string method `upper()` for this.
- To see if your code worked, print out `cars`. Don't indent this code, so that it's not part of the `for` loop.

In [24]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Code for loop that adds COUNTRY column
for lab, row in cars.iterrows():
    cars.loc[lab, 'COUNTRY'] = row['country'].upper()

# Print cars
print(cars)

     cars_per_cap        country  drives_right        COUNTRY
US            809  United States          True  UNITED STATES
AUS           731      Australia         False      AUSTRALIA
JAP           588          Japan         False          JAPAN
IN             18          India         False          INDIA
RU            200         Russia          True         RUSSIA
MOR            70        Morocco          True        MOROCCO
EG             45          Egypt          True          EGYPT


## Add column (2)
Using `iterrows()` to iterate over every observation of a Pandas DataFrame is easy to understand, but not very efficient. On every iteration, you're creating a new Pandas Series.

If you want to add a column to a DataFrame by calling a function on another column, the `iterrows()` method in combination with a `for` loop is not the preferred way to go. Instead, you'll want to use `apply()`.

Compare the `iterrows()` version with the `apply()` version to get the same result in the `brics` DataFrame:
```python
for lab, row in brics.iterrows() :
    brics.loc[lab, "name_length"] = len(row["country"])

brics["name_length"] = brics["country"].apply(len)
```
We can do a similar thing to call the `upper()` method on every name in the `country` column. However, `upper()` is a **method**, so we'll need a slightly different approach:

- Replace the `for` loop with a one-liner that uses `.apply(str.upper)`. The call should give the same result: a column `COUNTRY` should be added to `cars`, containing an uppercase version of the country names.
- As usual, print out `cars` to see the fruits of your hard labor

In [25]:
# Import cars data
import pandas as pd
cars = pd.read_csv('cars.csv', index_col = 0)

# Use .apply(str.upper)
cars['COUNTRY'] = cars['country'].apply(str.upper)

print(cars)

     cars_per_cap        country  drives_right        COUNTRY
US            809  United States          True  UNITED STATES
AUS           731      Australia         False      AUSTRALIA
JAP           588          Japan         False          JAPAN
IN             18          India         False          INDIA
RU            200         Russia          True         RUSSIA
MOR            70        Morocco          True        MOROCCO
EG             45          Egypt          True          EGYPT
