# Control Structures
The last chapter promised a look into loops as a solution to a few limitations of the base python functions. The idea that we were limited is not true of course! Programming in general relies heavily on iterative lines of code that can repeat, change values, and follow steps of logic written by the programmer. Consider any sort of repetitive tasks you have to perform often, and think about how much easier it would be if you could automate those tasks. Consider, for example, one of your responsibilities was to update your school's ACLED conflict data once every quarter. This would mean having to download all regions' individual CSV files, merging, and cleaning some variables you pay close attention to. This is what you will get practice doing in this chapter, creating loops (and functions!) that can repeat these steps as many times as there are CSV files to merge so your job will be that much easier by next quarter.

It has been our personal experience that many Python lessons and tutorials give abstract examples when teaching learners about looping and conditional statements. Hopefully this chapter will keep the lessons and examples grounded in real data with real tasks you might perform as a social science researcher and analyst. However, you will still need to see some abstract examples to start off each section of this chapter. It is important that you can recognize all the components of a loop before focusing on practical applications. 
(more?)

### Basic concepts
When writing loops, you will be repeating (or _iterating_) functions _x_ number of times. The value of that _x_ is _very_ important, and the value usually comes from the shape of a data object: Rows or columns. If we wanted to capitalize all the string characters in a dataframe, we would loop over the dataframe's columns. If we wanted to get True and False values for values greater than 170 in our `height` data, we would loop over the number of observations. To access these, we are going to be using a number of functions that you've come across reading this book like pandas' `.columns` function, and `len()` to obtain the number of observations in a vector. You will also need to combine `len()` with `range()`, which gives you start and end values for your loops. And you will most definitely become better acquainted with vector indexes to extract specific values from a vector.

Let's start with thinking about how many observations/how much data we're dealing with: Create the `height` object once again and see what len() and range() give you. 

In [1]:
# This code cell will be in every one of our chapters in Jupyter Notebook
# The function allows you to see every line of output when the code has multiple lines
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

In [2]:
height = [177, 174, 170, 183, 168, 
          182, 163, 191, 177, 176, 
          173, 186, 174, 168, 184, 
          170, 170, 192, 181, 173]
height
len(height)
range(20)
range(len(height))

[177,
 174,
 170,
 183,
 168,
 182,
 163,
 191,
 177,
 176,
 173,
 186,
 174,
 168,
 184,
 170,
 170,
 192,
 181,
 173]

20

range(0, 20)

range(0, 20)

## For Loops
The for-loop repeats itself as many times as the values you give it, "for this many values". If a vector had twenty values, a for loop that iterated over this vector would do so twenty times, until it reached the end of the length the object. This is why len() and range() work so well together in for-loops. The basic grammar is:

`for i in x: do y()`

We use some random variable after 'for' (and by that we mean a letter like in algebra) as the object to store the value of the current iteration. Historically this is the letter `i`. It does not _need_ to be `i`, it is merely the first letter of the word 'iteration'.

<div class="alert alert-block alert-info">Note: as you type the first line "for i in height:", when you press the enter key after the colon :, jupyter notebook automatically makes an indentation, extra space on the left. This lets Python know that this is not a separate line of code but a visual break and still a part of the code. You could also keep the print() function on the same line without a line break, we just think it looks neater.</div>

The function you're calling in these examples is just `i` because we want you to know what `i`, the variable after 'for', means. Depending on the python object after 'in', the value of `i` changes. If you are still confused, refer to the previous code cell where we printed the values of `height`, `len(height)`, and `range(len(height))`.

In [3]:
for i in height:
    i # here, 'i' stands for the vector index's value, or height[i]


for i in range(len(height)):
    i # here 'i' stands for a value in the range of 0 - 20

177

174

170

183

168

182

163

191

177

176

173

186

174

168

184

170

170

192

181

173

0

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

With the second loop, where we iterate over the _range of the length_ of the height vector, we get values from 0 to 19. Incidentally, these are the indexes for the position of each observation in the vector. So we can extract a vector's values in order and perform math operations on each value. Remember last chapter we needed NumPy to obtain the transformation of `height` into inches? 

In [4]:
for i in range(len(height)):
    height[i]*0.393701

69.685077

68.503974

66.92917

72.04728300000001

66.141768

71.653582

64.173263

75.19689100000001

69.685077

69.291376

68.110273

73.228386

68.503974

66.141768

72.440984

66.92917

66.92917

75.590592

71.25988100000001

68.110273

This is a step in the right direction, but we didn't really get a list object back, only 20 independent values for height in inches. But we can take advantage of the `append()` function on an empty list!

In [5]:
height_inches=list()
height_inches


for i in range(len(height)):
    height_inches.append(height[i]*0.393701)

height_inches
type(height_inches)

[]

[69.685077,
 68.503974,
 66.92917,
 72.04728300000001,
 66.141768,
 71.653582,
 64.173263,
 75.19689100000001,
 69.685077,
 69.291376,
 68.110273,
 73.228386,
 68.503974,
 66.141768,
 72.440984,
 66.92917,
 66.92917,
 75.590592,
 71.25988100000001,
 68.110273]

list

## List Comprehension
If we are mainly interested in creating a new vector, however, __list comprehension__ is the preferred loop process in Python. It looks and behaves like the for-loop, but it has a more succinct syntax, and actually processes more quickly. The basic command for a list comprehension looks like this: 

`[do y for i in x]` 

The line comprehension is wrapped in square brackets, which is how you assign a list to an object. Look at some examples of the very base syntax of line comprehension below. The iteration process is similar to the for-loop, but the output is automatically formatted as a list class. 

In [6]:
[i for i in height] # 'i' stands for the value in the corresponding index of the vector. It returns the exact same vector.

[i for i in range(len(height))] # 'i' stands for the position of the index, as opposed to the value. It still outputs a list class object

[i/10 for i in height] # all our operations happen to the left of the 'for'

[177,
 174,
 170,
 183,
 168,
 182,
 163,
 191,
 177,
 176,
 173,
 186,
 174,
 168,
 184,
 170,
 170,
 192,
 181,
 173]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

[17.7,
 17.4,
 17.0,
 18.3,
 16.8,
 18.2,
 16.3,
 19.1,
 17.7,
 17.6,
 17.3,
 18.6,
 17.4,
 16.8,
 18.4,
 17.0,
 17.0,
 19.2,
 18.1,
 17.3]

Now all that's left to get the height in inches vector again (and more efficiently) is to assign the new object to the list comprehension with nearly identical syntax to our last for-loop. The difference, as you can appreciate below, is that the assignment occurs to the left of the loop, and the operation $height \times 0.393701$ happens to the left of the 'for' term.

In [7]:
height_inches = [height[i]*0.393701 for i in range(len(height))]

type(height_inches)
height_inches

list

[69.685077,
 68.503974,
 66.92917,
 72.04728300000001,
 66.141768,
 71.653582,
 64.173263,
 75.19689100000001,
 69.685077,
 69.291376,
 68.110273,
 73.228386,
 68.503974,
 66.141768,
 72.440984,
 66.92917,
 66.92917,
 75.590592,
 71.25988100000001,
 68.110273]

In case you were wondering about the awkward looking, double-nested expression `range(len(height))`, we used it because 'height' has the number of iterations we are interested in, and we don't know if that number might ever change. Who's to say that the height vector won't be thirty observations tomorrow? However, you can certainly give for-loops and list comprehensions an explicit range if you really want to:

In [8]:
[height[i]*0.393701 for i in range(20)]

[69.685077,
 68.503974,
 66.92917,
 72.04728300000001,
 66.141768,
 71.653582,
 64.173263,
 75.19689100000001,
 69.685077,
 69.291376,
 68.110273,
 73.228386,
 68.503974,
 66.141768,
 72.440984,
 66.92917,
 66.92917,
 75.590592,
 71.25988100000001,
 68.110273]

## While Loops
When the exact number of iterations in your loop depends on a condition, then a while loop can help you make iteration for as long as a logical true-false condition remains "true". This is called "indefinite iteration", and we lack any kind of 'for' object to reference. 

Add more here, cant think of good examples

In [9]:
i=0 # manually set a starting value
while i < 10:
    print(i,' is less than 10')
    i=i+1 # and manually increment the value of 'i' since there is no reference object to follow along

0  is less than 10
1  is less than 10
2  is less than 10
3  is less than 10
4  is less than 10
5  is less than 10
6  is less than 10
7  is less than 10
8  is less than 10
9  is less than 10


## Break and Continue

## Nested Loops

## Conditional Statements

## Applied Example
- Reintroduce data briefly
- Will be using three packages from the python common library
- will use os to see local files and turn these intro string values using list comp
- will use zipfile to extract CSV files from ZIP files.
- will use pandas to make our dataframes within a for-loop
- will keep track of progress with print and the iteration variable, and well as appending the number of unique countries in each iteration to a vector to check our final number of countries is correct.


In [55]:
# Import the 'os' package that lets you browse your computer directories.
import os
os.getcwd()

# make a text vector that has the filenames of all the zip files in the data folder. we use list comprehension for this task
path = '../../Data/ACLED/'
zip_files = [i for i in os.listdir(path) if i.endswith('.zip')]
zip_files

'/home/fernando/Documents/UCLA/DataX/Python_for_Social_Science/lessons/control_structures'

['1900-01-01-2022-04-22-Middle_East.csv.zip',
 '1900-01-01-2022-04-22-South_Asia.csv.zip']

Unzip all the files in your list. 

In [56]:
# import the zipfile package to unzip it all in a simple for-loop

from zipfile import ZipFile 

for i in range(len(zip_files)):
    filename=path+zip_files[i]
    with ZipFile(filename, 'r') as f:
        f.extractall(path)

You are going to make another vector for all the CSV files in the ACLED directory. We can reuse the `path` object and we should also see the newly unzipped 'Middle East' and 'South Asia' CSV files. 

In [57]:
csv_files = [i for i in os.listdir(path) if i.endswith('.csv')]
csv_files

['1900-01-01-2022-04-22-South_Asia.csv',
 '1900-01-01-2022-04-22-Southern_Africa.csv',
 '1900-01-01-2022-04-22-East_Asia.csv',
 '1900-01-01-2022-04-22-Southeast_Asia.csv',
 '1900-01-01-2022-04-22-Middle_Africa.csv',
 '1900-01-01-2022-04-22-Caucasus_and_Central_Asia.csv',
 '1900-01-01-2022-04-22-South_America.csv',
 '1900-01-01-2022-04-22-North_America.csv',
 '1900-01-01-2022-04-22-Northern_Africa.csv',
 '1900-01-01-2022-04-22-Central_America.csv',
 '1900-01-01-2022-04-22-Western_Africa.csv',
 '1900-01-01-2022-04-22-Europe.csv',
 '1900-01-01-2022-04-22-Caribbean.csv',
 '1900-01-01-2022-04-22-Eastern_Africa.csv',
 '1900-01-01-2022-04-22-Middle_East.csv',
 '1900-01-01-2022-04-22-Oceania.csv']

In [51]:
import pandas as pd

acled_merge=[]
acled_merge=pd.DataFrame(acled_merge)
type(acled_merge)

pandas.core.frame.DataFrame

In [58]:
# nest a for loop such that you have one iteration variable 'i' equal to the csv file name for loading
# and another iterator where 'j' equals the numeric start and end of the range (do i need this?)
unique_countries = list()
for i in csv_files:
    filename=path+i
    temp=pd.read_csv(filename, low_memory=False)
    acled_merge=pd.concat([acled_merge, temp])
    print('For csv file ',i, ' there are ', len(temp['country'].unique()), ' unique countries.')
    unique_countries.append(len(temp['country'].unique()))

len(acled_merge['country'].unique())
sum(unique_countries)

For csv file  1900-01-01-2022-04-22-South_Asia.csv  there are  7  unique countries
For csv file  1900-01-01-2022-04-22-Southern_Africa.csv  there are  8  unique countries
For csv file  1900-01-01-2022-04-22-East_Asia.csv  there are  6  unique countries
For csv file  1900-01-01-2022-04-22-Southeast_Asia.csv  there are  11  unique countries
For csv file  1900-01-01-2022-04-22-Middle_Africa.csv  there are  9  unique countries
For csv file  1900-01-01-2022-04-22-Caucasus_and_Central_Asia.csv  there are  9  unique countries
For csv file  1900-01-01-2022-04-22-South_America.csv  there are  14  unique countries
For csv file  1900-01-01-2022-04-22-North_America.csv  there are  5  unique countries
For csv file  1900-01-01-2022-04-22-Northern_Africa.csv  there are  6  unique countries
For csv file  1900-01-01-2022-04-22-Central_America.csv  there are  7  unique countries
For csv file  1900-01-01-2022-04-22-Western_Africa.csv  there are  16  unique countries
For csv file  1900-01-01-2022-04-22-Eu

228

228

In [64]:
acled_merge.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1399627 entries, 0 to 1734
Data columns (total 31 columns):
 #   Column            Non-Null Count    Dtype  
---  ------            --------------    -----  
 0   data_id           1399627 non-null  int64  
 1   iso               1399627 non-null  int64  
 2   event_id_cnty     1399627 non-null  object 
 3   event_id_no_cnty  1399627 non-null  float64
 4   event_date        1399627 non-null  object 
 5   year              1399627 non-null  int64  
 6   time_precision    1399627 non-null  int64  
 7   event_type        1399627 non-null  object 
 8   sub_event_type    1399627 non-null  object 
 9   actor1            1399627 non-null  object 
 10  assoc_actor_1     529562 non-null   object 
 11  inter1            1399627 non-null  int64  
 12  actor2            717933 non-null   object 
 13  assoc_actor_2     179397 non-null   object 
 14  inter2            1399627 non-null  int64  
 15  interaction       1399627 non-null  int64  
 16  region  

## To-do
- Control structures
    - Loops: for and while,
    - list comprehension
    - break and continue
    - nesting.
- Conditional statements:
    - if,
    - else,
    - elif.
- Errors - logs, try-catch, debugging. 
