# Week 2 - Controlling the flow and pace of your code

#### The following play critical roles:  
1. Indentation - running blocks of code.
2. Time Delays - pacing the speed of our code.
3. For Loops - iterating through data
4. Data iteration with ```List Comprehension```
5. For Loops through multiple but related lists


## 1. Indentation

* Python is unique in requiring indentations.
* Indentations signify the start and end of code that belongs together (code blocks).
* Without proper indentation, your code won't do what you expect.
* Not working as expected? Check if you have indented correctly!

### Basic Flow Example: A Counter

In [10]:
## Using a While loop build a counter that counts from 1 to 5.
## Print the counter numbers in statement that reads "The count is" whatever the count is.
## Once it reaches 5, it should print "Done counting to 5!"

counter = 0
while counter < 5:
    counter += 1
    print(f"The count is {counter}")
    
#     counter = counter + 1

print("Done counting to 5")
    



The count is 1
The count is 2
The count is 3
The count is 4
The count is 5
Done counting to 5


In [11]:
## the order matters

counter = 1
while counter <= 5:
  print(f"The count is {counter}")
  # counter = counter + 1
  counter+=1
print("Done counting to 5")

The count is 1
The count is 2
The count is 3
The count is 4
The count is 5
Done counting to 5


### You just controlled flow using indentation and a while loop.

(<a href="https://docs.google.com/presentation/d/1ZsrzpQHTXK35pd3io8rwQUX10rBHf1T9iYz_SaqbhQo/edit?usp=sharing">slides</a>)

## How fast does our code run?

In [13]:
## import a package that keeps time
import datetime as dt

In [14]:
## loop through with time keeping

counter = 1
while counter <= 5:
    current_time = dt.datetime.now()
    print(f"The count is {counter} at exactly {current_time}")
    counter+=1
print("Done counting to 5")

The count is 1 at exactly 2022-09-12 14:31:15.194668
The count is 2 at exactly 2022-09-12 14:31:15.195084
The count is 3 at exactly 2022-09-12 14:31:15.195163
The count is 4 at exactly 2022-09-12 14:31:15.195231
The count is 5 at exactly 2022-09-12 14:31:15.195296
Done counting to 5


## 2. Time Delays

**Delay timers** are critical when scraping data from websites for several reasons. The **two** most important reasons are:

1. Sometimes your scraper clicks on links and must wait for the content to actually populated on the new page. Your script is likely to run faster than a page can load.


2. You don't want your scraper to be mistaken for a hostile attack on a server. You have to slow down the scrapes.

### Step 1 - Import required libraries

In [15]:
# time is required. we will use its sleep function

import time

#### Let's add a 5-second delay:

In [16]:
## A DELAY
counter = 1
while counter <= 5:
    current_time = dt.datetime.now()
    print(f"The count is {counter} at exactly {current_time}")
    counter+=1
    time.sleep(5)
print("Done counting to 5")


The count is 1 at exactly 2022-09-12 14:36:17.735550
The count is 2 at exactly 2022-09-12 14:36:22.742287
The count is 3 at exactly 2022-09-12 14:36:27.750708
The count is 4 at exactly 2022-09-12 14:36:32.755111
The count is 5 at exactly 2022-09-12 14:36:37.760052
Done counting to 5


### Randomize

Software that tracks traffic to a server might grow suspicious about a hit every nth seconds.

Let's **randomize** the time between hits by using ```randint``` from the ```random``` library.


You might sometimes see me use ```randrange``` from the ```random``` library: ``` from random import randrange```.

#### What's the difference?

**Difference 1**

```randrange``` is exclusive of the final range value.

```randint``` is inclusive of the final range value.

**Difference 2**

```randrange``` allows you to add a step: ```randrange(start, end, step)```

```randint ``` only has start and end: ```randint(start, end)```


In [25]:
# import randint necessary library
from random import randint
randint(1,10)

10

In [47]:
## import RANDRANGE necessary library
from random import randrange

randrange(1,10,3)

4

In [50]:
# RANDOMIZE THE OUR WAIT TIME
counter = 1
while counter <= 5:
    current_time = dt.datetime.now()
    snoozer = randint(3, 9)
    print(f"The count is {counter} at exactly {current_time}")
    counter+=1
    print(f"Let's snooze for {snoozer}")
    time.sleep(snoozer)
    
print("Done counting to 5")

The count is 1 at exactly 2022-09-12 14:55:02.826655
Let's snooze for 8
The count is 2 at exactly 2022-09-12 14:55:10.833535
Let's snooze for 5
The count is 3 at exactly 2022-09-12 14:55:15.839454
Let's snooze for 8
The count is 4 at exactly 2022-09-12 14:55:23.847413
Let's snooze for 8
The count is 5 at exactly 2022-09-12 14:55:31.853396
Let's snooze for 4
Done counting to 5


# 3. ```for loops```... a data journalist's favorite Python expression</center>

We use it to **iterate** over:
* data stored in a list and run some calculation on each value;
* a list of URLs and visit each site to scrape data;
* data stored in dictionary keys and values and return what you are looking for.

In [60]:
## FOR LOOP through this list
fav_animals = ["cats", "dogs", "birds", "snakes", "horses"]

for beast in fav_animals:
    print(beast.upper())

CATS
DOGS
BIRDS
SNAKES
HORSES


In [61]:
fav_animals

['cats', 'dogs', 'birds', 'snakes', 'horses']

In [67]:
## save a new list with uppercased values

upper_animals = []
for animal in fav_animals:   
#     print(type(animal))
    animal = animal.upper()
    upper_animals.append(animal)
    print(upper_animals)

['CATS']
['CATS', 'DOGS']
['CATS', 'DOGS', 'BIRDS']
['CATS', 'DOGS', 'BIRDS', 'SNAKES']
['CATS', 'DOGS', 'BIRDS', 'SNAKES', 'HORSES']


In [64]:
upper_animals

['CATS', 'DOGS', 'BIRDS', 'SNAKES', 'HORSES']

In [55]:
##call the temporary variable
beast

'horses'

In [56]:
## name dog lucy
dog = "lucy"

In [57]:
## you can target an individual item
dog.upper()

'LUCY'

In [59]:
type(dog)

str

In [58]:
## a for loop allows you to target individual items in a list
fav_animals.upper()

AttributeError: 'list' object has no attribute 'upper'

In [None]:
## this will break


Let's take **For Loops** for test drive:

In [None]:
## RUN THIS CELL - Use this list of CEO salaries from 1985 
ceo_salaries_1985 = [150_000, 201_000, 110_000, 75_000, 92_000, 55_000]


In [None]:
## Print each salary with in the following format:
## "A CEO earned [some value] in 1985."


In [None]:
## Now update each salary to 2019 dollars.
## Print the following info:
## "A CEO's salary of [1985 salary] in 1985 is worth [updated salary] in 2019 dollars."
## The CPI for 1985 is 107.6
## The 2019 CPI is 255.657
## The formula is: updated_salary = (oldSalary/oldCPI) * currentCPI



In [None]:
## add formatting


In [None]:
## store the updated values



In [None]:
## call CEO salaries


## 4. List Comprehension

In [69]:
## run this list
numbers = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
numbers

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [72]:
## use a FOR LOOP (FL) to create a list that holds the numbers times 10
x10_fl = []

for number in numbers:
#     number = number * 10
#     x10_fl.append(number)
    x10_fl.append(number * 10)
    
x10_fl





[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

In [73]:
## use list comprehension (LC) to create a list that holds the numbers times 10

x10_lc = [number * 10 for number in numbers]
x10_lc

[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

## the zen of python

In [74]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [85]:
## assign x the value 2
x = 2

In [86]:
## check for equality of 2
x == 2

True

In [87]:
## check for equality of 3
x == 3


False

In [88]:
x != 2

False

## Modulo Operator

The ```%``` is also known as the ```modulo operator``` in Python.

The expression ```10 % 2``` means if you devide 10 by 2, what is the remainder?


In [89]:
### try it
10 % 2


0

In [90]:
## try 13 divided by 2
13% 2

1

In [91]:
## call our number list again

numbers

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

In [94]:
## use LC to create a list that holds only the even numbers 
even_lc = [number for number in numbers if number % 2 == 0 ]
even_lc

[0, 2, 4, 6, 8, 10]

In [96]:
## use LC to create a list that holds only the odd numbers 

odd_lc = [number * 10 for number in numbers if number % 2 !=0]
odd_lc

[10, 30, 50, 70, 90]

In [97]:
# use LC to express as true or false

truthiness = [number % 2 == 0 for number in numbers]
truthiness

[True, False, True, False, True, False, True, False, True, False, True]

## 5. For Loops through multiple but related lists

When we scrape data from a website, we pull down various data points (income, name, location, id code, etc.) and store each into its own separate list.

A final step is to always group data points from each observation together. For example:

- "Sandeep Junnarkar"
- "Professor"
- "Male"
- "CUNY"



### Two ways to ```zip()``` these lists together

But first let's explore the zip function:




In [98]:
##  RUN THIS CELL - 
## Here we have a list of CEOs and their relevant data points.
first_names = ["Irene", "Ursula", "Elon", "Tim"]
last_names = ["Rosenfeld", "Burns", "Musk", "Cook"]
titles = ["Chairman and CEO", "Chairman and CEO", "CEO", "CEO"]
companies = ["Kraft Foods", "Xerox", "Tesla", "Apple"]
industries = ["Food and Beverage", "Process and Document Management", "Auto Manufacturing", "Consumer Technology"]

In [104]:
## with zip
## also print what each type of data is.

for (f_name, l_name, role, co, sector)\
in zip(first_names, last_names, titles, companies, industries):
    print(f"{f_name} {l_name}: {role}, {co}, {sector}")

Irene Rosenfeld: Chairman and CEO, Kraft Foods, Food and Beverage
Ursula Burns: Chairman and CEO, Xerox, Process and Document Management
Elon Musk: CEO, Tesla, Auto Manufacturing
Tim Cook: CEO, Apple, Consumer Technology


### Method 1 – Zip lists into dictionaries

In [109]:
## declare empty list and for loop zip
method_1 = []

## for loop:
for (f_name, l_name, role, co, sector)\
in zip(first_names, last_names, titles, companies, industries):
    method_1.append({
        "first_name": f_name,
        "last_name": l_name,
        "title": role,
        "company": co,
        "Industry": sector
    })



In [110]:
f_name

'Tim'

In [106]:
## call the method_1 list
method_1

[{'first_name': 'Irene',
  'last_name': 'Rosenfeld',
  'title': 'Chairman and CEO',
  'company': 'Kraft Foods',
  'Industry': 'Food and Beverage'},
 {'first_name': 'Ursula',
  'last_name': 'Burns',
  'title': 'Chairman and CEO',
  'company': 'Xerox',
  'Industry': 'Process and Document Management'},
 {'first_name': 'Elon',
  'last_name': 'Musk',
  'title': 'CEO',
  'company': 'Tesla',
  'Industry': 'Auto Manufacturing'},
 {'first_name': 'Tim',
  'last_name': 'Cook',
  'title': 'CEO',
  'company': 'Apple',
  'Industry': 'Consumer Technology'}]

## List of Dictionaries to Dataframes

Recall that a list of dictionaries are like columns and rows in a csv

In [107]:
## import pandas

import pandas as pd

In [108]:
## Turn list into a dataframe

df = pd.DataFrame(method_1)
df

Unnamed: 0,first_name,last_name,title,company,Industry
0,Irene,Rosenfeld,Chairman and CEO,Kraft Foods,Food and Beverage
1,Ursula,Burns,Chairman and CEO,Xerox,Process and Document Management
2,Elon,Musk,CEO,Tesla,Auto Manufacturing
3,Tim,Cook,CEO,Apple,Consumer Technology


In [111]:
## export as csv

df.to_csv("method_1.csv", encoding = "UTF-8", index = False)

### One more datatype: ```tuple```:

- Create a ```tuple``` by using parentheses.
- It's just like a list but can not be changed once it is assigned a value(s).
- You can call items in a ```tuple``` using slicing.

In [112]:
## create a tuple

grades = (95, 25, 30)

In [113]:
## call the tuple
grades

(95, 25, 30)

In [114]:
## confirm data type

type(grades)

tuple

In [115]:
## call the first item in our tuple
grades[0]

95

In [116]:
## append a grade of 100 to the grades tuple
## this will break
grades.append(100)


AttributeError: 'tuple' object has no attribute 'append'

In [117]:
## only way to add to a tuple is to create a new tuple
## append a grade of 100 to the grades tuple

updated_grade = grades + (100, )
updated_grade

(95, 25, 30, 100)

I don't use ```tuples``` too often except in one situation – they provide a shortcut to turning items in a list into a dataframe.

## Method 2 – Zip into tuple

In [118]:
## recall we named each item in the for

for (f_name, l_name, title, co, sector)\
in zip(first_names, last_names, titles, companies, industries):
  print(f"{f_name} {l_name}; {title}, {co}, {sector}")

Irene Rosenfeld; Chairman and CEO, Kraft Foods, Food and Beverage
Ursula Burns; Chairman and CEO, Xerox, Process and Document Management
Elon Musk; CEO, Tesla, Auto Manufacturing
Tim Cook; CEO, Apple, Consumer Technology


In [120]:
## zip it and print

for x in zip(first_names, last_names, titles, companies, industries):
    print(x)

('Irene', 'Rosenfeld', 'Chairman and CEO', 'Kraft Foods', 'Food and Beverage')
('Ursula', 'Burns', 'Chairman and CEO', 'Xerox', 'Process and Document Management')
('Elon', 'Musk', 'CEO', 'Tesla', 'Auto Manufacturing')
('Tim', 'Cook', 'CEO', 'Apple', 'Consumer Technology')


In [122]:
x

('Tim', 'Cook', 'CEO', 'Apple', 'Consumer Technology')

In [123]:
## we need to store into a list called method_2
method_2 = []
for item in zip(first_names, last_names, titles, companies, industries):
    method_2.append(item)

In [124]:
## call method 2
method_2

[('Irene',
  'Rosenfeld',
  'Chairman and CEO',
  'Kraft Foods',
  'Food and Beverage'),
 ('Ursula',
  'Burns',
  'Chairman and CEO',
  'Xerox',
  'Process and Document Management'),
 ('Elon', 'Musk', 'CEO', 'Tesla', 'Auto Manufacturing'),
 ('Tim', 'Cook', 'CEO', 'Apple', 'Consumer Technology')]

In [125]:
## what type of data set does this list hold?
type(method_2[0])

tuple

In [127]:
## export to a pandas dataframe
df = pd.DataFrame(method_2)
## name the columns
df.columns = ["first", "last", "title", "company", "industry"]

In [128]:
df

Unnamed: 0,first,last,title,company,industry
0,Irene,Rosenfeld,Chairman and CEO,Kraft Foods,Food and Beverage
1,Ursula,Burns,Chairman and CEO,Xerox,Process and Document Management
2,Elon,Musk,CEO,Tesla,Auto Manufacturing
3,Tim,Cook,CEO,Apple,Consumer Technology


In [130]:
## export to a csv
df.to_csv("method_2.csv", encoding = "UTF-8", index = False)