# Course: Intro to Python & R for Data Analysis
## Lecture: Learning Your Python ABC's
Professor: Mary Kaltenberg

Fall 2020

contact: mkaltenberg@pace.edu

About me: www.mkaltenberg.com

# Python as a Language

**Coding is a language**

Natural languages are far more complex, but coding is a language between you and the computer. You'll want to learn how to communicate to the computer to execute tasks.

We'll start by learning words - their function and what they do. 

Then we will build sentences where we can combine different parts of a sentence to operationalize logic.

This lecture aims to go over:
- Verbs (Functions & Packages)
- Nouns (Dictionaries, Strings, Lists, Objects)
- Conjunctions (Conditionals and Recursion)

These are the basic building blocks that will help you operationalize actions. It's kind of learning how to use words to effectively communicate.

<img src ="https://media.giphy.com/media/xT1XGzXhVgWRLN1Cco/giphy.gif">


# Verbs 
## Functions

- Functions operationalize tasks
- Functions require arguments to operationalize these tasks
- Arguments are inputs, and inputs can vary in their requirements

Functions always look like this:

*function*(*arguments*)

The function name is on the outside, and the parenthesis that follows indicate it's a function. You put required arguments inside the paranthesis - there can be many required inputs.


To know what is exactly required as an argument and optional arguments, you can always use the `?` at the end of a function to find out the definition.

print is the *function*, and I gave the *argument* a *string* called hello, world. It will *return* a value, in this case it will print text "hello, world"

String is an input type - we'll get into that in moment

I input the code in the cell, and after I run it, I get the output just below.
<img src ="print function.png">

Tip: 

`shift` + `return` on Mac to run the code in a cell

`shift` + `enter` on Windows to run the code in a cell


In [None]:
print('hello, world')

Let's make python do some math for us

In [1]:
print(1+25*2)
print(76*8366)
print(5-8-2*54)

51
635816
-111


In [2]:
76*8366
5-8-2*54

-111

You can also create your own functions - this is called a definition.

We won't go through working examples of definitions, yet - but, we'll get to it after we learn how to make sentences.

# Complex Verbs
## Packages


Packages are bundles of functions. Some packages come standard with python, but others might require you to install them via pip or conda (Python's package manager). 

Think of pip as a library and packages as a book. You'll want to go into your library and pick up a reference book that has a bunch of definitions that you can look up.

Knowing specific packages have become so important that you may be asked in an interview for a job, "which packages do you use?" This course will go into detail on numpy, pandas, matplotlib, and seaborn (if we have time). However, for certain tasks we may call other packages when needed.

<img src = "https://imgs.xkcd.com/comics/python.png" width= 350>



In [6]:
#What an error looks like
print('hello')
print('world')
1+89
print('hi)
    87+988

SyntaxError: EOL while scanning string literal (<ipython-input-6-0d27bf523c09>, line 4)

Let's start by installing a package called **numpy** (pronounced num-PIE).

The NumPy library contains multidimensional array and matrix data structures. It operationalized linear algebra and is incredibly useful for numerical data - from engineers to data scientists. This is one of a few packages that we will use throughout this course.


Go to your terminal/shell and input:

`pip install numpy`

Go back to your notebook and input into a cell:

`import numpy as np`

This line of code is saying 

*"import the package numpy (which is stored in your library on your computer) and name it np"*

numpy is like a book name, but I call it by a nickname "*np*"

Like Game of Thrones is might be seen as GoT by fan members

These nicknames are used by the entire python community. You'll almost always see import numpy as np. You can choose to name it whatever you want, but for ease, I use the same nicknames the community has chosen.

Generally, I input all of my packages in the first cell at the top of the notebook. Then I don't have to bother with it, again. Once it is loaded it works for the whole notebook. However, if you have to restart your *kernel* you'll have to import the packages again.

When you call a function from a package you will always use the package first and then call the function

*package*.*function*()


In [7]:
import numpy as np
# I've imported the package numpy as np
np.arange(4) 
# I am calling the package numpy that I nicknamed np from my library and using a function called arrange with an input of 4

array([0, 1, 2, 3])

# Nouns

## Types


Python can take a variety of 'type' seemlessly. There are 3 types: **strings**, **float**, **integers**

This is important to understand because when we talk about data manipulation, sometimes you want to deal with strings, integers and floats. They have different functionalites, and often bugs in code are related to a mix up between these types. 

### Strings
Strings are a sequence of characters. You can denote a string with a single or double quotation around text.

Even if you input numbers and use quotations, it becomes a string - it doesn't matter if it is a letter or number.

In [8]:
print('hello, world!')
print('1456')
print("I'm coding!")

hello, world!
1456
I'm coding!


### Float

Float is essentially anything with a decimal - also called "floating-point numbers"

In [11]:
float(105)
float(5/4)

1.25

### Integers 
Integers are, well, integers. Whole numbers without a decimal point

In [12]:
int(105)

105

<img src = "https://media.giphy.com/media/jRen3r5KuFwur3EuJf/giphy.gif">

YES! But, there will be a time that you will ask yourself, why isn't this working? And I'm willing to bet that you didn't define the right type. So always be aware of the type! Issues really arise in differences of type between string and float/integer.

## Variables & Reassignment

I became a Python Evanglist when I learned about the ease of assignment statements for creating variables.

You can assign variable names that begin with an alphabetical character to whatever you want it to be. Really. Anything.

This is generally useful - but, particulary when we work with data and different classes (we'll get to that).

Coming from other coding languages, I was legit:
<img src ="https://media.giphy.com/media/OK27wINdQS5YQ/giphy.gif" width = 300>


In [13]:
#I assign the variable x to something (in this case a value)
x = 5

In [14]:
#I tell python to return whatever is inside the variable x
x

5

In [29]:
# I reassign the variable to something else
x=25
print(x)
x=30
x

25


30

In [22]:
# now it is something else
x

"I'm a new variable!"

In [17]:
# I reassign the variable to something a string (in this case)
x ='I\'m a new variable!'

In [18]:
#now it is a string
x

"I'm a new variable!"

In [19]:
new_variable = 500*34

In [20]:
new_variable

17000

- The thing about variables - you'll want to try to do your best to name them appropriately. It's easy to forget what is stored in them otherwise. 

Computer scientists are WAY better at this than economists. They have super long names for variables, while economists like to abbreviate everything. 

It doesn't *really* matter, but it will make your life easier. You can also always document well and when you have completed your work, you will always want to attach a code book so people know the definitions of your variables.

# Lists

Lists are sequences of values (strings are sequences that are limited to characters)

Lists have elements within them - elements can be composed of numbers, letters, words, variables, etc.

lists are define by hard brackets `[]` and elements within the brackets are separated by commas `,` 

Whenever you want to create a list, you tell python this with hard brackets `[]`

\[*element1*, *element2*, *element3*\]

You can turn lists into a variable (named lists, essentially)

*variable* = \[*element1*, *element2*, *element3*\]

In [30]:
['dog','cat','bird']

['dog', 'cat', 'bird']

In [38]:
#list1 has 3 element
list1 = ['dog','cat','bird']
print(list1)

['dog', 'cat', 'bird']


In [39]:
print(list1)
list1 = [5,7,3500]
list1

['dog', 'cat', 'bird']


[5, 7, 3500]

In [41]:
x= 55
y='87'
list1 = [x,y]
list1

[55, '87']

In [77]:
#Here's a cool trick where you can convert a string into a list
w = 'hi, luna!'
list(w)
list_w

['hi, luna']

In [164]:
#you can split up words, too, with a delimiter (something that defines what separates values) 
# this turns a string into a list
w.split(' ')

['hi,', 'luna!']

In [56]:
w.split(',')

['hi', ' luna!']

# Dictionaries

Dictionaries are lists, but more general. They map **keys** to **values**, sometimes they are called **items**.

Dictionaries are defined when you use curly brackets:

`Dict = {'item1': 1, 'item2': 2}`

The key is listed first and you match it with a value with a colon (:) and separate pairs with a comma (,).

This can be very useful in **for loops** (we will get to this in a moment) or when you want to recall a particular key associated with a value. You'll also use these dictionaries in *pandas* when you define or rename columns. We'll get to pandas, but it's not a cute bear from China.

For example, let's say I have a set of countries, that I always want to be associated with a set of country codes. We can create a dictionary that defines the name of a country with their country code.

In [57]:
country_codes = {'Afghanistan':'AFG', 'Brazil':'BRA', 'Botswana':'BWA', 
                 'China':'CHN', 'El Salvadaor': 'SLV', 'Egypt': 'EGY',
                 'Kenya':'KEN', 'India': 'IND', 'The Netherlands': 'NLD' , 
                 'New Zealand':'NZL','Oman':'OMN','Poland':'POL',
                 'United States': 'USA', 'Viet Nam': 'VNM'}

In [61]:
country_codes.get('United States')

'USA'

Some technical things - 
Dictionaries are implemented using a *hastable* and the keys must be hashable. A *hash* is a function that takes a value (of any kind) and returns an integer. Dictionaries use these integers, called hash values, to store and look up key-value pairs. When you see an error relating to something not being hashable - this is why. Be sure to check to see that the values are integers.


In [63]:
list1

[55, '87']

In [65]:
# here is an example
d = {list1}

TypeError: unhashable type: 'list'

In [64]:
#However, you can call the values within a list
for i in list1:
    print(i)

55
87


# Tuples

Tuples are a squence of values. I rarely use them independently, but you will see that they are embedded in many of the things we do with data (when we get to pandas). The key difference between tuples and lists is that tuples are *immutable* (lists are not). It means that whatever you put in the tuple, it can't be changed (you can, of course reassign the entire definition), but you can not change one element in the list. 

It is a comma separated list of values:

`Tup = ('value1', 'value2)`


In [66]:
#tuples break down each element within a sequence
Luna = tuple('luna barks')
print(Luna)
# There are values in this tuple

('l', 'u', 'n', 'a', ' ', 'b', 'a', 'r', 'k', 's')


### Slicing
You can do the method "slicing" in lists or tuples. Slicing uses indices of characters to tell you their position in the list. 

<img src = 'slicing.png'>

In [None]:
# the character in position 5
print(Luna[5])

In [69]:
# you can call particular items within a tuple, here I am asking for items 1 through 3 in the tuple named l.
print(Luna[1:3])
# your going to be using this in dataframes when we want to call particular rows or columns in the data.
#Why is this starting at 'u'
# index is 0

('u', 'n')


In [70]:
# you can call values in a tuple in a lot of different ways
# This means to give me all values after item 3
print(Luna[3:])

('a', ' ', 'b', 'a', 'r', 'k', 's')


In [71]:
# give me values before 3 
print(Luna[:3])

('l', 'u', 'n')


In [72]:
#give me the last two values
print(Luna[-2:])

('k', 's')


In [73]:
#exclude the last two values going backwards
print(Luna[:-2])

('l', 'u', 'n', 'a', ' ', 'b', 'a', 'r')


Knowing the format of these types of items in pandas can be helpful for when you debug. Often an error is due to a type mismatch - when you ask python to do something and it says it's unhashable or immutable, it's due to this type of error. Also, know the syntax of these is also important.

For example, when I want to change the name of a column in a dataframe, the syntax is actually a dictionary, so I must be sure that I use the curly brackets and not the square brackets, and that to assign a key a value, I need to use the colon. Many errors can arise from not understanding the syntax related tot he particular type or operator.

# Basic Verbs

So technically, packages are actually paragraphs. But, since we use packages to do stuff, it made sense to call it complex verbs to give you an idea of what it does. This section goes over logical statements and expressions. These are the basic building blocks of most code - it's like learning verb 'to be' 



# Logic Statements and Expressions

A big part of coding is using logic - which consist of conditional statements. Python operates on these conditional statements - this is particularly important when we get to data management. Essentially, you will constantly be asking python a series of True or False questions of which it will return to you the answer. 

When you query data, you'll have to think about how you can get the data you want with a series of true or false queries. 

## Boolean expressions
A *Boolean expression* is something that is true or false. Long ago in a HS far far away, I was taught boolean expressions by the librarian to search the library catalog. 

These will be really important in data management, so memorize them!

<img src = "https://media.giphy.com/media/GoeAgxrr0lkqI/giphy.gif" width = 300>

In [83]:
# Equals (always a double =)
5 == 20

False

In [84]:
x = 5 # assignment
x == 5 #boolean

True

In [94]:
# Boolean expressions
x = 5
y = 9
print (x != y) # does not equal
print (x > y) # greater than
print (x < y) # less than
print (x >= y) # greater than or equal
print (x <= y) # less than or equal

True
False
True
False


True

In [97]:
## Logical expressions
x = 5
x<10 and x <4

False

In [98]:
x<10 or x>10

True

In [99]:
not x<10 

False

In [100]:
x<10 is not 10
# you can do this, but you'll get a warning. Generally, it's always better to use the boolean in these cases

  x<10 is not 10


False

## Identity Operators 

Operators ask about existence #meta 
It usually is about comparing an object to another - this can be useful in times when you are searching for something.  Usually this is already coded within an already existing function, so you might not use it directly.

In [101]:
a = 'luna'
b = 'woof'
a is b

False

In [102]:
a is not b

True

In [103]:
b = 'luna'
a is b

True

# If, Ands, and Buts

## Membership Operators

Membership operators are used to test if a sequence is presented in an object. This can be used in a lot of ways, for example to see if a value is in or is not in a particular list or dictionary.

In [104]:
Luna

('l', 'u', 'n', 'a', ' ', 'b', 'a', 'r', 'k', 's')

In [105]:
'a' in l

True

In [106]:
'a' not in l

False

In [107]:
'USA' in country_codes
# In dictionaries it only looks in the keys, not the values (so what you assign as a key or value matters)

False

In [108]:
'United States' in country_codes

True

In [110]:
country_codes

{'Afghanistan': 'AFG',
 'Brazil': 'BRA',
 'Botswana': 'BWA',
 'China': 'CHN',
 'El Salvadaor': 'SLV',
 'Egypt': 'EGY',
 'Kenya': 'KEN',
 'India': 'IND',
 'The Netherlands': 'NLD',
 'New Zealand': 'NZL',
 'Oman': 'OMN',
 'Poland': 'POL',
 'United States': 'USA',
 'Viet Nam': 'VNM'}

## Bitwise Operators

These operators can only be used with numbers (not strings). Everything earlier can be used with any type of data. These operators are incredibly useful when cleaning data - so memorize them. It'll help you speed through data tasks in the future. They are just like previous operators and operate on a logical statement that will return true or false statements.

and = `&`

or = `|`

not = `~`

and or = `^`

We will get to these operators when working with data. For now it's useful to know them and love them. Working with them outside of the data world can get very CS technical (it's about bits and how computers encode them, so I'll spare you these details & it's not so important in practice other than that you can only use it with numbers).

<img src ="https://media.giphy.com/media/kcesjPPcMH7EV7Q0Q5/giphy.gif" width = 200>

In [166]:
x = 'cat'
y = 'dog'
x & y
# Example that it works with string

TypeError: unsupported operand type(s) for &: 'str' and 'str'

In [169]:
#example of using bitwise operators with a boolean expression
x = 15
y = 20
x&y ==20

False

## Conditional execution

These are magic words that you will be using a lot. They operationalize many tasks and when you combine them with loops or recursions, that's when the real magic of coding begins. 

<img src ='https://media.giphy.com/media/MqyWCZbMMFFL2/giphy.gif'  width = 150>

Conditional executions are a set of logical expressions that help you control the flow of your operations (aka code). You want to have an order in the way that you do things so that you get the outcome that you want. If the order is not correct, often this can lead to problems. The best way to do this it to think in logical steps. Before writing your code, you might want to actually write out (when you start learning how to code) each step you need to take to get to the outcome that you want. In fact, that's basically documentation - writing in words the steps that you are taking in a block of code

`if `
is a conditional statement that will operate if the condition is met

`else`
is an alternative execution, essentially, if there are two possibilities, it will execute one of them depending on the statement. It is often use in conjunciton with `if`. 

`elif`
is a chained conditional. If there are more than two possibilities, tack this on at the end. You can have more than one, too. 


In [112]:
# This is a simple statement that compares x and y. 
x= 35
y = 100
#print out a statement if x is less then y.
if x < y:
    print('X is less!')
#Otherwise, print out that it is not less
else:
    print('X is not less')
    

X is less!


In [114]:
# Let's exapand this and ask it to print out a statement if x is less than y
if x < y:
    print('X is less!')
#print out a different statement if x is greater than y
elif x>y:
    print('X is greater than y')
#print out a different statement if there it is anything else
else:
    print('x = y')

X is greater than y


In [115]:
# Here, we can see the same code prints out that they are something else
x = 100
y = 100
if x < y:
    print('X is less!')
elif x>y:
    print('X is greater than y')
else:
    print('It is something else')

It is something else


# NOTE

There are some symbols that don't work in python because python uses it for other functions than we do in our every day python. To tell python to ignore the symbol you use `\` right before it.


In [122]:
print('it's a girl!')

SyntaxError: invalid syntax (<ipython-input-122-b68c0c580c82>, line 1)

In [123]:
print('it\'s a girl!')
print ('she said, \"hello\" and I said \"good bye\"')


it's a girl!
she said, "hello" and I said "good bye"


# Sentences 

And finally, we can make full sentences! We will start with simple sentences and you'll work your way up by 
incorporating functions from packages, and before you know it you're a wizard.


<img src= 'https://media.giphy.com/media/gL2yywTiuK7fTobybJ/giphy.gif' width = 200>
yeah, this gif is kind of ridiculous in #nerdlevel, but I like it.

## For loops

Here is where the automation begins. We may want to repeat a task multiple times so that we don't have to type code out repetitively - we may want to repeat an operations tens of times. Great coders write complex actions with few lines of code. 

for loops always begin with `for` followed by `in`. Generally it is:

`for item in iterable:
    Code to run`
    
An item is an element from an iterable (this can be a list, dictionary, tuple, or anything iterable in python). 

In [None]:
#You can make a for loop out of complex conditional statments. Let's take this example:
x = 5*3
print(5)
    if 5 <10:
        print('done')
    elif 5<1000:
        print(5)
        x = 5+10
    else:
        x = 5/100

What is this command telling step by step:
1. for each item called i multiple by 3 and name it the variable x
2. if x is less than 3 print done if true statement it will print done and will go to next item in list. if false it won't print and go to the next statement.
3. it will check the statement if 5 is less than a 100. If true, it will print 5 and then add 5+10 and assign that addition to the variable x. if false, it will go to the next item on the list.
4. it will assign a variable named x that is 5/100. 
5. no more statements, code is complete.

I can use this order of statements and create a for loop out of it. For example, let's say I want it to calculate this for a list of numbers by replacing the number 5 with an item that I will call i.

In [158]:
list1 = [5,25,30,35]

for i in list1:
    x = i*3
    if x <10:
        print('done')
    elif i<1000:
        print(i)
        x = x+10
    else:
        x = x/100        

step 1.  3
done
step 1.  1350
else if condition, 450
1360


In [130]:
#I can simplify the above for loop even more by using range
for i in range(5,40,5):
    x = i*3
    if x <10:
        print('done')
    elif i<1000:
        print(i)
        x = x+10
    else:
        x = x/100  

160
180
200
220
240
260
280
300


In [None]:
for i in range(8,16):
    print(i*20)
# notice that in range, it stops at 15, so the last number will not execute the command in the for loop.    

In [None]:
# we can also recall the syntax here
range?

### range

range has a particular way in which it starts, ends and counts by (step)

`range(start#, end#, step)`

In [126]:
for i in country_codes:
    print(i)
    print(i+' new')
# When I use the print function, I am just telling it to print stuff,
#so it doesn't actually change the existing list in country_codes

Afghanistan
Afghanistan ,new
Brazil
Brazil ,new
Botswana
Botswana ,new
China
China ,new
El Salvadaor
El Salvadaor ,new
Egypt
Egypt ,new
Kenya
Kenya ,new
India
India ,new
The Netherlands
The Netherlands ,new
New Zealand
New Zealand ,new
Oman
Oman ,new
Poland
Poland ,new
United States
United States ,new
Viet Nam
Viet Nam ,new


### Initializing
Often, we may want to add 1 to an existing number and build from that. 
In order to do this, I must **initialize** the for loop with some value

In [137]:
#I initialize the variable x to start at 0
x=0
# range has the ability that you can count by any number (not just 1 by 1, but whatever amount)
for i in range(5,25,5):
    x = i+1
    print(x) #I'm printing to see what the for loop is doing

6
11
16
21


What value will x be after this for loop?

In [138]:
x

21

### Nested Conditionals

In [140]:
#Let's add an if condition to this for loop
x= 0 
for i in range(5,25,5):
    x = i+1
    if x >8:
        y = x*2
        print(y)

22
32
42


What value is y?

In [None]:
# This is a recursion. It continues until all of the commands are completed. Be careful! 
# This can become a never ending loop (but there is a fix for this with break - more on that later)
x= 0 
for i in range(5,25,5):
    x = i+1
    if x >8:
        y = x*2
    else:
        y = x/2

In [150]:
list1 = [1, 450]

In [None]:
### Debugging for loops
#name that error!
for i in list1:
    x = i*3
    if x <10:
        print('done)
    elif x<1000:
        x = x+10
    else:
        x = x/100

In [160]:
### Debugging for loops
for i in list1:
    x = i*3
    if x <10
        print('done')
    elif x<1000:
        x = x+10
    else:
        x = x/100

SyntaxError: invalid syntax (<ipython-input-160-1222e56803ee>, line 4)

In [None]:
list1 = [500, 6, 98]

In [None]:
for i in list1:
    print('item value', i)
    if i <100:
        print('less than 100')
    else:
        x = i+20
    print(x)
    

In [162]:
### Debugging for loops
# I want it to print everytime a value is less than 10, how do I fix this?
for i in list1:
    x = i*3
    if x <10:
        print(x)
        print('done')
    elif x<10:
        x = x+10
    else:
        x = x/100              

3
done


In [163]:
list1

[1, 450]

In [None]:
for i in list1:
    if i <10:
        x = i
        print('done')
    else:
        x = x/100   

## break
Break are conditions in which your loop will stop when you meet them. It's useful to have breaks in many situations, particularly if it's something that could potentially go on infinitely. Knowing your bounds in your data is particularly useful as you can tell it to break a loop if the data shouldn't go above a certain point, for example.

In [None]:
for i in range(10,1000):
    i+1
    if i >10:
        break

## Nested Loops

A loop within a loop. What it will do is loop through the first set and then the second set in order.

In [None]:
x_list = [10,57,89,76,350,126]
for i in range(25, 350, 25):
    x = i +20
    for j in x_list:
        if j>100:
            x_list = [x-1,j]
        else: 
            x_list2 = [x,j]

In [None]:
x_list2

1. For every item i in the range of 25 to 350 going by 25 (25, 50, 75, etc until 100) add 20 and call that variable x
    2. For each item j in x_list check to see if the item j in x_list is above 100, if that's true, subtract 1 from x, create a list and call it x_list, otherwise create a list with x and j and name it x_list2
    3. Go back to the next item in the range 25 to 350 going by 25 and do it again until there are no more items.

## While
while loops are useful when the number of iterations needed depends on the outcome of the loop con-
tents. while loops are commonly used when a loop should only stop if a certain condition is met, such as 137
 when the change in some parameter is small. The generic structure of a while loop is:
 
 `while logical:
     run code
     update logical`
 
 Two things are crucial when using a while loop: first, the logical expression should evaluate to true when the loop begins (or the loop will be ignored) and second, the inputs to the logical expression must be updated inside the loop. If they are not, the loop will continue indefinitely (hit CTRL+C to break an interminable loop in IPython). The simplest while loops are (wordy) drop-in replacements for for loops:


# A moment of your time

Sometimes you will run code that might take a very long time or you might want to learn how to speed up code and see which line of code is faster than the other to improve it. There is a function called time that help you do this. It literally just keeps track of time.

<img src = 'https://media.giphy.com/media/4QX6y0fQT7yVO/giphy.gif' width = 200>

In [None]:
import time
start = time.time()
iterations = range(5, 20000000)
for i in iterations:
    x= 5*26
    x = x*2
    x= x%3
    it1 = time.time()
end = time.time()

total_time = end-start
iteration_time = it1-start
# you can print this throughout your for loop to keep track of time for each part of the for loop. 
#you can also track how long one iteration takes

In [None]:
print(total_time)
iteration_time

In [None]:
time?