# Python Language Basics: 

Data Types, Data Structures, Variables, Control Flow, Generator Objects, Assert Statements

# I. Importing Necessary Packages

The following code will include the packages you'll need for this notebook. Packages are collections of third-party code that add additional functionality to the core Python language.

Reminder: To run cells in this notebook you have three options. Place your cursor in the cell you want to run, then:
1) hold down the shift key while hitting the enter key (Shift+Enter),
2) click the play button that you should see on the notebook's tab just under the notebook name, or
3) click on the Run menu and click Run Selected Cell.

In [None]:
# # Numpy supports multidimensional arrays and has a lot of math functions.
# import numpy as np 

# # Pandas is great for tabular data (data in rows and columns)
# import pandas as pd

# # Matplot Lib is for drawing plots, graphs, etc. and Seaborn is another package
# # that makes matplotlib easier to use and create more attractive graphics
# import matplotlib.pyplot as plt 
# import matplotlib.image as mpimg 
# import seaborn as sns 

# # Geopandas adds geographical analysis capabilities to Pandas - one or more of your
# # columns can be geographical (points, lines, shapes, etc.)
# import geopandas as gpd
# from geopandas import GeoDataFrame 

# # These are built into core python but you have to include them to use them
# import random 
# import os 
# import json 

# # This package adds the ability to work with Excel files
# import openpyxl

# II. Hello World
It's traditional for your first code in any language to be a "hello world." So, let's store "Hello World" in a variable and then print it out with python. 

This is a good time to turn on your line numbers through View menu -> Show Line Numbers.

Also, let's experiment with commenting out code using a pound symbol (#). You can also select multiple lines and hit CTRL+/ in windows or Command+/ on Mac.

In [1]:
myMessage = "Hello World"
print(myMessage)

Hello World


# III. Python Variables and Types
In the previous section we stored the phrase "Hello World" in a variable named `myMessage`. A variable is a container for data. Variables have *names* and they can contain data of many different *types* (numbers, character strings, etc.)

## What Types Does Python Support?
We won't cover all of these data types but Python supports the following types by default:

| Kind of Data | Python Types | Examples |
| :----------- | :----------- | :----------- |
|Text|str|varName = "I love milkshakes."|
|Numeric|int,<br /> float,<br /> complex|varName = 3<br />varName = 3.0<br />varName = 3j|
|Sequence|list,<br /> tuple,<br /> range|varName = ["chocolate", "strawberry", "vanilla"]<br />varName = ("chocolate", "strawberry", "vanilla")<br />varName = range(6)
|Mapping|dict|varName = {"name" : "Bully", "species" : "Dog"}|
|Set|set,<br /> frozenset|varName = {"strawberry", "vanilla", "chocolate"}<br />varName = frozenset({"strawberry", "vanilla", "chocolate"})|
|Boolean|bool|varName = True|
|Binary|bytes,<br /> bytearray,<br /> memoryview|varName = b"hello"<br />varName=bytearray(5)<br />varName=memoryview(bytes(5))|
|None|NoneType|varName=None|


If you've programmed in another language, you will probably know what the following types are without any explanation:
- Text - str (text goes in what programmers call 'strings')
- Numeric - int (integers are whole numbers)
- Numeric - float (floating point numbers have a decimal and some amount of precision)
- Boolean - bool (boolean values are either True/1 or False/0)

Other data types we'll use in this course that may need more explanation are:
- Sequence - list, tuple, and range (iterable objects containing values) 
- Mapping - dict (dictionary of "key/value" pairs)
- NoneType (a special data type representing the absence of a value)

Links to more info about data types we won't cover:
- Complex numbers [Real Python: Simplify Complex Numbers with Python](https://realpython.com/python-complex-numbers/)
- Sets and frozen sets [Real Python: Sets in Python](https://realpython.com/python-sets/)
- Bytes and byte arrays (as well as other data types) [Real Python: Basic Data Types in Python: A Quick Exploration](https://realpython.com/python-data-types/) 
- Memoryview [CodeAcademy: Memoryview](https://www.codecademy.com/resources/docs/python/built-in-functions/memoryview), [Python 3 Documentation: Built-in Types, Memoryview](https://docs.python.org/3/library/stdtypes.html#memory-views)

### Sequences - list [] and tuple ()
Lists and tuples (pronounced too-pulls) are used to store lists of things like the integer values 1,2,3,4 or the strings "dog","cat","horse","zebra". The difference is that lists can be changed after you create them (they are *mutable*) and tuples cannot be changed (they are *immutable*). The code below initializes a list and a tuple and illustrates how Python also lets you figure out what type any variable is.

In [19]:
# initialize a list with square brackets
myList = [8,7,19,9]
# initialize a tuple with parentheses
myTuple = (8,7,19,9)

# print out the type of each
print(type(myList))
print(type(myTuple))

<class 'list'>
<class 'tuple'>


In [20]:
#let's try to change a list; you should have no problem
myList[0] = 9

# print out that changed list - you'll see a 9 in the first position since you changed it
print(myList)

[9, 7, 19, 9]


In [21]:
# you can also print out just the first member of the tuple or list like this
print(myTuple[0])

8


In [22]:
# but, let's try to change a tuple -  you should see an error
myTuple[0] = 9

# and so because of the error, this line will not execute at all
print(myTuple)

TypeError: 'tuple' object does not support item assignment

Hey, don't worry about that error - I *knew* that would happen. Notice a few things about that code:
- Lists are assigned using square brackets ```[]``` and tuples are created using parentheses ```()```
- The ```type()``` method returns what type of variable something is
- I can access an individual member of a list using brackets ```myList[0]``` and the position index of the item. In the example above, I assign a new value to the first item in the list and it switches from 8 to 9. Notice that the first item is accessed with an index of 0. Python and many other programming languages start counting from zero and not one. You can access the second member of the list using ```myList[1]```, the third with ```myList[2]```, and so on. 
- Tuples are not changeable. Even though I can access the first item using ```myTuple[0]``` the same way I did with lists, when I try to change myTuple, I get a "TypeError" - because you can can't change tuples once they are created, i.e. they are *immutable*. This is why ```myTuple[0] = 9``` generates an error.

Tuples exist because they are much faster to process than lists. If you have a list of things that won't change while a program runs, storing it as a tuple means faster code.

### Sequences  - range (start,stop,step)
The built-in function ```range``` is used to hold a series of numbers. ```range``` returns a range object that contains an immutable sequence of integers and is often used for looping through a specific number of iterations in a ```for``` loop. 

```range(start,stop,step)``` is exclusive of the stop value, like in the example given below range(10,15,1) the stop value of 15 won't be included. 

In [8]:
# see what the range function returns (a range object)
range(10,15) # range(start,stop,step)

range(10, 15)

In [5]:
# how to see all the values in the range object?
# loop through the range object and print out every number.
# notice that at each iteration of the loop, the variable "number" 
# will contain the next number in the range
for number in range(10,15,1):
  print(number)

10
11
12
13
14


As you can see, our range sequence goes from 10 to 14 incremented by 1.  

This example also used a ```for``` loop, meaning that the block of code indented below the ```for``` loop executes multiple times (i.e. multiple iterations). In this case the number of iterations is determined by how many members there are in the range sequence.

If you leave out the start number and the step size in the range function, Python assumes you want to start at 0 and increment by 1. Here's an example:

In [6]:
# when one number is given to the range function
# python infers that it is the stop value (exclusive)
# and that the start=0 and the step=1
for number in range(7):
    print(number)

0
1
2
3
4
5
6


### Sidebar: Indentation in Python

Notice that after the ```for``` loop starts there is a colon and then the lines contained within the loop are indented by four spaces. Python doesn't use curly brackets like C-like languages do, but uses indentation to mark out blocks of code. Jupyter will take care of this for you for the most part, automatically indenting when you type Enter after a colon. If you are using a plain text editor though, just be consistent, you have to use tab or a consistent number of spaces for the indent, but you can't mix styles or you will get errors.

### Mapping - Dictionary {"key":"value","key2":"value2"}

Dictionaries are very useful for data science. They consist of a list of "key / value" pairs. Just like a dictionary has a word (key) and a definition (value), Python dictionaries have these pairs of keys/values. 

Dictionaries are enclosed by curly brackets, use colons between each key and its associated value, and use commas between each key/value pair. For example, here's a dictionary that stores US Department of Labor occupation codes in a dictionary:

In [10]:
#lets initialize our dictionary to contain three occupations
# notice that dictionaries use curly brackets
occupationCodes = {"welder" : "51-4121.00", "nurse" : "29-1141.00", "computer programmer" : "15-1251.00"}

occupationCodes

{'welder': '51-4121.00',
 'nurse': '29-1141.00',
 'computer programmer': '15-1251.00'}

In [11]:
# oops, I forgot one, I can also add one at at a time with this syntax
occupationCodes["accountant"] = "13-2011.00"

occupationCodes

{'welder': '51-4121.00',
 'nurse': '29-1141.00',
 'computer programmer': '15-1251.00',
 'accountant': '13-2011.00'}

Imagine our dictionary was very large, containing all occupations and labor codes. How can we search the dictionary for a specific occupation and its code? 

In [12]:
# let's search for welder
keyword = "welder"

# check if the key "welder" exists in the dictionary
# if so, print the labor code for welder
# if not, print a different reponse
if keyword in occupationCodes:
    print(f"The occupation code for {keyword} is {occupationCodes[keyword]}")
else:
    print(f"Sorry, don't have information on {keyword}")

The occupation code for welder is 51-4121.00


There are lots of cool things to notice in this code:
- You can modify dictionaries (as we did when we added "accountant" and its labor code)
- You can check to see whether a key exists in the dictionary (as we did with "welder")
- You can use plain brackets and the key (```occupationCodes["welder"]```) to retrieve the corresponding value from a dictionary
- The f inside the print statement before the quotations indicates a formatted string. Formatted strings allow you to use curly brackets inside the string to print variables or execute expressions.

Try modifying the keyword to something like "firefighter" that we know isn't in the dictionary and then running it.

This code snippet above also uses the "if ... else" functionality in Python. This is called a conditional statement and is a way to check conditions before executing code. We'll cover more about this and other "control flow" functionality in the next section.

Dictionaries, starting in Python 3.7 are *ordered* collections which means you can count on key/value pairs being stored in the order in which you added them to the dictionary.

# IV. Iterating through Lists and Dictionaries
One thing you'll need to do quite frequently in data science is to iterate through the values in a list, one by one. There are a few ways to do this. Here's the first:

In [None]:
# Create a List
greatLanguages = ["Python", "R", "Rust", "C++"]
for language in greatLanguages:
    print(f"{language} is a great language.")

Notice that:
- The variable "language" stores, one by one, the words in the greatLanguages collection (Line 2)
- Within the for loop, at each step, you can operate against the variable language and do things like printing (Line 4)

Sometimes you want to iterate in such a way as to store the index of each item and not just get the value of each item in the list:

In [None]:
greatLanguages = ["Python", "R", "Rust", "C++"]
for position, language in enumerate(greatLanguages):
    print(position, language)

In the example above:
- The variable *position* contains the index number of the item at each iteration (0-3)
- The variable *language* contains the value of the item (just as before)
- The "enumerate" method is what extracts the value of the *position* and the *language variables*

Iterating through a dictionary works similarly:

In [None]:
#lets initialize our dictionary to contain three occupation titles with their corresponding codes
occupationCodes = {"welder" : "51-4121.00", "nurse" : "29-1141.00", "computer programmer" : "15-1251.00"}

for key, value in occupationCodes.items():
    print(key, " is coded as ", value)


### Exercise 1: Iterate through a list

Create a list of your favorite band names and then iterate through the list and output each one in the format like "I love The Rolling Stones".

In [None]:
# add your code here

# V. Boolean Operators and Comparison Operations

# VI. Control Flow

for loops, while loops, if statements, break and continue, user created functions, lambda functions, list comprehension 

## For loops

## While loops

## If statements

## Break

## Continue

## User created functions

You're familiar now with Python's built-in functions like "print." Sometimes you'll need to create custom functions.

For example, let's create and use a trivial function that takes three values and returns their average.

In [None]:
# define our function
def averageThreeVars(var1,var2,var3):
    # calculate the average of the three
    average = (var1 + var2 + var3) / 3
    # return the average
    return(average)

# let's find the average height of the three highest mountains in the world

# store the height in feet of three mountains in separate variables
everestHeight = 29032
k2Height = 28251
kangchenjungaHeight = 28169

# call the function averageThreeVars and pass in our three peaks, store the
# result in a global variable called averageHeightThreePeaks
averageHeightThreePeaks = averageThreeVars(everestHeight,k2Height,kangchenjungaHeight)

# print out our global variable
print(f"The average height of our three peaks is {averageHeightThreePeaks}")


Notice that our function received the three mountain heights, calculated an average, and then "returned" that average where we stored it into a global variable called "averageHeightThreePeaks."

In most languages, you can only return one variable from a function. That variable can be a collection (e.g., a list) and so this isn't very limiting, but Python is pretty unique in being able to return multiple values from one function. We won't go into that today, but keep that in mind for the future.

## Lambda functions

So far, this notebook has focused on defined functions. But there is another kind of function in Python called a "Lambda" function. This is probably the most advanced topic we cover here, but you'll see us use it later and may want to come back to review this.

The following sample code will call a traditional function and a lambda version of the same function to add three to the number we give the function.

In [None]:
# we have some variable we want to change
myVariable = 7

# define a traditional function
def addThreeTraditional(number):
    return number+3

# call it and pass in myVariable to add 3
myResult = addThreeTraditional(myVariable)

# print the result
print(myResult)

# now let's see a lambda version
addThreeLambda = lambda x: x + 3

# call the lambda
myResult = addThreeLambda(myVariable)

# print the result
print(myResult)

Note that you should get the same result for both the regular and lambda versions of the function (10).

Why on earth would we use lambdas? Well, a key reason would be that we don't have to define them ahead of time like we did above. The following is equivalent but it uses what are called "anonymous" functions - a lambda with no name.


In [None]:
# get a result by, instead of using "addThreeLambda", just put the lambda 
# in parentheses and call with an argument in parentheses

myResult = (lambda x:x+3)(myVariable)
print(myResult)


Amazing, right? You can also pass in multiple variables just like with a defined function.

In [None]:
var1 = "Amazing"
var2 = "Spiderman"

# notice our lambda has two variables - x and y
# and so we pass in the arguments in order
superhero = (lambda x,y:x + " " + y)(var1,var2)

print(superhero)

So we passed in two variables to the lambda, it squished them together with a space between and then returned them. You should see "Amazing Spiderman."

Now, just so you know the terminology of the parts of a lambda:

<img src="https://dsci.msstate.edu/downloads/wrangling/lab6/lambda.png" alt="parts of a lambda" width="400"/>


## List Comprehension

# VII. Python: Learning More

Well, there's a lot more to Python, but the above gives you a start on variables, loops, collections, and iterating through collections.

The great thing about Python is that you can google "How do I (do whatever) in python?" and you'll get help in many places.