# Introduction to Python

## Contents
1. Variables
2. Lists
3. Strings
4. Dictionaries
5. If/Else and Bools
6. Loops
7. Functions
8. Reading a file
9. Plotting a histogram
10. Plotting a scatterplot

## How to use this notebook
- To execute any single block of text or markdown, use ctrl+enter, shift+enter or press the run arrow on the left of the box (only in Colaboratory)
- To reset the notebook select "Factory reset runtime" from the Runtime tab at the top of Colaboratory

## 1. Variables

In [ ]:
# Assign the variable "a" the integer 2
# Using the print statement results in 2
# Try changing the value of a and re-running this block of text using shift+enter
a = 2
print (a)

In [ ]:
# You can do the same with a string (text)
b = "Hello World!"
print (b)

In [ ]:
# Now assign values to 2 variables and then perform arithmatic
x = 2
y = 3.5
print ("This is addition:", x+y)
print ("This is subtraction:", x-y)
print ("This is multiplication:", x*y)
print ("This is division:", x/y)
print ("This is modulus:", x%y)

In [ ]:
# Can you calculate the average of x and y?
# Store the average of x and y in a variable and print. Type your code below and ctrl+enter or shift+enter to execute.



In [ ]:
# Can you calculate the factorial of 5?
# Store the factorial of 5 in a variable and print. Type your code below and ctrl+enter or shift+enter to execute.



## 2. Lists
Lists are objects that contain many variables

In [ ]:
# Lists can contain either a single type of variable...
listA = [1, 2, 3, 4, 5]

# ... or several different types
listB = [1, "Hello World!", 3.5, ['List', 'In', 'List']]

print("List A:", listA)
print("List B:", listB)

In [ ]:
# As you saw from listB, a list can contain a "list of lists" which if you are familiar with linear algebra is a matrix
matrixC = [[1, 0, 1], [0, 3, 0], [2, 0, 2]]
print(matrixC)

### Lists can be "indexed", meaning we can use the position of the values in a list to retrieve them.

In [ ]:
# Return the 3rd value of listA
listA = [1, 2, 3, 4, 5]
A3 = listA[3]
print(A3)

In [ ]:
# Oops it looks like when we use the index "3", the returned value is the 4th value of list A.
# This is because Python starts list indices at "0" and not "1".
A3 = listA[2]
print(A3)

In [ ]:
# We can even grab a range of values. Python syntax includes all values up to but not including the trailing index.
# Let's get the first 3 values of A3.
print( listA[0:3] )

In [ ]:
# Is there a simple way to get the last index without counting the number of items in a list?
# Yes! Use negative indices, basically counting backwards.
print( listA[-1] )

In [ ]:
# You can add something to a list by using the built-in function "append".
listA.append(6)
listA

In [ ]:
# Now try to get the value from row 2, column 2 (middle value) of the matrixC. If you did not modify matrixC the value is "3"
# List indices in python stack sideways, so referencing a value in a matrix is "variable[row][column]"

matrixC = [[1, 0, 1], [0, 3, 0], [2, 0, 2]]

# Type your code below and ctrl+enter or shift+enter to execute.




## 3. Strings
Strings are a special type of variable that represents text. They can be stored, indexed, and even used in training machine learning algorithms!

In [ ]:
# Store a sentence as a string variable
text = "This is a sentence stored in the variable text."

# In a notebook format you can simply print the value of the variable by declaring it.
text

In [ ]:
# What if I want a part of the string? Strings are similar to lists and can be indexed.
# Let's get the 3rd character of the string
text[2]

In [ ]:
# But what if I want an entire word? You can use a range index to retrieve a word.
# Let's grab the word "sentence"
text[10:18]

In [ ]:
# Sometimes you want to know the length of a string or a list, the built in function "len" does this task.
print("The text variable has the following number of characters.")
len(text)

In [ ]:
# What about combining sentences?
# Text concatenation in Python uses the addition operator
"Hello" + " World" + "!"

## 4. Dictionaries
A Dictionary can be thought of as a custom list that can use both numerical indices as well as keywords to retrieve values. Dictionaries are one of the most useful variable types and form the backbone of a very common object called a hashmap that is used to speed up software processes.

In [ ]:
# Dictionaries come in a list of key:value pairs.
dictA = { "key1" : 1,
          "key2" : 2,
          "key3" : "Hi Mom!",
          "key4" : 16.55
        }

print(dictA)

In [ ]:
# Dictionaries can be referenced by their numerical indices similar to lists.
# Let's get "Hi Mom!"
dictA["key3"]

In [ ]:
# What if I want to access the keys of the dictionary?
dictA.keys()

In [ ]:
# And just the values?
dictA.values()

In [ ]:
# What if we want to use numerical indices?
# Starting in Python 3.7, dictionaries preserve order but require list conversion of the values
list(dictA.values())[2]

In [ ]:
# Now practice by using the following dictionary dictB and add the two strings together into a sentence.
# Your output should be "Hello World!"

dictB = { "number_1" : 1,
          "text_1" : "Hello",
          "number_2" : 5.5,
          "text_2" : " World!"
        }

# Type your code below and ctrl+enter or shift+enter to execute.




In [ ]:
# Hard mode: retrieve the 3rd key of dictB
# Type your code below and ctrl+enter or shift+enter to execute.




## 5. If/Else and Bools
1. If/Else statements form the backbone of heuristic artificial intelligence. Basically giving you a choice of executing a statement in multiple ways based on a condition.
2. Boolean variables allow you to evaluate a condition as True or False, forming the ability to execute conditions based on the boolean variable.

In [ ]:
# Below is a conditional statement using if/else, yes this is actually Artificial Intelligence!
# Try changing the value of December and seeing what output you get!
December = 12

if December == 12:
    print ("Yes December is the 12th month.")
else:
    print ("December is not the 12th month.")

In [ ]:
# Ok hold on a minute, what is the operator double equal sign? ==
# The double equal sign is a boolean operator, it evaluates if the left and right side are equal.
print ("This is a false statement and should evaluate to false:", 6 == 5)
print ("This is a true statement and should evaluate to true:", 5 == 5)

In [ ]:
# What other types of boolean operators are there?
print ("This is the OR operator | and should evaluate to True:", (6==5)|(5==5))
print ("This is the AND operator & and should evaluate to False:", (6==5)&(5==5))
print ("This is the less than operator and should evaluate to True:", 5 < 6)
print ("This is the greater than operator and should evaluate to False:", 5 > 6)

In [ ]:
# Basically we can combine boolean operators to make some really flexible and complex logic.
# Try changing the value of "a" below and see what output you get.

a = 6

if a == 5:
    print ("a is 5.")
elif a > 5: # elif is "else if" and is used to add conditions to the if/else conditional tree.
    print ("a is greater than 5.")
else:
    print ("a is less than 5.")

In [ ]:
# Practice time! Use NESTED if statements to print "YES" when a, b, and c are True and "NO" when any of the conditions are False.
a = 5
b = 6
c = True

# Type your code below and ctrl+enter or shift+enter to execute.




## 6. Loops
1. For loops iterate through a specific range of values or a list
2. While loops continuously execute as long as they are true. BE CAREFUL! They can execute infinitely and crash your computer. :(

In [ ]:
# This for loop iterates through a range of values and prints each one.
for i in range(0,4):
    print(i)

In [ ]:
# You can also iterate through a list
listA = [0, 1, 1, 2, 3, 5, 8]
for i in listA:
    print(i)

In [ ]:
# You can iterate with both the index AND the value of a list
listB = ["Hello", "World", "!"]
for i, j in enumerate(listB):
    print(i, j)

In [ ]:
# You can even iterate through a dictionary
dictA = { "key1" : 1,
          "key2" : 2,
          "key3" : "Hi Mom!",
          "key4" : 16.55
        }

for k in dictA:
    print ("Key:", k)
    print ("Value:", dictA[k])

In [ ]:
# While loops are typically combined with a conditional statement or a bool
a = 4
i = 0
while (i <= a):
    print(i)
    i = i + 1 # Remember you need to increment i to end the loop when i reaches a or it will run forever!

In [ ]:
# You can use a conditional to "break" a loop early and end the loop
a = 4
i = 0
while(True):
    print(i)
    i = i + 1
    if (i > a):
        break

In [ ]:
# Practice time! Using either a for loop or a while loop, print only the odd values of the following list.
listC = [1, 2, 3, 4, 5, 6, 7]

# Type your code below and ctrl+enter or shift+enter to execute.




## 7. Functions
Functions are pieces of code that can be reused. Except in special cases, variables declared in a function only exist within that function unless used in the return statement.

In [ ]:
# Let's write a function to add two numbers and then return the result.
def myfunc(x, y):
    """Returns the sum of x and y"""
    return(x+y)

In [ ]:
# To call the previous function, invoke. You can also store the result in a new variable
z = myfunc(2, 3)
print("Sum is:", z)

In [ ]:
# Practice time! Create a function with a list as the first input, an integer as the second, and return in a new list all indices of the list where the value at the INDEX matches the input integer
# Finish creating the function below
def match_integers(listA, b):
    """
    This function returns all indices of listA where the value at that index matches integer b
    """

    return ()

# Execute function
# The statement below should print [1, 2]
print(match_integers([1, 2, 2, 3, 4, 5], 2))

## 8. Reading a file
One of the most powerful things you can do with Python is read a file, manipulate the data, and then store the results in a new file. This allows you to automate day-to-day tasks that would normally take you many manual minutes/hours to accomplish

Now let's read a file of Boston housing data and see what we can glean from the data doing some light exploration. First we must import a new library called "pandas". This is by far the most prolific data exploration/manipulation library in Python and appears as one of the highest ranking keywords on any data science job postings on linkedin/indeed/etc...

In [ ]:
# Let's read the csv file as a pandas data frame
import pandas as pd

url = 'https://raw.githubusercontent.com/jzhangab/DS101/master/1_Data/housing.csv'
df = pd.read_csv(url, sep = ',')

In [ ]:
# We can use the following command to peak at the first 5 rows of the dataframe
df.head()

In [ ]:
# How many data points do we have in this dataset?
len(df)

In [ ]:
# What is the average value of property in Boston in 2019?
# We can reference a specific column in a dataframe similar to how we do it in a dictionary.
df['Value'].mean()

In [ ]:
# We can also retrieve items in a dataframe by index
# This grabs the 2nd value from the Value column
df['Value'][1]

In [ ]:
# In pandas dataframes we can subset the dataframe using conditions
# The following grabs all properties with values less than 1 million dollars and stores the result in a new dataframe df_1M
df_1M = df.loc[df['Value'] < 1000000]
df_1M.head()

In [ ]:
# Practice Time!
# Sometimes it's necessary to remove weird datapoints, let's remove all properties where the Size is 0
# Type in your code below



## 9. Plotting a Histogram
It can often be useful to visualize data by plotting it. Histograms are super useful because they give us a quick glance at the shape/concentration of the distribution.

In [ ]:
# Let's read the csv file as a pandas data frame
import pandas as pd

url = 'https://raw.githubusercontent.com/jzhangab/DS101/master/1_Data/housing.csv'
df = pd.read_csv(url, sep = ',')

In [ ]:
# Let's plot a histogram of the Boston property
# You can specify the number of bins in the histograms using one of the parameters
# Let's also only consider property greater than 0 value and less than 2 million dollars

df = df.loc[(df['Value'] > 0) & (df['Value'] < 2000000)]

# Play around with the number of bins to see how the data is separated
%matplotlib inline
df.hist(bins = 10)

In [ ]:
# Practice Time!
# We subsetted the data using ranges of Value, let's now try to only get houses that were built AFTER 1900
# Below subset the data to remove the Year Built outliers older than 1900 and then replot the histograms




## 10. Plotting a Scatterplot
Scatterplots are very useful for quickly visualizing the relationship between two factors.

In [ ]:
# Let's read the csv file as a pandas data frame
import pandas as pd

url = 'https://raw.githubusercontent.com/jzhangab/DS101/master/1_Data/housing.csv'
df = pd.read_csv(url, sep = ',')

In [ ]:
# Let's plot a scatterplot of Year Built vs. Value
%matplotlib inline
df.plot.scatter(x = 'Year Built',
                y = 'Value')

In [ ]:
# Practice Time!
# Seems like there are a lot of outliers above the $10,000,000
# Subset the data to values below $10,000,000 and replot below




# CONGRATS!
### You have finished Python basics in under 1 hour! Remember that you don't have to memorize syntax, if you are not sure how to do something in code use trusty Google + Stack Overflow.
## Now get ready to dive in to Machine Learning!