# Introduction to Python
---

*Updated 5/12/2023*

This course aims to teach you how to perform various Data Analysis tasks required for a variety of workflows. You will learn how to combine datasets, calculate aggregated values, remove bad data, and create visuals. We are going to be using the Pandas library built for Python. This lesson focuses on some basic python syntax that will be universal about most functions you need to perform within python. We assume minimal prior knowledge of programming or Python syntax.

## Basic Operations
Many basic operations can be performed through common mathematical operators (like `+`, `-`, and `*`). We discuss many of these below.

### Variables
A critical function of any script or workflow is the ability to save information for later use. The most common way to do this is by the use of **variables**. In python the we use the equals (`=`) operator, with the variable name on the left, and the value on the right. See below for an example below. Saving a value to a variable name like this allows us to modify, reference, and use the value in a later section of the code. 

In [None]:
# <- this sumble starts a comment. Fell free to include as many comments as 
# needed to ensure that you always understand your code whenever you come back
# to it. 
x = 5
y = 3
sum_of_x_and_y = 8

print(x)
print(y)
print(sum_of_x_and_y)

In [None]:
print(x)
print(y)

### Basic Math Functions
Python follows a fairly intuitive syntax for performing mathematical functions. All of these will operators will accept a one value on the left, and one on the right. If both values are numeric, it will perform the given operation. Below are examples of some basic mathematic operations. 

In [None]:
# Addition
print(x + y)

# subtraction
print(x - y)

# multiplication
print(x * y)

# division
print(x / y)

# expontential (x ^ y)
print(x ** y)

There are some other operators that have specific symbols. We can talk about those as you need them. Note that python follows traditional order of operations for mathematical functions. A common moniker used to remember the order is PEMDAS, which suggests the order of 
1. **P**arenthisis
2. **E**xpontentials
3. **M**ultiplication and **D**ivision
4. **A**dition and **S**ubtraction

In [None]:
'hi' + 'friend'*2

### Calling Functions
Functions are pre-written code that you can call at any time to execute the same code over and over without rewriting it. Some examples include
- `print("Some text goes here")`: Which lets you print stuff to the screen (see above)
- `open(filename)`: which opens a file, for example reading in a text file or a dataset.
- `abs(number)`: returns the absolute value of a number

In [None]:
print("Hello World")
print(abs(-34), "+", abs(49), "!=", abs(-34 + 49))

The syntax here is important. The parenthesis is used to determine what information or values needs to be sent to the code behind the function. These values are called arguments. Functions can either receive positional arguments, where the order of the arguments matters, or keyword arguments, where you will assign a value to a variable being sent to the function. Examples of both can be seen below, and will be called out when encountered. 

**Pro tip:** Always search for some function before trying to write the code yourself. If you need it, it's very likely that someone else has also needed it, and already written the code. The most common resource for these functions will be the hosted in the Libraries and Packages described below. 

## Libraries and Packages
(WARNING: These two words have slightly different meanings, but I will (accidentially) be using them interchangably)

Python is an open source programming language, and has a *very* large community of developers. There are libraries for pretty much anything that will make your life easier if you know how to use them. Using the libraries is simple. 
1. Make sure the package is installed on your computer. Do this by running the command `pip install [package_name]` in a command line environment (Bash, PowerShell, CMD, etc.). If you are using a package and environment manager like `poetry` or `conda`, make sure to use the approved method by that package manager when installing new packages.
2. Import the functionality that you need into your script or notebook.

Below is an example of how to import code a package.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn import datasets

Given that there are some complications in exactly how code is brought into the code, we won't go into too much details about exactly is happening. The basic keywords are defined below.
- `import`: This keyword brings everything from the following module into your code. In the first example, we will have access to the entirety of `pandas` in our code. In the second example we import everything from a subset of `matplotlib`, specifically `pyplot`. 
- `as` gives the package an alias. This is important when calling the functions or classes. Instead of having to type `pandas.DataFrame` everytime you want to create that class, you only need to type `pd.DataFrame`. Optional, but recommended for your sanity, as it can add up quickly when you import many different libraries. 
- `from`: This keyword will always need to be paired with the `import` keyword, however it is a different way to subset the import feature. It will only import the explicit things listed in the comma sepparated list after the import statement. Importing only what is nessecary can improve the efficiency of your code. 

Now that we have these functions at our disposal, we can write code faster. Below is an example of importing a specific dataset from Sci-kit Learn, a machine learning library, saving it to a dataframe, and plotting it's data as a scatter plot.

In [None]:
iris = datasets.load_iris()
# Save to dataframe (table) 
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)

# Print some information about the dataframe
print(iris_df.head())
print(iris_df.describe())

In [None]:
# save parts of our dataframe to distinct variables
sepal_length = iris_df['sepal length (cm)']
sepal_width  = iris_df['sepal width (cm)']
petal_length = iris_df['petal length (cm)']
petal_width  = iris_df['petal width (cm)']

In [None]:
# let's plot some data
fig = plt.figure()
# Note the positional arguments on scatter. The first two are positional arguments, and the last 3 are keyword arguments. 
plt.scatter(sepal_length, sepal_width, c='Red', marker='.', label='sepal')
plt.scatter(petal_length, petal_width, c='Blue', marker='x', label='pedal')

plt.legend()
plt.xlabel("length (cm)")
plt.ylabel("width (cm)")
plt.title("Iris Dataset")
plt.show()

Using code that other people have already written, we are able to fairly quickly and effectively create a graphic to start our visual exploration of the data. 

However, there is rarely a single way to implement a solution in python. 

In [None]:
# Note this variation on scatter where all arguments are assigned to keywords. 
ax1 = iris_df.plot.scatter(x="sepal length (cm)", y="sepal width (cm)", color="red", marker='.', label="sepal")
iris_df.plot.scatter(x="petal length (cm)", y="petal width (cm)", color="blue", marker='x', ax=ax1, label="petal")

ax1.set_xlabel("length (cm)")
ax1.set_ylabel("width (cm)")
ax1.set_title("Iris Dataset");

In the first example, we extracted the data from the dataframe, saved it to a variable, then created a scatter plot, passing in those columns as a an argument. In the second, we utilized the fact that the `DataFrame` object has plotting functionality as a method of the object. Depending on the programmer, the team, or even the workflow, one method might make more sense to be used, based on readability, workflow, or optimization. The decision as to which approach to be used will be left to the individual programmer with their style and needs.

## Control Flow
When building a workflow (expecially in the context of data analysis), some actions will need to be performed only when certain conditions are meet. Other times, we want to perform the same action multiple times with slightly different data. Below, we discuss the concepts of boolean values, `if` statements and `for` and `while` loops.

### Boolean
Boolean is a data type which essentially represents a `True` or a `False` statement. The keywords themselves can be used as the value to a variable or as the input to some function, but often times, boolean values are best represented as logical statements. Recall from the beginning of the file, we saved values to `x`, `y`, and `sum_of_x_and_y`. We can check that the addition was performed correctly simply by using the "is equal" (`==`) operator. 

In [None]:
x + y == sum_of_x_and_y

This comparison operator (as well as many others) will turn a set of statements into a single `True` or `False` value. This is most commonly combined with `if` statements to determine if code should be executed or with loops to determine when the loop should end. Other comparison operators include: 
- `!=`: not equal
- `>`: greater than
- `>=`: greater than or equal to
- `<`: less than
- `<=`: less than or equal to

Combining these operators with parenthesis and keywords like `not`, `and`, and `or` will enable you to create complex logical checks for whatever situation you might encounter. There are other keywords and functions that can be used as well which fall out of the scope of this lesson. 

### If Statements
Armed with the ability to check if something execute logical statements, we now have the ability to selectively run code based on whether condition has been met. The syntax There are 3 keywords used for If statements. 
- `if`: This opens the if block. This keyword expects a boolean value of logical statement (which computes a boolean value) immediately after it. Parenthesis are not expected. They can be used when grouping multiple statements together (for example, `(A and B) or C`). After the condition, be sure to include a semi-colon (`:`) to enter the if block. Both `elif` and `else` follow this pattern.
- `elif`: This is often also read as "else if". This will always be used in combination with an `if` statement just above it. The `elif` keyword allows for an alternate condition to be checked after the first condition returns false, without nesting those conditions within an `else` statement. The boolean value should follow the keyword, just like with if statements. Multiple `elif` statements can be used, but note that the order of the `if` and `elif` checks are important. When one is found to be true, all other checks will be skipped and ignored. 
- `else`: If all other conditions fail, whatever code is placed in the else block will be executed. It *does not* except any argument, and will fail with a syntax error if one is included. Like `elif`, and `else` block is not required, but will be helpful in cases where it is needed. 

*Note:* While whitespace (spaces, tabs, and new lines) are typically ignored, there are a handful of locations where it is used as a required part of the syntax. Python uses tabs/4 space characters to define things that are "inside" the if statement. The syntax is demonstrated in some examples below. Other blocks require this indenting syntax to define scope, and will be called out when discussed. 

In the below example, we define a date variable, and check to see what day of the week it is.

In [None]:
from datetime import datetime
date = datetime.today()
# date = datetime(2023,6,7)
print(date)
print(date.weekday())

if date.weekday() == 4:
    # Note here that the indent (4 spaces) is required to define that this print statement is "inside" the if block.
    print("🎵 It's Friday! 🎵")
    
elif date.weekday() == 2:
    # Identically for elif and else
    print("Hump day!")

else:
    print("Boo!!")

### Loops (for/while)
Sometimes, you have some specific steps you need to do multiple times in a row. Python offers 2 types of loops: `for` and `while`.
- `for` loops are best used when you have a specific set of values that you know you want to iterate over. This could be a specific range of numbers, or a list of input data.
- `while` loops are best used in situations where you only want to leave whenever a certain condition has been meet, but you do not necessarily know how many iterations will be needed. 

The syntax is very similar to that of the `if` blocks mentioned above that being,
1. keyword
2. how to loop (condition for `while` loop, or iterator for `for` loops)
3. semi-colon
4. indent 4 spaces
5. code to be run inside the loop.

#### Nuances of `for` Loops
The `for` loop depends on an object called "iterators." In simple terms, an iterator is an object that has multiple values in it, and has the ability to pass a single value at a time. Below is an example of a `for` loop. 

In [None]:
sepal_length

In [None]:
for x in sepal_length:
    print(x)

Let's break down the syntax of the above. 
- We start with the `for` keyword. This basically tells python that we are going to do a `for` loop, and tells python to expect a specific syntax. 
- We next need to provide a variable name to save values to. It is also possible to loop over with 2 or more variables, but that is a bit more advanced and is out of the scope of this lesson. In this example, we use the variable `x`.
- The most common keyword used in combination with `for` loops is `in`. This is basically the operator that tells us to connect the variable `x` to the list-like object that follows.
- The last thing needed is the actual iterator. Commonly, iterators will be in the form of lists, dictionaries, pandas columns, dataframe rows, numpy arrays, or even specific functions like `range`. When used in a `for` loop, all code following the semi-colon will be executed with subsequent values of the iterator. In the above, we see that the `print` statement puts one value from `sepal_length` on the screen. specifically, it does so in the order present in `sepal_length`. This is critical if your current iteration depends on previous iterations or if the ordering is important for any reason. 

This course provides several chances to practice for loops and get comfortable with the syntax and it's nuances.

#### Nuances of `while` Loops
While loops are a type of loop that will execute as many times as is needed until a condition is meet. For example, If you are running some sort of numeric simulation, and you need to run the simulation until a certain metric reaches a threshold, you could use a `while` loop to check at every iteration if the threshold has been passed. Below is an example of a `while` loop. 

In [None]:
x = 0
while x < 10:
    x += 1
    print(x)

Most of the syntax is identical to that of a `for` loop, with the primary difference being that instead of passing in a variable and an iterator, we provide a boolean statement. In this case we are passing in `x < 10`. Given the context of the code within the for loop, we know that `x` will approach 10 with each iteration. The conditition is checked at the beginning of each iteration, and if the value of the boolean statement is `True`, then the iteration will execute. When (or if) the iteration ever reaches `False`, no more loops will be executed. This can create 2 unexpected situations if the logic is not written correctly. 
1. Providing statements that are always `False` on the first check. This basically turns your code into an `if` statement. This will depend on the context and the code provided before the while loop.
2. Providing statements that are always `True`. This could be providing a statement like `x > 0 or x < 10`. Assuming `x` is a number, this boolean statement ultimately translates to "is x a number". Another bug that could cause this by checking a logical statement that never changes. In the example above, failing to increase the value of in the line `x += 1`, x will never reach 10, therefore, will never stop the loop. 

#### Final Note
Technically, both types of loops perform the exact same function. The only difference is exactly how the iterations are defined. In `while` loops, we define the exit condition with that boolean value, and in `for` loops, we define the iterator object that is being used. In most data analysis tasks, `for` loops are going to be most commonly used due to the fact that the dataset (and therefore the number of iterations, and what we need to iterate over) is defined at the beginning of the script. You can do all this with a while loop, however, the `for` loop is a bit more efficent and is easier to read. 

## Closing
There is so much more that could be covered here, such as classes, user-defined functions, more complex operators, and many other functions, much of this will be touched on in later lessons as we use them.

## Homework Assignment: 
> Find some python library, package, or function that you believe could help you with what you are doing as an intern at Summit. Explain in the meeting chat why you believe it would be helpful to your workflow. *Reason*: finding packages helpful packages is a common first step when starting a new project or solution. Doing so ensures that you avoid writing code for a workflow that has already been solved. 