# Class 4: Loops, conditional statements and array computations

In this notebook we will continue learning some of the fundamentals of Python. We will also begin to learn about array computations which are particularly useful for processing images. 

## Notes on the class Jupyter setup

If you have the *ydata123_2023e* environment set up correctly, you can get the class code using the code below (which presumably you've already done given that you are seeing this notebook).  

In [None]:
import YData

# YData.download.download_class_code(4)   # get class 4 code    

# YData.download.download_class_code(4, TRUE) # get the code with the answers 

There are also similar functions to download the homework:

In [None]:
YData.download.download_homework(1)  # downloads the first homework 

If you are using colabs, you should install polars and the YData packages by uncommenting and running the code below.

In [None]:
# !pip install polars
# !pip install https://github.com/emeyers/YData_package/tarball/master

If you are using google colabs, you should also uncomment and run the code below to mount the your google drive

In [None]:
# from google.colab import drive
# drive.mount('/content/drive')

## Review of lists, dictionaries and statistis by looking at NBA Salaries

Let's look a salaries of basketball players in the NBA! The data we will analyze contains infomraiton about each player including their salary from the 2015-2016 season listed in millions of dollars.  

We will load the data as a "polar's DataFrame" which is a data structure we will discuss more in a couple of weeks. We will then convert the data to lists and dictionaries to explore it further. 

This table can be found online: https://www.statcrunch.com/app/index.php?dataid=1843341


In [None]:
import polars as pl

nba = pl.read_csv("nba_salaries_2015_16.csv")  # load in the data

nba.head()  # show the first 6 rows


In [None]:
# get the salaries as a list

salary_list = nba["SALARY"].to_list()
player_list = nba["PLAYER"].to_list()

salary_list[0:10]

In [None]:
# What is the maximum and minimum salaries? 
print(max(salary_list))
print(min(salary_list))

In [None]:
# What is the average salary?
salary_tot = sum(salary_list)
salary_tot/len(salary_list)

In [None]:
# we can also use the mean() and median() functions in the statistics module to get the mean and the median values

import statistics

statistics.mean(salary_list)



In [None]:
# What was Stephen Curry's salary in 2015-2016 season? 
salary_dict = dict(zip(player_list, salary_list))
salary_dict["Stephen Curry"]

In [None]:
# Visualize a histogram of the data with vertical lines at the mean and the median
# Don't worry about this code for now. We will go over creating visualizations soon....

import matplotlib.pyplot as plt
%matplotlib inline

plt.hist(nba["SALARY"], bins=20, edgecolor='k', color = 'c');
plt.xlabel("Salary (million $)");
plt.ylabel("Count");
plt.axvline(statistics.mean(salary_list), color='r', linestyle='dashed', linewidth=1, label = "Mean");
plt.axvline(statistics.median(salary_list), color='b', linestyle='dashed', linewidth=1, label = "Median");
plt.legend();

# View the counts in the different histogram bins
import numpy as np
counts, bins = np.histogram(salary_list, bins = 10)
dict(zip(list(zip(np.round(bins[0:-1], 1), np.round(bins[1:], 1))), counts))


We will learn much easier ways to manipulate structured data tables when we learn how to use the polars package. 

## Loops

Loops allow us to repeat a process many times. They are particularly useful in conjuction with lists to process and store multiple values. 


In [None]:
a_list = ["first", "second", "third", "forth"]

for item in a_list:
    print(item)


In [None]:
# looping over numbers using the range() function

for i in range(10):
    print(i)


In [None]:
# Can you print the squares of the numbers from 1 to 6? 

for i in range(1, 7): 
    print(i**2)

We can use a loop to build up values in a list...

In [None]:
# create a list that has the squares of the numbers 1 to 6

my_squares = []

for i in range(1, 7): 
    my_squares.append(i**2)


my_squares


How can we sum the numbers 1 to 10? Or, to use mathematical notation, how can we compute $\sum_{i=1}^{10} i$ ?


In [None]:

my_sum = 0

for i in range(1, 11):
    my_sum = my_sum + i

my_sum

In [None]:
# we can use enumerate(my_list) to get both values from a list and sequential index numbers

a_list = ["first", "second", "third", "forth"]

for i, curr_val in enumerate(a_list):
    print(str(i) + " " + curr_val)



## Comparison ##

We can do simple mathematical and string comparisons in Python which return Boolean values.

In [None]:
# basic math comparison
3 > 1

In [None]:
# checking the type of a basic math comparison
type(3 > 1)

In [None]:
# another basic math comparison
3 < 1

In [None]:
# We can type in Boolean values ourselves
True

In [None]:
# We use == to compare whether two items are equal (not 3 = 3)
3 == 3

In [None]:
x = 14
y = 3

In [None]:
# we can compare whether a value is between two values
12 < x < 18

In [None]:
# we can also do mathematical operations between logical comparisons
12 < x-y < 18

In [None]:
# we can use the `and` keyword to combine multiple logical statements 
x > 10 and y > 5

In [None]:
# we can also use the `or` keyword to combine multiple logical statements 
x > 10 or y > 5

In [None]:
# We can also compare strings
"my string" == "my string"

In [None]:
# Stings compare alphabetically
"cats" < "dogs"

In [None]:
# Shorter words occur earlier than longer words that have matching letters
"cat" < "catastrophe"

## Conditional Statements 

Conditional statements allow use to excecute particular pieces of code when certain conditions are met; i.e., they execute a piece of code when a Boolean value is True. 

Let's explore!

In [None]:
num_semesters = 7

if num_semesters <= 0:
    print('Not a valid input')
elif num_semesters <= 2:
    print('First Year')
elif num_semesters <= 4:
    print('Sophomore')
elif num_semesters <= 6:
    print('Junior')
elif num_semesters <= 8:
    print('Senior')
else:
    print("NA")


In [None]:
# let's look at a conditional statement in a loop

for num_semesters in range(10):
    
    print(num_semesters)

    if num_semesters <= 0:
        print('Not a valid input')
    elif num_semesters <= 2:
        print('First Year')
    elif num_semesters <= 4:
        print('Sophomore')
    elif num_semesters <= 6:
        print('Junior')
    elif num_semesters <= 8:
        print('Senior')
    else:
        print("NA")


## Array computations

Often we want to process data that is all of the same type. For example, we might want to do processing on a data set of numbers (e.g., if we were just analyzing salary data). 

When we have data that is all of the same type, there are faster ways to process data than using a list. In Python, the `numpy` package offers ways to store and process data that is all of the same type using a data structure called a `ndarray`. There are also functions that operate on `ndarrays` that can do computations very efficiently. 

Let's explore this now!


In [None]:
# import the numpy package
import numpy as np

In [None]:
# create an ndarry of numbers
a_list = [2, 3, 4, 5]
an_array = np.array(a_list)

an_array

In [None]:
# we can get the type of elements in an array by accessing the dtype property
an_array.dtype

In [None]:
# get the size of the array
an_array.shape

In [None]:
# create an array of strings
string_array = np.array(["a", "b", "c"])
string_array

In [None]:
# get the type in the string array
string_array.dtype      # < little endian, U unicode, 1 bit

In [None]:
# create a boolean array
boolean_array = np.array([True, True, False])
boolean_array

In [None]:
# get the type in the boolean array
boolean_array.dtype

In [None]:
# what happens if we make an array from a list of mixed values
mixed_array = np.array([1, 2, "three"])
mixed_array

In [None]:
# get the dtype 
mixed_array.dtype

In [None]:
mixed_array[0]

In [None]:
type(mixed_array[0])

In [None]:
mixed_array[0] == 1

In [None]:
mixed_array[0] == '1'

## NumPy functions on numerical arrays

The NumPy package has a number of functions that operate very efficiently on numerical ndarrays.

Let's explore these functions by looking at the price of gas!

The data comes from: https://www.eia.gov/opendata/v1/qb.php?category=240692&sdid=PET.EMM_EPM0_PTE_NUS_DPG.W


In [None]:
all_gas_prices = pl.read_csv("US_Gasoline_Prices_Weekly.csv", parse_dates=True)  # load in the data
all_gas_prices.head()

In [None]:
# Get an ndarray of the gas prices from each week of 2022
# You can ignore this code for now...

from datetime import datetime

gas_prices_2022 = np.array(all_gas_prices.with_column(pl.col('Week').str.strptime(pl.Date, fmt='%m/%d/%Y').cast(pl.Datetime)).filter(
    pl.col("Week").is_between(datetime(2022, 1, 1), datetime(2022, 12, 31), closed = "both"),
)["DollarsPerGallon"])

gas_prices_2022


In [None]:
gas_prices_2022.shape   # prices for all 52 weeks in 2022

In [None]:
# One dollar is currently .92 Euros. What has been the price of a gallon of gas cost in Euros? 
# What have gas prices been in Euros? 
gas_prices_2022 * .92

In [None]:
# what if there was a constant tax of $2 on each gallon purchased? 
gas_prices_2022 + 2

In [None]:
# basic functions of: min, max, sum, mean and median
print([np.min(gas_prices_2022), np.max(gas_prices_2022)])

In [None]:
# if you bought one gallon each week, what would you pay over the whole year? 
print(np.sum(gas_prices_2022))  

In [None]:
# what do you pay on average? 
print(np.mean(gas_prices_2022))
print(np.median(gas_prices_2022))

In [None]:
# If you bought one gallon each week, how much would you pay at the end of each of the weeks of the year? 
np.cumsum(gas_prices_2022)

In [None]:
# How mu;ch does the gas price go up and down each week? 
np.diff(gas_prices_2022)

## Measuring how long a function takes to run

Jupter notebooks have a special set of ["magic commands"](https://ipython.readthedocs.io/en/stable/interactive/magics.html) that can be used to add additional functionality to a notebook. 

We use `%%time` magic command to evaluate how long a piece of code takes to run. In particular, let's compare summing our gas prices using:

1. A for loop
2. Python's standard library sum() function
3. NumPy's np.sum() function 

In [None]:
%%timeit

# use a for loop to sum all the values
for_loop_sum = 0
for p in gas_prices_2022:
    for_loop_sum = for_loop_sum + p
    
for_loop_sum

In [None]:
%%timeit

# use Python's standard library sum() function
standard_python_sum = sum(gas_prices_2022)

In [None]:
%%timeit

# NumPy's np.sum() function 
np.sum(gas_prices_2022)


There is not a huge difference here because this is such a small data set, but using efficient code can make a huge difference on large data sets!

![gas_prices](https://cdn.quotesgram.com/img/69/59/1803591020-high-gas-prices.jpg)