# Module 3a. Python Programming 101

**Author:** Lauryn Bruce lbruce@ucsd.edu

This notebook contains some additional introductory exercises to basic concepts of programming, specifically for the Python programming language. 

Topics included below are:
- Data Types/Objects
- Data Structures
- Comparison Operators
- Python Statements/Control Flow
- Functions
- User Defined Data Objects using Classes
- Read/Write data to/from Files
- Pandas data cleaning/manipulation
- Pandas DateTime index
- Testing & Debugging

### Jupyter Commands
Note** Windows command or Mac command if different
- Run a cell: Shift + Enter
- Run a cell and create a new one below: Alt + Enter or option ⌥ + Enter
- Run selected cells: Ctrl + Enter or Command ⌘ + Enter
- Save: Ctrl + s or Command ⌘ + s

- In command mode (Esc to activate)
    - Create a new cell above: a
    - Create a new cell below: b
    - Cut cell: x
    - Copy cell: c
    - Paste cell: v

- Functions
    - show the required inputs for a function use: Shift + tab
    - example: my_function(  --> Shift  +tab


Resources:<br>
- https://www.udemy.com/course/complete-python-bootcamp
- https://towardsdatascience.com/jypyter-notebook-shortcuts-bf0101a98330

## Basic Python Data Types/Objects

#### Integers: Whole Numbers

In [None]:
x=3
y=9
print(x)
print(y)

#### Floats: Numbers with a Decimal point

In [None]:
x=3.3
y=9.1

#### Strings: Ordered sequence of Characters

In [None]:
z = 'Hello'
zz = 'you'
print(z + ' ' + zz)

## Basic Data Structures

#### Lists: mutable ordered sequence of objects

In [None]:
my_list=[1,2,"three",[4.0, 5]] # can contain a mixture of data types
print(my_list[0]) # access an item using the index.

In [None]:
# Slicing: can use my_list[start:stop] syntax to access a sub-list (0-bases)
my_sub_list = my_list[:2]
print(my_sub_list)
my_sub_list_2 = my_list[2][1]
print(my_sub_list_2)

#### Dictionaries: unordered key:value pairs

In [None]:
# Create empty dictionary
my_dict = {}

In [None]:
# Add key value pairs
my_dict['key1'] = 1
my_dict['key2'] = 2
my_dict['key3'] = 3
print(my_dict)

In [None]:
# Create populated dictionary
my_new_dict = {'key4': 4, 'key5': 5}
# Change a value within a dictionary
my_new_dict['key4'] = 6
print(my_new_dict)

# Get all keys
print(my_dict.keys())
# Get all values
print(my_dict.values())

#### Boolean: logical value indicating True or False

In [None]:
my_bool = True
my_bool == False

#### Sets: unordered collection of unique objects

In [None]:
my_set = ('a', 'b', 'c')

my_list = ('a', 'b', 'c', 'd', 'd', 'b')
print(set(my_list))

#### Tuples: ordered immutable sequence of objects

In [None]:
my_tup = (10, "hello", 200.3)

## Comparison Operators

Operator & Description
<br>== if the value of two operands are equal, return True
<br>!= if the value of the two operands are not equal, return true
<br>\> if the value of the left operand is greater than the value of the right, return True
<br>\< if the value of the left operand is less than the value of the right, return True
<br>\>= if the value of the left operand is greater than OR equal to the value of the right, return True
<br>\<= if the value of the left operand is less than OR equal to the value of the right, return True

In [None]:
x=3
y=3.0
x == y

In [None]:
x > y

In [None]:
x = 'Hello'
y = 'Hi'
x != y

## Python Statments: Control flow

#### if, elif, else

In [None]:
x = 4

if x == 5:
    print('x equals 5')
elif x < 5:
    print('x is less than 5')
else:
    print('x is greater than 5')

#### for loops

In [None]:
# Loop 10 times (0 to 9 inclusive)
for x in range(10):
    print('hell' + (x * 'o'))

In [None]:
my_list = [1,2,3,4,5,6,7,8]
for i,x in enumerate(my_list):
    print(i,x)
    # check if numer is even
    if x%2 == 0:
        print('even ' + str(x))
    elif i != 0:
        print(i, x, x + my_list[i-1])
    else:
        print(x)

In [None]:
# iterate through a dictionary
d = {'k1': 1, 'k2': 2, 'k3': 3}
for item in d:
    print(item)

for item in d.items():
    print(item)
    
for key,value in d.items():
    print(key)
    print(value)

In [None]:
# More flow control
x = [1,2,3]

for item in x:
    # Do nothing at all
    pass

my_string = 'Sam'
for letter in my_string:
    if letter == 'a':
        # Goes to the top of the closest enclosing loop
        continue
    print(letter)
    
for letter in my_string:
    if letter == 'a':
        #  breaks out of the closest enclosing loop
        break
    print(letter)

#### while loop

In [None]:
# Be careful with while loop, infinite loops may break your kernal or worse you computer!

x = 10
while x != 1:
    print('x = ' + str(x))
    x -= 1
    
    if x < 5:
        print('x is less than 5')

In [None]:
x = 0

while x < 5:
    if x == 2:
        break 
        
    print(x)
    x += 1

#### Other useful operators

In [None]:
# Range operator
# Print numbers from 0 to 9 (not inclusive)
for num in range(10):
    print(num)

# Print numbers from 3 to 9 (not inclusive)  
for num in range(3,10):
    print(num)

# Print every other number from 3 to 9 (not inclusive)  
for num in range(3,10,2):
    print(num)

In [None]:
# Enumerate Function
for index_count, letter in enumerate('abcde'):
    print('At index {} the letter is {}'.format(index_count, letter))

In [None]:
# Zip function
my_list_1 = [1,2,3]
my_list_2 = ['a', 'b', 'c']
my_list_3 = [100, 200]
my_zipped_list = list(zip(my_list_1, my_list_2, my_list_3))
print(my_zipped_list)

In [None]:
# in function to check if object is in a list, dictionary, string, etc
'x' in [1,2,3]
'x' in ['x', 'y', 'z']
d = {'mykey':345}
'mykey' in d
345 in d.keys()

In [None]:
# Min/Max
min([1,2,3])
min([4,5,6])

In [None]:
# Random built in functions
# Need to import from a library

from random import shuffle
# shuffle does not return anything, only shuffles in place
my_list = [3,4,5,6,7,8,9]
print(shuffle(my_list))

from random import randint
# randomly pull integer in range
my_random_number = randint(50,100)
print(my_random_number)

In [None]:
# Accept user input
my_input_number = input('Enter a number here: ')

In [None]:
# Be careful, always turns input into a string!
print(my_input_number, type(my_input_number))

## Functions

In [None]:
# Basic Function
def sum_numbers(num1, num2):
    return num1+num2

sum_numbers(19,1)

In [None]:
def check_even_list(num_list):
    """
    Doc String: Check to see if any number in a list is even
    Input: list of numbers
    Return: If any number is even, return True, otherwise False
    """
    
    for num in num_list:
        if num % 2 == 0:
            return True
        else:
            pass
        
        return False
        
check_even_list([1,4,7])

In [None]:
# Special functions

def square(num):
    return num**2

def check_even(num):
    return num % 2 == 0

my_nums = [1, 2, 3, 4, 5, 6]

## Map: executes a function on a list of arguments
squared_list = list(map(square, my_nums))
print(squared_list)

## Filter: filter based on a boolean function's return
even_nums = list(filter(check_even, my_nums))
print(even_nums)

## Lambda / Anonymous function: shorted version of a simple function,
## generally one time use
display(list(map(lambda num: num**2, my_nums)))

## Another lambda function strings
## reverse names
name_list = ['Andy', 'Wall-e', 'Sully']
list(map(lambda x:x[::-1], name_list))

## User Defined Objects via Classes
### Glimpse of Object Oriented Programing

In [None]:
# classes follow camel casing (new words are capitalized)
class Patient():
    
    # Class Object Attribute: same for any instance of a class
    genus = 'Homo'
    
    # connect key word 'self' to an instance of the class
    # define other user defined attributes
    def __init__(self, age, gender, birthdate, note=None):
        self.age = age
        self.gender = gender
        self.birthdate = birthdate
        
    def add_note(self, string):
        self.note = string
    
    # Methods: operations/actions that use the self attributes
    
    # The __repr__ function defines behavior when "print" is called on this object
    def __repr__(self):
        return ':'.join([str(self.age), self.gender, self.birthdate]) + '\n' + self.note
    
    def legal(self):
        if self.age >= 21:
            return True
        else:
            return False

In [None]:
my_patient = Patient(20, 'non-binary', '2000-09-16')
my_patient.add_note('This patient is a student')

print(my_patient.genus)
print(my_patient)
print(my_patient.legal())

### Editing primitives vs objects

In [None]:
# Primitive points
x1,y1 = 3,4 	# create a point
x2,y2 = x1,y1 	# create another point at the same place
x1 = 13 		# move point 1
print(x1,y1) 	# 13 4
print(x2,y2) 	# 3 4

# Object points
class Point:
    def __init__(self,x,y):
        self.x=x
        self.y=y
    def __repr__(self):
        return '('+str(self.x)+','+str(self.y)+')'
    
pt1 = Point(3,4)# create a point
pt2 = pt1 		# create(?) another point at the same place
pt2.x = 13 		# move pt2
print(pt1) 		# (13,4)
print(pt2) 		# (13,4)

## Math Functions
For full list see: https://docs.python.org/3/library/math.html

In [None]:
import math

In [None]:
x = 3.71
y = 4.74

# Round down
display(math.floor(x))

# Round up
display(math.ceil(x))

# Remainder x/y
display(round(math.remainder(x,y), 3))

# Exponential: return e raised to the power of x
display(round(math.exp(x),3))

# Logrithm (default log base e, can provide base which calculates as log(x)/ log(base))
print('Natural Log: ', round(math.log(x),3))
print('Log base 10: ', round(math.log(x, 10),3))

# Constants
display(math.pi, math.e, math.nan, math.inf)

### Reading data from input file and Writing to output file

In [None]:
%%bash
# Determine where you are 
pwd

In [None]:
def total_function(numbers_list):
    return(sum(numbers_list))

In [None]:
### Read one line at a time
# You know there are only two numbers

import os

# Read in a file with two lines and output 
input_file = 'inputs/known_length_data_input.txt'

with open(input_file,'r') as f:

    first = f.readline().rstrip()

    second = f.readline().rstrip()
    
    numbers = [int(first), int(second)]

    # Create output folder if it does not exist
    output_folder = 'outputs/'
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)
    
    # Create output file
    output_file = open(os.path.join(output_folder,"sum_known.txt"),'w')
    
    # Call function and write to a file
    output_file.write(str(total_function(numbers)))
    output_file.close()

In [None]:
### Read all lines and store
# You do not know how many numbers you have

import os

# Read in a file with two lines and output 
input_file = 'inputs/unknown_length_data_input.txt'

with open(input_file,'r') as f:
    
    numbers = []
    for lines in f:
        numbers.append(int(lines.rstrip()))

    # Create output folder if it does not exist
    output_folder = 'outputs/'
    if not os.path.exists(output_folder):
        os.makedirs(output_folder)
    
    # Create output file
    output_file = open(os.path.join(output_folder,"sum_unknown.txt"),'w')
    
    # Call function and write to a file
    output_file.write(str(total_function(numbers)))
    output_file.close()

# Pandas: Software for Data Manipulation and analysis

https://www.datacamp.com/community/tutorials/pandas-tutorial-dataframe-python

In [None]:
# Import pandas and numpy libraries
import pandas as pd
import numpy as np

import warnings
warnings.filterwarnings('ignore')

### Create a pandas dataframe from a dictionary

In [None]:
my_flower_dict = {'flowers': ['rose', 'daisy', 'sunflower'],
                  'colors': ['red', 'white', 'yellow'],
                  'blooming_season': ['spring', 'spring', 'summer']}
flower_df = pd.DataFrame(my_flower_dict)

flower_df

In [None]:
# Pandas Series: like a column in a table, one-dimensional array
flower_colors = flower_df['colors']
display(type(flower_colors))
display(flower_colors)

# Convert back to dataframe
flower_colors_df = pd.Series(flower_colors).to_frame()
display(type(flower_colors_df))
flower_colors_df

In [None]:
# Assign DataFrame index to flower type
flower_df.set_index('flowers', inplace=True)
flower_df

In [None]:
# Get list of columns
columns = flower_df.columns.to_list()
print(columns)

In [None]:
# Get Unique list of values within a column
print(flower_df['blooming_season'].unique())

# Get Counts of the set of values within a column
print(flower_df['blooming_season'].value_counts())

### Pandas Read in tabular data with header

#### Source: https://www.kaggle.com/srikarkashyap/analyzing-healthcare-data-tutorial

In [None]:
df = pd.read_csv('inputs/pandas_healthcare_input_data.csv', index_col=0)
df.head(2)

In [None]:
# Print Summary Statistics of the numeric columns
df.describe()

In [None]:
# Print data type info by column
df.info()

In [None]:
# Get list of columns
df_columns = df.columns.to_list()
print(df_columns)

In [None]:
# Display counts of each value in the sex column
df['SEX'].value_counts()
# Do you see any issues with this data?

In [None]:
# Have both 'Male' and 'MALE' listed, can fix that and rename Male(Child) and Female(Child) as Girl and Boy
mappings = {'MALE':'Male', 'FEMALE': 'Female', 'Male(Child)': 'Boy', 'Female(Child)' :'Girl'}
df['SEX'] = df['SEX'].replace(mappings)
df['SEX'].value_counts()

In [None]:
# Plot bar graph of counts
df['SEX'].value_counts().plot.bar()

In [None]:
# Find mean, median, and standard deviation of 'AGE'
print('Mean: {}'.format(df['AGE'].mean().round(2)))
print('Median: {}'.format(df['AGE'].median().round(2)))
print('Standard Deviation: {}'.format(df['AGE'].std().round(2)))

display(df['AGE'].plot.box())

## DateTime Indexing
Originally developed for financial data, the time series tools in the pandas libraries can also be used with health/biology time series data. A time series is any data set where the values are measured at diferent points. in time, either uniformally or irregularly sampled.

#### Source: https://www.dataquest.io/blog/tutorial-time-series-analysis-with-pandas/
#### Data: Open Power System Data (OPSD) from Germany

In [None]:
opsd_daily = pd.read_csv('inputs/pandas_datetime_input_data.csv')

In [None]:
# Get number of rows and columnsd
display(opsd_daily.shape)

# Get list of columns
opsd_columns = opsd_daily.columns.to_list()
print(opsd_columns)

# View last 3 rows
display(opsd_daily.tail(3))

# Show data types
display(opsd_daily.dtypes)

In [None]:
# Convert Date column to DateTime object
opsd_daily['Date'] = pd.to_datetime(opsd_daily['Date'])

# Show data types
display(opsd_daily.dtypes)

# Set index to Date
opsd_daily = opsd_daily.set_index('Date')

# View top 3 rows
display(opsd_daily.head(3))

In [None]:
# Add Year, Month, and Weekday Name to the dataframe
# Add columns with year, month, and weekday name
opsd_daily['Year'] = opsd_daily.index.year
opsd_daily['Month'] = opsd_daily.index.month
opsd_daily['Weekday Name'] = opsd_daily.index.day_name()

# Display a random sampling of 3 rows
opsd_daily.sample(3, random_state=0)

In [None]:
# Select data for a single day
display(opsd_daily.loc['2017-08-10'])

# Select data for a range of days
display(opsd_daily.loc['2017-08-10': '2017-08-12'])

### Visualizing with Matplotlib

In [None]:
import matplotlib.pyplot as plt

opsd_daily['Consumption'].plot(linewidth=0.5, figsize=(8,4))

In [None]:
cols_plot = ['Consumption', 'Solar', 'Wind']

axes = opsd_daily[cols_plot].plot(marker='.', alpha=0.5, linestyle='None', figsize=(11, 9), subplots=True)

for ax in axes:
    ax.set_ylabel('Daily Totals (GWh)')

In [None]:
# Resample by looking at weekly median

# Specify the data columns we want to include (i.e. exclude Year, Month, Weekday Name)
data_columns = ['Consumption', 'Wind', 'Solar', 'Wind+Solar']

# Resample to weekly frequency, aggregating with median
opsd_weekly_mean = opsd_daily[data_columns].resample('W').median()
display(opsd_weekly_mean.head(3))

# Plot daily and weekly resampled time series together
# Start and end of the date range to extract
start, end = '2017-01', '2017-06'

fig, ax = plt.subplots(figsize=(8,4))
ax.plot(opsd_daily.loc[start:end, 'Solar'],
marker='.', linestyle='-', linewidth=0.5, label='Daily')
ax.plot(opsd_weekly_mean.loc[start:end, 'Solar'],
marker='o', markersize=8, linestyle='-', label='Weekly Mean Resample')
ax.set_ylabel('Solar Production (GWh)')
ax.legend()
plt.show()

# Unit Testing
Based on Rubik's code tutorial: https://rubikscode.net/2021/05/24/test-driven-development-tdd-with-python/

A unit test is a piece of code that tests another piece of code.

### Rick & Morty Example
Many versions of the charcters Rick & Morty exist in different dimensions/universes. The Citadel is place where all the different Rick and Mortys have form a society. We want to be able to assign Ricks and Mortys a universe, and add residents to the Citadel.

<br> First we need to create a Rick class and a Morty class. So we first write the test and then the class.

In [None]:
import unittest

class RickTests(unittest.TestCase):
    def test_universe(self):
        rick = Rick(111)
        self.assertEqual(rick.universe, 111)

In [None]:
if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False) ## arguments allow for running in jupyter notebook

We didn't define the Rick class yet so it should error!

In [None]:
class Rick(object):
    def __init__(self, universe):
        self.universe = universe

Now the unit test calls the function and should Pass!

In [None]:
if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False) ## arguments allow for running in jupyter notebook

Now create the Morty Test and Implemenataion Classes.

In [None]:
class MortyTests(unittest.TestCase):
    def test_universe(self):
        morty = Morty(111)
        self.assertEqual(morty.universe, 111)

In [None]:
class Morty(object):
    def __init__(self, universe):
        self.universe = universe

In [None]:
if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False) ## arguments allow for running in jupyter notebook

#### Cool both tests now pass!


Now want to make two functions that test: 
1. Returning a list of all residents
2. Adding of new residents

In [None]:
class CitadelTests(unittest.TestCase):
    def test_get_all_residents(self):
        citadel = Citadel()
        residents = citadel.get_all_residents()
        self.assertCountEqual(residents, [])
        
    def test_add_resident(self):
        citadel = Citadel()
        rick = Rick(111)
        morty = Morty(111)
        
        citadel.add_resident(rick)
        citadel.add_resident(morty)
        residents = citadel.get_all_residents()
        
        self.assertEqual(residents[0], rick)
        self.assertEqual(residents[1], morty)

Next we need to implement Citadel class that contains functions to get a list of residents and add new ones.

In [None]:
class Citadel(object):
    def __init__(self):
        self._residents = []
        
    def get_all_residents(self):
        return self._residents
    
    def add_resident(self, resident):
        self._residents.append(resident)

Now we run the test to confirm that all functions return the expected values

In [None]:
if __name__ == '__main__':
    unittest.main(argv=['first-arg-is-ignored'], exit=False) ## arguments allow for running in jupyter notebook

#### Tutorial continues to show how you can then turn all Ricks with assigned Mortys into <font color=green size =8>pickles!</font>
https://rubikscode.net/2021/05/24/test-driven-development-tdd-with-python/

# Testing & Debugging
#### Author: Owen Chapman

In [None]:
################
# Code execution is sequential*
################

def some_function(string):
    return string+string
def some_other_function(string):
    s=nested_function(string)
    return s+s
def nested_function(string):
    return "asdf"+string

foo = "foo"
bar = "bar"
foobar = foo+bar
foobar = some_function(foobar)
foo = some_other_function(bar)
foo = 11

# What are foo, bar and foobar now?
# Solution:
print(foo)
print(bar)
print(foobar)

In [None]:
######################
# Bug-finding exercise
######################
# From rosalind.info http://rosalind.info/problems/ba3a/
# kmer composition
def kmer_composition(string,k):
    '''
    Given a string and an integer k, return all k-mers in the string.
    Inputs:
        string (str)
        k (int): length of the k-mer
        NOTE: THIS IS A BUGFIXING EXERCISE, THIS FUNCTION IS INCORRECT AS WRITTEN.
    '''
    for i in range(len(string)- k):
        print(string[i:i+k])

kmer_composition("CAATCCAAC",5)

In [None]:
######################
# Reading the stacktrace
######################
import math
def sqrt(value):
    return math.sqrt(value)
sqrt("sixteen")

In [None]:
#################
# Raise a warning
#################
import warnings
warnings.simplefilter(action="default") # Required for VS Code, idk why.

# Raise a warning
def double(value):
    if isinstance(value, str):
        warnings.warn("Input was a string. Result may be unexpected.") 
    return value+value
print(double(4))
print(double("four"))

In [None]:
#############################
# Raising warnings and errors 
#############################
# Raise an error
def only_accepts_strings(value):
    if not isinstance(value,str):
        raise(TypeError("This function requires string input."))

only_accepts_strings(5)