<a href="https://colab.research.google.com/github/grettadarmstrong/github-slideshow/blob/master/AIBootCampIntro2Python.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **APPLIED PYTHON 101**

**Authors**: 
- Dr. Jany Chan, The Ohio State University
- Dr. Chaitanya Kulkarni, The Ohio State University
- Prof. Raghu Machiraju, The Ohio State University

---

## Context 

The material here was developed by the authors for a [professional masters course in data analytics](https://tdai.osu.edu/education/masters-translational-data-analytics). The enrolled students are often from all academic backgrounds. MDs, PharmDs, MBAs, etc. The goal of that program is to teach to **_data story telling_** in context.

---

## Objectives
The main goal of this notebook is to extend and apply concepts covered in the prerequisite [Kaggle Python tutorial](https://www.kaggle.com/learn/python). If you have prior experience coding in Python, then you can skip straight to the exercises in **Part B**. If you are less familiar Python, we strongly suggest exploring each section below and running the provided example code blocks. **Focus on understanding why each line of code works and how the output can be changed**: you can add a new block using the "+ Code" button to test your own modifications.  

**Learning to code in Python is similar to learning to speak a new language. It will take time and hands-on practice.** Googling unknown terms and phrases is a useful strategy to help you quickly find answers and debug errors. Sites like [StackOverflow](https://stackoverflow.com/), [Towards Data Science](https://towardsdatascience.com/), and [Kaggle](https://www.kaggle.com/) usually have high-quality posts and troubleshooting suggestions, especially related to data analysis and machine learning applications. 

Additionally, **We strongly encourage you to use the pertinent Scarlet Carmen pages to discuss problems and points of uncertainty**. If you're stuck on a problem, it's very likely that someone else in the course is having/had a similar issue.

---

**Note**: Additional (optional) background information is provided for you in the blue URLs.

**Note**: Since you'll need to modify the code in this Colab Notebook, please make a copy of the Notebook for your use. Go to **File > Save a copy in Drive** and then work on that copy.

---



# [S] Part A: Opening the black box: Data Structures to Functions
## Motivating Question: What happens when we need to write our own code???
Using pre-built packages is easy, but writing our own code from scratch can quickly become much more challenging. Here, we'll take a bottom up approach, starting with the primitive data structures and operators, then the built-in data structures, both of which form the foundation of Python code. This section is an application of the concepts covered in the Kaggle Python tutorial and requires a basic understanding of Python. Future modules will build upon the concepts highlighted here.


---


#### Organization of Python Code: Data Structures to Functions
- Package or libraries
- Modules
- Classes
- **Functions**: a reusable, modular workflow in Python code
- **Scripts or code snippets**: single-use, simple code or code fragments
- **Data Structures**
  - **Primitive**
  - **Non-primitive**


---

**Note**: Machine Learning requires humans (i.e. specifically us as data analysts) to **translate human-understandable data**, like words, images, and audio, **into machine-understandable formats**, such as data structures, numbers, and network graphs. This notebook focuses on several basic data transformations and manipulations that can occur in Python.

## A.1 Quick Python review:

You are probably familiar with the primitive data structures from R, which are very similar to the Python primitives: 
* integers
* floats
* strings
* Boolean (`True` and `False`)
* Special case: `None`



In [None]:
# Integers
4
4376

# Floating-point numbers
0.34
1.1352476
10.0

# Strings
'This is a string!'
'a'
'12 cats'
'12334235624'

# Boolean
True
False

# Special Case
None

### A.1a Type Casting

We can convert between some data representations using type casting. Check the data type using the function `type()`. 

**Note**: Excel data is notorious for mixing numerics and strings, which needs to be resolved prior to analysis.

In [None]:
x = 123
y = '123'

print(x, type(x))
print(y, type(y))

# Currently, Line 9 throws an error. Use type casting in Line 9 to resolve the
# conflict; there are 2 expected answers: (1) 123123 and (2) 246.
z = x + y
print(z, type(z))



123 <class 'int'>
123 <class 'str'>


### A.1b Operators

Basic Python operators fall into 7 categories: 

* Arithmetic Operators - `+`, `-`, etc.
* Relational Operators - `!=` "is not equal", `==` "is equal", `<`, `<=`, etc.
* Assignment Operators - `=`
* Logical Operators - `and`, `or` are used to combine relational operations
* Membership Operators - `in`, `not in` are used to check if A is in B
* Identity Operators - `is`, `is not` checks if A and B are the same object
* Bitwise Operators - (outside the scope of this course)


---

Python also uses a shortcut to perform arithmetic and assigment operations in one step, since this is a very common action when coding.



![picture](https://drive.google.com/uc?id=16SFddjb55H_ugjkVvtcqASDsqxI1i_Up)

In [None]:
# Expressions are a combination of variables, values, and operators that can be
# evaluated as a single value
3**2
True and False
'dog' + 'toy'
4 / (12 + 3.2) * 5 - 10

# Statements do something (e.g. assign the value of an expression to a variable)
import math
result = 3**2


In [None]:
import math

# Assignment statements link a variable to a Python object using `=`. 
# Nearly everything in Python is an object.
x = 5
my_var = 'Hello, World!'
myVar = 12.6457 + math.pi
var1 = True

# Special case: when basic arithmetic is used to modify the value of a variable
# and then assign the new value to the original variable 
x = x + 1

# This operation can also be written using the following shortcut
x += 1

# Why does x = 7?
x



7

It's very important to understand the difference between the identity operator `is` and the relational operator `==`. **They are different operations and using them incorrectly can lead to subtle, erroneous outputs in your analyses.** 

* `is` returns `True` if two variables refer to the same object (stored in memory). 
* `==` returns `True` if the objects referred to by the variables are equivalent in value.

In [None]:
from copy import copy

# Here, both `a` and `b` point to same object in memory --> the list [1, 2, 3]
a = [1, 2, 3]
b = a

print(b is a )  # `a` and `b` refer to the same object
print(b == a)   # `a` and `b` have the same values

# Because `a` and `b` both refer to the same list, changes to one affect the other
a[0] = 'Oops we overwrote the original data'

print(f'\nWe made a change to a: {a}')
print(f'But the changes also affect b: {b}')
print('because they refer to the same object in memory.\n')





True
True

We made a change to a: ['Oops we overwrote the original data', 2, 3]
But the changes also affect b: ['Oops we overwrote the original data', 2, 3]
because they refer to the same object in memory.



In [None]:
# Now, we make a copy of `a` and assign it to `b`. 
# Keep in mind that we need 2x memory because there are 2 separate objects
a = [1, 2, 3]
b = copy(a)

print(b is a)  # `b` is NOT the same object as `a`
print(b == a)  # `b` has the same values as `a`

# Since `a` and `b` refer to different objects...
a[0] = 'Oops we overwrote the original data'

print(f'\nWe made a change to a: {a}')
print(f'This does NOT affect b: {b}')
print('because they refer to 2 different objects in memory.')


False
True

We made a change to a: ['Oops we overwrote the original data', 2, 3]
This does NOT affect b: [1, 2, 3]
because they refer to 2 different objects in memory.


### A.1c Membership Operators

Recall that the membership operator `in` allows us to quickly check if an element is in a collection. List comprehension allows you to use a shorter syntax when you want to create a new list based on the values of an existing list depending on a specific condition. These two concepts can be combined into an efficient method for filtering or creating a new collection. You can use the same approach for Strings, Sets, and Dictionaries.

In [None]:
# To quickly check if an object is present in a collection, use the membership 
# operator 'in'. As a conditional expression, this is formatted as `<part> in <whole>`
# and evalutes to either TRUE or FALSE

list1 = [0, 1, 2, 3, 4, 3, 2, 1, 0]
print(3 in list1)

name = 'Applied Machine Learning'
search_term = 'Machine'
print(search_term in name)



True
True


### A.1d Chaining and Relational Operators


In [None]:
n = 100

# The comparison in Line 4...
result1 = 1 < n and n < 200

# Is the same as the comparison in Line 7.
result2 = 1 < n < 200

print(f'Line 4 evaluates to {result1}, and Line 7 evaluates to {result2}')



Line 4 evaluates to True, and Line 7 evaluates to True


## A.2 Control or Logic Flow in Python code

### A.2a Review - IF/ELSE, WHILE, and FOR loops

Python is an interpreted programming language. The code is executed sequentially line by line. However, there are control structures that can cause the Python interpreter to skip or repeat parts (AKA blocks) of code. The most common ones are `if/else` statements, `while` loops, and `for` loops. 

Let's use `if/else` as an example. 

* The first line always contains a comparison operation that evaulates to either `True` or `False`. 
* If `True`, the Python interpreter is directed to a new block of code (note the indentation!) and begins executing the new block. When it reaches the end of the new block, it returns to the next line in the original block. 
* If `False`, then the interpreter skips the indented block and continues executing the current block.

Note: **Pay attention to the indentation of each line.** Blocks can be nested inside other blocks.

In [None]:
from random import randint

x = 8
result = "Thursday"

# If/Else Statement - careful with setting your condition: only 1 block will be
# executed in an if/else
if x > 0:
    result = "Friday"


if x > 5:
    result = "Saturday"
else:
    result = " Tuesday"





print(f"Since x = {x}: homework is due on {result}")


Since x = 8: homework is due on Saturday


Let's say we need to perform a task multiple times. Here, we can use `while` loops. 

* The first line always contains at least one comparison operation that evaulates to either `True` or `False`. 
* If `True`, the Python interpreter is directed to a new block of code (note the indentation!) and begins executing the new block. When it reaches the end of the new block, it jumps back to the first line and checks if the condition is still `TRUE`. 
* If `FALSE`, the interpreter skips the indented block and continues to the next line.

In [None]:
from random import randint

# While Loop - careful with setting your stop condition: infinite loops can run forever!
x = randint(5, 10)
y = 1
while x > 0 and y < 6:
    print(x, y)
    x -= 1  # Here, we decrease x by 1
    y += 2  # Here, we increase y by 2

print('We have exited the While loop!')

6 1
5 3
4 5
We have exited the While loop!


`For` loops work in a similar fashion; however, they only loop a specified number of times rather than checking for a condition. 

* The first line specifies the number of times, usually by setting a range using the format `(start value, stop value, step size)`. 
* The stopping point is EXCLUDED. 
* If the step size isn't specified, a default value of 1 is used.

For example, `range(5, 10)` would start at 5, increment by 1, and stop after reaching 9.

In [None]:
# For Loop - no conditional, will always enter the indented code block

for x in range(10) 
    print(x)
    
# What happens if you don't specify the starting value?





0
1
2
3
4
5
6
7
8
9


`For` loops can also be formated as:

for `my_variable_name` in `my_data_structure`

In [None]:
# For loops can also be used to iterate through a collection of Python objects
my_list = ['cat', 'dog', 1, 100, 50.5050]  # Here, we use a Python data structure called a `list`

for x in my_list:  # Each object in the list is temporarily assigned to the variable `x`
    print(x)       # Which allows us to "do something" with x inside the FOR loop


cat
dog
1
100
50.505


In [None]:
# This is a code snippet with 3 levels of indentation and 2 loops:
# - WHILE block
# - FOR block
# - IF block
# - ELIF block
# - ELSE block

x = 6
print(f"x = {x}: before WHILE\n")

while x > 0:
    for i in range(x):
        if i > 3:
            print(f"x = {x}, i = {i}: IF block")
        elif i <= 3 and i > 1:
            print(f"x = {x}, i = {i}:   ELIF block")
        else:
            print(f"x = {x}, i = {i}:       ELSE block")
    print()
    x -= 2
print(f"x = {x}: after WHILE\n")

# Can you trace the logic of each block? 
# What happens if you change the conditional expressions?

x = 6: before WHILE

x = 6, i = 0:       ELSE block
x = 6, i = 1:       ELSE block
x = 6, i = 2:   ELIF block
x = 6, i = 3:   ELIF block
x = 6, i = 4: IF block
x = 6, i = 5: IF block

x = 4, i = 0:       ELSE block
x = 4, i = 1:       ELSE block
x = 4, i = 2:   ELIF block
x = 4, i = 3:   ELIF block

x = 2, i = 0:       ELSE block
x = 2, i = 1:       ELSE block

x = 0: after WHILE



### A.2b Functions

A function is a block of code which only runs when it is called. You can pass data (known as parameters, arguments, or input) into a function, which can then transform or modify the data and return an output or result.

This further complicates how code is evaluated. Normally, this occurs sequentially line by line. As we saw earlier, that sequential order can be modified by conditional loops and IF/ELSE blocks. Now, if a function call is present, that causes code evaluation to jump to the function before returning to the next sequential line.

For example, see the code block below. Code evaluation always starts on Line 1. But when we reach the function call on Line 24, which calls `gets_a_raise()`, the Python interperter jumps to Line 5 and begins evaluating the code until Line 7, after which, it jumps back to finish Line 24.


---


**Note**: There are several conventional methods for organizing Python code. The following is a commonly used for shorter pieces of code, like scripts:
1. Import statements
2. Definitions for custom classes and functions
3. Variable assignments
4. The bulk of the code

The main reason for this is to enable easy identification and access to the parts that you might change later on.

In [None]:
# Import statements - usually doing this once per notebook is sufficient
from random import randint

# Sometimes we need a custom function
def gets_a_raise(start_date):
    print('\tIn the function:', start_date)
    return start_date < 2015


# Define our variables
employees = ['Alice', 'Bob', 'Charlie', 'Dave']
hired = []

# Generate our data
for e in employees:
    year = randint(1990, 2021)
    hired.append(year)

print(hired)

# Apply our analysis
for i, year in enumerate(hired):
    print('For loop:', i)
    if gets_a_raise(year):
        print(employees[i], 'gets a raise.')
    else:
        print(employees[i], 'does not get a raise')
    print('\tOut of the function:', year)

# Note the use of comments to outline what you are doing. This helps:
#   (1) plan out how to approach coding
#   (2) documents what you've done for future reference


[1998, 1993, 1999, 2003]
For loop: 0
	In the function: 1998
Alice gets a raise.
	Out of the function: 1998
For loop: 1
	In the function: 1993
Bob gets a raise.
	Out of the function: 1993
For loop: 2
	In the function: 1999
Charlie gets a raise.
	Out of the function: 1999
For loop: 3
	In the function: 2003
Dave gets a raise.
	Out of the function: 2003


### A.2c Variable Scope

When is a variable accessible by the code? This is determined by its **scope**.


*   Built-in
    * This is the widest scope (i.e. they can be called anywhere in any program without needing to be defined). 
    * Examples: the Python keywords, such as `def`, `while`, `True`, etc.

*   Global scope
    * Can be explicitly set by the keyword `global`
    * Variables defined outside of any functions is accessible from anywhere within the specific program that defined it.
    * Examples: 'employees' and 'hired' are global variables

*   Enclosing scope
    * This is specific to nested functions, which are rare, complex cases and we will not cover this.

*   Local scope
    * Variables defined within a function
    * Only accessible during a function call

![picture](https://drive.google.com/uc?id=1HgXTUWyG0C6w8eR9izk1Qr80_PmqYS5I)

In [None]:
# Import statements - usually doing this once per notebook is sufficient
# Keywords like `import`, `def`, `class` are built-in
from random import randint

# Sometimes we need a custom function
def gets_a_raise(start_date):
    #start_date = 800  # This is a local variable
    print('\tInside the function:', start_date)
    return start_date < 2015


# Define our variables. 
# These are global variables
employees = ['Alice', 'Bob', 'Charlie', 'Dave']
hired = []

# Generate our data
for e in employees:
    year = randint(1990, 2021)
    hired.append(year)

print(hired)
print()

# Apply our "analysis"
for i, year in enumerate(hired):
    print('For loop:', i)
    print(employees[i], 'gets a raise.' if gets_a_raise(year) else 'does not get a raise')
    print('\tOut of the function:', year)
    print()

# Remove the `#` from the beginning of Line 7.
# What happens if you try to access the variable start_date outside of the function?
# Can you access the variable `hired` inside the function?

[2014, 2006, 1996, 2005]

For loop: 0
	Inside the function: 2014
Alice gets a raise.
	Out of the function: 2014

For loop: 1
	Inside the function: 2006
Bob gets a raise.
	Out of the function: 2006

For loop: 2
	Inside the function: 1996
Charlie gets a raise.
	Out of the function: 1996

For loop: 3
	Inside the function: 2005
Dave gets a raise.
	Out of the function: 2005



## Part B. Exercises with Non-primitive Data Structures

**Note**: Stuck at any point? Look up the following resource [Lists, Tuples, Sets, Dictionaries](https://learning.oreilly.com/library/view/python-for-data/9781491957653/ch03.html#tut_data_structures).

Also, one of the key strengths of Python is the ability to use dynamic typing to transform data from one data structure to a more useful data structure. What are the key characteristics, features, and functions associated with each data structure?

### B.1 Lists
**Note**: Be familiar with slicing as this will be important for handling data in basic Python strings, NumPy arrays, and Pandas DataFrames in later modules.

![picture](https://drive.google.com/uc?id=1BbZQMwTyp6kg_ISVAY5sPnRBKWfrlaAx)


In [None]:
# A list is an ordered sequence of elements denoted by `[]`. 
list1 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# We can access each element by its index, which starts at 0 -> n.
var1 = list1[0]
print(f'Using indexing (Line 5), the first element is {var1}\n')


# What about negative indexing?
var2 = list1[-1]
print(f'Using negative indexing (Line 10), the last element is {var2}\n')


# Slicing a list - Access a range of elements from [start : stop)
# Note: slicing INCLUDES the start index but EXCLUDES the end index.
slice1 = list1[2:5]
print(f'We can isolate part of a list via slicing (Line 16): {slice1}\n')


# If the slice starts at Index 0, the start position can be implicit
slice2 = list1[:5]  # equivalent to `list1[0:5]

# If the slice ends with the last index, the end position can be implicit
slice3 = list1[5:]  # equivalent to `list1[5:10]'

print(f'In special cases, the start or end index can be implied: ')
print(f'See Line 21 for {slice2} which spans Index 0 through 4, excluding the end at Index 5')
print(f'and Line 24 for {slice3} which spans Index 5 through the end.\n')



Using indexing (Line 5), the first element is 0

Using negative indexing (Line 10), the last element is 9

We can isolate part of a list via slicing (Line 16): [2, 3, 4]

In special cases, the start or end index can be implied: 
See Line 21 for [0, 1, 2, 3, 4] which spans Index 0 through 4, excluding the end at Index 5
and Line 24 for [5, 6, 7, 8, 9] which spans Index 5 through the end.



In [None]:
list1 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Slicing actually has a third value (step size), formatted [start : stop : step]
print('To skip every other value:')
skip_even = list1[::2]
print(skip_even)


# Using list1, how would you skip the even numbers?
print('\nWhat if we want the values in the odd indices:')

skip_odd = list1[::]  # Modify this line
print(skip_odd)


# Negative indexing can be used to reverse the values in a list
print('\nTo quickly reverse a list, use negative indexing:')
print(list1[::])  # What start, stop, and step values are implied here?
print(list1[::-1])



To skip every other value:
[0, 2, 4, 6, 8]

What if we want the values in the odd indices:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

To quickly reverse a list, use negative indexing:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


In [None]:
list1 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
list2 = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

# We can use FOR loops to iterate through the elements in a list.
for item in list1:
    if item % 2 != 0:
        item += 100  # Do something to the element at index i
print(list1, '\n')


# We can also iterate through a list by index. 
# Note: Python functions can be nested; they are evaluted from the inner most 
# function to the outer most function. The function `len()` is given `list1` as
# an input. This returns the length of list1 (i.e. 4) which is then used as an 
# input for the function `range()`. This returns a sequence of integers from 0
# to the input value, 4.
for i in range(len(list1)):
    if i % 2 == 1: list1[i] += 200
print(list1, '\n')  


# Sometimes we need to access both the value and the index of an element, 
# usually for cross-referencing or indexing into another list. The `enumerate()`
# function is a useful trick here:
for i, item in enumerate(list1):
    if 2 < item < 6:
        print(f'At index {i}, the value in list1 is {item}')
        print(f'and the corresponding value in list2 is {list2[i]}.\n')


# Points to ponder:
# 1. Why is using `range(len(list))` better than using `range(4)`?
# 2. Here, we chose to use a FOR loop. Could this be done using a WHILE loop?
# 3. How can we do this in 1 line of code using list comprehension?


[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] 

[0, 201, 2, 203, 4, 205, 6, 207, 8, 209] 

At index 4, the value in list1 is 4
and the corresponding value in list2 is e.





---



#### Problem 1 (mild):

Find the smallest number in the list. 

**Input:**

```
nums = [4, 16, -21, -49, 3, 821, 8, 27, -2, 74, -81, 0, 5]
```

**Expected Outputs:**

```
-81
```

In [None]:
# Note the keyword `def`. Write a function that RETURNS the smallest number in a list
# Hint: Consider using a FOR or WHILE loop
def find_smallest(my_list):
    # Enter your code here
    return  my_list.min()


#------------------------
# Now call your function using the list `nums` as a parameter
nums = [4, 16, -21, -49, 3, 821, 8, 27, -2, 74, -81, 0, 3] 





---



#### Problem 2 (medium):

Given two lists, return a list of all elements that are common (without duplicates) between the two. 

**Input:**

```
a = [2, 15, 8, 10, 1, 18]
b = [6, 2, 10, 19, 8, 4, 2, 20, 2]
```

**Expected Output:**
```
[2, 8, 10]
```

In [None]:
def common_ele(lista, listb):
    # Enter your code here


    
    return


#------------------------
# Now call your function using the lists `a` and `b` as parameters
a = [2, 15, 8, 10, 1, 18]
b = [6, 2, 10, 19, 8, 4, 2, 20, 2]






---



#### Problem 3 (spicy):
For any given a list of characters, check if it is a palindrome or not.

> Palindrome: A palindrome is a word, number, phrase, or other sequence of characters which reads the same backward as forward, such as _madam_, _racecar_.

**Input:**

```
s1 = ['n', 'o', 'o', 'n']
s2 = ['k', 'a', 'y', 'a', 'k']
s3 = ['p', 'a', 'l', 'i', 'n', 'd', 'r', 'o', 'm', 'e']
```

**Expected Output:**

```
noon is a Palindrome
kayak is a Palindrome
palindrome is not a Palindrome
```

In [None]:
# Either a WHILE or FOR loop can work here
# Either print or return can work here
# To concatenate all the strings in a list, use this: ''.join(my_list_here)
def palindrome(input):
    # Enter your code here


    
    return




#------------------------
# Now call your function using the inputs below
s1 = ['n', 'o', 'o', 'n']
s2 = ['k', 'a', 'y', 'a', 'k']
s3 = ['p', 'a', 'l', 'i', 'n', 'd', 'r', 'o', 'm', 'e']







---
### B.2 Sets


In [None]:
# A set is an unordered collection of UNIQUE elements denoted by `{}`. 
# Note: Sets are unordered; we cannot use an index to access an item.
set1 = {'biology', 'chemistry'}
set2 = {1999, 2000, '2021', '2050'}


# We can add elements using `.add()`
set1.add(2022)
print(f'Add 2022: {set1}\n')


# We cannot concatenate sets because they are unordered, instead we union sets
new_set = set1.union(set2)
new_set


Add 2022: {'chemistry', 'biology', 2022}



{1999, 2000, '2021', 2022, '2050', 'biology', 'chemistry'}

In [None]:
# If we know an element is in a set, we can explicitly remove it.
if 2021 in set1:  # Recall the keyword 'in' from Operators
    set1.remove(2021)
    print(f'Remove 2021: {set1}\n')
else:
    print('2021 was not found.\n')  # Why didn't we find 2021?


# Note: The IF/ELSE block above can be condensed using a ternary operator:
print (f'Remove 2021: {set1}\n' if 2021 in set1 else '2021 was not found.\n')


# Or we can randomly remove an element
if len(set1) > 0:  
    var = set1.pop()
    print(f'We removed \'{var}\' from {set1}', '\n')


# Points to Ponder
# 1. Why is it important to check if an element is in a collection before removing it?
# 2. Why isn't `2021` found in the set? 
# 3. Why do we need to check the size/length of a collection before removing?


2021 was not found.

2021 was not found.

We removed 'chemistry' from {'biology', 2022} 



---
#### Problem 1 (mild): 

Given a list, check if it has duplicates. Hint: Consider type casting. What is a key feature of sets that is different from lists?

**Input:**

```
b = [31, 11, 1, 9, 72, 4, 3, 20, 31]
```

**Expected Output:**
```
True
```

In [None]:
def has_duplicates(list1):
    # Enter your code here


    return 

#------------------------
# Now call your function using the inputs below
b = [31, 11, 1, 9, 72, 4, 3, 20, 31]



---
#### Problem 2 (medium):
The set `S` originally contained numbers from `1` to `n`, where `n` is the length of `nums`. 

As we were shifting our data from a set to a list, there was a data entry error: one number in `nums` was duplicated thus overwriting the original value. In this case, `3` was duplicated and the value `5` was lost. Note that the size of the set (cardinality) remains the same.

Given the erroneous list `nums` and the original data in `s`, can you write a function that identifies the duplicated number and the missing number?

**Input:**
```
s = {1, 2, 3, 4, 5}
nums = [1, 2, 3, 3, 4]
```
**Expected Output:**

```
3 5
```
**Explanation**: The first number that you return is the number that got duplicated. In this case it was `3`. Now there will be two '3's and hence displacing the last number in the set. The number that it replaced was `5`.


---


**Note**: Your function should be a general solution (i.e. it should work on any set with the same type of error)
```
s = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
nums = [1, 2, 3, 4, 5, 6, 6, 7, 9, 10, 11]
```
**Expected Output:**

```
6 8
```



In [None]:
# Convert list nums into a another set nums
# Find disjoint or s-nums and return the element in s that is missing in nums
# To find the duplicated number use a version of Problem 3 of Lists.

def fix_nums(set1, list1):
    # Enter your code here, Lines 7-8 are placeholders
    duplicate = None
    displaced = None



    # Note: you can return multiple objects
    return duplicate, displaced

#------------------------
# Now call your function using the inputs below
s = {1, 2, 3, 4, 5}
nums = [1, 2, 3, 3, 4]

# Note: you'll need to assign 2 variables to this function
dup, dis = fix_nums(s, nums)


---
#### Optional - Problem 3 (extra spicy): 
(Extension of Problem 2)

Given a unordered list containing numbers repeating exactly _**K**_ times and one unique number, find that unique number.

For example: for K = 2, and for the given list below
```
list1 = [5, 6, 9, 6, 10, 5, 10]
```
`5`, `6`, and `10` repeat exactly `K = 2` times, but `9` is unique. So `9` is the answer.

**Input:**
```
K = 4
list1 = [5, 23, 6, 10, 10, 411, 23, 6, 11, 23, 6, 5, 411, 10, 2, 5, 10, 2, 2, 23, 6, 5, 411, 411, 2]
```

**Expected Output:**
```
11
```

> Hint: How can you leverage the fact that duplicate numbers always repeat exactly K times ?

In [None]:
# K * SUM(SET(LIST1)) - SUM(LIST1) = K-1*UNIQUE_NUMBER
# UNIQUE_NUMBER = (K * SUM(SET(LIST1)) - SUM(LIST1)) / K-1

# First find sum of elements in list1 
# Then convert list1 to a set 
# Use the inbuilt function "sum()" to get the total value of the elements in set
# Compute the unique_number using the formula above
def find_unique(k, my_list):
    # Enter your code here


    return

#------------------------
# Now call your function using the inputs below
K = 4
list1 = [5, 23, 6, 10, 10, 411, 23, 6, 11, 23, 6, 5, 411, 10, 2, 5, 10, 2, 2, 23, 6, 5, 411, 411, 2]





### B.3 Tuples

#### Problem 1 (mild):

Write a function to find the tuple by searching for `n` in a given list.

**Input:**
```
n = 5.12
l = [('item1', 1.20), ('item2', 5.12), ('item3', 4.58)]
```

**Expected Output:**
```
('item2', 5.12)
```

In [None]:
# Loop over the list `l` 
# Check if the second element matches n
# If it matches return that tuple.

def find_n(list1, find_me):

    return


#------------------------
# Now call your function using the inputs below
n = 5.12
l = [('item1', 1.20), ('item2', 5.12), ('item3', 4.58)]




#### Problem 2 (medium):
Write a function to reverse a tuple

**Input:**

```
tup = (13, 24, 1, 5, 18)
```

**Output:**
```
tup = (18, 5, 1, 24, 13)
```

In [None]:
# Note: this should work with tuples of any length
# Note: a key feature of tuples is that they are IMMUTABLE

def flip(tuple1):

    return

#------------------------
# Now call your function using the inputs below
tup = (13, 24, 1, 5, 18)





#### Optional - Problem 3 (extra spicy):

Given a list of integers, and a number, find all good triplets.

A triplet `(l[i], l[j], l[k])` is **good** if the following conditions are true:

- `0 <= i < j < k <= len(l)`
- `|l[i] - l[j]| <= n`
- `|l[j] - l[k]| <= n`
- `|l[i] - l[k]| <= n` 

where `n` denotes the absolute value.

Return a tuple of all the **good** triples.

Input:

```
n = 3
l = [4, 1, 0, 1, 7, 9]
```

Expected Output:

```
[(4, 1, 1), (1, 0, 1)]
```


In [None]:
# Use 3 nested for loops with indices i, j, k
# Within each loop, access the elements of list list[i], list[j],list[k]
# Check if the triplet conditions are met
# Identify the good i,j,k combinations and form tuples
# Add them to a list to return

def good_trip(abs_value, list1):
    # Enter your code here


    return

#------------------------
# Now call your function using the inputs below
n = 2
l = [3, 0, 1, 1, 9, 7]




### B.4 Dictionaries

In [None]:
# A dictionary is an indexed collection of key:value pairs denoted by {}
my_dict =	{   # key: value
  "coffee": "latte",
  "flavor": "peppermint",
  "size": "extra large"
}

# We can access a value by its key
x = my_dict["coffee"]
print(x)

# Values can be changed in a similar manner
my_dict["coffee"] = "espresso"
print(my_dict)

# And new elements can be added
my_dict["refill"] = "yes"
print(my_dict)

# Or removed... what should be done before removing something from a collection?
my_dict.pop("size")
print(my_dict)



latte
{'coffee': 'espresso', 'flavor': 'peppermint', 'size': 'extra large'}
{'coffee': 'espresso', 'flavor': 'peppermint', 'size': 'extra large', 'refill': 'yes'}
{'coffee': 'espresso', 'flavor': 'peppermint', 'refill': 'yes'}


In [None]:
# Note: For dictionaries, we need to assign 2 variables: the first for the key
# and the second for the value.
for x, y in my_dict.items():
  print(x, y) 




coffee espresso
flavor peppermint
refill yes


#### Problem 1 (mild):

Convert the two lists into a dictionary.

**Input:**
```
keys = ['alpha', 'beta', 'charlie']
values = ['a', 'b', 'c']
```
**Expected Output:**

```
d = {'alpha': 'a', 'beta': 'b', 'charlie': 'c'}
```


In [None]:
# You can use function `zip()` to create key-value tuple pairs
# Create an empty dictionary
# Use a FOR loop to fill the dictionary with values

# No need for a function, just write a code snippet for this question
keys = ['alpha', 'beta', 'charlie']
values = ['a', 'b', 'c']







#### Problem 2 (medium):

Write a function `histogram()` that takes a string and builds a frequency listing of the characters contained in it. Represent the frequency listing as a Python dictionary and print the in sorted order on 'key'.

Input:
```
s = "oneredpaperclip"
```
Output:
```
{'o': 1, 'n': 1, 'e': 3, 'r': 2, 'd': 1, 'p': 3, 'a': 1, 'c': 1, 'l': 1, 'i': 1}
```

In [None]:
# Collect all characters from string
# Count them
# Create the dictionary
# Enter the keys as letters and numbers as values. 

def histogram(string1):
    # Enter your code here


    return


#------------------------
# Now call your function using the input below
s = "oneredpaperclip"








---


# Python 101: Putting it all together


---


We noted that Python:


1.   is object-oriented
2.   encourages code modularity
3.   allows dynamic typing
4.   allows dynamic binding

**But what does all this really mean in practice?**


In [None]:
# Almost everything in Python is an object, meaning variables, function, data 
# structures, etc. can each be treated as one 'blob' without worrying about its
# parts. The inputs set1, list2, and the basic string3 are all objects. 
set1 = {n * 3 for n in range(20000, 200000, 7) if n % 6 == 0}
list2 = [m + 'no' for m in 'bibbity bobitty boo']
string3 = 'racecar'
s1 = ['n', 'o', 'o', 'n']
s2 = ['k', 'a', 'y', 'a', 'k']
s3 = ['p', 'a', 'l', 'i', 'n', 'd', 'r', 'o', 'm', 'e']

# I can then create a list (another object) and populate it with these datasets
# without needing to consider the individual elements of each.
my_inputs = [set1, list2, string3, s1, s2, s3]

# An example of code modularity is grouping all the palindrome code into a single
# function. Here's a more complex palindrome function that can handle multiple 
# input types:
def palindrome(input):
    exit_flag = False
    for item in input:
        if type(item) is not str:  # Dynamic typing allows conversion between data structures
            item = str(item)
        if len(item) == 1: 
            item, exit_flag = input, True  # Dynamic binding allows (re-)assignment "on-the-fly"
        if item == item[::-1]: 
            print(item if type(item) is str else ''.join(item), 'is a Palindrome')
        if exit_flag: break
    print()

# Putting this all together, we can write flexible code that scales efficiently  
# with data. The list `my_inputs` could contain hundreds of different data points. 
# Instead of having to write code explicitly for each input, we just need 3 lines. 
for n, input in enumerate(my_inputs):
    print(f'Dataset {n+1} of type {type(input)} contains {len(input)} items')
    palindrome(input)


# In the next modules, we'll cover advanced data structures that scale even more 
# efficiently with "Big Data" (e.g. NumPy arrays and Pandas DataFrames). 



Dataset 1 of type <class 'set'> contains 4286 items
477774 is a Palindrome
85158 is a Palindrome
225522 is a Palindrome
62226 is a Palindrome
87678 is a Palindrome
64746 is a Palindrome
450054 is a Palindrome

Dataset 2 of type <class 'list'> contains 19 items
ono is a Palindrome
ono is a Palindrome
ono is a Palindrome

Dataset 3 of type <class 'str'> contains 7 items
racecar is a Palindrome

Dataset 4 of type <class 'list'> contains 4 items
noon is a Palindrome

Dataset 5 of type <class 'list'> contains 5 items
kayak is a Palindrome

Dataset 6 of type <class 'list'> contains 10 items

