# MODULE 1: Python Fundamentals

## Part A. Objects

Objects are things that hold value. For example, we can create "BLAHBLAH" and it hold value. 

Objects kind of work like 'nouns' of a pythonic sentence. 

Somewhere in computer memory, it holds this *instance* of "BLAHBLAH".

But how can we keep referring back to our original object that we created?

Variables are also objects, but a special kind that are considered pointers. 

It tells the computer and us like this is the thing we want to change or look at or reference. 

Thus, giving us a lot of flexibility to *mutate* objects.


In [1]:
# FOR EXAMPLE
"BLAHBLAH"

'BLAHBLAH'

In [12]:
our_original_string = "BLAH"
print("OUR ORIGINAL STRING:", our_original_string)

first_char = our_original_string[0]
print('FIRST CHAR VARIABLE VALUE:', first_char)

print('we can also change our pointer to something completely different.')

first_char = 100
print("FIRST CHAR VARIABLE VALUE:", first_char)

print('but our original string stays the same:', our_original_string)

print('or maybe i want to change my original string:')
# I can make it all lowercase
our_original_string = our_original_string.lower()
print('OUR ORIGINAL STRING:', our_original_string)

OUR ORIGINAL STRING: BLAH
FIRST CHAR VARIABLE VALUE: B
we can also change our pointer to something completely different.
FIRST CHAR VARIABLE VALUE: 100
but our original string stays the same: BLAH
or maybe i want to change my original string:
OUR ORIGINAL STRING: blah


There are some common practices when naming variables:
- No starting with a number.
- u cant use python native classes like "int" "str" or functions ... these words will highlight differently when you type them so you'll know.
- try to not be ambiguous with variable names like using single letters or generic words like "foo" or "x". It gets complicated as your code gets more complicated.

### STRINGS basics



In [13]:
# INDEXING A STRING
i = 0  # TODO FOR YOU: change i and run the code to see how it affects the cell output
test_string = 'abcdef'
test_string[i]

'a'

In [16]:
# BASIC built in string functions https://www.w3schools.com/python/python_ref_string.asp 
# try to walk through 2 these doc strings on this test string:
test_string_2 = '  AbCdEfGG      '
# case: lower()/upper()
# find: index()
# strip white space: lstrip/rstrip/strip
# replace a value in the string: replace()
# count number of a substring: count()
# length: len()



### LISTS & Dictionaries

Strings are similar to lists and in some languages are actually the same type of object! (ie. a string is a list of characters.) 

Lists are basically like lists we know in real life: grocery lists, classes lists, excel sheets.

We can add or remove or locate elements in a list.


we use the '[]' to indicate the start of an empty list.


we can use '[1, 2, 3]' to make a list containing 1, 2, and 3!


In [18]:
example_empty_list = []
example_empty_list

[]

In [19]:

example_simple_list = [1, 2, 'a', 'b']
example_simple_list

[1, 2, 'a', 'b']

In [20]:

example_simple_list2 = [test_string_2]
example_simple_list2

['  AbCdEfGG      ']

In [None]:
# COMMON LIST OPERATIONS

# length: len()
# removing: pop(), remove()
# ordering: reverse(), sort()
# adding: append() vs extend()
# accessing: using bracket indexing
# updating: using bracket indexing
# max/min: max(), min()


### DICTs

In [65]:
empty_dictionary = {}
single_dictionary = {'id': 3004}
simple_dictionary = {'id': 3004,
                     'name': 'potato',
                     'attributes': ['brown', 'small', 'yummy when fried']}
simple_dictionary_2 = {1:100,
                       2:200,
                       3:300,
                       4:400,}


In [None]:
# simple dictionary operations
# indexing using dictionary[key] = value
# grabbing keys() and values()
# inverse a dictionary using: 
# loop thruogh a dictionary using dictionary.iteritems()

# ** IMPORTANT RULE WITH DICTS ** you can't have two keys be identical in a dictionary. but you CAN have unique keys point to the same value.


## Part B. Operations

- operations with numbers
- operations with strings
- operations with functions
- loops

In [21]:
import numpy as np # import numpy library for some math functions 
import math

# SIMPOL MATH
# addition, subtraction, multiplication, divide, power, mod (aka remainder),


In [22]:
# try 1 + 1
addition = ...
print('RESULT SHOULD BE 2. addition =', addition)



you can also add strings and lists!

In [24]:
#string concatenation using "+" operator
 
'a' + 'b'

'ab'

In [23]:
# list extension using "+ operator"

[1, 2, 3] + [4]

[1, 2, 3, 4]

In [None]:
# try 10-29

subtraction = ...
print('RESULT SHOULD BE -19. subtraction =', subtraction)

In [None]:
# try 3*2*2
multiplication = ...
print('RESULT SHOULD BE 12. multiplication =', multiplication)

In [None]:
# try divide.

# first: try a regular divide 3/2
regular_divide = ...
print('result should be 1.5. regular_divide = ', regular_divide)

# second: try a integer divide with two // marks (this rounds down to the nearest integer) "3//2"
floor_divide = ...
print('result should be 1. floor_divide =', floor_divide)

In [28]:
# power

# first: regularly 10**2
starstar_power = ...

# second: example using math library. set base and power variables to pass the assertion below.
base = ...
power = ...

math_power = np.power(base, power)

print(starstar_power, math_power)
assert starstar_power == math_power

TypeError: unsupported operand type(s) for ** or pow(): 'ellipsis' and 'ellipsis'

### FUNCTIONS!

most of the time we want to be doing things that are more complex than one liners

We can build out functions to modularize repetitive code to help make our lives easier.

In [30]:
email_data = ['unread'] * 5 # a list of 5 unread emails
email_data

['unread', 'unread', 'unread', 'unread', 'unread']

we want to go through and mark each one as read...

we COULD do the simple way and go through each entry to change it manually like this:

In [32]:
email_data_2 = email_data.copy()
email_data_2[0] = 'read'
email_data_2[1] = 'read'
email_data_2[2] = 'read'
email_data_2[3] = 'read'
email_data_2[4] = 'read'
email_data_2

['read', 'read', 'read', 'read', 'read']

which does the trick but, what if we have 100s if not 1000s of emails?

What if a thing we are looking at is infinite? 

what we can do is create a function & loops!

functions have the following format:

<code>def function_name(inputs):
    *function operations* 
</code>

In [33]:
def mark_read(email_idx):
    email_data[email_idx] = 'read'
    
def mark_all_read(data):
    for i in range(len(data)): # for index from the range (0) to len(email_data)
        mark_read(i)
    return data
        

we can call functions like so:

In [35]:
mark_read(0)
email_data

['read', 'unread', 'unread', 'unread', 'unread']

In [36]:
mark_all_read(email_data)

['read', 'read', 'read', 'read', 'read']

what if we want to condition that if the email is already read, we toggle back to unread?

we can use if statements !

also the great thing about functions is that you only need to write them once and u can reuse it anywhere as long as you created the method

In [41]:
def mark_unread(email_idx):
    email_data[email_idx] = 'unread'

## !Writing in pseudocode!

writing out functions in english words how you want to approach the solution is best practice!

So for example: say for the function "mark_opposite" we want to go through every data entry and mark each read email as unread and each unread email as read.

our pseudocode might look something like:

    for every index in the email list...

        if it is already read,

            mark as unread

        otherwise

            mark as read
            
    return edited list



In [38]:
def mark_opposite(data):
    for i in range(len(data)): # for every index from the range 0 to end of the list...
        if data[i] == 'read': # if it is already read
            mark_unread(i) # mark unread
        else: # otherwise
            mark_read(i) # mark as read
    return data # return the list

*see how the comments (#...blah...) follow the pseudocode!*

In [42]:
mark_opposite(email_data)

['unread', 'unread', 'unread', 'unread', 'unread']

In [44]:
mark_read(1)
email_data

['unread', 'read', 'unread', 'unread', 'unread']

In [45]:
mark_opposite(email_data)

['read', 'unread', 'read', 'read', 'read']

if statements can be very powerful tools!!


In [52]:
myFirstName = 'Jiyoo' # enter in your name.

if myFirstName.lower() == 'doug':
    print('my last name is Parada.')
elif myFirstName.lower() == 'kelly': # THE ELIF! its a concatenation of the word else if
    print('my last name is Zhen.')
elif myFirstName.lower() == 'krishna':
    print('my last name is Parekh.')
elif myFirstName.lower() == 'clara':
    print('my last name is Mangali.')
else:
    print(myFirstName+', I don\'t know who you are!')

Jiyoo, I don't know who you are!


Is there any way you could have made this code more efficient?

yes!

In [55]:
# ONE WAY:::
myFirstName = "Jiyoo"
fn_lower = myFirstName.lower()
last_name = ''
if fn_lower == 'doug':
    last_name = 'Parada'
elif fn_lower == 'kelly': # THE ELIF! its a concatenation of the word else if
    last_name = 'Zhen'
elif fn_lower == 'krishna':
    last_name ='Parekh.'
elif fn_lower == 'clara':
    last_name = 'Mangali.'


if last_name: # if last_name is not empty string, (ie. not none.)
    print("My last name is " + last_name)
else:
    print(myFirstName+', I don\'t know who you are!')
    
    

Jiyoo, I don't know who you are!


We can actuallly make this even MORE "black box" by using dictionaries!

dictionaries are very useful to create indexable lists:
we can index by words and numbers instead of just the location of the object.

more on dictionaries:

In [56]:

first_names = ['doug', 'kelly', 'krishna', 'clara', 'jiyoo', 'victoria']
last_names = ['parada', 'zhen', 'parekh', 'mangali', 'jeong', 'robinson']
first_and_last = {} # empty dictionary

# lets fill the dictionary with our names:

for i in range(len(first_names)):
    fn = first_names[i]
    ln = last_names[i]
    first_and_last[fn] = ln 
    print('iteration', i, ":", first_and_last)
    

iteration 0 : {'doug': 'parada'}
iteration 1 : {'doug': 'parada', 'kelly': 'zhen'}
iteration 2 : {'doug': 'parada', 'kelly': 'zhen', 'krishna': 'parekh'}
iteration 3 : {'doug': 'parada', 'kelly': 'zhen', 'krishna': 'parekh', 'clara': 'mangali'}
iteration 4 : {'doug': 'parada', 'kelly': 'zhen', 'krishna': 'parekh', 'clara': 'mangali', 'jiyoo': 'jeong'}
iteration 5 : {'doug': 'parada', 'kelly': 'zhen', 'krishna': 'parekh', 'clara': 'mangali', 'jiyoo': 'jeong', 'victoria': 'robinson'}


In [57]:

myFirstName = "Jiyoo"
fn_lower = myFirstName.lower()
keys = first_and_last.keys() # .keys() gives us the index values of the dictionary as a list.
print('keys:', keys)
if fn_lower in first_and_last.keys(): 
    print("my last name is", first_and_last[fn_lower])
else:
    print(myFirstName+', I don\'t know who you are!')
    

keys: dict_keys(['doug', 'kelly', 'krishna', 'clara', 'jiyoo', 'victoria'])
my last name is jeong


we can also simplify making a dictionary to just "zipping" two lists together

In [64]:
fnln_zip = zip(first_names, last_names)
print(list(fnln_zip))

[('doug', 'parada'), ('kelly', 'zhen'), ('krishna', 'parekh'), ('clara', 'mangali'), ('jiyoo', 'jeong'), ('victoria', 'robinson')]


In [63]:
dict(fnln_zip)

{'doug': 'parada',
 'kelly': 'zhen',
 'krishna': 'parekh',
 'clara': 'mangali',
 'jiyoo': 'jeong',
 'victoria': 'robinson'}

Dictionaries are the stepping stone to dataframes so good to get familiar with this!

## PART C. Mini Project


### building a simple spam bot!
an email is NEVER spam if ONE OF THE FOLLOWING IS TRUE:
- it is from one of our only two friends: americancultures@berkeley.edu or jiyoojeong@berkeley.edu
- it has "[OFFICIAL]" in the subject


(crudely) we'll consider an email spam if ANY OF THE FOLLOWING IS TRUE:
- the subject has any numbers or symbols. -- see the 're' library
- the subject has more than 100 characters.
- the subject any typos (spellcheck using spellchecker library.) https://towardsdatascience.com/textblob-spelling-correction-46321fc7f8b8 
- the email address is from outside @berkeley.edu

If the above is true, we will consider it 70% possibility that it is a spam email.
If the above is not true and is not a Never spam email, there is a 10% possibility it is a spam email.

Project Skeleton Code:

In [None]:
import sys
!{sys.executable} - m pip install spellchecker
!{sys.executable} - m pip install pickle

from spellchecker import SpellChecker # helps spellcheck.
import pickle # packaging library for the data :)
seed(123) # this sets the randomness to a fixed sequence of randomness (*ask me if ur curious)

In [None]:
# before doing anything... explore the data

#This is a LIST of dictionaries with the same keys:
#ie... [{email data 1}, {email data 2}, {email data 3}, ... etc.]
# each email data looks like: {'email':___, 'subject':____}

In [None]:
# build a function that checks if it is NEVER SPAM
def never_spam(email_dat):
    '''
    input: email_dat : a dictionary with email metadata.
    output: boolean : True if the email is never spam.
    '''
    
    sender_email = email_dat[...]
    subject = email_dat[...]
    if ... : # it is from one of our only two friends: americancultures@berkeley.edu or jiyoojeong@berkeley.edu
        return True
    elif ...: # it has "[OFFICIAL]" in the subject
        return True
    else:
        return False


In [None]:
# build a function that checks if it is spam
def maybe_spam(email_dat):
    '''
    input: email_dat : a dictionary with email metadata.
    output: boolean : True if the email has any of the following spam conditions.
    
    hint:: the spellchecker library is already imported for you. All you have to do is:
        
    '''
    
    sender_email = email_dat[...]
    subject = email_dat[...]
    
    if ... : #the subject has any numbers or symbols. -- see the 're' library's re.search
        return True
    elif ...: # the subject has more than 100 characters.
        return True
    
    # text blob set up
    spell = SpellChecker()
    misspelled = spell.unknown(...) # returns a list of typos in the string input.
    number_misspelled = len(...)
    if : # the subject any typos (spellcheck using textblob library.)
        return True
    elif ... : # the email address is from outside @berkeley.edu
        return True
    else:
        return False
    
     

Now we have two functions. One that tells us to mark an email as real regardless of other conditions, and one that tells us if it falls under any spam email conditions.

How should we use this to apply to filter through every email in our list?

In [None]:
status = []
for email in inbox: # for every email in our list
    if never_spam(...): # if it is never spam
        status.append('good')# mark as good
    elif maybe_spam(...):
        # use math.random to get a random decimal between 0 <= x < 1
        if math.random() <= .7:
            status.append('spam')
        else:
            status.append('good')
    else:
        if math.random() <= .1:
            status.append('spam')
        else:
            status.append('good')
        
        