## Underscores

In truth we won't worry much about underscores, though I wanted to provide a bit of info on their use Python, specifically in relation to the double underscored attribuets and methods we saw running the dir() function.

#### Double Underscore
* these refer to "magic methods", meaning python uses them behind the scenes and in most cases we don't generally want to alter or change how they function it.  think of them as python setting up some nuts and bolts to make your programs run
* for instance lists and integers (among other datatypes) can be added together using the + symbol or \_\_add\_\_.  Though using the symbol is far easier.  In this case, when we do list + list, the + sign tells python to call the \_\_add\_\_ method.
* \_\_init\_\_ is the initialization method for an object, which will will use, but not alter the functionality of
* \_\_iter\_\_ is used to return an iteration, so this is a method that iterators make use of
* \_\_new\_\_ is called to make a new object and used by the \_\_init\_\_ method
* for brevity I will leave it there, but these aren't something we'll need to be altering the functionality

In [55]:
a = [1,2]
b = [3,4]
a + b

[1, 2, 3, 4]

In [56]:
a.__add__(b)

[1, 2, 3, 4]

to the second point, there isn't a great way to tell the difference between methods and attributes in dir(), though we can make use of the getmembers function from inspect.  we can see that methods like extend, append etc. are listed as functions

In [57]:
from inspect import getmembers

In [58]:
getmembers(lst)

[('__add__', <method-wrapper '__add__' of list object at 0x7fe0b8c1f6e0>),
 ('__class__', list),
 ('__contains__',
  <method-wrapper '__contains__' of list object at 0x7fe0b8c1f6e0>),
 ('__delattr__',
  <method-wrapper '__delattr__' of list object at 0x7fe0b8c1f6e0>),
 ('__delitem__',
  <method-wrapper '__delitem__' of list object at 0x7fe0b8c1f6e0>),
 ('__dir__', <function list.__dir__()>),
 ('__doc__',
  'Built-in mutable sequence.\n\nIf no argument is given, the constructor creates a new empty list.\nThe argument must be an iterable if specified.'),
 ('__eq__', <method-wrapper '__eq__' of list object at 0x7fe0b8c1f6e0>),
 ('__format__', <function list.__format__(format_spec, /)>),
 ('__ge__', <method-wrapper '__ge__' of list object at 0x7fe0b8c1f6e0>),
 ('__getattribute__',
  <method-wrapper '__getattribute__' of list object at 0x7fe0b8c1f6e0>),
 ('__getitem__', <function list.__getitem__>),
 ('__gt__', <method-wrapper '__gt__' of list object at 0x7fe0b8c1f6e0>),
 ('__hash__', None)

if we make a quick dummy class (we'll cover this in lecture 3), and give it an attribuet of name, we can see the tuple of attribuet name and value using the getmembers function.

In [59]:
class Dog:
    def __init__(self, name):
        self.name = name
        
d = Dog(name = "Brian")
getmembers(d)

[('__class__', __main__.Dog),
 ('__delattr__',
  <method-wrapper '__delattr__' of Dog object at 0x7fe08823d610>),
 ('__dict__', {'name': 'Brian'}),
 ('__dir__', <function Dog.__dir__()>),
 ('__doc__', None),
 ('__eq__', <method-wrapper '__eq__' of Dog object at 0x7fe08823d610>),
 ('__format__', <function Dog.__format__(format_spec, /)>),
 ('__ge__', <method-wrapper '__ge__' of Dog object at 0x7fe08823d610>),
 ('__getattribute__',
  <method-wrapper '__getattribute__' of Dog object at 0x7fe08823d610>),
 ('__gt__', <method-wrapper '__gt__' of Dog object at 0x7fe08823d610>),
 ('__hash__', <method-wrapper '__hash__' of Dog object at 0x7fe08823d610>),
 ('__init__',
  <bound method Dog.__init__ of <__main__.Dog object at 0x7fe08823d610>>),
 ('__init_subclass__', <function Dog.__init_subclass__>),
 ('__le__', <method-wrapper '__le__' of Dog object at 0x7fe08823d610>),
 ('__lt__', <method-wrapper '__lt__' of Dog object at 0x7fe08823d610>),
 ('__module__', '__main__'),
 ('__ne__', <method-wrappe

The underscore is also used in python from a formatting perspective, in some cases to mean "throwaway."  Below we are saying we don't care about 2, or the middle value.  the variable is technically declared but it's pythonic to see this as, it's not going to be relevant info to our program

In [152]:
x, _, y = (1, 2, 3)
print(x)
print(_)
print(y)

1
2
3


while the _ is being used as the iteration variable,  it's pythonic to mean we don't actually care about the values in the range(10)

In [146]:
for _ in range(10):
    print(_)

0
1
2
3
4
5
6
7
8
9


In [1]:
# for _ in range(10):
    # do something 10 times, but we don't  need the 1,2,3,4, etc.

the same holds here, we as we are saying, we don't really care about what the function is returning

In [148]:
def my_func():
    return 2

In [151]:
_  = my_func()

if you want to get rather in depth, you can look up name mangling in python, which is another use of underscores when making classes, but something outside the scope of this class

## pass vs. continue

remember continue will throw us to the next iteration task of the current interation

In [5]:
a = [1,[3,4]]
c_outer = 0

for i in a:
    
    # this just prints the first element of our list
    # initially the value is 1
    print("-"*75) # divide up our outer loop printing
    print("First Loop:{}".format(i))
    
    # since the first value of the list a is a number
    # this will be false and we will go to the else which is a pass
    # so we increment the counter and print the last statement 
    # but when i = [3,4] we drop into the second loop
    if isinstance(i, list):
        
        # consider we are on element 2 where
        # i = [3,4]
        for i2 in i:
            # initially i2 is 3
            # so we continue
            # this will go back up to the line
            # for i2 in i (the current loop we are in)
            # and finish the iteration of [3,4]
            # notice that the continue here
            # doesen't put us back to for i in a
            # it puts us back to the current iteration loop
            # we can see this because c_outer is incremented and printed
            if i2 == 3:
                continue
            else:
                print("Second Loop:{}".format(i2))
            
    # in this case it's a continue
    # so notice the next piece of the code is not executed
    else:
        pass
        
    c_outer+=1
    print("Outer loop iteration count:{}".format(c_outer))

---------------------------------------------------------------------------
First Loop:1
Outer loop iteration count:1
---------------------------------------------------------------------------
First Loop:[3, 4]
Second Loop:4
Outer loop iteration count:2


if we change the bottom else to continue, we see for the first iteration loop, we skip the c_outer incrementing and skip the printing of that value

In [4]:
a = [1,[3,4]]
c_outer = 0

for i in a:
    
    # this just prints the first element of our list
    # initially the value is 1
    print("-"*75)
    print("First Loop:{}".format(i))
    
    # since the first value of the list a is a number
    # this will be false and we will go to the else which is a pass
    # so we increment the counter and print the last statement 
    # but when i = [3,4] we drop into the second loop
    if isinstance(i, list):
        
        # consider we are on element 2 where
        # i = [3,4]
        for i2 in i:
            # initially i2 is 3
            # so we continue
            # this will go back up to the line
            # for i2 in i
            # and finish the iteration of [3,4]
            # notice that the continue here
            # doesen't put us back to for i in a
            # it puts us back to the current iteration loop
            # most nested or lowest level
            if i2 == 3:
                continue
            else:
                print("Second Loop:{}".format(i2))
            
    # in this case it's a continue
    # so notice the next piece of the code is not executed
    
    # if we change this to continue, when i initially is 1
    # the loop will shoot back up to for i in a and continue
    # missine the bottom  print and counter increment
    # so in this sence, we only partially complete the outer loop
    else:
        continue
        
    c_outer+=1
    print("Outer loop iteration count:{}".format(c_outer))
    

---------------------------------------------------------------------------
First Loop:1
---------------------------------------------------------------------------
First Loop:[3, 4]
Second Loop:4
Outer loop iteration count:1


control flow of a loop can be tricky.  the best way to learn it will be to write some loops and print output so you can visually trace what is happening

## quick overview of comprehension
these are just concise and speed efficient ways to right for loops

In [7]:
lst = []
for i in range(10):
    lst.append(i)
lst

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

notice to wording is basically the same, just ordered a bit differently and put on one line

In [8]:
# as a comprehension
lst = [i for i in range(10)]

we are compacking the for loop syntax into brackets (meaning we are using comprehension to make a list).  we could also use comprehensions to make a tuple or dictionary.  simply put, comprehensions let us compact our loop code and run more efficiently.  these will show up often as you continue to write and read python code

In [16]:
# we can use other comprehensions with othere iterablees
# like tuples or lists
tup = tuple(i for i in range(10))
print(tup)

dct = dict((idx,i+1) for idx,i in enumerate(range(10)))
print(dct)

s = set(i for i in range(10))
print(s)

(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
{0: 1, 1: 2, 2: 3, 3: 4, 4: 5, 5: 6, 6: 7, 7: 8, 8: 9, 9: 10}
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}


we can also apply functions with comprehensions

In [15]:
lst = [-1,-2,-3,-4]
a = [abs(x) for x in lst]
a

[1, 2, 3, 4]

these techniques above, will make up the majority of coding within assignment 1.  we will cover them far more in our second lecture, but for those that want to get started ahead of time, the above should help

## map, comprehensions, for loops and time
* as we discussed runtime and concise code matters
* most likely, there is going to be runtime gains when using comprehensions and map vs. loops
* not to mention  you get the concise code with comprehension
* Once you learn comprehensions, you will not bother thinking about using map much, as comprehensions are rather simple and efficient to run
* plus, they are more pythonic
* but the concept of mapping will come up later on using the multiprocessing library so it is important to understand the idea of "mapping" a function to a collection of data
* the below prints out the seconds needed to run each chunk of code

In [19]:
import time

In [45]:
lst = list(range(10000000))
start = time.time()
a = list(map(abs,lst))
end = time.time()
print(end - start)

0.24928903579711914


In [46]:
lst = list(range(10000000))
b = []
start = time.time()
for i in lst:
    b.append(abs(i))
end = time.time()
print(end - start)

1.2070269584655762


In [47]:
lst = list(range(10000000))
start = time.time()
for idx,i in enumerate(lst):
    lst[idx]= abs(i)
end = time.time()
print(end - start)

1.4653120040893555


In [48]:
# comprehension
start = time.time()
b = [abs(x) for x in list(range(10000000))]
end = time.time()
print(end - start)

0.7413120269775391


with a udf

In [50]:
def my_func(x):
    return abs(x)

In [51]:
lst = list(range(10000000))
start = time.time()
a = list(map(my_func,lst))
end = time.time()
print(end - start)

0.8321022987365723


In [52]:
lst = list(range(10000000))
b = []
start = time.time()
for i in lst:
    b.append(my_func(i))
end = time.time()
print(end - start)

1.619018793106079


In [53]:
lst = list(range(10000000))
start = time.time()
for idx,i in enumerate(lst):
    lst[idx]= my_func(i)
end = time.time()
print(end - start)

1.9668538570404053


In [54]:
start = time.time()
b = [my_func(x) for x in list(range(10000000))]
end = time.time()
print(end - start)

1.2628250122070312


once you start working with larger datasets and writing more in depth programs, timing steps and keeping track of runtimes will be common.  as will be trying to find the most efficient way to write your code from a runtime and concise standpoint.  the above is a pretty simple example, but with a more complicated function or more complicated dataset, these functions may scale differently

## dictionary searches
* dictionaries aren't generally used to do nested searching
* there is some nice functionality in pandas to do json_normalization, which will blow out the  json into  a dataframe, but we will ignore that for illustrative purposes

In [105]:
# if we know our keys, we might be making use of a dictionary like this, in a lookup fashion
lookup = {
    "item_a":"doritos",
    "item_b":"lays"
}

for i in ["item_a", "item_b", "item_b"]:
    print(lookup[i])

doritos
lays
lays


to do some custom searching we can use a for loop or we can use list comprehension

In [110]:
my_dict = {
    
    1:{"first_name":"Brian"},
    2:{"first_name":"Jane", "last_name":"Doe"},
    3:{"first_name":"John", "last_name":"Doe2"}
}

In [114]:
# this does let us search that nested dictionary

for k,v in my_dict.items():
    # this is  saying if "last_name" is a key in our dictionary
    # which in this case, v is one of those nested dictionaries
    # and the last_name value is Doe, then print the k,v
    if "last_name" in v and v["last_name"] == "Doe":     
        print(k,v)

2 {'first_name': 'Jane', 'last_name': 'Doe'}


in this case comprehensions aren't as concise, but we can solve this writting a custom UDF and then throwing it in a comprehension

In [97]:
[k for k,v in my_dict.items()]

[1, 2, 3]

In [98]:
[v for k,v in my_dict.items()]

[{'first_name': 'Brian'},
 {'first_name': 'Jane', 'last_name': 'Doe'},
 {'first_name': 'John', 'last_name': 'Doe2'}]

In [160]:
def my_func(k,v,search_term):
    if "last_name" in v and v["last_name"] == search_term:
        return (k,v)
    else:
        return False

In [161]:
search = "Doe"
[my_func(k,v,search) for k,v in my_dict.items()]

[False, (2, {'first_name': 'Jane', 'last_name': 'Doe'}), False]

In [162]:
# the filter(None, iterable) filters False, None, empty items
search = "Doe"
list(filter(None, [my_func(k,v,search) for k,v in my_dict.items()]))

[(2, {'first_name': 'Jane', 'last_name': 'Doe'})]