# Jupyter Notebook

This is a Jupyter Notebook, which is a basically just a super fancy Python shell.

You may have "cells" that can either be text (like this one) or executable Python code. Notebooks are really nice because they allow you to rapidly develop Python code by writing small bits of code, testing their output, and moving on to the next bit; this interactive nature of the notebook is a huge plus to professional Python developers. 

It's also nice, because it's really easy to share your code with others and surround it with text to tell a story! 

# Colaboratory
Colaboratory is a service provided by Google to take a Jupyter Notebook (a standard formay of a `.ipynb` file) and let users edit/run the code in the notebook for free! 

This notebook is write-protected so you are not able to edit the  notebook that the whole class will look at, but you are able to open up the notebook in "playground mode" which lets you make edits to a temporary copy of the notebook. If you want to save the changes you made to this notebook, you will have to follow the instructions when you try to save to copy the notebook to your Google Drive. 

# Setup
Make sure you run the following cell before trying to run any the following cells. You do not need to understand what they are doing, it's just a way to make sure there is a file we want to use stored on the computer running this notebook.


In [7]:
import pandas as pd



In [2]:
import requests

def save_file(url, file_name):
  r = requests.get(url)
  with open(file_name, 'wb') as f:
    f.write(r.content)

save_file('https://courses.cs.washington.edu/courses/cse163/19sp/' +
          'files/lectures/04-08/bee-movie.txt', 'bee-movie.txt')
save_file('https://courses.cs.washington.edu/courses/cse163/19sp/' +
          'files/lectures/04-08/mobydick.txt', 'mobydick.txt')

ModuleNotFoundError: No module named 'requests'

# List
First we reviewed how to create and index into a list.

In [3]:
l = [1, 2, 3]
print(l)
print(l[1])

[1, 2, 3]
2


It's very common that you don't want to explicitly write out all of the numbers by hand, but rather compute them in a loop. We learned that there are many methods you can call on lists

Method            |  Description
-----------------------|----------------------------
list.append(x) | Adds x to the end
list.extend(xs) |Adds all elements in xs at the end
list.insert(i, x) | Inserts x at index i
list.remove(x) | Removes the first instance of x
list.pop([i]) | Removes the value at index i (default: last)
list.clear() | Removes all values
list.index(x) | Returns the index of the given value
list.reverse() | Reverses the elements
list.sort() | Sorts the elements

And we utilized `append` to build up a list


# [0,1,2,3,4,5]


QUEUE is a data structure that leds out the order == good for keeping the order 

pop() -> deletes the primary one

[D C B A ] --> x.pop()

Remove vs pop

remove: based on index

We can set the primary



In [10]:
from queue import PriorityQueue

In [24]:
numbers = []
for i in range(1, 11):
  if i % 2 == 0:
    numbers.append(i ** 2)
  
print(numbers)

[4, 16, 36, 64, 100]


This pattern of 
1. Looping over a sequence
2. Potentially filtering out certain values
3. Making some computation with the loop variable
4. Putting this computed value in a list

Is so common, Python provides nice syntax called a list comprehension to do these components in one line.

In [35]:
even_squares = [i ** 2 for i in range(1, 11) if i % 2 == 0]
print(even_squares)

[4, 16, 36, 64, 100]


In [38]:
#[x + 1 if x >= 45 else x+5 for x in xs]


[i ** 2 if i % 2 == 0 else i ** 3 for i in range(1,11)]

[1, 4, 27, 16, 125, 36, 343, 64, 729, 100]

In [27]:
even_square = []
for i in range(1,11):
    i = i**2
    if i % 2 == 0:
        even_square.append(i)
print(even_square)
        

[4, 16, 36, 64, 100]


You can use the "in" keyword to see if a value is in a structure like below

In [34]:
if 64 in even_squares:
  print('Found it!')

print("\n"+ str(64 in even_squares))

Found it!

True


# Text Data Analysis

We wanted to count up the number of unique words in a file. Our first attempt used lists to keep track of all the unique words we have seen so far

In [39]:
def count_unique(file_name):
  words = list()  # used this syntax to make a new list instead
  with open(file_name) as file:
    for line in file.readlines():
      for word in line.split():
        if word not in words:
          words.append(word)
  return len(words)

In [41]:
set([1,2,3,4,4,4,5])

#duplicate ones go away.



{1, 2, 3, 4, 5}

But found this took too long to run on large files! 

In [42]:
%%time #%%time way to see the run time of the cell 
print(count_unique('bee-movie.txt'))

4104
CPU times: user 200 ms, sys: 2.18 ms, total: 202 ms
Wall time: 201 ms


In [43]:
%%time
count_unique('mobydick.txt')

CPU times: user 11.2 s, sys: 8.69 ms, total: 11.2 s
Wall time: 11.2 s


32553

We discussed the algorithm was slowed down by the "in" check. For lists, to see if it contains an element, we have to start from the front and go all the way to the end. This can be very slow if it has to iterate over the list every single time for every word in the file.

We introduced the idea of a `set` and saw that it greatly improves the performance of our algorithm since it was written to be highly optimized for these membership queries. 

In [47]:
def count_unique(file_name):
  words = set() #only sort distinct value : improves the performance very much
  with open(file_name) as file:
    for line in file.readlines():
      for word in line.split():
        words.add(word)
  return len(words)

In [50]:
%%time
print(count_unique('bee-movie.txt')) #can see the major difference between speed : 200ms to 5.31 ms from using set

4104
CPU times: user 5.08 ms, sys: 729 µs, total: 5.81 ms
Wall time: 5.22 ms


In [51]:
%%time
count_unique('mobydick.txt')

CPU times: user 55.4 ms, sys: 3.32 ms, total: 58.7 ms
Wall time: 57.9 ms


32553

# Set

We also investigate some basic things about sets like how to create them, how to use a "set comprehension" much like a list comprehension, and show that you cannot index into a set since they don't have indices.

In [62]:
s = {1, 2 , 3, 2}
print(s)

{1, 2, 3}


In [63]:
list_evens = [x % 2 for x in range(20)]
print(list_evens)
set_evens = {x % 2 for x in range(20)}
print(set_evens)

[0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]
{0, 1}


In [65]:
set_evens[1]

#set doesn't have a index and it just keeps values

TypeError: 'set' object is not subscriptable

In [83]:
s = set()
s.add((1000,"b"))
s.add((1, "c"))

s.add((10, "c"))
s = list(s)

for i in s:
    print(i)
    
    
sorted(s, key=lambda i: [i[1],i[0]])


(1, 'c')
(10, 'c')
(1000, 'b')


[(1000, 'b'), (1, 'c'), (10, 'c')]

# Dictionaries

Our next big data structure is a generalization of a list that allows any type of data for the index (we call them keys). We saw an example below

In [None]:
d = dict()  # Makes an empty dictionary
d['a'] = 1  # Associates 1 to the key 'a'
d['b'] = 2  # Associated 2 to the key 'b'
d['z'] = 14
d['hello world'] = 5
print(d)

You can get the value for a particular key using the indexing syntax. If the key is not present, raises a `KeyError`

In [None]:
print(d['hello world'])  # Gets the value associated to that key

In [None]:
print(d['foo'])

Keys in a dictionary are unique. If you assign a key that already exists to a new value, it will re-associate that key to the new value rather than adding a second copy of that key in the dictionary. For example, we can change the mapping for 'b':

In [None]:
d['b'] = 1
print(d)

Then we saw one last example of building up a dictionary in a loop

In [2]:
words = ['I', 'saw', 'a', 'dog', 'today']

lengths = {}
for word in words:
  lengths[word] = len(word)
  
print(lengths)

{'I': 1, 'saw': 3, 'a': 1, 'dog': 3, 'today': 5}


## Advanced: Dictionary Comprehension
You might ask yourself, can I use a comprehension to make a dictionary? The answer is yes! It looks exactly the same as a set comprehension, but you specify key-value pairs.

In [3]:
lengths = {word: len(word) for word in words}