> **IMPORTANT:** Every week, you will be solving exercises in a Jupyter notebook that looks like this one. Because you are cloning a Github repository that only I can push to, you should **NOT EDIT** any of the files you pull from Github. Instead, what you should do, is either make a new notebook and write your solutions in there, or make a copy of this notebook and **save it somewhere else** on your computer, not inside the `caobd_s19` folder that you cloned, so you can write your answers in there. **If you don't follow this advice your solutions may be overwritten and lost**.

# Week 1: Coding with data in Python

We start out with the basics. The exercises today cover:

* Writing Python code and Markdown in Jupyter notebooks
* Introductory Python
* Getting some data from Reddit

**Feedback:** I'm always trying to improve. If you find errors or have concerns you can voice them safely and anonymously at https://ulfaslak.com/vent. You can also send me an email at ulfaslak@gmail.com or talk to me in class. I care about everything you have to say.

## Exercises

### Part 1: Know thy notebook

This document is what we call a *Jupyter notebook*. We will be using these extensively throughout the course so **READ THIS CLOSELY**. There are two basic things you need to know about Jupyter notebooks:

1. A notebook is nothing but a list of cells. A cell can either be a **code cell** or a **Markdown cell**. Code cells are for writing executable code, and Markdown cells (like this one) are for explaining things in text and making your notebook more readable. A typical workflow that you will soon get use to, is something like: solving a problem with some code in a *code cell* and explaining your reasoning or the results you obtained in a *Markdown cell*. You can toggle cell type when you are in *command mode* by pressing <kbd>y</kbd> for code and <kbd>m</kbd> for Markdown. **Try to do that**. Change this Markdown cell to a code cell, and change it back again. What happens if you execute (<kbd>shift</kbd>+<kbd>enter</kbd>) this cell as code cell, compared to when it is a Markdown cell?

2. The notebook has two *modes*: **edit mode** and **command mode**. You enter command mode by pressing <kbd>esc</kbd> or clicking outside a cell, and edit mode by clicking a cell and pressing <kbd>enter</kbd> or double clicking a cell. When you're in edit mode, the outline of the current cell turns green (not with `jupyter lab`, though, there the bar is always blue)and whatever you type into your keyboard goes into that cell, whether it is a code or Markdown cell. [Here](http://maxmelnick.com/2016/04/19/python-beginner-tips-and-tricks.html)'s a nice rundown of the different commands you can use. **Beware of <kbd>x</kbd> and <kbd>d</kbd>**. Read the full list of hotkeys by pressing <kbd>h</kbd> in command mode to figure out why.

>*Heads up:* Because we'll be using Jupyter notebooks so much in this course, I strongly recommend investing 5 minutes more than you would normally, playing around with cell types, modes and hotkeys. It will save you heaps of time down the road.

When you run a code cell by pressing <kbd>shift</kbd> + <kbd>enter</kbd>, the code gets evaluated by the Python interpreter installed on your computer. The interpreter always returns some output, so unless you store it in a variable, it gets printed below the cell. In general, you will use code cells for doing analysis and working with data.

*Markdown* is a simple markup language for formatting text (similar to *HTML* or $\LaTeX$, which you may know). You will typically use it for writing explanations about how you solve the exercises and the results you get, and styling your notebook with sections and subsections. It can do **bold**, *italics* and $\LaTeX$ formatting (for equations), and much much more. You can read about the Markdown language [here](http://daringfireball.net/projects/markdown/).

Below is your first exercise. The exercise are numbered by the convention `[session]`.`[section]`.`[problem]`.`[subproblem]`. For example, exercise 4.2.3.1 is in week 4, section 2, problem 3, and subproblem 1.

>**Ex. 1.0.1**: In the Markdown cell below, write a short text that shows that you can:
>* Create sections
>* Write words in bold and italics
>* Write an equation in LateX formatting
>* Create bullet lists
>* Create [hyperlinks](https://en.wikipedia.org/wiki/Hyperlink)

>*Hint: Remember to execute the cell (<kbd>shift</kbd>+<kbd>enter</kbd>) so the Markdown gets rendered.*

[Answer to Ex. 1.0.1]

*This is a sentence in italics.*

**This is a sentence in bold.**

>This is a brand new section. 

>Here is a new section with a list of my favorite rappers
+ Kanye West
+ Childish Gambino
+ Earl Sweatshirt


\\( a^2 = b^2 + Gatorade \\)

>Watch this video: [Jorja Smith - A Colors Show](https://www.youtube.com/watch?v=fYwRsJAPfec)



### Part 2: Essential Python (DSFS Chapter 2)

These exercises take you through some very basic Python functionality. Use them to calibrate your expectations: If you find them hard, you must spend some more time getting up to speed (see the [preperation goals](https://canvas.disabroad.org/courses/2500/pages/sessions) for today's session).

>**Ex. 1.1.1**: Create a list `a` that contains the numbers from $1$ to $1110$ (including $1$ and $1110$), incremented by one, using the `range` function.

In [3]:
# [Answer to Ex. 1.1.1]
my_list = list(range(1,1111))

>**Ex. 1.1.2**: Show that you understand [slicing](http://stackoverflow.com/questions/509211/explain-pythons-slice-notation) in Python by extracting a list `b` with the numbers from $760$ to $769$ (including both) from the list created above.

In [6]:
# [Answer to Ex. 1.1.2]
s = slice(759, 769)
b = my_list[s]
print(b)
c = [[760, 761],[762, 763, 764][765, 766, 767, 768, 769]]
print(c[:,0])

[760, 761, 762, 763, 764, 765, 766, 767, 768, 769]


TypeError: list indices must be integers or slices, not tuple

>**Ex. 1.1.3**: Define a function that takes as input a number $x$ and outputs the number multiplied by itself plus three $f(x) = x(x+3)$. 

In [12]:
# [Answer to Ex. 1.1.3]
def quickMaths(number):
    return number*(number+3)
    
print(quickMaths(5))

40


>**Ex. 1.1.4**: Apply this function to every element of the list `b` using a `for` loop and append the results to a new list `c`. Print `c`.

In [14]:
# [Answer to Ex. 1.1.4]
c = []
for number in b:
    c.append(quickMaths(number))
print(c)

[579880, 581404, 582930, 584458, 585988, 587520, 589054, 590590, 592128, 593668]


>**Ex. 1.1.5**: Do the exact same thing using a *list comprehension*.

In [16]:
# [Answer to Ex. 1.1.5]
comp_list = [quickMaths(i) for i in b]
print(comp_list)

[579880, 581404, 582930, 584458, 585988, 587520, 589054, 590590, 592128, 593668]


>**Ex. 1.1.6**: Write the numbers in `c` to a text file with one number per line.

In [19]:
# [Answer to Ex. 1.1.6]
f = open("myfile.txt", "w")
for number in c:
    f.write(str(number))

>**Ex. 1.1.7**: Show that you understand how strings work in Python. You should:
>
>1. Add a comment above each line of code that explains it.
>2. Find all the lines where **a string** is put into a string. How many are there?
>3. Explain the difference between `%d`, `%s` and `%r`.
>
>[Source](https://learnpythonthehardway.org/book/ex6.html)

In [30]:
# This is an example of a comment
# the modulo d is basically a placeholder that allows for insertion of a number variable into the string
x = "There are %d types of people." % 10
# two string declarations
binary = "binary"
do_not = "don't"
# Same as placeholder above, except %s allows you to insert strings instead of numbers. Parentheses used to insert multiple variables
# 1
y = "Those who know %s and those who %s." % (binary, do_not)

print(x)
print(y)


print("I said: %r." % x)
#2
print("I also said: '%s'." % y)

#initializes boolean variable
hilarious = False
#initialize string variable with placeholder for string representation of variable at %r
joke_evaluation = "Isn't that joke so funny?! %r"

#inserts hilarious variable into the print statement
print(joke_evaluation % hilarious)

#initializing two new string variables
w = "This is the left side of..."
e = "a string with a right side."

#printing w with e appended onto it
print(w + e)



There are 10 types of people.
Those who know binary and those who don't.
I said: 'There are 10 types of people.'.
I also said: 'Those who know binary and those who don't.'.
Isn't that joke so funny?! False
This is the left side of...a string with a right side.
there are two instances where a string is put into a string


[Answer to Ex. 1.1.7.2]
there are two instances where a string is put into a string

[Answer to Ex. 1.1.7.3]
%r returns a printed representation of an object with repr(), while %s converts the variable with str() and %d is a placeholder which converts to a number value

>**Ex. 1.1.8**: Why does `5 // 2 == 2` in Python 3.7? How is division different between Python 2 and 3?

In [33]:
# [Answer to Ex. 1.1.8]
print(5 // 2)
print(5 / 2)
print("Python performs true division by default, even if both numbers are integers. So we must use integer division notation if we don't want the decimal value")

2
2.5
Python performs true division by default, even if both numbers are integers. So we must use integer division notation if we don't want the decimal value


>**Ex. 1.1.9**: What is the point of using `try` and `except`? Write some code that shows how to use these.

In [34]:
# [Answer to Ex. 1.1.9]
try:
    f.open("smack.txt")
except:
    print("that didn't work")

that didn't work


>**Ex 1.1.10**: Last week you learned about `dict`s, so now you're ready for `defaultdict`s.
1. What is a `defaultdict`? How would you say it is different from a normal Python `dict`?
2. Write some code that takes a list of tuples:

>        l = [("a", 1), ("b", 3), ("a", None), ("c", False), ("b", True), ("a", None)]

>     And produces a `defaultdict` object

>        defaultdict(<type 'list'>, {'a': [1, None, None], 'c': [False], 'b': [3, True]})

>*Hint: you can import `defaultdict` from `collections`*

In [98]:
# [Answer to Ex. 1.1.10]
from collections import defaultdict
# main difference is that defaultdict will never throw a KeyError
# because it is initialized with a default function that takes no arguments and provides a value for a nonexistent key
l = [("a", 1), ("b", 3), ("a", None), ("c", False), ("b", True), ("a", None)]
swag = defaultdict(list)
print(swag)
for k,val in l:
    swag[k].append(val)

print(swag)




defaultdict(<class 'list'>, {})
defaultdict(<class 'list'>, {'a': [1, None, None], 'b': [3, True], 'c': [False]})


# >**Ex 1.1.11**: Take a list `a = list("justreadtheinstructions")` and
1. count the number of times each element occurs using `Counter`,
2. report the two most common elements

>*Hint: you can import `Counter` from `collections`*

In [107]:
# [Answer to Ex. 1.1.11]
from collections import Counter

a = list("justreadtheinstructions")

print(Counter(a).most_common())
ls = Counter(a).most_common(2)
print("these are the two most common: " + str(ls))

[('t', 4), ('s', 3), ('u', 2), ('r', 2), ('e', 2), ('i', 2), ('n', 2), ('j', 1), ('a', 1), ('d', 1), ('h', 1), ('c', 1), ('o', 1)]
these are the two most common: [('t', 4), ('s', 3)]


>**Ex 1.1.12**: Take another list `b = list("ofcourseistillloveyou")` and
1. get the `set` of characters that exist in both `a` and `b` (intersection),
2. get the `set` of characters that exist in either `a` or `b` (union), and
3. compute the [Jaccard similarity](https://en.wikipedia.org/wiki/Jaccard_index) between the distinct elements in `a` and `b`.

>*Hint: use the `set` function to get a `set`-type object of distinct elements from a list*

In [112]:
# [Answer to Ex. 1.1.12]
b = list("ofcourseistillloveyou")
b_set = set(b)
a_set = set(a)
inter = a_set.intersection(b_set)
union = (a_set.union(b_set))
jac = len(inter) / len(union)
print(inter)
print(union)
print(jac)

{'c', 't', 'i', 's', 'r', 'u', 'e', 'o'}
{'y', 'h', 'f', 'c', 'd', 'a', 'r', 'e', 'l', 'o', 'j', 'v', 't', 'i', 's', 'n', 'u'}
0.47058823529411764


### Part 3: A little bit of real data

>**Ex. 1.2.1**: Learn about JSON by reading the **[wikipedia page](https://en.wikipedia.org/wiki/JSON)**. Then answer the following questions in the cell below. 
>
>1. What do the letters stand for?
>2. What is `json`?
>3. Why is `json` superior to `xml`? (... or why not?)

[Answers to Ex. 1.2.1.1-3]

1. JavaScript Object Notation
2. It's a method of storing data with key-value pairs. Keys often store arrays of more key-value pairs. It's nice to work with.
3. JSON is easier to read and faster to parse because it contains no tags. JSON takes fewer characters to represent the same data. Biggest thing for convenience is that JSON can be parsed with a normal javascript function, while XML requires an XML parser. 

>**Ex. 1.2.2**: Working with JSON files
>1. Use [`requests`](https://www.google.dk/search?q=python+requests+get+json&gws_rd=cr&ei=M5OdWaewD8Ti6AS54J24Bg), or another Python module, to store **[this data](https://www.reddit.com/r/gameofthrones/.json)** in a new variable `data`.
>2. What is the [type](https://stackoverflow.com/questions/2225038/determine-the-type-of-an-object) of `data`?

In [166]:
# [Answer to Ex. 1.2.2.1]
import requests
import string
headers = {
    'User-Agent': 'Nate Uses Agents',
    'From': 'nshirley22@gmail.com'
}
data = requests.get('https://www.reddit.com/r/gameofthrones/.json', data = {'key':'value'}, headers = headers)
#print(data.json())
print(data is dict)
#print(data.text)



False


# [Answer to Ex. 1.2.2.2]
It's a response object, but data is JSON formatted. 

>**Ex. 1.2.3**: Let's try to inspect the data you retrieved. 
>
>1. Use the `json` module to print your data variable as a string with `indent=4`.
>2. The data is a dictionary, a type of Python object that stores data as key-value pairs. Print the keys.
>
>*Hint: 1. Use the `json` function `dumps`. 2. Call `.keys()` on the variable.*

In [173]:
# [Answer to Ex. 1.2.3.1]
import json
parsed = json.loads(data.text)
print(json.dumps(parsed, indent=4))


{
    "kind": "Listing",
    "data": {
        "modhash": "",
        "dist": 27,
        "children": [
            {
                "kind": "t3",
                "data": {
                    "approved_at_utc": null,
                    "subreddit": "gameofthrones",
                    "selftext": "",
                    "author_fullname": "t2_xw2hq",
                    "saved": false,
                    "mod_reason_title": null,
                    "gilded": 0,
                    "clicked": false,
                    "title": "[SPOILERS] Game of Thrones | Season 8 | Official Tease: Crypts of Winterfell (HBO)",
                    "link_flair_richtext": [
                        {
                            "e": "text",
                            "t": "News"
                        }
                    ],
                    "subreddit_name_prefixed": "r/gameofthrones",
                    "hidden": false,
                    "pwls": 6,
                    "link_flair_css_class

In [177]:
# [Answer to Ex. 1.2.3.2]
print(type(parsed))

<class 'dict'>


>**Ex. 1.2.4**: The URL reveals that the data is from reddit/r/gameofthrones, but can you recover that information from the data? Give your answer by 'keying' into the JSON data using square brackets.

>*Hint: 'Keying' is a word i just made up. By it, I mean the following. Consider a JSON object such as:*
>
>        my_json_obj = {
>            'cats': {
>                'awesome': ['Missy'],
>                'useless': ['Kim', 'Frank', 'Sandy']
>            },
>            'dogs': {
>                'awesome': ['Finn', 'Dolores', 'Fido', 'Casper'],
>                'useless': []
>            }
>        }
>
>*I can get the list of useless cats by keying into `my_json_obj` like such:*
>
>        >>> my_json_obj['cats']['useless']
>        Out [ ]: ['Kim', 'Frank', 'Sandy']
>
>*`my_json_obj['cats']` returns the dictionary `{'awesome': ['Missy'], 'useless': ['Kim', 'Frank', 'Sandy']}` and getting '`useless`' from that eventually gives us `['Kim', 'Frank', 'Sandy']`. If any of those list items were a list of a dictionary themselves, we could have kept keying deeper into the structure.*

In [181]:
# [Answer to Ex. 1.2.4]
print(parsed['data']['children'][0]['data']['subreddit_name_prefixed'])

r/gameofthrones


>**Ex 1.2.5**: Write two `for` loops (or list comprehensions for extra street credits) which:
>1. Counts the number of spoilers.
>2. Only prints headlines that aren't spoilers.

In [1]:
# [Answer to Ex. 1.2.5.1]
tmp = parsed['data']['children']
#[0]['data']['title']
counter = 0;
for entry in tmp:
    title = entry['data']['title']
    if(title.__contains__('[SPOILERS]') or title.__contains__('[Spoilers]')):
        counter += 1
print("there are " + str(counter) + " spoilers")


NameError: name 'parsed' is not defined

In [206]:
# [Answer to Ex. 1.2.5.2]
for entry in tmp:
    title = entry['data']['title']
    if not(title.__contains__('[SPOILERS]')or title.__contains__('[Spoilers]')):
        print(title)

[NO SPOILERS] Petition for my best friends fiancé with colon cancer to see the final season early.
[NO SPOILERS] Daenerys inspired hair for my wedding day!
[NO SPOILERS] How to talk to girls.
[No Spoilers] Dear GRRM &amp; D+D. Thank you for... Valar Morgulis????
[No spoilers] GOT risk while repeating the series!
[NO SPOILERS] Drowned God — a little drawing a did a bit ago with ink.
[NO SPOILERS] My girlfriend drew this for my birthday! All freehanded!
[NO SPOILERS] The wife with the (real) Khal Drogo’s Arakh.
[NO SPOILERS] - I drew Sansa Stark and felt like sharing it here :)
[No Spoilers] Game of Thrones Theme performed on a Handpan
[No Spoilers] “The Hound” for an upcoming GOT themed art show I’m in.
[NO SPOILERS] Sketch of Arya Stark on my wall from 2 years ago
[NO SPOILERS] Question about the Nights Watch.
[NO SPOILERS] Game Of Thrones Cover | ONE STRING guitar! | Igor Presnyakov
[No Spoilers] I drew Tyrion Lannister
[No Spoilers] My cello cover of the Rains of Castamere :)
[No Spo