> **IMPORTANT:** Every week, you will be solving exercises in a Jupyter notebook that looks like this one. Because you are cloning a Github repository that only I can push to, you should **NOT EDIT** any of the files you pull from Github. Instead, what you should do, is either make a new notebook and write your solutions in there, or make a copy of this notebook and **save it somewhere else** on your computer, not inside the `practical_data_science` folder that you cloned, so you can write your answers in there. **If you don't follow this advice your solutions may be overwritten and lost**.

# Week 1: Coding with data in Python

We start out with the basics. The exercises in this session cover:

* Writing Python code and Markdown in Jupyter notebooks
* Introductory Python
* Getting some data from Reddit

## Exercises

### Part 1: Know thy notebook

This document is what we call a *Jupyter notebook*. We will be using these extensively throughout the course so **READ THIS CLOSELY**. If you understand how notebooks work, you will save yourself lots of time and frustration throughout this course!

There are two basic things you need to know about Jupyter notebooks:

1. A notebook is nothing but a list of cells. A cell can either be a **code cell** or a **Markdown cell**. Code cells are for writing executable code, and Markdown cells (like this one) are for explaining things in text and making your notebook more readable. A typical workflow that you will soon get used to, is something like: solving a problem with some code in a *code cell* and explaining your reasoning or the results you obtained in a *Markdown cell*. You can toggle cell type when you are in *command mode* by pressing <kbd>y</kbd> for code and <kbd>m</kbd> for Markdown. **Try to do that**. Change this *Markdown* cell to a *code* cell, and change it back again. What happens if you execute (<kbd>shift</kbd>+<kbd>enter</kbd>) when this cell is a code cell, compared to when it is a Markdown cell?

2. The notebook has two *modes*: **edit mode** and **command mode**. You enter command mode by pressing <kbd>esc</kbd> or clicking outside a cell, and edit mode by clicking a cell and pressing <kbd>enter</kbd> or double clicking a cell. When you're in edit mode, the left border of the current cell turns green (not with `jupyter lab`, though, there the bar is always blue) and whatever you type into your keyboard goes into that cell, whether it is a code or Markdown cell. [Here](http://maxmelnick.com/2016/04/19/python-beginner-tips-and-tricks.html)'s a nice rundown of the different commands you can use. **Beware of <kbd>x</kbd> and <kbd>dd</kbd>**. Read the full list of hotkeys by pressing <kbd>h</kbd> in command mode to figure out why.

>*Heads up:* Because we'll be using Jupyter notebooks so much in this course, I strongly recommend investing 5 more minutes playing around with cell types, modes and hotkeys. It will save you heaps of time down the road. Above all, make sure you have read and understood these ^ two points!

When you run a code cell by pressing <kbd>shift</kbd> + <kbd>enter</kbd>, the code gets evaluated by the Python interpreter installed on your computer. The interpreter always returns some output, so unless you store it in a variable, it gets printed below the cell. In general, you will use code cells for doing analysis and working with data.

*Markdown* is a simple markup language for formatting text (like *HTML* or $\LaTeX$). You will typically use it for writing explanations about how you solve the exercises and the results you get, and styling your notebook with sections and subsections. It can do **bold**, *italics* and $\LaTeX$ formatting (for equations), and much much more. You can read about the Markdown language [here](http://daringfireball.net/projects/markdown/).

Below is your first exercise. The exercise are numbered by the convention `[session]`.`[section]`.`[problem]`.`[subproblem]`. For example, exercise 4.2.3.1 is in week 4, section 2, problem 3, and subproblem 1.

### Part 2: Essential Python ([DSFS](https://www.oreilly.com/library/view/data-science-from/9781492041122/) Chapter 2)

>**Ex. 1.0.1**: In the Markdown cell below, write a short text that shows that you can:
>* Create sections
>* Write words in bold and italics
>* Write an equation in LateX formatting
>* Create bullet lists
>* Create [hyperlinks](https://en.wikipedia.org/wiki/Hyperlink)

>*Hint: Remember to execute the cell (<kbd>shift</kbd>+<kbd>enter</kbd>) so the Markdown gets rendered.*

[Answer to Ex. 1.0.1] (next 5 rows)

This is a new section.

<b> BOLDTEXT </b>

$1 + 1 = 2$

<ul>
    <li>Apple</li>
    <li>Orange</li>
    <li>Watermelon</li>
<ul>

[YouTube](https://www.youtube.com/)

These exercises take you through some very basic Python. Use them to calibrate your expectations: If you find them hard, you must spend some more time getting up to speed (see the preparation goals for today's session on Canvas).

>**Ex. 1.1.1**: Create a list `a` that contains the numbers from $0$ to $1110$ (including $0$ and $1110$), incremented by one, using the `range` function.

In [1]:
#Answer to Ex. 1.1.1
a = [i for i in range(1111)]

print(a)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221,

>**Ex. 1.1.2**: Show that you understand [slicing](http://stackoverflow.com/questions/509211/explain-pythons-slice-notation) in Python by extracting a list `b` with the numbers from $760$ to $769$ (including both) from the list created above.

In [2]:
#Answer to Ex. 1.1.2
b = a[760:770]

print(b)

[760, 761, 762, 763, 764, 765, 766, 767, 768, 769]


>**Ex. 1.1.3**: Define a function that takes as input a number $x$ and outputs the number multiplied by itself plus three $f(x) = x(x+3)$. 

In [31]:
#Answer to Ex. 1.1.3
x = int(input("Please input a number."))
answer = x*(x+3)

print(answer)

Please input a number.10
130


>**Ex. 1.1.4**: Apply this function to every element of the list `b` using a `for` loop and append the results to a new list `c`. Print `c`.

In [33]:
#Answer to Ex. 1.1.4
c = []
for i in b:
    c.append(i*(i+3))
    
print(c)

[579880, 581404, 582930, 584458, 585988, 587520, 589054, 590590, 592128, 593668]


>**Ex. 1.1.5**: Do the exact same thing using a *list comprehension*.

In [34]:
#Answer to Ex. 1.1.5
c = [(i*(i+3)) for i in b]

print(c)

[579880, 581404, 582930, 584458, 585988, 587520, 589054, 590590, 592128, 593668]


>**Ex. 1.1.6**: Write the numbers in `c` to a text file with one number per line.

In [35]:
#Answer to Ex. 1.1.6
f = open("1.1.6_Answer.txt", "w")
for i in c:
    f.write(str(i) + "\n")

f = open("1.1.6_Answer.txt", "r")
print(f.read())

f.close()

579880
581404
582930
584458
585988
587520
589054
590590
592128
593668



>**Ex. 1.1.7**: Show that you understand how strings work in Python. You should:
>
>1. Add a comment above each line of code that explains it.
>2. Find all the lines where **a string** is put into a string. How many are there?
>3. Explain the difference between `%d`, `%s` and `%r`.
>
>[Source](https://learnpythonthehardway.org/book/ex6.html)

In [37]:
#Answer to Ex. 1.1.7.1

# This is an example of a comment

# Creating a sentence that we can specify a value for
x = "There are %d types of people." % 10
# Creating a variable for binary to be used later
binary = "binary"
# Creating a variable for do not to be used later
do_not = "don't"
# Inserting binary and do not into their respetive places from the placeholders
y = "Those who know %s and those who %s." % (binary, do_not)

# Print the values of x and y
print(x)
print(y)

# Print the sentence while inserting x
print("I said: %r." % x)
# Print the sentence while inserting y
print("I also said: '%s'." % y)

# Set a boolean for hilarious
hilarious = False
# Use hilarious in the sentence with the place holder
joke_evaluation = "Isn't that joke so funny?! %r"

# Print the joke and evaluation 
print(joke_evaluation % hilarious)

# Create variables for w and e
w = "This is the left side of..."
e = "a string with a right side."

# Concatenate w and e in a print statement
print(w + e)


#Answer to Ex. 1.1.7.2
# There are 5 instances where a string is inserted into another string.

#Answer to Ex. 1.1.7.3
# %d will allow you to insert numbers and display them, %s will use the 
# presentation representation of the string, %r is the canonical 
# representation of the string

There are 10 types of people.
Those who know binary and those who don't.
I said: 'There are 10 types of people.'.
I also said: 'Those who know binary and those who don't.'.
Isn't that joke so funny?! False
This is the left side of...a string with a right side.


>**Ex. 1.1.8**: Why does `5 // 2 == 2` in Python 3? What does `5 / 2` give?

In [38]:
#Answer to Ex. 1.1.8
print(5//2)
print(5/2)

# Double slash is integer division, which is floor by default. 
# Single slash is floating point division, which will return 
# a more exact answer.


2
2.5


>**Ex. 1.1.9**: Explain the point of using `try` and `except` statements? Write some code that shows how to use these.
>
> *Hint: You will do a lot of Googling in this course. If you don't already know how to use `try` and `except`, start Googling now.*

In [8]:
#Answer to Ex. 1.1.9

x = True
y = False

try:
    assert(x == y)
    print("x and y are equivalent.")
except:
    print("x and y are not equivalent.")
    
# Try and execept statements are a way of doing control statements 
# more concisely, a simpler and more adaptable version of if else 
# statements.

x and y are not equivalent.


>**Ex 1.1.10**: `dict`s and `defaultdict`s.
1. What is a `defaultdict`? How would you say it is different from a normal Python `dict`?
2. Write some code that takes a list of tuples:
>
>        l = [("a", 1), ("b", 3), ("a", None), ("c", False), ("b", True), ("a", None)]
>
>     And produces a `defaultdict` object
>
>        defaultdict(<class 'list'>, {'a': [1, None, None], 'c': [False], 'b': [3, True]})
>
>*Hint: you can import `defaultdict` from `collections`. Your code should be a for loop that loops over the tuples in `l` and updates an initially empty defaultdict, iteration after iteration.*

In [39]:
#Answer to Ex. 1.1.10.1

# The Primary difference between a dict and default dict is that 
# the defaultdict will not throw an error if the desired key
# doesn't exist, the user can specify the resulting behavior.  


# [Answer to Ex. 1.1.10.2]
from collections import defaultdict 

def ifNotPresent():
    return("Key is not present")

l = [("a", 1), ("b", 3), ("a", None), ("c", False), ("b", True), ("a", None)]
d = defaultdict(list)

for i,j in l:
    d[i].append(j)
        
print(d)



defaultdict(<class 'list'>, {'a': [1, None, None], 'b': [3, True], 'c': [False]})


>**Ex 1.1.11**: Take a list `a = list("justreadtheinstructions")` and
1. count the number of times each element occurs using `Counter`,
2. report the two most common elements
>
>*Hint: you can import `Counter` from `collections`. `Counter` has a method called `most_common` can you can use.*

In [40]:
#Answer to Ex. 1.1.11.1-2

from collections import Counter

a = list("justreadtheinstructions")
count = Counter(a)
count.most_common(1)

l = list(count.items())
l.sort(key=lambda y: y[1], reverse=True)

print(count)
print(l[:2])


Counter({'t': 4, 's': 3, 'u': 2, 'r': 2, 'e': 2, 'i': 2, 'n': 2, 'j': 1, 'a': 1, 'd': 1, 'h': 1, 'c': 1, 'o': 1})
[('t', 4), ('s', 3)]


>**Ex 1.1.12**: Take another list `b = list("ofcourseistillloveyou")` and
1. get the `set` of characters that exist in both `a` and `b` (intersection),
2. get the `set` of characters that exist in either `a` or `b` (union), and
3. compute the [Jaccard similarity](https://en.wikipedia.org/wiki/Jaccard_index) between the distinct elements in `a` and `b`.
>
>*Hint: use the `set` function to get a `set`-type object of distinct elements from a list. Sets supports a [number of different operations](https://snakify.org/en/lessons/sets/#section_4).*

In [12]:
#Answer to Ex. 1.1.12.1-3
b = list("ofcourseistillloveyou")

# Create sets of a and b
a_set = set(a)
b_set = set(b)

# Get the intersection, untion, and Jaccard similarity
intersection_ab = set(a_set.intersection(b_set))
union_ab = set(a_set.union(b_set))
Jaccard_sim = len(intersection_ab) / len(union_ab)

# Print the results
print(" " + str(a_set) + "\n", str(b_set) + "\n", str(intersection_ab) + "\n", str(union_ab) + "\n", str(Jaccard_sim))


 {'i', 'd', 't', 'r', 'h', 'j', 'c', 'o', 'n', 's', 'a', 'u', 'e'}
 {'y', 'i', 't', 'r', 's', 'c', 'l', 'f', 'o', 'v', 'u', 'e'}
 {'i', 't', 'u', 'o', 'c', 's', 'r', 'e'}
 {'i', 'd', 'j', 'c', 'l', 'f', 'o', 's', 'a', 'r', 'v', 'y', 't', 'u', 'n', 'h', 'e'}
 0.47058823529411764


### Part 3: A little bit of real data

>**Ex. 1.2.1**: Learn about JSON by reading the **[wikipedia page](https://en.wikipedia.org/wiki/JSON)**. Then answer the following questions in the cell below. 
>
>1. What do the letters stand for?
>2. What is JSON?
>3. Why is JSON superior to XML? (... or why not?)

In [13]:
#Answer to Ex. 1.2.1.1-3

# JSON stands for JavaScript Object Notation. It is a way to 
# store the information about object using a special hierarchy 
# wirtten in JavaScript. Javascript can be more applicable, as 
# the mongodb database system using JSON to represent each 
# object, and many websites use this or something similar. JSON 
# also allows for mutiple atrributes per value, while XML does
# not. 

>**Ex. 1.2.2**: Working with JSON files
>1. Use [`requests`](https://www.google.dk/search?q=python+requests+get+json&gws_rd=cr&ei=M5OdWaewD8Ti6AS54J24Bg), or another Python module, to store **[this data](https://www.reddit.com/r/gameofthrones/.json)** in a new variable `data`.
>2. Show that `data` is a `dict` type object.

In [15]:
#Answer to Ex. 1.2.2.1-2
import requests
import json

data = requests.get("https://www.reddit.com/r/gameofthrones/.json")
data.encoding = 'utf-8'
data_dict = json.loads(data.text)

print(" 1.2.2.1 " + str(data) + "\n", "1.2.2.2 " + str(isinstance(data_dict, dict)))

 1.2.2.1 <Response [200]>
 1.2.2.2 True


>**Ex. 1.2.3**: Let's try to inspect the data you retrieved. 
>
>1. Use the `json` module to print your data variable as a string with `indent=4`.
>2. Print the keys of `data`.
>
>*Hint: 1. Use the `json` function `dumps`. 2. Call `.keys()` on the variable.*

In [16]:
#Answer to Ex. 1.2.3.1
import json

data_printed = json.dumps(data_dict, indent=4)

print(data_printed)

{
    "kind": "Listing",
    "data": {
        "after": "t3_sw9xyr",
        "dist": 26,
        "modhash": "",
        "geo_filter": null,
        "children": [
            {
                "kind": "t3",
                "data": {
                    "approved_at_utc": null,
                    "subreddit": "gameofthrones",
                    "selftext": "Please post and discuss tier lists here so they don't clog up the front page. Thank you!",
                    "author_fullname": "t2_n2ti4",
                    "saved": false,
                    "mod_reason_title": null,
                    "gilded": 0,
                    "clicked": false,
                    "title": "[SPOILERS] Tier List MegaThread",
                    "link_flair_richtext": [
                        {
                            "e": "text",
                            "t": "Spoilers"
                        }
                    ],
                    "subreddit_name_prefixed": "r/gameofthrones",
          

In [17]:
#Answer to Ex. 1.2.3.2
import json
from collections import defaultdict 

data_keys = data_dict.keys()
   
print(data_keys)
print(data_dict['data']['children'][0]['data'].keys())

dict_keys(['kind', 'data'])
dict_keys(['approved_at_utc', 'subreddit', 'selftext', 'author_fullname', 'saved', 'mod_reason_title', 'gilded', 'clicked', 'title', 'link_flair_richtext', 'subreddit_name_prefixed', 'hidden', 'pwls', 'link_flair_css_class', 'downs', 'thumbnail_height', 'top_awarded_type', 'hide_score', 'name', 'quarantine', 'link_flair_text_color', 'upvote_ratio', 'author_flair_background_color', 'subreddit_type', 'ups', 'total_awards_received', 'media_embed', 'thumbnail_width', 'author_flair_template_id', 'is_original_content', 'user_reports', 'secure_media', 'is_reddit_media_domain', 'is_meta', 'category', 'secure_media_embed', 'link_flair_text', 'can_mod_post', 'score', 'approved_by', 'is_created_from_ads_ui', 'author_premium', 'thumbnail', 'edited', 'author_flair_css_class', 'author_flair_richtext', 'gildings', 'content_categories', 'is_self', 'mod_note', 'created', 'link_flair_type', 'wls', 'removed_by_category', 'banned_by', 'author_flair_type', 'domain', 'allow_live_

>**Ex. 1.2.4**: The URL reveals that the data is from reddit/r/gameofthrones, but can you recover that information from the data? Give your answer by 'keying' into the dictionary using square brackets.
>
>*Hint: 'Keying' is a word i just made up. By it, I mean the following. Consider a nested dictionary like:*
>
>        my_json_obj = {
>            'cats': {
>                'awesome': ['Missy'],
>                'useless': ['Kim', 'Frank', 'Sandy']
>            },
>            'dogs': {
>                'awesome': ['Finn', 'Dolores', 'Fido', 'Casper'],
>                'useless': []
>            }
>        }
>
>*I can get the list of useless cats by keying into `my_json_obj` like such:*
>
>        >>> my_json_obj['cats']['useless']
>        Out [ ]: ['Kim', 'Frank', 'Sandy']
>
>*`my_json_obj['cats']` returns the dictionary `{'awesome': ['Missy'], 'useless': ['Kim', 'Frank', 'Sandy']}` and getting '`useless`' from that eventually gives us `['Kim', 'Frank', 'Sandy']`. If any of those list items were a list of a dictionary themselves, we could have kept keying deeper into the structure.*

In [19]:
#Answer to Ex. 1.2.4
import json

print(data_dict['data']['children'][0]['data']['subreddit'])

gameofthrones


>**Ex 1.2.5**: Write two `for` loops (or list comprehensions) which:
>1. Count the number of spoilers.
>2. Only prints headlines that aren't spoilers.

In [25]:
#Answer to Ex. 1.2.5.1
num_spoilers = 0

for i in data_dict['data']['children']:
    if i['data']['spoiler'] == True:
        num_spoilers +=1

print(num_spoilers)
        

14


In [30]:
#Answer to Ex. 1.2.5.2
non_spoiled_headlines = [i['data']['title'] for i in data_dict['data']['children'] if not i['data']['spoiler'] == True]

for i in non_spoiled_headlines:
    print(i)

[SPOILERS] Pictures of some of the cast when they were younger…I’ll let you guess who’s who!
[No Spoilers] They say the journey beats the destination. But if the last season of Game of Thrones was as bad as everyone says it was, is it worth even starting to watch the series if I know I'm going to be disappointed in the end?
[No Spoilers] I went to the Great Sept of Baelor in Girona, Spain! (2018)
[No Spoilers] I visited the peaceful town of Essaouira in Morocco where some scenes of the show were filmed
[No Spoilers] check out my Renfair mug it’s the Iron Throne! It was 2019’s commemorative mug but they had some left so they sold them at the front this year with the other mugs and memorabilia
[NO SPOILERS] I recently visited a couple of SE1 EP1 locations in Tollymore Forrest Northern Ireland
[NO SPOILERS] bran starks lookalike??????????
[No Spoilers] What game of thrones painting is the best? Competition
[No Spoilers] I don't know.....
[NO SPOILERS] What Went Wrong With GoT: Fan Theory
