In [None]:
#%conda install -c conda-forge rise

# Basics of Programming
## Some full examples to get started

## The building blocks of a program are surprisingly small
* Data: numbers, characters, strings, etc.
* Functions and Operators: `+, -, abs(), sqrt(), print(), ...`
* Conditionals: `if/else` statements
* Loops: `for/while/...` loops which allow you to work with multiple items
* Data structures: list of items, matrix of items, lookup dictionaries,...

## Let's download a file containing Game Of Thrones data and explore it

In [None]:
data_file_location = "../../datasets/deaths-in-gameofthrones/game-of-thrones-deaths-data.csv"

### Sit back and get your first look at Python
The point of this lecture is just to show you what Python looks like. Meet Python features for the first time. We will get to know their names and characteristics throughout the rest of the semester.

**You are NOT expected to learn anything yet, just open your brain and let new words, concepts and sytax flow in**

BTW, does everyone know Game of Thrones?

In [None]:
[line for line in open(data_file_location, 'r', encoding='utf8')][:10]

### Open the file and look at the contents
Let's write very simple programs to operate on this file
1. Count the number of lines in the file
2. How many people did Arya kill?
3. Find the names of everyone killed by Jamie.
4. How many people were killed in total?

## Simplified example: count the number of murders
**Pseudo code** (not real code)
1. Set variable `counter` equal to zero
2. Keep going through lines of the file  
    2a. For every new line, add 1 to `counter`
3. When done with lines in file, show `counter` 

## Real python: count the number of murders

In [None]:
counter = 0
file = open(data_file_location, 'r', encoding='utf8')

for line in file:
    counter = counter + 1


file.close()
print(counter)

* Use `with open(file_name) as file` in the future
* Notice the indentation
* Notice that we have to close the file (real-world python has a solution for this)
* Notice the loop
* Find function calls
* Find variables
* What are types and scopes of variables?

#### Python provides a way to avoid having to `close` files

In [None]:
counter = 0

with open(data_file_location, 'r', encoding='utf8') as file:
    for line in file:
        counter = counter + 1

print(counter)

## Interesting dataset, let's see what it looks like

In [None]:
list(open(data_file_location, 'r', encoding='utf8'))[:5]

In [None]:
#Sneak peek at DS specific library
import pandas as pd
pd.read_csv(data_file_location).head()

https://github.com/washingtonpost/data-game-of-thrones-deaths

## How many people did Arya Stark and Jon Snow kill? **Pseudo code**
1. Set variable `jon` equal to zero
2. Set variable `arya` equal to zero

3. Keep going through lines of the file  
    3a. For every new line, `split` at commas  
    3b. If 4th column is "Arya Stark" then add 1 to variable `arya`  
    3c. If 4th column is "Jon Snow" then add 1 to variable `jon`
4. When done with lines in file, show `arya` and `jon` 

## How many people did Arya Stark and Jon Snow kill? **Real code**

In [None]:
jon  = 0 # variable containing Jon's score
arya = 0 # variable containing Arya's score

with open(data_file_location, 'r', encoding='utf8') as file:
    #Go through each line in file
    for line in file:
      columns = line.split(',') #separate line into columns
      if columns[4] == "Arya Stark": arya = arya + 1
      if columns[4] == "Jon Snow": 
        jon = jon + 1

print("Arya killed", arya, "people")
print("Jon killed", jon, "people")


In [None]:
line

In [None]:
line.split(',')

In [None]:
#Sneak peak at "Python For Analytics" -- more declarative
data_df = pd.read_csv(data_file_location)
data_df[data_df.killer.isin(["Arya Stark", "Jon Snow"])].killer.value_counts()

* Use with open(file_name) as file
* Notice that we are accessing the 4th item by using the index 3
* Notice that the if statements can be 1 or 2 lines
* Find function calls
* Find variables
* What are the types and scopes of variables

## Who did Jaime Lannister kill: **Pseudo code**
1. Create an empty list which will hold the names of people Jaime killed

3. Keep going through lines of the file  
    3a. For every new line, `split` at commas  
    3b. If 4th column is "Jaime Lannister" then add get the name of the person killed from the 3rd column and add it to the list
4. When done with lines in file, show the list of people killed by Jaime 

## Who did Jaime Lannister kill: **Real code**

In [None]:
killed = list() # list data structure

with open(data_file_location, 'r', encoding='utf8') as file:
    for line in file:
      tokens = line.split(',')
      if tokens[4] == "Jaime Lannister":
        name_of_killed = tokens[3]
        killed.append(name_of_killed)

print(killed)

In [None]:
#Sneak peak at "Python For Analytics" -- more declarative
data_df = pd.read_csv(data_file_location)
print(list(data_df[data_df.killer == 'Jaime Lannister'].character_killed))

* Use with open(file_name) as file
* Use variables, instead of tokens with hard coded numbers
* How many people did he kill? (len)
* Who was the first, last person he killed? (killed[0], killed[-1])
* Who were the first three people he killed? (killed[0:3])

## Final example: How many people did _any one_ kill? **Pseudo code**
1. create an empty dictionary which will hold look-up keys and values

3. Keep going through lines of the file

    3a. For every new line, `split` at commas
    
    3b. look up the killer's name in column 4 and add 1 to the value

4. When done with lines in file, show the contents of the dictionary

## How many people did _any one_ kill? **Real code**

In [None]:
killers = dict() # dictionary data structure

with open(data_file_location, 'r', encoding='utf8') as file:
    for line in file:
      tokens = line.split(',')
      if tokens[4] in killers: kill_count = killers[tokens[4]]
      else: kill_count = 0
      kill_count = kill_count + 1
      killers[tokens[4]] = kill_count

killers

In [None]:
#Sneak peak at "Python For Analytics" -- more declarative
data_df = pd.read_csv(data_file_location)
data_df.killer.value_counts()

# Create a new file, containing names of all the killers

In [None]:
killers = set() # set data structure

output_file_location = "../../datasets/deaths-in-gameofthrones/killers.csv"

with open(data_file_location, 'r', encoding='utf8') as input_file, open(output_file_location, 'w', encoding='utf8') as output_file:
    for line in input_file:
      tokens = line.split(',')
      killer = tokens[4]
      output_file.write(killer + '\n') # notice that we have to explicitely write '\n' for new lines


In [None]:
data_df = pd.read_csv(data_file_location)
data_df[['killer']].drop_duplicates().sort_values('killer').to_csv('../../datasets/deaths-in-gameofthrones/killers_pd.csv', index=False)

Improve this:
1. Assign tokens[4] to a variable
2. Use += to increment
3. Use get(key, default) (mention defaultdict(int))
4. use killers.items() in a for loop for better formatting
5. Notice that "killer" is actually a column heading!
6. Use `with open(file_name) as file`

7. How many did Jaime kill again?
8. How many killers are there?
9. How many dead are there?
10. Who killed most people?


Reference:
Deaths in Game of Thrones: https://data.world/makeovermonday/2019w27