# Basics of Programming
## Some full examples to get started

## The building blocks of a program are surprisingly small
* Data: numbers, characters, strings, etc.
* Functions and Operators: `+, -, abs(), sqrt(), print(), ...`
* Conditionals: `if/else` statements
* Loops: `for/while/...` loops which allow you to work with multiple items
* Data structures: list of items, matrix of items, lookup dictionaries,...

## Let's download a file a explore it
### Then run it in VS Code

In [1]:
src_url = "https://raw.githubusercontent.com/washingtonpost/data-game-of-thrones-deaths/master/game-of-thrones-deaths-data.csv"
data_file_location = "../../datasets/deaths-in-gameofthrones/game-of-thrones-deaths-data.csv"

In [2]:
%time
import requests
import os.path

#This is just boiler plate code, not for instruction...ignore meeee
if not os.path.exists(data_file_location):
  print("Downloading...")
  req = requests.get(src_url, allow_redirects=True).content
  open(data_file_location, 'wb').write(req)
  print("Finished downloading")
else:
  print("File already exists, not downloading")


Wall time: 0 ns
File already exists, not downloading


### Sit back and get your first look at Python
The point of this lecture is just to show you what Python looks like. Meet Python features for the first time. We will get to know their names and characteristics throughout the rest of the semester.

BTW, does everyone know Game of Thrones?

### Open the file and look at the contents
Let's write very simple programs to operate on this file
1. Count the number of lines in the file
2. How many people did Arya kill?
3. Find the names of everyone Jamie killed.
4. How many people were killed in total?

## Simplified example: count lines in a file
**Pseudo code** (not real code)
1. Set variable `counter` equal to zero
2. Keep going through lines of the file  
    2a. For every new line, add 1 to `counter`
3. When done with lines in file, show `counter` 

## Real python: count lines in a file

In [3]:
counter = 0
file = open(data_file_location, 'r', encoding='utf8')

for line in file:
  counter = counter + 1

file.close()
print(counter)


6888


* Use with open(file_name) as file
* Notice the indentation
* Notice that we have to close the file (real-world python has a solution for this)
* Notice the loop
* Find function calls
* Find variables
* What are types and scopes of variables?

## Interesting dataset, let's see what it looks like

In [4]:
list(open(data_file_location, 'r', encoding='utf8'))[:5]

['order,season,episode,character_killed,killer,method,method_cat,reason,location,allegiance,importance\n',
 '1,1,1,Waymar Royce,White Walker,Ice sword,Blade,Unknown,Beyond the Wall,"House Royce, Night’s Watch",2\n',
 '2,1,1,Gared,White Walker,Ice sword,Blade,Unknown,Beyond the Wall,Night’s Watch,2\n',
 '3,1,1,Will,Ned Stark,Sword (Ice),Blade,Deserting the Night’s Watch,Winterfell,Night’s Watch,2\n',
 '4,1,1,Stag,Direwolf,Direwolf teeth,Animal,Unknown,Winterfell,None,1\n']

In [8]:
#Sneak peek at DS specific library
import pandas as pd
pd.read_csv(data_file_location).head()

Unnamed: 0,order,season,episode,character_killed,killer,method,method_cat,reason,location,allegiance,importance
0,1,1,1,Waymar Royce,White Walker,Ice sword,Blade,Unknown,Beyond the Wall,"House Royce, Night’s Watch",2.0
1,2,1,1,Gared,White Walker,Ice sword,Blade,Unknown,Beyond the Wall,Night’s Watch,2.0
2,3,1,1,Will,Ned Stark,Sword (Ice),Blade,Deserting the Night’s Watch,Winterfell,Night’s Watch,2.0
3,4,1,1,Stag,Direwolf,Direwolf teeth,Animal,Unknown,Winterfell,,1.0
4,5,1,1,Direwolf,Stag,Antler,Animal,Unknown,Winterfell,,1.0


https://github.com/washingtonpost/data-game-of-thrones-deaths

## Another example: How many people did Arya Stark and Jon Snow kill?
**Pseudo code**
1. Set variable `jon` equal to zero
2. Set variable `arya` equal to zero

3. Keep going through lines of the file  
    3a. For every new line, `split` at commas  
    3b. If 4th column is "Arya Stark" then add 1 to variable `arya`  
    3c. If 4th column is "Jon Snow" then add 1 to variable `jon`
4. When done with lines in file, show `arya` and `jon` 

## Real Python: How many people did Arya Stark and Jon Snow kill?

In [5]:
jon = 0 #variable containing Jon's score
arya = 0 #variable containing Arya's score

#Open file
file = open(data_file_location, encoding='utf8')

#Go through each line in file
for line in file:
  tokens = line.split(',') #separate line into columns
  if tokens[4]=="Arya Stark": arya = arya + 1
  if tokens[4]=="Jon Snow": 
    jon = jon + 1

file.close()
print("Arya killed", arya, "people")
print("Jon killed", jon, "people")


Arya killed 1278 people
Jon killed 112 people


In [None]:
#Sneak peak at "Python For Analytics" -- more declarative
data_df = pd.read_csv(data_file_location)
data_df[data_df.killer.isin(["Arya Stark", "Jon Snow"])].killer.value_counts()

* Use with open(file_name) as file
* Notice that we are accessing the 4th item by using the index 3
* Notice that the if statements can be 1 or 2 lines
* Find function calls
* Find variables
* What are the types and scopes of variables

## Who did Jaime Lannister kill: **Pseudo code**
1. Create a list which will hold the names of people Jaime killed

3. Keep going through lines of the file  
    3a. For every new line, `split` at commas  
    3b. If 4th column is "Jaime Lannister" then add get the name of the person killed from the 8th column and add it to the list
4. When done with lines in file, show the list of people killed by Jaime 

## Who did Jaime Lannister kill: **Real code**

In [6]:
killed = [] # list data type

file = open(data_file_location, "r", encoding='utf8')

for line in file:
  tokens = line.split(',')
  if tokens[4]=="Jaime Lannister":
    name_of_killed = tokens[3]
    killed.append(name_of_killed)

file.close()
print(killed)

['Jory Cassel', 'Alton Lannister', 'Torrhen Karstark', 'Martell soldier', 'Aerys II Targaryen', 'Olenna Tyrell', 'Dothraki rider', 'Dothraki rider', 'Dothraki rider', 'Dothraki rider', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Euron Greyjoy']


In [9]:
#Sneak peak at "Python For Analytics" -- more declarative
data_df = pd.read_csv(data_file_location)
print(list(data_df[data_df.killer == 'Jaime Lannister'].character_killed))

['Jory Cassel', 'Alton Lannister', 'Torrhen Karstark', 'Martell soldier', 'Aerys II Targaryen', 'Olenna Tyrell', 'Dothraki rider', 'Dothraki rider', 'Dothraki rider', 'Dothraki rider', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Wight', 'Euron Greyjoy']


* Use with open(file_name) as file
* Use variables, instead of tokens with hard coded numbers
* How many people did he kill? (len)
* Who was the first, last person he killed? (killed[0], killed[-1])
* Who were the first three people he killed? (killed[0:3])

## Final example: How many people did _any one_ kill?
**Pseudo code**
1. create a dictionary which will hold look-up keys and values

3. Keep going through lines of the file
    3a. For every new line, `split` at commas
    3b. look up the killer's name in column 4 and add 1 to the value

4. When done with lines in file, show the contents of the dictionary

## Real Python: How many people did _any one_ kill?

In [11]:
killers = {} # dictionary data type

file = open(data_file_location, "r", encoding='utf8')

for line in file:
  tokens = line.split(',')
  if tokens[4] in killers: kill_count = killers[tokens[4]]
  else: kill_count = 0
  kill_count = kill_count + 1
  killers[tokens[4]] = kill_count

file.close()
killers

{'killer': 1,
 'White Walker': 5,
 'Ned Stark': 5,
 'Direwolf': 1,
 'Stag': 1,
 'Lysa Arryn': 1,
 'Dothraki man': 1,
 'Summer': 5,
 'Sandor “the Hound” Clegane': 59,
 'Gregor “the Mountain” Clegane': 10,
 'Tribesman': 2,
 'Bronn': 27,
 'Tyrion Lannister': 5,
 'Rodrik Cassel': 2,
 'Lannister soldier': 32,
 'Jory Cassel': 3,
 'Jaime Lannister': 23,
 'Robb Stark': 3,
 'Theon Greyjoy': 33,
 'Khal Drogo': 2,
 'Boar': 1,
 'City watch guard': 6,
 'Meryn Trant': 1,
 'Arya Stark': 1278,
 'Jon Snow': 112,
 'Night’s Watch brother': 41,
 'Mirri Maz Duur': 2,
 'Jorah Mormont': 65,
 'Hill tribesman': 2,
 'Ilyn Payne': 1,
 'Daenerys Targaryen': 24,
 'None': 477,
 'Melisandre “the Red Woman” of Asshai': 8,
 'Janos Slynt': 1,
 'Unknown (possible rival Dothraki men)': 1,
 'Yoren': 5,
 'Amory Lorch': 1,
 'Polliver': 1,
 'Grey Wind': 2,
 'The Tickler': 1,
 'Brienne of Tarth': 28,
 'Jaqen H’ghar': 4,
 'Qhorin Halfhand': 1,
 'Peasants': 1,
 'Osha': 2,
 'Unknown': 67,
 'Pyat Pree': 11,
 'Accident': 1,
 'Bara

In [12]:
#Sneak peak at "Python For Analytics" -- more declarative
data_df = pd.read_csv(data_file_location)
data_df.killer.value_counts()

Wight                                        1602
Drogon                                       1426
Arya Stark                                   1278
None                                          477
Rhaegal                                       273
                                             ... 
Gerold Hightower                                1
Rickard Karstark                                1
Night’s Watch brothers (final blow: Olly)       1
Nymeria Sand                                    1
Roose Bolton                                    1
Name: killer, Length: 149, dtype: int64

Improve this:
1. Assign tokens[4] to a variable
2. Use += to increment
3. Use get(key, default) (mention defaultdict(int))
4. use killers.items() in a for loop for better formatting
5. Notice that "killer" is actually a column heading!
6. Use `with open(file_name) as file`

7. How many did Jaime kill again?
8. How many killers are there?
9. How many dead are there?
10. Who killed most people?


Reference:
Deaths in Game of Thrones: https://data.world/makeovermonday/2019w27