# Python for Data Science - week 1

#### The Notebook environment
Jupyter Notebooks are a widely used tool for data science. They contain both code and rich-text elements like paragraphs, equations and charts. Notebooks are both (a) human-readable documents used to present analysis and results; and (b) executable documents that your computer can be run.


#### Preliminary exercise - get to know Colab

Notebooks comprise two types of cells:
* _Code cells._ These contain executable commands in Python.
* _Text ('markdown') cells._ These include plain text, or you can use [markdown](https://commonmark.org/help/) to add formatting.

__EXERCISE:__ Spend a couple of minutes learning to navigate Colab. Perform the following:
 * Add a new code cell
 * Write your first program: print("hello world!")
 * Run the program two ways: using CTRL-ENTER and SHIFT-ENTER

#### Keyboard shortcuts

Action | Colab Shortcut
---|---
Executes current cell | `<CTRL-ENTER>`
Executes current cell and moves to next cell | `<SHIFT-ENTER>`
Insert cell above | `<CTRL-M> <A>`
Append cell below | `<CTRL-M> <B>`
Convert cell to code | `<CTRL-M> <Y>`
Convert cell to Markdown | `<CTRL-M> <M>`
Autocomplete | `<TAB>`
Goes from edit to "command" mode | `<ESC>`
Goes from "command" to edit mode | `<ENTER>`
<p align="center"><b>Note:</b> On OS X use `<COMMAND>` instead of `<CTRL>`</p>

#### 1. Variables and basic math in Python

#### 1.1 Math operators

In [None]:
# add two integers
2 + 2

In [None]:
# multiply two integers
2 * 2

In [None]:
# spaces don't matter here, but keep them consistent (PEP8 good practice)
2*3   +   10

In [None]:
# divide two integers
6 / 3

In [None]:
# raise 2 to the 4th power
2 ** 4

| Symbol | Task Performed |
|----|---|
| +  | Addition |
| -  | Subtraction |
| /  | division |
| *  | multiplication |
| **  | to the power of |
| %  | mod |

#### 1.2 Working with variables

In [None]:
# variables, such as x here, contain values and their values can vary
x = 5

In [None]:
# to inspect a value, just call it
x

In [None]:
# you can perform calculations on variables
x + 3

In [None]:
# what's the value of x now?
x

In [None]:
# to update the value of a variable, you have to do assignment again
x = x + 3

In [None]:
# what's the value of x now?
x

In [None]:
# create a new variable y from a calculation involving x
y = x + 2
y

In [None]:
# calling two variables only displays the last one
x
y

In [None]:
# use the print() function to output value(s) to the console
print(x)
print(y)

In [None]:
# separate two values by commas to output on the same line
print(x,y)

In [None]:
# you can also print the output of an expression
print(x * y)

NOTE: Use valid variable names:
* Variable names can contain letters, numbers, and the underscore character.
* You can't begin variable names with a digit, or use any of Python's _reserved words_ (eg. False, None, zip, else, class, ...).
* Don't accidentally use a space in the middle of a variable name!

| result | variable name |
|----|----|
| Valid | my_float, xyz_123, zip_code |
| Error! | my float, 123_xyz, zip |

#### 1.3 Getting help and using tab complete

In [None]:
# get iPython help on an expression by putting ? after it
len?

In [None]:
# use tab complete to fill in the rest of statements, functions, methods
prin

In [None]:
# also use it to complete variable or functions that you defined yourself
name_of_course = "Python for Data Science"
name_

#### 2. Data types 1: int, float, string, Boolean
In Python, everything -- booleans, integers, floats, dictionaries and functions -- is implemented as an object. Each object has a type, which determines what can be done with the data it contains. For instance, if an object is type _int_, you can add it to another _int_. 

In compiled languages like C++, the programmer has to declare the type of any variable before using it. In Python, the type is inferred at run-time.

In [None]:
# integers are whole numbers
x = 10
type(x)

In [None]:
# floats are floating point (or decimal) numbers
y = 4.25
type(y)

In [5]:
# strings are sets of characters in a row, denoted by single or double quotes
course_name = 'Python for Data Science'

In [None]:
# YOUR ACTION

# Create a dictionary with the characteristics of your favorite pet (eg. size, breed, favorite food)

#### 2.1. Manipulating strings

In [6]:
# pass string directly to print function

print("Hello world")

Hello world


In [7]:
# or pass a variable to print function

print(course_name)

Python for Data Science


In [12]:
# Add (concatenate) two strings together
date = "April 2019"

full_title = course_name + ": " + date
full_title

'Python for Data Science: April 2019'

In [8]:
# how long is the string?

len(course_name)

23

In [6]:
# extract a character with []
# specify the offset: my_string[start_point:end_point]
course_name[:6]

'Pytho'

In [8]:
# get a sub-string between two index positions
course_name[7:10]

'for'

In [9]:
# from the 11th character to the end
course_name[11:]

'Data Science'

In [10]:
# Data types are defined as classes. Classes have methods attached to them, which you can access with dot notation.
# Example: strings have a method '.split()' that returns a list of component parts. 

full_title.split()

['Python', 'for', 'Data', 'Science:', 'Feb', '2019']

In [13]:
# Or we could replace characters using the .replace() method
full_title.replace('1','2')

'Python for Data Science: April 2029'

In [14]:
# YOUR ACTION
# Use iPython help to understand the .split() method better. What parameters does it take? What can it do? Next, split full_title using a colon instead of a space as the separator. This should return a list containing two items. Inspect both items in turn using Python's square bracket indexing notation.

#### 2.2 Converting between types
Often you need to convert variables to other types, especially to make them work together. Use the _int()_, _str()_ or _float()_ functions to convert to these data types.

In [None]:
# you can change the type of a variable
my_float = 4.3765
type(my_float)

In [None]:
# changing a float to an integer lops off everything after the decimal place

int(my_float)

In [None]:
# you can't concatenate a string and an integer

address = "1808 H ST NW, DC"
WB_zip = 20037

address + " " + WB_zip

In [None]:
# instead, change the integer to a string first
WB_zip = str(WB_zip)
type(WB_zip)

In [None]:
# does it work now?
address + " " + WB_zip

#### 3. Introducing more complex data types: lists, tuples and dictionaries
We already looked at integers, floats and strings. Think of these as atoms. Next, we will look at data types that combine those basics types in more complex ways. Lists, tuples and dictionaries are containers for other pieces of data; think of these as molecules.

|Data structure | Properties| Syntax|
|----|----|----|
|List | Ordered, mutable sequence | mylist = [1,2,3] |
|Tuple | Ordered, immutable sequence | mytuple = (1,2,3) |
|Set | Unordered set of unique values | set(1,2,3 |
|Dictionary | Mutable set of key, value pairs | mydict = {'first_value':1, 'second_value:2} |


In [None]:
# a list is a collection of elements, denoted by square brackets
Welsh_towns = ['Cardiff', 'Prestatyn', 'Aberystwyth']

In [None]:
# a dictionary is a collection of key:value pairs, denoted by curly brackets
town_populations = {'Cardiff': 340000, 'Swansea': 230000, 'Aberystwyth': 16700}

#### 3.1 Manipulating lists
Lists are ordered sequences denoted by square brackets. They're helpful when your data has a meaningful order, and may need to be changed in place. You can put strings, floats, integers, or any of Python's more complex data types into a list.

In [20]:
# define a list with []
weekdays = ['monday','tuesday','wednesday','thursday','friday']

In [None]:
# get an item using mylist[offset]
weekdays[3]

In [None]:
# change an item using mylist[offset]
weekdays[3] = 'thursday - remember Python class!'
weekdays

In [None]:
# slicing: extract items by offset range
weekdays[2:4]

In [21]:
# add an item to a list with append()
weekdays.append('Saturday')

In [17]:
# concatenate two lists
odds = [1,3,5]
evens = [2,4,6]
all_nums = odds + evens
all_nums

[1, 3, 5, 2, 4, 6]

In [18]:
# Lists, like other data types, have methods associated with them. These are accessed through dot notation.
# Use tab complete to find helpful methods! 

all_nums.sor

[0;31mSignature:[0m [0modds[0m[0;34m.[0m[0msort[0m[0;34m([0m[0;34m*[0m[0;34m,[0m [0mkey[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mreverse[0m[0;34m=[0m[0;32mFalse[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m Stable sort *IN PLACE*.
[0;31mType:[0m      builtin_function_or_method


In [24]:
# test for a value in your list
'Saturday' in weekdays

False

In [None]:
# use .remove() to clean up the weekdays list

weekdays.remove('Saturday')
weekdays

#### 3.2 Dictionaries
Dictionaries are collections of key-value pairs. Unlike lists, the order of items doesn't matter, and they aren't selected by an offset. To use dictionaries, you specify a unique key to associate with each value. Dictionaries are mutable, so you can add, delete or change their elements.

In [None]:
# create a dictionary


#### 4. Intro to logic and control flow

Definition of **control flow**:
* In a simple script, program execution starts at the top and executes each instruction in order. 
* **Control flow** statements can cause the execution to loop and skip instructions based on conditions.

#### 4.1 Logic operators

We test conditions using logic operators.

| Symbol | Task Performed |
|----|---|
| == | True, if it is equal |
| !=  | True, if not equal to |
| < | less than |
| > | greater than |

In [20]:
# NOTE: We declare variables using '='
a = 5
b = 7

In [21]:
# But compare them using '=='
a == b

False

In [22]:
# Logic expressions evaluate to True or False (datatype: Boolean)

test = b > a

test

True

In [23]:
type(test)

bool

#### 4.2 Conditional statements with if

My pet Python is a vegetarian. She will test whether variable 'food' is 'burger', 'chicken' or 'veg', then decide whether to eat.

Do this with 'if', 'elif' (else if), and 'else'.

In [24]:
food = 'burger'

In [25]:
if food == 'veg':
    print ('yum')
elif food == 'chicken':
    print ('hmm maybe')
elif food == 'burger':
    print ('no thanks')
else:
    print ('more information needed')

no thanks


NOTE: Here's how the structure works:
* start with an 'if' statement, specifying the logical test to apply
* make sure your 'if' statement ends with :
* **indent the conditional code block.** Whatever code should be executed if the condition is true, indent it with a tab.

Test additional actions using 'elif', and any other actions with 'else'.

#### 4.3 For loops
A for loop runs a block of code repeatedly "for" each item in a range of numbers, or in a list. End the declaration with : and remember to indent the subsidiary code.

In [None]:
days = ['Mon','Tue','Wed','Thu','Fri','Sat','Sun']

for day in days:
    if day == 'Sat':
        location = '--> Beach!'
    elif day == 'Sun':
        location = '--> My sofa!'
    else:
        location = '--> MC5-215B'
    print(day, location)

# Part 2: Exercises
Pick one of these exercises. Try to complete it within the time allotted.

#### EXERCISE 1: Age calculator
    
Write a program that will:
1. Define a variable 'birth_year'
2. Define a variable 'current_year'
3. Calculate the person's age from these two
4. Print the output in format ("You are x years old.")

In [None]:
# YOUR CODE HERE:

#### EXERCISE 2: Sports star salaries

1. Define three variables, player, season_length, and annual_salary. Just use a Google search to find some values.
2. Write a script to calculate how much the player earns (a) per month, and (b) for each day in the season.
3. Print your output neatly.
4. Your team won the league so your player gets a 33% bonus. Recalculate.

**BONUS POINTS**: find a way to limit the salary to two decimal points when printing it.

In [28]:
# YOUR CODE HERE:

#### EXERCISE 3: Higher or lower?

Write a program to:
1. Generate a random number (kept secret from the user)
2. Ask the user to guess the number
3. Tell them if they were correct, too high, or too low.

BONUS POINTS: limit the number of guesses to 5.

In [100]:
# YOUR CODE HERE:

import random
random_number =                          # hint: try the random.randint() function
guess_number = input()

 4


#### EXERCISE 4: FizzBuzz

Write a script to:

* Print out the numbers from 1 to 20 but replacing numbers with 'Fizz' if divisible by 3, 'Buzz' if divisible by '5', and 'FizzBuzz' if divisible by 3 and 5.

Hint: the 'mod' operator, denoted %, is widely used to check divisibility. Example: 10 % 2 == 0.

(Any questions about a Python function, just google it!!). 

In [None]:
# YOUR CODE HERE: