# UFCFVQ-15-M Programming for Data Science
# Week 2 Jupyter Notebook 
# Python Variables and Data Types


## Goals
This notebook has been created to familiarise you with Python Variables and Data Types. Most of the code needed to progress through this Notebook has been provided for you. However, there are several coding tasks that you will need to complete yourself by entering code yourself.

The topics in this notebook include:
* Basic programming features in Python: variables, data types and values
* Translating formulas from math notation to Python.
* Declaring strings and manipulating them
* Creating mathematical expressions in Python

## Variables
Variables are containers for storing data values. They are created by assigning a value to them using the assignment operator `=`. In Python, you do not need to explictly define what data-type a variable uses unlike other languages such as Java or C++. Python will implicitly decide this out by looking at the type of value that is being assigned. So if you are assigning an integer value the variable is an integer or if you assign a string the variable is a string. Later in this Notebook, we will look at the different kinds of data we can store in a variable, known as its data type.

In [None]:
x = 2     # integer assignment
y = 5     # integer assignment
xy = 'Hey'   # string assignment

In fact, a Python variable's type can change during its lifetime by assigning a value of a different data type. However, you should avoid doing this as it can be confusing to anyone else reading your source code.

In [None]:
x = 1
print('The value of x is', x)

x = 2.5
print('Now the value of x is', x)

x = 'hello there'
print('Now it is ', x)

Multiple variables can be assigned with the same value.

In [None]:
x = y = 1

In [None]:
print (x,y)

### Variable Scope
Normally, you would execute cells in the order they are included in the Notebook. However, please be aware that it is the order of execution of cells that is important in a Jupyter Notebook, not necessarily the order in which they appear. Python will remember all the code that was run previously, including any variables you have defined, irrespective of the order in the Notebook. Therefore if you define variables lower down the Notebook and then (re)run cells further up, those defined further down will still be present. Variables persist between cells as you have already no doubt experienced. This is one of the most useful facilities within the Jupyter system. It allows you to break down complex data science investigations into smaller units of work.

To show how this works try to execute the first of the following code cells. You should see an error showing that the variable `message` does not exist. Now execute the second code cell. Try executing the first code cell again. This time the variable `message` is defined.

In [None]:
print(message)

In [None]:
message="Hello World!"

### Variables Names
A variable can have a short name (like x and y) or a more descriptive name (age, carname, total_volume). Rules for Python variables:
* A variable name must start with a letter or the underscore character
* A variable name cannot start with a number
* A variable name can only contain alpha-numeric characters and underscores (A-z, 0-9, and _ )
* Variable names are case-sensitive (age, Age and AGE are three different variables)

Variable names with more than one word can be difficult to read. There are a couple of common technique for making these kinds of variable more readable:
#### Camel Case
Each word, except the first, starts with a capital letter.
#### Snake Case
Each word is separated by an underscrore character _

In [None]:
# camelCase
myFirstName="Dave"

# snake_case
my_first_name = "Dave"

However, there are words that we should not use as variable names because these words already have special meaning in Python.

##### Reserved Words
Python will raise an error if you try to assign a value to any of these keywords and so you <strong>must</strong> avoid these as variable names.

|  |  |  |  |  |  |  |  |  |
|---|---|---|---|---|---|---|---|---|
| False | class | finally | is | return | None | continue | for | lambda |
| True | def | from | nonlocal | while | and | del | global | not |
| as | elif | if | or | yield | assert | else | import | pass |
| except | in | raise | try | with | break |  |  |  |  |

##### Built-in Function Names
There are several functions which are included in the standard Python library. Do not use the names of these functions as variable names otherwise the reference to the built-in function will be lost. For example, do not use `sum()`, `min()`, `max()`, `list()` or `sorted()` as a variable name. See the full list of <a href="https://docs.python.org/3/library/functions.html" target="_blank">built-in functions</a>.

### <font color='red'><u>Worksheet Exercises</u></font>
1. Create a variable named `carname` and assign the string value `"Volvo"` to it
2. Create a variable named `x` and assign the value `50` to it
3. Print the sum of 5 + 10, using two variables: `x` and `y`
4. Create a variable called `z`, assign `x + y` to it, and print the result.
5. Remove the illegal characters in the variable name: `2my-first_name = "John"`
6. Assign the same value `"Orange"` to three variables (`x`, `y` and `z`) in one line of code.
7. Choose a descriptive variable name for each of the following and give an example initial value:
    * the total revenue for in a given year
    * the average house price in a given district
    * the number of characters in a text file
    * the manufacturer of a given car

In [None]:
# add your exercise solutions here

### Special Jupyter Commands
Jupyter Notebook has several special commands called magic commands. There is no need for you to learn all of these commands but the following command is quite useful to use when editing and exectuing multiple code cells in Notebook: `%whos`. This command will print the variable name, type and data of all active variables. Additional information may be printed for other data structures such as lists, tuples, sets and dictionaries.

In [None]:
%whos

### A note on the print() function
The `print()` function prints the specified message to the screen, or other standard output device. The message can be a string, or any other object, the object will be converted into a string before written to the screen. The values passed to the function are called arguments. These arguments must be separated by a `,`. The print function will automatically display all arguments on the same line with a single space between them. Finally, the print function adds a new line to the end.

In [None]:
name="Dave"
age=42
print(name,"is",age,"years old") # create a string with inserted variable values

The syntax for this function is as follows:

| <div align="center">Parameter</div> | <div align="center">Description</div> |
|----|---|
| <div align="center">object(s)</div>  | <div align="center">Any object, and as many as you like</div> |
| <div align="center">sep='separator'  | <div align="center">Optional. Specify how to separate multiple objects. ' ' is default.</div> |
| <div align="center">end='end'</div>  | <div align="center">Optional. Specify what to print at the end</div> |

In [None]:
anInt = 10
aFloat = 3.14
aString = "Hello"
print(anInt,aFloat,aString,sep=',') # print three objects separated by a ,

### <font color='red'><u>Worksheet Exercises</u></font>
1. Create three integer variables, `x`, `y` and `z` with values `1`, `2` and `3` respectively. Print all three variables on the same line
2. Now, print the same three variables as above, but this time separated by `***` instead of a space
3. Finally, print the same three variables as above, but this time terminate the string with an `!` exclamation mark

In [None]:
# add your exercise solutions here

## Data Types
Values in Python have an associated data type. Different types can do different things. Python has 13 built-in data types:

| <div align="center">Category</div> | <div align="center">Data Type</div> |
|---|---|
| <div align="center">Textual</div> | <div align="center">str</div> |
| <div align="center">Numeric</div> | <div align="center">int, float, complex</div> |
| <div align="center">Sequence</div> | <div align="center">list, tuple, range</div> |
| <div align="center">Mapping</div> | <div align="center">dict</div> |
| <div align="center">Set</div> | <div align="center">set, frozenset</div> |
| <div align="center">Boolean</div> | <div align="center">bool</div> |
| <div align="center">Binary</div> | <div align="center">bytes, bytearray, memoryview</div> |

We will look at all of these data types as we progress through the module. It should be noted at this stage, that Python has the facility to create new data types and many of the libraries you will use for Data Science will do just this giving many more possibilities. For now, we will focus on the textual, numeric and boolean types.

### Null Values
In fact, sometimes we may need to represent the absence of data. In Python, we use the special value `None`.

In [None]:
result = None

### A note of data types
As you begin to develop python programs it can sometime be useful to find out what type of data is variable is storing. Python provides the special function `type()` for this purpose.

In [None]:
x = 5 # define x as an int with value 5
type(x)

### Numeric Data Types
There are three numeric types in Python: `int`, `float` and `complex`.
#### int
Used to represent a whole number, positive or negative, without a fraction or decimal, of unlimited length.

In [None]:
x=1
y=35656222554887711
z=-3255522

#### float
Used to represent a floating point number, positive or negative, containing one or more decimals.

In [None]:
x = 1.10
y = 1.0
z = -35.59

 Optionally, float can also be a scientific number with an e to indicate the power of 10.

In [None]:
x = 35e3
y = 12E4
z = -87.7e100

#### complex
Used to represent complex numbers whcih are formed from a real part and imaginary part. The imaginary part is written with a `j`

In [None]:
x = 3+5j
y = 5j
z = -5j

### Arithmetic Operators
Values of all three numeric data types can be used in arithmetic expressions. Below are a list of Python's arithmetic operators:

| Symbol | Task Performed |
|----|---|
| +  | addition |
| -  | subtraction |
| /  | division |
| %  | modulus |
| *  | multiplication |
| //  | floor division |
| **  | to the power of |
| abs() | absolute value |

Some examples:

In [None]:
1+2 # addition

In [None]:
2-1 # subtraction

In [None]:
1/2 # division

In [None]:
15%10 # modulus - the remainder of the division

In [None]:
1*2 # mulitplication

In [None]:
15//10 # floor - rounds the result down to the nearest whole number

In [None]:
5**2 # power

In [None]:
abs(-1) # absolute value

##### Rules of operator precedence
The arithmetic operators follow the rules of precedence you might have learned as "PEMDAS":

* Parentheses before
* Exponentiation before
* Multiplication before
* Division before
* Addition before
* Subtraction

So in the following expression the multiplication happens first:

In [None]:
1 + 2 * 3

If that's not what you want, you can use parentheses to make the order of operations explicit:

In [None]:
(1 + 2) * 3

#### Mathematical Expressions
Variables can be used in calculations (also known as expressions) as if they were the values. For example, we could add `1` and `2` together and store the result in `z`. Or, we could set variable `x` equal to `1` and variable `y` equal to `2` and add these two variables together and store the result in `zz`. This is the one of the strengths of programming. We can capture the behaviour of adding of two values together without fixing which two values are added. This way we can use this addition functionality for any two numeric values we wish simply by setting `x` and `y` accordingly. We can go further an build algebriac formulae into a program thereby gaining all the benefits of the mathematical way of thinking.

In [None]:
z = 1 + 2  # add 1 and 2 together and store in a new variable z
print(z)

x = 1
y = 2
zz = x + y  # add x and y together and store in a new variable zz
print(zz)

### <font color='red'><u>Worksheet Exercise</u></font>
Now let's use variables to solve a problem involving mathematical calculation. Suppose we have the following formula for computing compound interest [from Wikipedia](https://en.wikipedia.org/wiki/Compound_interest#Periodic_compounding):

$V=P\left(1+{\frac {r}{n}}\right)^{nt}$

where:

* $P$ is the original principal sum
* $V$ is the total accumulated value
* $r$ is the nominal annual interest rate
* $n$ is the compounding frequency
* $t$ is the overall length of time the interest is applied (expressed using the same time units as $r$, usually years).

Suppose a principal amount of \$1,500 is deposited in a bank paying an annual interest rate of 4.3\%, compounded quarterly.
Then the balance after 6 years is found by using the formula above, with the following values:

In [None]:
P = 1500
r = 0.043
n = 4
t = 6

We can compute the total accumulated value by translating the mathematical formula into Python syntax:

In [None]:
P * (1 + r/n)**(n*t)

Suppose the same amount of \$1,500 is compounded biennially, so `n = 1/2`.  
What would the total value be after 6 years?  Hint: we expect the answer to be a bit less than the previous answer.

In [None]:
# add your exercise solution here

### <font color='red'><u>More Exercises</u></font>
You may need to use a search engine to find a formulas for the following exercises: 
1. Create a Python expression to convert a weight given in pounds to kilograms
2. Create a Python expression to convert a temperature given in fahrenheit to celsius
3. Use the quadratic formula to solve the following: $-x^2+8x-1=0$. NOTE: to do this you will need to use the math library. Add the following code to the beginning of your solution: `import math`. You can now use the `sqrt()` method, e.g. `math.sqrt(25)` provides the square root of 25.

In [None]:
# add your exercise solutions here

#### Arithmetic Assignment Operators
Assignment operators can be combined with arithmetic operators to perform an arithmetic operation on a variable value and then assign the results back into the same variable. For example, `x+=3` is equivalent to `x=x+3`. 

| Symbol | Task Performed |
|----|---|
| +=  | addition plus assignment |
| -=  | subtraction plus assignment |
| *=  | multiplication plus assignment |
| /=  | division plus assignment |
| %=  | modulus plus assignment |
| //=  | floor division plus assignment |
| **=  | to the power of plus assignment |

In [None]:
x=10
x+=3
x

### Boolean Data Type
A boolean variable can take two values only: `True` or `False`. In programming you often need to know if an expression is true or false. This will be of use in flow control operations. We will look at these in Week 3. For now, it is useful to understand the relational operators that are available for comparing two values:

| Symbol | Task Performed |
|----|---|
| == | True, if it is equal |
| !=  | True, if not equal to |
| < | less than |
| > | greater than |
| <=  | less than or equal to |
| >=  | greater than or equal to |

In [None]:
z=1

In [None]:
z==1 # equality

In [None]:
z!=2 # inequality

In [None]:
z>1 # greater than

In [None]:
z<=1 # less than or equal to

### Textual Data Type
Strings are a sequence of characters which can be stored either as a constant or a variable. Strings are must be enclosed in quotation marks (either `'...'` or `"..."`) for the data to recognized as a string. This is called a string literal. We will briefly return to strings in Week 4 when looking at the List datatype - there are some similarities between strings and Lists  in the way that Python can be used to access individual elements.

In [None]:
# Declaring a string variable using "
string = "This is a python string"

print(string)

In [None]:
# Declaring a string variable using '
another_string = 'This is another python string'

print(another_string)

In fact, in Python you can define a multiline string literal using three quote marks at either end of the string, i.e. `'''...'''` or `"""..."""`. This will preserve new lines and any spaces in the text.

In [None]:
# Declaring a string that spans multiple lines using '''
span_string= """Lorem ipsum dolor sit amet,
    consectetur adipiscing elit,
sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua."""

print(span_string)

#### Finding the length of a string
The `len()` function returns the length of the string.

In [None]:
string = "programming"
string_len = len(string)
print(string_len)

#### String Concatenation
Concatenation is the process of appending one string to the end of another string. You concatenate strings by using the `+` operator. 

In [None]:
firstName = "Dave"
surname = "Wyatt"
fullName = firstName + " " + surname # concatenate two variables with a space separator
print(fullName)

However, you must be careful when using the `+` to ensure both operands are strings. The following example results in a TypeError because we are trying to add a string and a number together. It is possible to fix this error by converting the number to a string using the `str()` function, i.e. `message = "Hello" + str(2)`. Try this fix. There is more information about converting between data types at the end of this Notebook.

In [None]:
message = "Hello" + 2 # attempt to concatenate a string and integer
print(message)

#### Escape sequences in Python using strings
In Python strings, the backslash `\` is a special character, also called the "escape" character. It is used in representing certain whitespace characters such as `\t` (tab) and `\n` (new line). Finally, `\` can be used to escape itself: `\\` is the literal backslash character. Here is list of the common escape characters:

| Escape Code | Result |
|---|---|
| \\' | single quote |
| \\" | double quote |
| \\n  | new line |
| \\r | carriage return |
| \\t | tab |
| \\b  | backspace |
| \\f  | form feed |
| \\\\  | backslash itself |

In [None]:
# THis is an escape sequence.
string = "This is a \"Week 2 of the Module\""
print(string)

#### Indexing a single character in a string
The characters (individual letters, numbers, and so on) in a string are ordered. For example, the string `AB` is not the same as `BA`. Because of this ordering, we can treat the string as a list of characters. Each position in the string (1st, 2nd, etc) is given a number. This number is called an index or sometimes a subscript. Indices are numbered from 0. Use the position’s index in `[]` square brackets to get the character at that position.

In [None]:
atomic_element="helium"
print(atomic_element[0]) # print the 1st character in the string
print(atomic_element[1]) # print the 2nd character in the string

#### Selecting a substring
A part of a string is called a substring. A substring can be as short as a single character. A slice is a part of a string. We take a slice by using `[start:stop]`, where `start` is replaced with the index of the first character we want and `stop` is replaced with the index of the character just after the last character we want. Therefore, the difference between `stop` and `start` is the slice’s length. Taking a slice does not change the contents of the original string. Instead, the slice is a copy of part of the original string.

In [None]:
atomic_element="sodium"
print(atomic_element[0:3]) # print a substring made up of the first three characters

In fact, you can omit `start` or `stop` and Python will automatically use the beginning or end of the string respectively.

In [None]:
atomic_element="hydrogen"
print(atomic_element[:5]) # omitting start - uses 0
print(atomic_element[5:]) # omitting end - uses 8 (length of string)

NOTE: the full slicing syntax is much more flexibile than this brief introduction to the idea. We will return to slicing in Week 4 when we focus on the Python data structures.

### <font color='red'><u>Worksheet Exercises</u></font>
1. Given the following string `Albert Einstein`, use the slicing syntax build the string `Alstein Einbert`
2. Given a string of odd length (such as `Universal`), use slicing to display a string made of the middle three chars of a given string, e.g. `ver`
3. Given 2 strings, `s1` and `s2`, create a new string, `s3`, by appending `s2` in the middle of `s1`

In [None]:
# add your exercise solutions here

#### Common string methods
There are dozens of methods available for the string object. It is beyond the scope of this session to show them all. So, here are a few to try out:

##### `split()`
The `split()` method splits the string into smaller substrings and returns them as a Python list. You can specify which separator to use (such as a `,` or `\t`). The default separator is any whitespace.

In [None]:
string = "Now is the time for all good men to come to the aid of the party"
substrings = string.split()
print(substrings)

##### `strip()`
The `strip()` method strips or removes the white spaces both from the starting and the ending of a string.

In [None]:
string = "     programming is easy    "
string = string.strip()
print(string)

##### `replace()`
The `replace()` method is used for changing string content.

In [None]:
Money = '$113,678'
print(Money)

In [None]:
Money = Money.replace('$', '£') # replace the $ sign for a £ sign
print(Money)

In [None]:
round(2.34)

##### `find()`
The `find()` method finds the first occurrence of the specified value or it returns -1 if the value is not found.

In [None]:
string = "programming"
print(string.find('m')) # find the first m
print(string.find('z')) # find the first z - there are none, so -1 is returned

##### `format()`
In Python, the `format()` method can be used for handling complex string formatting more efficiently. The method provides functionality for complex variable substitutions and value formatting. Programmers should place in one or more placeholder fields into a string. Placeholder fields are defined by a pair of curly braces `{}`. The value we wish to put into the placeholders and concatenate with the string are then passed as parameters into the format function.

###### Placeholders
The following example shows a single placeholder `{}` inserted into the `message` string which is then formatted with the value `Programming for Data Science`.

In [None]:
# a single placeholder example
message = "This Notebook is part of the {} module learning materials" # create a string with placeholder {}
formatted_message = message.format("Programming for Data Science") # use the format method to add a value
print(formatted_message)

The following example shows a multiple placeholders inserted into the `message` string which are then formatted with corresponding values.

In [None]:
# a multiple placeholders example
message = "This Notebook is part of the {} module learning materials for Week {} of the module" 
formatted_message = message.format("Programming for Data Science", 2)
print(formatted_message)

When using multiple placeholders, we can also include positional arguments in the formatted string. NOTE: positional arguments used in the formatted strings begin at `0` not `1`. This means the first argument in the `format()` method call replaces the placeholder `{0}` in the formatted string.

In [None]:
# a multiple placeholders example with positional arguments
message_postional_arguments = "The first argument is {0} and the second argument is {1}"
formatted_message = message_postional_arguments.format("Programming for Data Science", 2)
print(formatted_message)

Or, we can also include keyword arguments in the formatted string.

In [None]:
# a multiple placeholders example with keyword arguments
message_keyword_arguments = "The first argument is {first} and the second argument is {second}"
formatted_message = message_keyword_arguments.format(first="Programming for Data Science", second=2)
print(formatted_message)

###### Value Formatting
Additional information can be included within the placeholder `{}` brackets to provide greater control over how the value inserted will be formatted. To do this you must add a `:` after any positional or keyword argument and then include the value formatting argument, e.g. `{0:d}` states that the first argument should be displayed as a decimal value. NOTE: there is no requirement that you must use positional or keyword arguments and so the previous example could also be written `{:d}`.

One really useful technique when formatting text is to pad in the output with extra spaces. For instance, you might wish to output data in a tabular format. To do this you must add a `:` to the placeholder and include the minimum size of the output field, e.g. `{0:10}` would force the format method to allocate 10 characters for the first field even if the value was only 1 or 2 characters long. If the vlaue size is larger than the stated field width it is NOT truncated.

In [None]:
msg="a is {:10}" # set the minimum width of the output field to 10 characters
print(msg.format(2.3)) # a value smaller than 10 characters

msg="a is {:10}"
print(msg.format("123456789012")) # a value larger than 10 characters

There are many other ways to affect the formatting of a value. The table below shows a few of the options available to you:

| Value Format | Decription |
|---|---|
|`:<`|Left align the value (within the available space)|
|`:>`|Right align the value (within the available space)|
|`:^`|Center align the value (within the available space)|
|`:+`|Use to indicate if the result is positive or negative|
|`:-`|Use to indicate negative values only|
|`:d`|Decimal format|
|`:e`|Scientific format with a lower case e|
|`:f`|Fixed point format - requires additional information|

Here are some examples of these:

In [None]:
txt = "We have {:<8} chickens." # left align the text within the field of width 8
print(txt.format(49))

txt = "We have {:>8} chickens." # right align the text within the field of width 8
print(txt.format(49))

txt = "We have {:^8} chickens." # center align the text within the field of width 8
print(txt.format(49))

txt = "The temperature is between {:+} and {:+} degrees celsius." # indicate positive and negative values
print(txt.format(-3, 7))

txt = "The temperature is between {:-} and {:-} degrees celsius." # indicate negative values only
print(txt.format(-3, 7))

txt = "We have {:e} chickens." # output in scientific notation
print(txt.format(49))

txt = "The price is {:.2f} dollars." # specify the number of digits after the decimal point (in this case 2)
print(txt.format(45))

### <font color='red'><u>Worksheet Exercises</u></font>
1. Using the `format()` method and three `{}` curly brackets, print the numbers 1, 2 and 3 each separated with a comma, i.e. `1,2,3`
2. Adjust the following code snippet to output the correct answer: `str="One year has {} months, {} weeks and {} days.".format(52, 365, 12)`. NOTE: you should change the contents of the string NOT the format method.
3. Using `format()` and fixed width fields (such as `{:10}`) reproduce the following table:

| Product | Qtr 1 | Qtr 2 | Grand Total |
|---------|-------|-------|-------------|
|Chocolate|£744.60|£162.56|£907.16|
|Gummibarchen|£5,079.60|£1,249.20|£6,328.80|
|Scottish Longbreads|£1,267.50|£1,062.50|£2,330.00|
|Sir Rodney's Scones|£1,418.00|£756.00|£2,174.00|
|Tarte au sucre|£4,728.00|£4,547.92|£9,275.92|
|Chocolate Biscuits|£943.89|£349.60|£1,293.49|
|Totals|£14,181.59|£8,127.78|£22,309.37|


In [None]:
# add your exercise solutions here

### Converting between Data Types
There will be occasions when the value stored in variable is of the wrong data type. For example, assume that we have an mathematical expression (`area = (base/2)*height`) that calculates the area of a triangle but that the variables `base` and `height` are strings. This is not an unrealistic scenario. Data comes in many different forms. To perform this calculation without error, the programmer will need to convert the string representation of the value into an float before it can be used in the expression. Execute the code below and you will see an error is produced.

In [None]:
base="4.6"
height="5.3"
area = (base/2)*height
print("Area of the triangle is", area)

#### Converting to a float
To convert an integer or a string to a floating-point number use the `float()` function.

In [None]:
x = float("1.234")  # convert a string into a float
y = float(4321)  # convert an integer into a float

type(x), type(y) # the type of x and y are both float

Using the `float()` function, we can now edit and execute the example expression for calculating the area of a triangle:

In [None]:
base="4.6"
height="5.3"
area = (float(base)/2)*float(height)
print("Area of the triangle is", area)

#### Converting to a integer
To convert an float or a string to an integer use the `int()` function.

In [None]:
x = int("1234")  # convert a string into an int
y = int(1.234)  # convert an float into an int

type(x), type(y) # the type of x and y are both int

#### Converting to a string
To convert an float or an integer to a string use the `str()` function.

In [None]:
x = str(1234)  # convert an integer into an string
y = str(1.234)  # convert a float into an string

type(x), type(y) # the type of x and y are both str