
![Cloud-First](../image/CloudFirst.png) 


# SIT742: Modern Data Science


**(Module: Python Foundations for Big Data)**

---
- Materials in this module include resources collected from various open-source online repositories.
- You are free to use, change and distribute this package.
- If you found any issue/bug for this document, please submit an issue at [tulip-lab/sit742](https://github.com/tulip-lab/sit742/issues)


Prepared by **SIT742 Teaching Team**

---

## Session 2C:  Basic data types

1. [String](#cell_string)

2. [Number](#cell_number)

3. [Data conversion and comparison](#cell_conversion)

4. [Input and output](#cell_input)


<a id = "cell_close"></a>



In this part,  you will get better understanding with Python's basic data type. We will 
look at **string** and **number** data type in this section. Also covered are:

- Data conversion
- Data  comparison
- Receive  input from users and display results effectively 

You will be guided through  completing a simple program which receives input from a user,
 process the information, and display results with specific format. 

## 1 String

A string is a *sequence of characters*. We are using strings in almost every Python
programs. As we can seen in the **”Hello, World!”** example, strings can be specified
using single quotes **'**. The **print()** function can be used to display a string.

In [None]:
print('Hello, World!')

We can also use a variable to store the string value, and use the variable in the
**print()** function.

In [None]:
# Assign a string to a variable 
text = 'Hello, World!'
print(text)

A *variable* is basically a name that represents (or refers to) some value. We use **=**
to assign a value to a variable before we use it. Variable names are given by a programmer
in a way that the program is easy to understanding. Variable names are *case sensitive*.
It can consist of letters, digits and underscores. However, it can not begin with a digit.
For example, **plan9** and **plan_9** are valid names, where **9plan** is not.

In [None]:
text = 'Hello, World!'

In [None]:
# with print() function, content is displayed without quotation mark
print(text)

With variables, we can also display its value without **print()** function. Note that
you can not display a variable without **print()** function in Python script(i.e. in a **.py** file). This method only works under interactive mode (i.e. in the notebook).  

In [None]:
# without print() function, quotation mark is displayed together with content
text 

Back to representation of string, there will be issues if you need to include a quotation
mark in the text.
We provide a example use a apostrophe mark(’) similar with single quotation mark(').
You will find that it will show "SyntaxError: invalid character in identifier". Just try to change the apostrophe mark with single quotation mark and run it again.

In [None]:
text = ’What’ s your name ’

<details><summary><u><b><font color="Blue">Click here for solution</u></b></summary>
```python
    text = ' What\'s your name?'
    print(text)
```
</details>

Since strings in double quotes **"** work exactly the same way as string in single quotes.
By mixing the two types, it is easy to include quotation mark itself in the text.

In [None]:
text = "What' s your name?"
print(text)

Alternatively, you can use:

In [None]:
text = '"What is the problem?", he asked.'
print(text)

You can specify multi-line strings using triple quotes  (**"""** or **'''**). In this way, single
quotes and double quotes can be used freely in the text.
Here is one example:

In [None]:
multiline = '''This is a test for multiline. This is the first line. 
This is the second line. 
I asked, "What's your name?"'''
print(multiline)

Notice the difference when the variable is displayed without **print()** function in this case.

In [None]:
multiline = '''This is a test for multiline. This is the first line. 
This is the second line. 
I asked, "What's your name?"'''
multiline

Another way of include the special characters, such as single quotes is with help of
escape sequences **\\**. For example, you can specify the single quote using **\\' ** as follows.

In [None]:
string = 'What\'s your name?'
print(string)

There are many more other escape sequences (See Section 2.4.1 in [Python3.0 official document](https://docs.python.org/3.1/reference/lexical_analysis.html)). But I am going to mention the most useful two examples here. 

First, use escape sequences to indicate the backslash itself e.g. **\\\\**

In [None]:
path = 'c:\\windows\\temp'
print(path)

Second,  used escape sequences to specify a two-line string. Apart from using a triple-quoted
string as shown previously, you can use **\n** to indicate the start of a new line.

In [None]:
multiline = 'This is a test for multiline. This is the first line.\nThis is the second line.'
print(multiline)

To manipulate strings, the following two operators are most useful: 
* ** + ** is use to concatenate two strings or string variables; 
* ** * ** is used for concatenating several copies of the same string.

In [None]:
print('Hello, ' + 'World' * 3)

Below is another example of string concatenation based on  variables that store strings.

In [None]:
name = 'World'
greeting = 'Hello'
print(greeting + ', ' + name + '!')

Using variables, change part of the string text is very easy. 

In [None]:
name

In [None]:
greeting

In [None]:
# Change part of the text is easy
greeting = 'Good morning' 
print(greeting + ', ' + name + '!')

 ## 2 Number

There are two types of numbers that are used most frequently: integers and floats. As we
expect, the standard mathematical operation can be applied to these two types. Please
try the following expressions. Note that **\*\*** is exponent operator, which indicates
exponentiation exponential(power) calculation.

In [None]:
2 + 3

In [None]:
3 * 5

In [None]:
#3 to the power of 4
3 ** 4 

Among the number operations, we need to look at division closely. In Python 3.0, classic division is performed using  **/**. 

In [None]:
15 / 5

In [None]:
14 / 5

*//* is used to perform floor division. It truncates the fraction and rounds it to the next smallest whole number toward the left on the number line.

In [None]:
14 // 5

In [None]:
# Negatives move left on number line. The result is -3 instead of  -2
-14 // 5 

Modulus operator **%** can be used to obtain remainder. Pay attention when negative number is involved.

In [None]:
14 % 5

In [None]:
# Hint:  −14 // 5 equal to −3
#        (-3) * 5 +  ? = -14

-14 % 5 

*Operator precedence* is a rule that affects how an expression is evaluated. As we learned in high school, the multiplication is done first than the addition. e.g. **2 + 3 * 4**. This means multiplication operator has higher precedence than the addition operator.

For your reference, a precedence table from the python reference manual is used to indicate the evaluation order in Python.  For a complete precedence table, check the heading "Python Operators Precedence" in this [Python tutorial](http://www.tutorialspoint.com/python/python_basic_operators.htm)


However, When things get confused, it is far better to use parentheses **()** to explicitly
specify the precedence. This makes the program more readable.

Here are some examples on operator precedence:

In [None]:
2 + 3 * 4

In [None]:
(2 + 3) * 4

In [None]:
2 + 3 ** 2

In [None]:
(2 + 3) ** 2

In [None]:
-(4+3)+2

Similarly as string, variables can be used to store a number so that it is easy to manipulate them.

In [None]:
x = 3
y = 2
x + 2

In [None]:
sum = x + y
sum

In [None]:
x * y

One common expression is to run a math operation on a variable and then assign the result of the operation back to the variable. Therefore, there is a shortcut for such a expression. 

In [None]:
x = 2
x = x * 3
x

This is equivalent to:

In [None]:
x = 2
# Note there is no space between '*' and '+'
x *= 3
x

## 3 Data conversion and comparison

So far, we have seen three types of data: integer, float, and string. With various data type, Python can define the operations possible on them and the storage method for each of them. In the later pracs, we will further introduce more data types, such as tuple, list and dictionary. 

To obtain the data type of a variable or a value, we can use built-in function **type()**;
whereas functions, such as **str()**, **int()**, **float()**, are used to convert data one  type to another. Check the following examples on the usage of these functions:

In [None]:
type('Hello, world!)')

In [None]:
input_Value = '45.6'
type(input_Value)

In [None]:
weight = float(input_Value)
weight
type(weight)

Note the system will report error message when the conversion function is not compatible with the data.

In [None]:
input_Value = 'David'
weight = float(input_Value)

Comparison between two values can help make decision in a program. The result of the comparison is either **True** or **False**. They are the two values of *Boolean* type.

In [None]:
5 > 10

In [None]:
type(5 > 10)

In [None]:
# Double equal sign is also used for comparison
10.0 == 10

Check the following examples on comparison of two strings.

In [None]:
'cat' < 'dog'

In [None]:
# All uppercases are smaller than low cases in terms of ASCII code. It will compare each character from the beginning to the end between two words based on their ACSII code value.
'cat' < 'Dog'

In [None]:
'apple' < 'apricot'

There are three logical operators, *not*, *and* and *or*, which can  be applied to the boolean values. 

In [None]:
# Both condition #1 and condition #2 are True?
3 < 4  and 7 < 8

In [None]:
# Either condition 1 or condition 2 are True?
3 < 4  or 7 > 8

In [None]:
# Both conditional #1 and conditional #2 are False?
not ((3 > 4) or (7 > 8))

## 4. Input and output

All programming languages provide features to interact with user. Python provide *input()* function to get input. It waits for the user to type some input and press return. We can add some information for the user by putting a message inside the function's brackets. It must be a string or a string variable. The text that was typed can be saved in a variable. Here is one example:

In [None]:
nInput  = input('Enter your number here:\n')

However, be aware that the input received from the user are treated as a string, even
though a user entered a number. The following **print()** function invokes an error message. 

In [None]:
print(nInput + 3)

The input need to be converted to an integer before the match operation can be performed as follows because the string data cannot add the integer data directly. They are totally different two types of data.

In [None]:
print(int(nInput) + 3)

After user's input are accepted, the messages need to be displayed to the user accordingly. String concatenation is one way to display messages which incorporate variable values. 

In [None]:
name = 'David'
print('Hello, ' + name)

Another way of achieving this is using **print()** function with *string formatting*. We need to use the *string formatting operator*, the percent(**%**) sign. 

In [None]:
name = 'David'
print('Hello, %s' % name)

Here is another example with two variables:

In [None]:
name = 'David'
age = 23
print('%s is %d years old.' % (name, age))

Notice that the two variables, **name**, **age**, that specify the values are included at the end of the statement, and enclosed with a bracket. 

With the quotation mark,  **%s** and **%d** are used to specify formatting for string and integer respectively. 
The following table shows a selected set of symbols which can be used along with %. 

<table width="304" border="1">
  <tr>
    <th width="112" scope="col">Format symbol</th>
    <th width="176" scope="col">Conversion</th>
  </tr>
  <tr>
    <td>%s</td>
    <td>String</td>
  </tr>
  <tr>
    <td>%d</td>
    <td>Signed decimal integer</td>
  </tr>
  <tr>
    <td>%f</td>
    <td>Floating point real number</td>
  </tr>
</table>

There are extra characters that are used together with above symbols:

<table width="400" border="1">
  <tr>
    <th width="100" scope="col">Symbol</th>
    <th width="3000" scope="col">Functionality</th>
  </tr>
  <tr>
    <td>-</td>
    <td>Left justification</td>
  </tr>
  <tr>
    <td>+</td>
    <td>Display the sign</td>
  </tr>
  <tr>
    <td>m.n</td>
    <td>m is the minimum total width; n is the number of digits to display after the decimal point</td>
  </tr>
</table>

Here are  more examples that use above specifiers:

In [None]:
# With %f, the format is right justification by default. 
# As a result, white spaces are added to the left of the number
# 10.4 means minimal width 10 with 4 decimal points
print('Output a float number: %10.4f' % (3.5))

In [None]:
# plus sign after % means to show positive sign
# Zero after plus sign means using leading zero to fill width of 5
print('Output an integer: %+05d' % (23))