# Data Types

<div align="center"> <img src="https://raw.githubusercontent.com/eitanlees/ISC-3313/master/images/plant_diagram.gif" width="700"/></div>

When discussing Python variables and objects, we mentioned the fact that all Python objects have type information attached. 

Here we'll briefly walk through the built-in simple types offered by Python.

We say "simple types" to contrast with several compound types, which will be discussed in the following section.

**Python Scalar Types**

- ``int``     : integers (i.e., whole numbers) such as ``x = 1``                            
- ``float``   : floating-point numbers (i.e., real numbers) such as ``x = 1.0`` 
- ``complex`` : Complex numbers (i.e., numbers with real and imaginary part) such as``x = 1 + 2j``
- ``str``     : String: characters or text such as ``x = 'abc'``
- ``NoneType``: Special object indicating nulls such as  ``x = None``
- ``bool``    : Boolean: True/False values such as ``x = True``

We'll take a quick look at each of these in turn.

## Integers
The most basic numerical type is the integer.
Any number without a decimal point is an integer:

In [1]:
x = 1
type(x)

int

Python integers are actually quite a bit more sophisticated than integers in languages like ``C``.

C integers are fixed-precision, and usually overflow at some value (often near $2^{31}$ or $2^{63}$, depending on your system).

Python integers are variable-precision, so you can do computations that would overflow in other languages:

In [2]:
x = 2 ** 200
print(x+1)

1606938044258990275541962092341162602522202993782792835301377


Another convenient feature of Python integers is that by default, division up-casts to floating-point type:

In [3]:
type(5 / 2)

float

Note that this upcasting is a feature of Python 3; in Python 2, like in many statically-typed languages such as C, integer division truncates any decimal and always returns an integer:
``` python
# Python 2 behavior
>>> 5 / 2
2
```
To recover this behavior in Python 3, you can use the floor-division operator:

In [4]:
5 // 2

2

## Floating-Point Numbers
The floating-point type can store fractional numbers.
They can be defined either in standard decimal notation, or in exponential notation:

In [5]:
x = 0.000005
y = 5e-6
print(x == y)

True


In [6]:
x = 1400000.00
y = 1.4e6
print(x == y)

True


In the exponential notation, the ``e`` or ``E`` can be read "...times ten to the...",
so that ``1.4e6`` is interpreted as $~1.4 \times 10^6$.

An integer can be explicitly converted to a float with the ``float`` constructor:

In [7]:
float(1)

1.0

### Aside: Floating-point precision
One thing to be aware of with floating point arithmetic is that its precision is limited, which can cause equality tests to be unstable. 

For example:

In [8]:
0.1 + 0.2 == 0.3

False

It turns out that it is not a behavior unique to Python, but is due to the fixed-precision format of the binary floating-point storage used by most, if not all, scientific computing platforms.

All programming languages using floating-point numbers store them in a fixed number of bits, and this leads some numbers to be represented only approximately.

We can see this by printing the three values to high precision:

In [9]:
print("0.1 = {0:.17f}".format(0.1))
print("0.2 = {0:.17f}".format(0.2))
print("0.3 = {0:.17f}".format(0.3))

0.1 = 0.10000000000000001
0.2 = 0.20000000000000001
0.3 = 0.29999999999999999


We're accustomed to thinking of numbers in decimal (base-10) notation, so that each fraction must be expressed as a sum of powers of 10:
$$
1 /8 = 1\cdot 10^{-1} + 2\cdot 10^{-2} + 5\cdot 10^{-3}
$$

In the familiar base-10 representation, we represent this in the familiar decimal expression: $0.125$.

Computers usually store values in binary notation, so that each number is expressed as a sum of powers of 2:
$$
1/8 = 0\cdot 2^{-1} + 0\cdot 2^{-2} + 1\cdot 2^{-3}
$$
In a base-2 representation, we can write this $0.001_2$, where the subscript 2 indicates binary notation.

The value $0.125 = 0.001_2$ happens to be one number which both binary and decimal notation can represent in a finite number of digits.

In the familiar base-10 representation of numbers, you are probably familiar with numbers that can't be expressed in a finite number of digits.

For example, dividing $1$ by $3$ gives, in standard decimal notation:
$$
1 / 3 = 0.333333333\cdots
$$

The 3s go on forever: that is, to truly represent this quotient, the number of required digits is infinite!

Similarly, there are numbers for which binary representations require an infinite number of digits.
For example:
$$
1 / 10 = 0.00011001100110011\cdots_2
$$

Just as decimal notation requires an infinite number of digits to perfectly represent $1/3$, binary notation requires an infinite number of digits to represent $1/10$.

Python internally truncates these representations at 52 bits beyond the first nonzero bit on most systems.

This rounding error for floating-point values is a necessary evil of working with floating-point numbers.

The best way to deal with it is to always keep in mind that floating-point arithmetic is approximate.

**Never** rely on exact equality tests with floating-point values.

## Complex Numbers
Complex numbers are numbers with real and imaginary (floating-point) parts.
We've seen integers and real numbers before; we can use these to construct a complex number:

In [10]:
complex(1, 2)

(1+2j)

Alternatively, we can use the "``j``" suffix in expressions to indicate the imaginary part:

In [11]:
1 + 2j

(1+2j)

Complex numbers have a variety of interesting attributes and methods, which we'll briefly demonstrate here:

In [12]:
c = 3 + 4j

In [13]:
c.real  # real part

3.0

In [14]:
c.imag  # imaginary part

4.0

In [15]:
c.conjugate()  # complex conjugate

(3-4j)

In [16]:
abs(c)  # magnitude, i.e. sqrt(c.real ** 2 + c.imag ** 2)

5.0

## String Type
Strings in Python are created with single or double quotes:

In [17]:
message = "what do you like?"
response = 'spam'

Python has many extremely useful string functions and methods

Here are a few of them:

In [18]:
# length of string
len(response)  

4

In [19]:
# Make upper-case. See also str.lower()
response.upper()

'SPAM'

In [20]:
# Capitalize. See also str.title()
message.capitalize()

'What do you like?'

## Exercise

Write a boolean expression that is True if two strings are identical in a case insensitive manner. 

For example if the two strings are "Hello" and "HeLLO", your code should evaluate to `True`

In [21]:
string_1 = "Hello"
string_2 = "HeLLO"

# Your code here #
#----------------#
test = False
#----------------#

print(f'The strings {string_1} and {string_2} are similar: {test}')

The strings Hello and HeLLO are similar: False


## Exercise

Use the `.count()` method to count the number of times the the letter `o` and the word `and` are in the following quote

In [22]:
famous_quote = "Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal."

# Change the code here #
#----------------------#
number_of_o = 0
number_of_and = 0
#----------------------#


print(f'The letter o appeared {number_of_o} times')
print(f'The word and appeared {number_of_and} times')

The letter o appeared 0 times
The word and appeared 0 times


More string operations

In [23]:
# concatenation with +
message + response

'what do you like?spam'

In [24]:
# multiplication is multiple concatenation
5 * response

'spamspamspamspamspam'

In [25]:
# Access individual characters (zero-based indexing)
message[0]

'w'

We will discuss indexing soon

## Aside: f-strings

Often after a calculation you want to print the results. 

Python 3.6 introduces a special type of string called an _f-string_ (short for "formatted string") which allows for variable place holders to be places right inside of the string. 

Consider an example where we want to print a persons name and age as such 
    
    My name is John and I am 20 years old

In [26]:
name = "John"
age = 20

greeting = 'My name is ' + name + ' and I am ' + str(age) + ' years old'

print(greeting)

My name is John and I am 20 years old


In [27]:
name = 'John'
age = 20

greeting = f'My name is {name} and I am {age} years old'

print(greeting)

My name is John and I am 20 years old


## More Formatting

In addition to including variables directly in strings we can also pass formatting information directly in the curly brackets. 

The syntax is 

```python
f'This is a formatted number {<variable>:.<precision><type>}'
```

In [28]:
x = 1618.03398875

f'My number is {x:.3f}'

'My number is 1618.034'

## Common Format Types

- `b` Binary format. Outputs the number in base 2.
- `f` Fixed point. Displays the number as a fixed-point number.
- `e` Exponent notation. Prints the number in scientific notation using the letter ‘e’ to indicate the exponent.
- `%` Percentage. Multiplies the number by 100 and displays in fixed ('f') format, followed by a percent sign.

## Exercise

- Print pi to to the 5th decimal place
- Print 1/42 in exponential notation
- Print 13 in binary format
- Print 2/5 as a percentage with no decimal place

In [29]:
from math import pi

# Your Code Here #
#----------------#
print(f'')
print(f'')
print(f'')
print(f'')
#----------------#







## None Type
Python includes a special type, the ``NoneType``, which has only a single possible value: ``None``. 

For example:

In [30]:
type(None)

NoneType

You'll see ``None`` used in many places, but perhaps most commonly it is used as the default return value of a function.

For example, the ``print()`` function in Python 3 does not return anything, but we can still catch its value:

In [31]:
return_value = print('abc')

abc


In [32]:
print(return_value)

None


Likewise, any function in Python with no return value is, in reality, returning ``None``.

## Boolean Type
The Boolean type is a simple type with two possible values: ``True`` and ``False``, and is returned by comparison operators discussed previously:

In [33]:
result = (4 < 5)
result

True

In [34]:
type(result)

bool

Keep in mind that the Boolean values are case-sensitive: unlike some other languages, ``True`` and ``False`` must be capitalized!

In [35]:
print(True, False)

True False


Booleans can also be constructed using the ``bool()`` object constructor: values of any other type can be converted to Boolean via predictable rules.

For example, any numeric type is False if equal to zero, and True otherwise:

In [36]:
bool(2014)

True

In [37]:
bool(0)

False

In [38]:
bool(3.1415)

True

The Boolean conversion of ``None`` is always False:

In [39]:
bool(None)

False

For strings, ``bool(s)`` is False for empty strings and True otherwise:

In [40]:
bool("")

False

In [41]:
bool("abc")

True

For sequences, which we'll see in the next section, the Boolean representation is False for empty sequences and True for any other sequences

In [42]:
bool([1, 2, 3])

True

In [43]:
bool([])

False

## Review 

**Python Scalar Types**
- ``int``     : integers (i.e., whole numbers) such as ``x = 1``                            
- ``float``   : floating-point numbers (i.e., real numbers) such as ``x = 1.0`` 
- ``complex`` : Complex numbers (i.e., numbers with real and imaginary part) such as``x = 1 + 2j``
- ``str``     : String: characters or text such as ``x = 'abc'``
- ``NoneType``: Special object indicating nulls such as  ``x = None``
- ``bool``    : Boolean: True/False values such as ``x = True``

# Built-In Data Structures

Python also has several built-in compound types, which act as containers for other types.

- ``list`` for example ``[1, 2, 3]`` which represents an 'Ordered collection'                    
- ``tuple`` for example ``(1, 2, 3)``            which represents an 'Immutable ordered collection'         
- ``dict``  for example ``{'a':1, 'b':2, 'c':3}`` which represents an 'Unordered (key,value) mapping'     


## Lists
Lists are the basic *ordered* and *mutable* data collection type in Python.

They can be defined with comma-separated values between square brackets; 

for example, here is a list of the first several prime numbers:

In [44]:
L = [2, 3, 5, 7]

Lists have a number of useful properties and methods available to them.
Here we'll take a quick look at some of the more common and useful ones:

In [45]:
# Length of a list
len(L) 

4

In [46]:
 # Append a value to the end
L.append(11)
L

[2, 3, 5, 7, 11]

In [47]:
# Addition concatenates lists
L + [13, 17, 19]

[2, 3, 5, 7, 11, 13, 17, 19]

In [48]:
# sort() method sorts in-place
L = [2, 5, 1, 6, 3, 4]
L.sort()
L

[1, 2, 3, 4, 5, 6]

In addition, there are many more built-in list methods; they are well-covered in Python's [online documentation](https://docs.python.org/3/tutorial/datastructures.html).

While we've been demonstrating lists containing values of a single type, one of the powerful features of Python's compound objects is that they can contain objects of *any* type, or even a mix of types. For example:

In [49]:
L = [1, 'two', 3.14, [0, 3, 5]]

This flexibility is a consequence of Python's dynamic type system.

Creating such a mixed sequence in a statically-typed language like C can be much more of a headache!

We see that lists can even contain other lists as elements.

So far we've been considering manipulations of lists as a whole; another essential piece is the accessing of individual elements.

This is done in Python via *indexing* and *slicing*, which we'll explore next.

### List indexing and slicing
Python provides access to elements in compound types through *indexing* for single elements, and *slicing* for multiple elements.

As we'll see, both are indicated by a square-bracket syntax.
Suppose we return to our list of the first several primes:

In [50]:
L = [2, 3, 5, 7, 11]

Python uses *zero-based* indexing, so we can access the first and second element in using the following syntax:

In [51]:
L[0]

2

In [52]:
L[1]

3

Elements at the end of the list can be accessed with negative numbers, starting from -1:

In [53]:
L[-1]

11

In [54]:
L[-2]

7

You can visualize this indexing scheme this way:

<div align="center"><img src="https://raw.githubusercontent.com/eitanlees/ISC-3313/master/images/list-indexing.png" width="900"/></div>

In this case, ``L[2]`` returns ``5``, because that is the next value at index ``2``.

Where *indexing* is a means of fetching a single value from the list, *slicing* is a means of accessing multiple values in sub-lists.

It uses a colon to indicate the start point (inclusive) and end point (non-inclusive) of the sub-array.

For example, to get the first three elements of the list, we can write:

In [55]:
L[0:3]

[2, 3, 5]

Notice where ``0`` and ``3`` lie in the preceding diagram, and how the slice takes just the values between the indices.
If we leave out the first index, ``0`` is assumed, so we can equivalently write:

In [56]:
L[:3]

[2, 3, 5]

Similarly, if we leave out the last index, it defaults to the length of the list.
Thus, the last three elements can be accessed as follows:

In [57]:
L[-3:]

[5, 7, 11]

Finally, it is possible to specify a third integer that represents the step size; for example, to select every second element of the list, we can write:

In [58]:
L[::2]  # equivalent to L[0:len(L):2]

[2, 5, 11]

A particularly useful version of this is to specify a negative step, which will reverse the array:

In [59]:
L[::-1]

[11, 7, 5, 3, 2]

Both indexing and slicing can be used to set elements as well as access them.

The syntax is as you would expect:

In [60]:
L[0] = 100
print(L)

[100, 3, 5, 7, 11]


In [61]:
L[1:3] = [55, 56]
print(L)

[100, 55, 56, 7, 11]


A very similar slicing syntax is also used in many data science-oriented packages, including NumPy and Pandas (which we will cover later).

## Exercise

Suppose you have a list of the digits 1 through 9.

How could you slice the list as to:
- include [4, 5, 6]?
- include [5, 4, 3]?
- include [3, 6, 9]?

In [62]:
numbers = [1,2,3,4,5,6,7,8,9]

# Your Code Here #
#----------------#
example_1 = numbers[:]
example_2 = numbers[::]
example_3 = numbers[::]
#----------------#

print('Example 1: ', example_1)
print('Example 2: ', example_2)
print('Example 3: ', example_3)

Example 1:  [1, 2, 3, 4, 5, 6, 7, 8, 9]
Example 2:  [1, 2, 3, 4, 5, 6, 7, 8, 9]
Example 3:  [1, 2, 3, 4, 5, 6, 7, 8, 9]


## Tuples
Tuples are in many ways similar to lists, but they are defined with parentheses rather than square brackets:

In [63]:
t = (1, 2, 3)

They can also be defined without any brackets at all:

In [64]:
t = 1, 2, 3
print(t)

(1, 2, 3)


Like the lists discussed before, tuples have a length, and individual elements can be extracted using square-bracket indexing:

In [65]:
len(t)

3

In [66]:
t[0]

1

The main distinguishing feature of tuples is that they are *immutable*: this means that once they are created, their size and contents cannot be changed:

In [67]:
# Uncomment for error
# t[1] = 100

In [68]:
# Uncomment for erros
# t.append(4)

Tuples are often used in a Python program; a particularly common case is in functions that have multiple return values.

For example, the ``as_integer_ratio()`` method of floating-point objects returns a numerator and a denominator; this dual return value comes in the form of a tuple:

In [69]:
x = 0.125
x.as_integer_ratio()

(1, 8)

These multiple return values can be individually assigned as follows:

In [70]:
numerator, denominator = x.as_integer_ratio()
print(numerator / denominator)

0.125


The indexing and slicing logic covered earlier for lists works for tuples as well, along with a host of other methods.
Refer to the online [Python documentation](https://docs.python.org/3/tutorial/datastructures.html) for a more complete list of these.

## Dictionaries
Dictionaries are extremely flexible mappings of keys to values, and form the basis of much of Python's internal implementation.

They can be created via a comma-separated list of ``key:value`` pairs within curly braces:

In [71]:
numbers = {'one':1, 'two':2, 'three':3}
type(numbers)

dict

Items are accessed and set via the indexing syntax used for lists and tuples, except here the index is not a zero-based order but valid key in the dictionary:

In [72]:
# Access a value via the key
numbers['two']

2

New items can be added to the dictionary using indexing as well:

In [73]:
# Set a new key:value pair
numbers['ninety'] = 90
print(numbers)

{'one': 1, 'two': 2, 'three': 3, 'ninety': 90}


Keep in mind that dictionaries do not maintain any sense of order for the input parameters; this is by design.

This lack of ordering allows dictionaries to be implemented very efficiently, so that random element access is very fast, regardless of the size of the dictionary (if you're curious how this works, read about the concept of a *hash table*).

The [python documentation](https://docs.python.org/3/library/stdtypes.html) has a complete list of the methods available for dictionaries.

## Exercise

Create a dictionary with the following keys:
- "Florida"
- "Georgia"
- "Texas"
- "New York"

Assign to each key a tuple of at least two cities in the corresponding state.

In [74]:
state_cities = dict()
## Your Code Here ##
#------------------#

#------------------#
print(state_cities)

{}


## More Specialized Data Structures

Python contains several other data structures that you might find useful; these can generally be found in the built-in ``collections`` module.

The collections module is fully-documented in [Python's online documentation](https://docs.python.org/3/library/collections.html), and you can read more about the various objects available there.

## Review

- Lists
    - Most importantly "Slicing"
- Tuples
- Dictionaries