## Data Types in Python

In computing, a *data type* refers to the way in which a value is stored in the computer's memory, and to the types of calculations that can be performed on it.  The following data types can be used in base Python:

* **boolean**
* **integer**
* **float**
* **string**
* **list**
* **None**
* complex
* object
* set
* dictionary

Here we will only focus on the **bolded** data types.  In this notebook, we will be using data types from base Python, as well as some data types from the numpy and pandas libraries.  Therefore we import the numpy library next.

In [1]:
import numpy as np

Let's connect these base Python data types to the *variable types* that we learned about in the [Variable Types video](https://www.coursera.org/learn/understanding-visualization-data/lecture/iDodZ/variable-types).  Recall that a "variable type" in this context refers to the type of information encoded in the variable, and informs the statistical analyses that can be performed on it.  While there is a relationship between data types in computing and variable types in statistics, they are distinct ideas, and there is no one-to-one mapping between them.

###  The quantitative (numerical) variable type

Quantitative variables usually represent an amount that can be measured in the real world.  It is possible to do arithmetic when working with quantitative variables. As a result, statistical summaries like the mean (average value) make sense.  Sometimes, it is useful to distinguish two types of quantitative variables:

* Discrete -- a variable that can only take on a limited range of values, e.g. only positive integers
* Continuous -- a variable that can in theory represent any real number, or a quantitative value measured to arbitrarily high precision

It is often (but not always) the case that discrete data are reperented by the computer with integers, and continuous data are represented by the computer with float values.  

In base Python a single "literal" number is stored as an integer or as a "float" value based on whether it is expressed with a decimal point.  We can see this in the following examples:

In [2]:
type(4)

int

In [3]:
type(4.)

float

In [4]:
type(0)

int

In [5]:
type(-3)

int

**Floats**

Floating point values are the main numeric data type used to represent quantitative data, or any numeric value that is not a whole number.

In [None]:
3/5

0.6

Recall that `**` represents exponentiation in Python.

In [None]:
6*10**(-1)

0.6000000000000001

In [None]:
type(3/5)

float

The integer division operator // divides two integers and drops the remainder.

In [None]:
print(3//5)
type(3//5)

0


int

In [None]:
type(np.pi)

float

In [None]:
type(4.0)

float

Python will promote integers to float values when the result of an arithmetic operation cannot be represented as an integer.  For example, below we calculate the mean of a sequence of integers using base Python:

In [None]:
numbers = [2, 3, 4, 5]
print(sum(numbers)/len(numbers))
type(sum(numbers)/len(numbers))

3.5


float

We can do something similar using Numpy.  Note that here we obtain a "numpy float" which is distinct from a base Python float.  For most purposes we can treat these two varieties of floats as being equivalent.

In [None]:
numbers = np.r_[2, 3, 4, 5] # creates a one-dimensional array from a Python list
print(numbers.mean())
print(type(numbers.mean()))

3.5
<class 'numpy.float64'>


In [8]:
numbers = np.r_[2, 3.2, 4, 5] # creates a one-dimensional array from a Python list
print(numbers.mean())
print(type(numbers.mean()))

3.55
<class 'numpy.float64'>


### Categorical (or qualitative) variable types

In statistics, there are two main variants of a categorical variable:

* *Nominal* variables have no ordering, e.g. what country was a person born in, or whether a person is of age 65 years or older.

* *Ordinal* variables have an ordering, e.g. how many times has a person been involved in a traffic accident, or how strongly does a person support a policy (e.g. strongly oppose, neutral, strongly support)

In base Python, Integer (int), Boolean (bool), and String (str) data types are often used to represent nominal values.

Ordinal values may be represented by numbers, but it is important to remember that these numbers are codes that do not contain any quantitative information.

**Boolean**

A Boolean variable has two possible values: "True" and "False" (in Python the capitalization of these terms is important).  A "Boolean expression" is an expression involving comparison operators (<, <=, >, >=, ==) that evaluates to a Boolean value.

In [None]:
# Boolean
type(True)

bool

In [None]:
# Print the result of two Boolean expressions
print(6 < 5)
print(5 < 6)

# Print the type of a Boolean expression's result
print(type(6 < 5))

False
True
<class 'bool'>


Boolean expressions are often used in "if blocks" to control program flow

In [9]:
if 6 < 5:
    print("Yes!")

Square brackets [...] create a literal list.  The list here contains only values of Boolean type.  See below for further discussion of "None" and "is"

In [12]:
myList = [True, 6<5, 1==3, None is None] # is refers to memory location
print(myList)
for element in myList:
    print(type(element))

[True, False, False, True]
<class 'bool'>
<class 'bool'>
<class 'bool'>
<class 'bool'>


Python converts Boolean values to integers when doing arithmetic: False is converted to 0 and True is converted to 1.

In [None]:
print(sum(myList)/len(myList))
type(sum(myList)/len(myList))

0.5


float

**Strings**

A string is a single "text" value of arbitrary length.  Technically, text in Python3 is encoded using a scheme called [unicode](https://en.wikipedia.org/wiki/Unicode).  Characters from nearly every human language can be a part of a unicode string.  

Single or double quotes are equivalent in Python and can be used to create string literals.

In [None]:
type("This sentence makes sense")

str

In [None]:
type('This sentence makes sense')

str

Note that the back-tick character cannot be used to create a string literal.

In [None]:
# This is not allowed
# x = `invalid`

Triple quotes can be used when you want a literal string to span multiple lines.  Try this with single quotes and you will see that if fails.

In [None]:
print("""This sentence makes
sense""")

This sentence makes 
sense


A Python expression in quotion marks is a string and is not evaluated as code by the Python interpreter.

In [None]:
type("np.pi")

str

It does not make sense to take the average of string values, so an error results.

In [None]:
x = np.asarray(['dog', 'koala', 'goose'])
# This is not allowed:
# x.mean()

**Nonetype**

None is a special value that is a placeholder representing "no meaningful value". It is often returned by functions that never return a value, or that cannot return a value for certain inputs.

In [None]:
type(None)

NoneType

None can be compared using "is" or "==" but conventionaly "is" is preferred

In [None]:
None is None
None == None

True

None cannot be used in arithmetic:

In [None]:
noneList = [None]*5
# This is not allowed:
# sum(noneList)/len(noneList)

**Lists**

A list can hold values (posssibly of different types) in sequence.

In [None]:
myList = [1, 1.1, "This is a string", None]
for element in myList:
    print(type(element))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'NoneType'>


As we have seen, arithmetic operations can only be used with numeric values:

In [None]:
# This is not allowed:
# sum(myList)/len(myList)

In [None]:
myList = [1, 2, 3]
for element in myList:
    print(type(element))
sum(myList)/len(myList) # note that this outputs a float

<class 'int'>
<class 'int'>
<class 'int'>


2.0

Elements of lists and vectors can be accessed by position, noting that Python always counts from zero:

In [13]:
myList = ['third', 'first', 'medium', 'small', 'large', 'medium']
myList[0]

'third'

You can invoke certain "methods" on a list, which may compute a result using the list, or change the contents of the list.

In [14]:
myList.count('medium')

2

In [None]:
myList.sort()
myList

['first', 'large', 'medium', 'small', 'third']

There are more datatypes available when using different libraries such as Pandas and Numpy, which we will introduce to you as we use them.