# Data Types in Python
---

In computing, a _data type_ refers to the way in which a value is stored in the computer's memory and the type of calculations that can be performed on it. The following data types can be used in Python:

* boolean
* integer
* float
* string
* list
* None
* complex
* object
* set
* dictionary

This notebook will focus on the bolded data types, making use of the ones from base Python as well as some data types from **numpy** and **pandas** libraries.

In [None]:
import numpy as np

A **variable type** refers to the type of information that is encoded in the variable, and informs the statistical analyses that can be performed on it.

## The Quantitative variable type
---

Quantitative variables usually represent an amount that can be measured in the real world. It is possible to do arithmetic when working with quantitative variables. As a result, statistical summaries like the **mean** make sense. It is useful to distinguish two types of quantitative variables:

* Discrete - a variable that can only take on a limited range of values

* Continuous - a variable that can, in theory, represent any real number

In Python, a single number can be stored as an integer or as a float, depending on how the number is expressed.

In [None]:
print(type(4))
print(type(4.))
print(type(0))
print(type(-3))

<class 'int'>
<class 'float'>
<class 'int'>
<class 'int'>


### Floats

Floating point values are the main numeric data type used to represent quantitative data, or any numeric value that is not a whole number.

In [None]:
print(3/5)
print(2 * 10 ** (-1))
print(type(3/5))

0.6
0.2
<class 'float'>


The integer division operator // divides two integers and drops the remainder, returning an integer value.

In [None]:
print(8//5)
print(type(8//5))

print(type(np.pi))
print(type(5.1))

1
<class 'int'>
<class 'float'>
<class 'float'>


Python will promote integers to float values when the result of an arithmetic operation cannot be represented as an integer.

For example, calculating the mean of a sequence of integers yields

In [None]:
numbers = [10, 2, -5, 7]
print("Mean: {}".format(sum(numbers) / len(numbers)))
print(type(sum(numbers) / len(numbers)))

Mean: 3.5
<class 'float'>


Something similar can be done using Numpy. Note that the type obtained here is 'numpy float', which is distinct from 'float'. However, for most purposes, these two varieties of floats are treated the same.

In [None]:
numbers = np.array([10, 2, -5, 7])
print("Mean: {}".format(numbers.mean()))
print(type(numbers.mean()))

Mean: 3.5
<class 'numpy.float64'>


## Categorical Variable types
---

In statistics, there are two main variants of a categorical variable:

* Nominal variables that have no ordering
* Ordinal variables that have an ordering

In Python, integer (int), boolean (bool), and string (str) data types are often used to represent nominal values.

Ordinal values maybe represented by numbers, but it is important to remember that these numbers are codes that do not provide any quantitative information.

### Boolean

A boolean variable has two possible values: **True** and **False**. A Boolean expression is an expression involving comparison operators (<, <=, >, >=, ==) that evaluates to a Boolean value.


In [None]:
# Boolean
print(type(True))

# Print the result of two Boolean expressions
print(6 < 5)
print(5 < 6)

<class 'bool'>
False
True


Boolean expressions are often used in 'if blocks' to control program flow.

In [None]:
if 6 < 5:
  print("6 is less than 5.")
else:
  print("6 is not less than 5.")

6 is not less than 5.


Square brackets [...] creates a list. The list here contains only values of Boolean type.

In [None]:
myList = [True, 6<5, 1==3, None is None]
print(myList)

for element in myList:
  print(type(element))

[True, False, False, True]
<class 'bool'>
<class 'bool'>
<class 'bool'>
<class 'bool'>


Python converts Boolean values to integers when doing arithmetic: False is converted to 0 and True is converted to 1.

In [None]:
print(sum(myList)/len(myList))
print(type(sum(myList) / len(myList)))

0.5
<class 'float'>


### Strings

A string is a single 'text' of arbitary length. Technically, text in Python3 is encoded using a scheme called unicode.

Single or double quotes are equivalent in Python and can be used to create string literals.

In [None]:
print(type('This sentence makes sense.'))
print(type("This sentence makes sense."))

<class 'str'>
<class 'str'>


Triple quotes can be used when you want a literal string to span multiple lines.

In [None]:
print("""This sentence makes
sense. But, does it actually?
I guess so!""")

This sentence makes
sense. But, does it actually?
I guess so!


A Python expression in quotation marks is a string and is not evaluated as code by the Python interpreter.

In [None]:
print(type('np.pi'))

<class 'str'>


### Nonetype

None is a special value that is a placeholder representing 'no meaningful value'. It is often returned by functions that never return a value, or cannot return a value for certain inputs.

In [None]:
print(type(None))

<class 'NoneType'>


None can be compared using 'is' or '=='.

In [None]:
print(None is None)
print(None == None)

True
True


None cannot be used in arithmetic.

In [None]:
# executing the command below yields an error
# since Nonetype cannot be used for arithmetic

# noneList = [None + None]


### Lists

A list can hold values of different types in sequence.

In [None]:
myList = [1, 1.1, "This is a string", None]

for element in myList:
  print(type(element))

<class 'int'>
<class 'float'>
<class 'str'>
<class 'NoneType'>


In [None]:
myList = [10, 20, 25]

for element in myList:
  print(type(element))

print("Mean: {}".format(sum(myList) / len(myList)))

<class 'int'>
<class 'int'>
<class 'int'>
Mean: 18.333333333333332


Elements of lists and vectors can be accessed by position, noting that Python always counts from zero:

In [None]:
myList = ['third', 'first', 'medium', 'small', 'large']
myList[0]

'third'

Methods can be invoked on lists, which may compute a result using the list, or change the contents of the list.

In [None]:
myList = [-15, 100, 0, 1, 25, 2, 7, 10, 95, 11, 0]

print("Frequency of '0' in list: {}".format(myList.count(0)))
print("List before sorting: {}".format(myList))
myList.sort()
print("List after sorting: {}".format(myList))

Frequency of '0' in list: 2
List before sorting: [-15, 100, 0, 1, 25, 2, 7, 10, 95, 11, 0]
List after sorting: [-15, 0, 0, 1, 2, 7, 10, 11, 25, 95, 100]
