# Built-In Types: Simple Values

When discussing Python variables and objects, we mentioned the fact that all Python objects have type information attached. Here we'll briefly walk through the built-in simple types offered by Python.
We say "simple types" to contrast with several compound types, which will be discussed in the following section.

Python's simple types are summarized in the following table:

<br />
<center>**Python Scalar Types**</center>

| Type        | Example        | Description                                                  |
|-------------|----------------|--------------------------------------------------------------|
| ``int``     | ``x = 1``      | integers (i.e., whole numbers)                               |
| ``float``   | ``x = 1.0``    | floating-point numbers (i.e., real numbers)                  |
| ``complex`` | ``x = 1 + 2j`` | Complex numbers (i.e., numbers with real and imaginary part) |
| ``bool``    | ``x = True``   | Boolean: True/False values                                   |
| ``str``     | ``x = 'abc'``  | String: characters or text                                   |
| ``NoneType``| ``x = None``   | Special object indicating nulls                              |

We'll take a quick look at each of these in turn.

## Integers
The most basic numerical type is the integer.
Any number without a decimal point is an integer:

In [None]:
#
x = 1
type(x)

Python integers are actually quite a bit more sophisticated than integers in languages like ``C``.
C integers are fixed-precision, and usually 
<b><a href="https://en.wikipedia.org/wiki/Integer_overflow" target="_blank">overflow</a></b>
at some value (often near $2^{31}$ or $2^{63}$, depending on your system).
Python integers are **variable-precision**, so you can do computations that would overflow in other languages:

In [None]:
2 ** 200

Another convenient feature of Python integers is that by default, division up-casts to floating-point type:

In [None]:
5 / 2

Note that this upcasting is a feature of Python 3; in Python 2, like in many statically-typed languages such as C, integer division truncates any decimal and always returns an integer:
``` python
# Python 2 behavior
>>> 5 / 2
2
```
To recover this behavior in Python 3, you can use the floor-division "double slash" operator:

In [None]:
5 // 2

Finally, note that although Python *2.x* had both an ``int`` and ``long`` type, Python 3 combines the behavior of these two into a single ``int`` type.

## Floating-Point Numbers
The floating-point type can store fractional numbers.
They can be defined either in standard decimal notation, or in exponential notation:

In [None]:
x = 0.000005
y = 5e-6
print(x == y)

In [None]:
x = 1400000.00
y = 1.4e6
print(x == y)

In the exponential notation, the ``e`` or ``E`` can be read "...times ten to the...",
so that ``1.4e6`` is interpreted as $~1.4 \times 10^6$.

An integer can be explicitly converted to a float with the ``float`` constructor:

In [None]:
float(1)
int(3.5)
str(987)

### Aside: Floating-point precision
One thing to be aware of with floating point arithmetic is that its precision is limited, which can **cause equality tests to be <span style="color:red;">unstable</span>**. For example:

In [None]:
(0.1 + 0.2) == 0.3

Why is this the case? It turns out that it is not a behavior unique to Python, but is due to the fixed-precision format of the binary floating-point storage used by most, if not all, scientific computing platforms.

All programming languages using floating-point numbers store them in a fixed number of bits, and this leads some numbers to be represented only approximately.
We can see this by printing the three values to high precision:

In [None]:
print("0.1 = {0:.17f}".format(0.1))
print("0.2 = {0:.17f}".format(0.2))
print("0.3 = {0:.17f}".format(0.3))

By printing so many decimal places, we see that these floats are actually **truncated**.

In the familiar base-10 representation of numbers, you are probably familiar with numbers that can't be expressed in a finite number of digits.
For example, dividing $1$ by $3$ gives, in standard decimal notation:
$$
1 / 3 = 0.333333333\cdots
$$
The 3s go on forever: that is, to truly represent this quotient, the number of required digits is infinite!

Similarly, there are numbers for which binary representations require an infinite number of digits.
For example:
$$
1 / 10 = 0.00011001100110011\cdots
$$
Just as decimal notation requires an infinite number of digits to perfectly represent $1/3$, binary notation requires an infinite number of digits to represent $1/10$.
Python internally truncates these representations at 52 bits beyond the first nonzero bit on most systems.

This rounding error for floating-point values is **a necessary evil** of working with floating-point numbers.
The best way to deal with it is to always keep in mind that floating-point arithmetic is approximate, and *never* rely on exact equality tests with floating-point values.

## Complex Numbers
Complex numbers are numbers with real and imaginary (floating-point) parts.
We've seen integers and real numbers before; we can use these to construct a complex number:

In [None]:
complex(1, 2)

Alternatively, we can use the "``j``" suffix in expressions to indicate the imaginary part:

In [None]:
a = 1 + 2j
a.

Complex numbers have a variety of interesting attributes and methods, but they will be seldom used in this course.

## Strings
Strings in Python are created with single or double quotes:

In [None]:
message = "what do you like?"
response = 'spam'

Python has many extremely useful string functions and methods; here are a few of them:

In [None]:
# length of string
len(response)

In [None]:
# Make upper-case. See also str.lower()
print (response.upper())
response

In [None]:
# Capitalize. See also str.title()
message.capitalize()

In [None]:
# concatenation with +
message + response

In [None]:
# multiplication is multiple concatenation!!!
5 * response

In [None]:
# Access individual characters (zero-based indexing)
print(message[0])
print(message[-1]) # get last character without knowing how long the str is

For more discussion of indexing in Python, see List.

## None Type
Python includes a special type, the ``NoneType``, which has only a single possible value: ``None``. For example:

In [None]:
type(None)

You'll see ``None`` used in many places, but perhaps most commonly it is used as the **default return value of a function**.
For example, the ``print()`` function in Python 3 does not return anything, but we can still catch its value:

In [None]:
return_value = print('abc')

In [None]:
print(return_value)

Likewise, any function in Python with no return value is, in reality, returning ``None``.

## Boolean Type
The Boolean type is a simple type with two possible values: ``True`` and ``False``, and is returned by comparison operators discussed previously:

In [None]:
result = (4 < 5)
print(result)

In [None]:
type(result)

Keep in mind that the Boolean values are case-sensitive: unlike some other languages, ``True`` and ``False`` must be capitalized!

In [None]:
print(True, False)

Booleans can also be constructed using the ``bool()`` object constructor: values of any other type can be converted to Boolean via predictable rules.
For example, any numeric type is False if equal to zero, and True otherwise:

In [None]:
bool(2014)

In [None]:
bool(0)

In [None]:
bool(3.1415)

The Boolean conversion of ``None`` is always False:

In [None]:
bool(None)

For strings, ``bool(s)`` is False for empty strings and True otherwise:

In [None]:
bool("")

In [None]:
bool("abc")

For sequences, which we'll see in the next section, the Boolean representation is False for empty sequences and True for any other sequences

In [None]:
bool([1, 2, 3])

In [None]:
bool([])

This holds regardless of what is inside the sequence:

In [None]:
list1 = [False]
list2 = [0]
list3 = [None]

print(bool(list1))
print(bool(list2))
print(bool(list3))

print(bool(list3[0]))

# Built-In Data Structures

We have seen Python's simple types: ``int``, ``float``, ``complex``, ``bool``, ``str``, and so on.
Python also has several built-in **compound types**, which group together variables of other types in different ways.
These compound types are:

| Type Name | Example                   |Description                            |
|-----------|---------------------------|---------------------------------------|
| ``list``  | ``[1, 2, 3]``             | Ordered collection                    |
| ``tuple`` | ``(1, 2, 3)``             | Immutable ordered collection          |
| ``dict``  | ``{'a':1, 'b':2, 'c':3}`` | Unordered (key,value) mapping         |
| ``set``   | ``{1, 2, 3}``             | Unordered collection of unique values |

As you can see, round, square, and curly **brackets** have distinct meanings when it comes to the type of collection produced.
We'll take a quick tour of these data structures here.

## Lists
Lists are the basic *ordered* and *mutable* data collection type in Python.
They can be defined with comma-separated values between square brackets; for example, here is a list of the first several prime numbers:

In [None]:
L = [2, 3, 5, 7]

Lists have a number of useful properties and methods available to them.
Here we'll take a quick look at some of the more common and useful ones:

In [None]:
# Length of a list
len(L)

In [None]:
# Append a value to the end
L.append(11)
L

In [None]:
# Addition concatenates lists
L + [13, 17, 19]

In [None]:
# sort() method sorts in-place
L = [2, 5, 1, 6, 3, 4]
L.sort()
L

In addition, there are many more built-in list methods; they are well-covered in Python's [online documentation](https://docs.python.org/3/tutorial/datastructures.html).

While we've been demonstrating lists containing values of a single type, Python's compound objects can contain objects of *any* type, or even a mix of types. For example:

In [None]:
L = [1, 'two', 3.14, [0, 3, 5]]
[x**2 for x in range(10)]
L

This flexibility is a consequence of Python's dynamic type system.
Creating such a mixed sequence in a statically-typed language like C can be much more of a headache!

We see that lists can even contain other lists as elements.

Such type flexibility is an essential piece of what makes Python code relatively quick and easy to write.

So far we've been considering manipulations of lists as a whole; another essential piece is the accessing of individual elements.
This is done in Python via *indexing* and *slicing*, which we'll explore next.

### List indexing and slicing
Python provides access to elements in compound types through *indexing* for single elements, and *slicing* for multiple elements.
As we'll see, both are indicated by a square-bracket syntax.
Suppose we return to our list of the first several primes:

In [None]:
L = [2, 3, 5, 7, 11]

Python uses *zero-based* indexing, so we can access the first and second element in using the following syntax:

In [None]:
L[0]

In [None]:
L[1]

Elements at the end of the list can be accessed with negative numbers, starting from -1:

In [None]:
L[-1]

In [None]:
L[-2]

You can visualize this indexing scheme this way:

![List Indexing Figure](fig/list-indexing.png)

Here values in the list are represented by large numbers in the squares; list indices are represented by small numbers above and below.
In this case, ``L[2]`` returns ``5``, because that is the next value at index ``2``.

Where *indexing* is a means of fetching a single value from the list, **slicing** is a means of accessing multiple values in sub-lists.
It uses a colon to indicate the start point (inclusive) and end point (non-inclusive) of the sub-array.
For example, to get the first three elements of the list, we can write:

In [None]:
L[0:3]

Notice where ``0`` and ``3`` lie in the preceding diagram, and how the slice takes just the values between the indices.
If we leave out the first index, ``0`` is assumed, so we can equivalently write:

In [None]:
L[:3]

Similarly, if we leave out the last index, it defaults to the length of the list.
Thus, the last three elements can be accessed as follows:

In [None]:
L[-3:]

Finally, it is possible to specify a third integer that represents the step size; for example, to select every second element of the list, we can write:

In [None]:
L[::2]  # equivalent to L[0:len(L):2]

A particularly useful version of this is to specify a **negative step**, which will reverse the array:

In [None]:
L[::-1]

Both indexing and slicing can be used to set elements as well as access them.
The syntax is as you would expect:

In [None]:
L[0] = 100
print(L)

In [None]:
#L[1:2] = [55, 56]
L[1] = [55, 56]
print(L)

A very similar slicing syntax is also used in many data science-oriented Python packages, including NumPy and Pandas.


## Tuples
Tuples are in many ways similar to lists, but they are defined with parentheses rather than square brackets:

In [None]:
t = (1, 2, 3)

They can actually also be defined without any brackets at all:

In [None]:
t = 1, 2, 3
print(t)

Like the lists discussed before, tuples have a length, and individual elements can be extracted using square-bracket indexing:

In [None]:
len(t)

In [None]:
t[0]

The main distinguishing feature of tuples is that they are **immutable**: this means that once they are created, their size and contents cannot be changed:

In [None]:
t[1] = 4

In [None]:
t.append(4)

Tuples are often used in a Python program; a particularly common case is in functions that have multiple return values.
For example, the ``as_integer_ratio()`` method of floating-point objects returns a numerator and a denominator; this dual return value comes in the form of a tuple:

In [None]:
x = 0.125
x.as_integer_ratio()

These multiple return values can be individually assigned as follows:

In [None]:
numerator, denominator = x.as_integer_ratio()
print(numerator / denominator)

The indexing and slicing logic covered earlier for lists works for tuples as well, along with a host of other methods.
Refer to the online [Python documentation](https://docs.python.org/3/tutorial/datastructures.html) for a more complete list of these.

## Dictionaries
Dictionaries, sometimes called hashes or associative arrays in other languages, are extremely flexible mappings of keys to values. Dictionaries are one of the **most powerful** and effective aspects of Python! Mastering Python includes mastering dictionaries.

They can be created via a comma-separated list of ``key:value`` pairs within curly braces:

In [None]:
numbers = {'one':1, 'two':2, 'three':3}

Items are accessed and set via the indexing syntax used for lists and tuples, except here the index is not a zero-based integer value but a valid key in the dictionary:

In [None]:
# Access a value via the key
numbers['two']

New items can be added to the dictionary using indexing as well:

In [None]:
# Set a new key:value pair
numbers['ninety'] = 90
print(numbers)

Keep in mind that <span style='color:red;'>dictionaries do not maintain any sense of order</span> for the input parameters; this is by design.
This lack of ordering allows dictionaries to be implemented very efficiently, so that random element access is very fast, regardless of the size of the dictionary (if you're curious how this works, read about the concept of a *hash table*).

Despite being unordered, **looping** over a dictionary is very possible and pragmatic, as long as you realize you can only expect the keys and values to be paired properly, but not ordered!

In [None]:
for key in numbers:
    print(key, "-->", numbers[key])

print("---")

for key, value in numbers.items(): # .items is a handy method
    print(key, "maps to", value)

The [Python documentation](https://docs.python.org/3/library/stdtypes.html) has a complete list of the methods available for dictionaries.

## Sets

The fourth basic collection is the set, which contains unordered collections of unique items.
They are defined much like lists and tuples, except they use the curly brackets of dictionaries:

In [None]:
primes = {2, 3, 5, 7}
odds = {1, 3, 5, 7, 9}

If you're familiar with the mathematics of sets, you'll be familiar with operations like the union, intersection, difference, symmetric difference, and others.
Python's sets have all of these operations built-in, via methods or operators.
For each, we'll show the two equivalent methods:

In [None]:
# union: items appearing in either
primes | odds      # with an operator
primes.union(odds) # equivalently with a method

In [None]:
# intersection: items appearing in both
primes & odds             # with an operator
primes.intersection(odds) # equivalently with a method

In [None]:
# difference: items in primes but not in odds
primes - odds           # with an operator
primes.difference(odds) # equivalently with a method

In [None]:
# symmetric difference: items appearing in only one set
primes ^ odds                     # with an operator
primes.symmetric_difference(odds) # equivalently with a method

Many more set methods and operations are available.
Refer to Python's [online documentation](https://docs.python.org/3/library/stdtypes.html) for a complete reference.

# String Manipulation

One place where the Python language really shines is in the manipulation of strings.
This section will cover some of Python's built-in string methods and formatting operations. Formatting and manipulating strings is one of the most common tasks a data scientist performs.

Strings in Python can be defined using either single or double quotations (they are functionally equivalent):

In [None]:
x = 'a string'
y = "a string"
x == y
#x is y

In addition, it is possible to define multi-line strings using a triple-quote syntax:

In [None]:
multiline = """
one
\t\ttwo
three
"""
print(multiline)
#multiline

Since quotes are used to mark the beginning and ending of strings, if you wish to use those characters inside a string you need to "escape" them with a backslash (`\`). This tells Python not to interpret these quotes as the end of the string.

In [None]:
print("Jon said', \"Hello World\".")
print('Andrea replied, "That\'s nice!"')

## Simple String Manipulation in Python

For basic manipulation of strings, Python's built-in string methods can be extremely convenient.

We introduced Python's string type and a few of these methods earlier; here we'll dive a bit deeper

### Formatting strings: Adjusting case

Python makes it quite easy to adjust the case of a string.
Here we'll look at the ``upper()``, ``lower()``, ``capitalize()``, ``title()``, and ``swapcase()`` methods, using the following messy string as an example:

In [None]:
fox = "tHe qUICk bROWn fOx."

To convert the entire string into upper-case or lower-case, you can use the ``upper()`` or ``lower()`` methods respectively:

In [None]:
fox.upper()

In [None]:
fox.lower()

A common formatting need is to capitalize just the first letter of each word, or perhaps the first letter of each sentence.
This can be done with the ``title()`` and ``capitalize()`` methods:

In [None]:
fox.title()

In [None]:
fox.capitalize()

The cases can be swapped using the ``swapcase()`` method:

In [None]:
fox.swapcase()

### Formatting strings: Adding and removing spaces

Another common need is to remove spaces (or other characters) from the beginning or end of the string.
The basic method of removing characters is the ``strip()`` method, which strips whitespace from the beginning and end of the line:

In [None]:
line = '         this is the content         '
line.strip()

To remove just space to the right or left, use ``rstrip()`` or ``lstrip()`` respectively:

In [None]:
line.rstrip()

In [None]:
line.lstrip()

To remove characters other than spaces, you can pass the desired character to the ``strip()`` method:

In [None]:
num = "000000000000435000"
num.strip('0')

The opposite of this operation, adding spaces or other characters, can be accomplished using the ``center()``, ``ljust()``, and ``rjust()`` methods.

For example, we can use the ``center()`` method to center a given string within a given number of spaces:

In [None]:
line = "this is the content"
line.center(30)

Similarly, ``ljust()`` and ``rjust()`` will left-justify or right-justify the string within spaces of a given length:

In [None]:
line.ljust(30)

In [None]:
line.rjust(30)

All these methods additionally accept any character which will be used to fill the space.
For example:

In [None]:
'435'.rjust(10, '\n')

Because zero-filling is such a common need, Python also provides ``zfill()``, which is a special method to right-pad a string with zeros:

In [None]:
'435'.zfill(10)

### Finding and replacing substrings

If you want to find occurrences of a certain character in a string, the ``find()``/``rfind()``, ``index()``/``rindex()``, and ``replace()`` methods are the best built-in methods.

``find()`` and ``index()`` are very similar, in that they search for the first occurrence of a character or substring within a string, and return the index of the substring:

In [None]:
line = 'the quick brown fox jumped over a lazy dog'
line.find('fox')

In [None]:
line.index('fox')

How many unique alphabet in line?

The only difference between ``find()`` and ``index()`` is their behavior when the search string is not found; ``find()`` returns ``-1``, while ``index()`` raises a ``ValueError``:

In [None]:
line.find('bear')

In [None]:
line.index('bear')

The related ``rfind()`` and ``rindex()`` work similarly, except they search for the first occurrence from the end rather than the beginning of the string:

In [None]:
line.rfind('a')

For the special case of checking for a substring at the beginning or end of a string, Python provides the ``startswith()`` and ``endswith()`` methods:

In [None]:
line.endswith('dog')

In [None]:
line.startswith('fox')

To go one step further and replace a given substring with a new string, you can use the ``replace()`` method.
Here, let's replace ``'brown'`` with ``'red'``:

In [None]:
line.replace('brown', 'red')

The ``replace()`` function returns a new string, and will replace all occurrences of the input:

In [None]:
line.replace(' ', '-')

### Splitting and partitioning strings

If you would like to find a substring *and then* split the string based on its location, the ``partition()`` and/or ``split()`` methods are what you're looking for.
Both will return a sequence of substrings.

The ``partition()`` method returns a tuple with three elements: the substring before the first instance of the split-point, the split-point itself, and the substring after:

In [None]:
line.partition('fox')

The ``rpartition()`` method is similar, but searches from the right of the string.

The ``split()`` method is perhaps more useful; it finds *all* instances of the split-point and returns the substrings in between.
The default is to split on any whitespace, returning a list of the individual words in a string:

In [None]:
line.split()

A related method is ``splitlines()``, which splits on newline characters.
Let's do this with a Haiku, popularly attributed to the 17th-century poet Matsuo Bashō (松尾 芭蕉):

In [None]:
haiku = """matsushima-ya
aah matsushima-ya
matsushima-ya"""

haiku.splitlines()

### Joining strings

Note that if you would like to undo a ``split()``, you can use the ``join()`` method, which returns a string built from a splitpoint and an iterable:

In [None]:
'--'.join(['1', '2', '3'])

A common pattern is to use the special character ``"\n"`` (newline) to join together lines that have been previously split, and recover the input:

In [None]:
print("\n".join(['matsushima-ya', 'aah matsushima-ya', 'matsushima-ya']))

## Format Strings

In the preceding methods, we have learned how to extract values from strings, and to manipulate strings themselves into desired formats.
Another use of string methods is to manipulate string *representations* of values of other types.
Of course, string representations can always be found using the ``str()`` function; for example:

In [None]:
pi = 3.14159
str(pi)

In [None]:
"The value of pi is " + str(pi)

However, a more flexible way to do this is to use *format strings*, which are strings with special markers (noted by curly braces) into which string-formatted values will be inserted.
Here is a basic example:

In [None]:
"The value of pi is {}".format(pi)

Inside the ``{}`` marker you can also include information on exactly *what* you would like to appear there.
If you include a number, it will refer to the index of the argument to insert:

In [None]:
"First letter: {1}. Last letter: {0}.".format('A', 'Z')

If you include a string, it will refer to the key of any keyword argument:

In [None]:
"""First letter: {first}. Last letter: {last}.""".format(last='Z', first='A')

Finally, for numerical inputs, you can include format codes which control how the value is converted to a string.
For example, to print a number as a floating point with three digits after the decimal point, you can use the following:

In [None]:
"pi = {0:.3f}".format(pi)

As before, here the "``0``" refers to the index of the value to be inserted.
The "``:``" marks that format codes will follow.
The "``.3f``" encodes the desired precision: three digits beyond the decimal point, floating-point format.

This style of format specification is very flexible, and the examples here barely scratch the surface of the formatting options available.
For more information on the syntax of these format strings, see the [Format Specification](https://docs.python.org/3/library/string.html#formatspec) section of Python's online documentation.

In [None]:
a_number = 1 / 3
percentage = "{:.20%}".format(a_number)
print(percentage)

## Special characters

Above we saw a special character, the newline, represented in Python strings as `\n`. While displayed in source code with two characters, `\` and `n`, when paired they are interpreted as a single character representing what would be emitted when pressing the enter or return key.

In [None]:
print(len("\n"))

Another special character often encountered is the tab (`\t`).