### Type Casting

Oftwe we want to turn one data type into another while preserving (as much as possible) the data itself. We can't always do this, so some cast operations preserve data, some do not.

For example, we can cast an `int` to a `float` by using the `float()` function in Python:

In [1]:
a = 10
print(a, type(a))

10 <class 'int'>


In [2]:
b = float(a)
print(b, type(b))

10.0 <class 'float'>


We can cast an `int` to a `float` by using the `int()` function:

In [3]:
a = 10.5
print(a, type(a))

10.5 <class 'float'>


In [4]:
b = int(a)
print(b, type(b))

10 <class 'int'>


Obviously if we cast a float to an int then we have possible data loss, as was the case here.

When we cast a float to an int using the `int()` function, Python uses **truncation** - it basically drops anything after the decimal point and only retains the numbers before the decimal point (the integer portion of the float).

This was a very simple type cast (or type conversion).

We can also type cast collection objects.

For example we have the following functions in Python:
- `list()`
- `tuple()`
- `str()`
- `set()`

These functions can handle a variety or arguments of various type.

You should refer to the Python docs here for more info: https://docs.python.org/3/library/functions.html

Or you can invoke help in an interactive Python shell this way:

In [5]:
help(list)

Help on class list in module builtins:

class list(object)
 |  list() -> new empty list
 |  list(iterable) -> new list initialized from iterable's items
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __delitem__(self, key, /)
 |      Delete self[key].
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getattr(self, name).
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(self, value, /)
 |      Return self>value.
 |  
 |  __iadd__(self, value, /)
 |      Implement self+=value.
 |  
 |  __imul__(self, value, /)
 |      Implement self*=value.
 |  
 |  __init__(self, /, *args, **kwargs)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __iter__(self, /)
 |      Implement iter(self).
 |  
 |  __l

As you can see the `list()` function is actually just the initializer for the `list` class (we'll talk about classes and initializers later). For now, just think of it as a function that takes certain arguments and returns a list object based on that argument.

The `list()` function can take no argument, returning an empty list object:

In [6]:
list()

[]

Or it can take any **iterable**. We'll discuss iterables in general later, for now any all the collection types we have seen (list, tuple, string, set, dictionary) are iterables. Numbers are **not** iterables.

So, aside from creating lists using literals, we can also create them using the `list()` function. Since the list argument can be another collection type such as a tuple, set, etc, we are essentially **casting** that iterable to a list.

Here are a few examples:

In [7]:
a = (10, 20, 30)
print(a, type(a))

b = list(a)
print(b, type(b))

(10, 20, 30) <class 'tuple'>
[10, 20, 30] <class 'list'>


In [8]:
a = 'python'
print(a, type(a))

b = list(a)
print(b, type(b))

python <class 'str'>
['p', 'y', 't', 'h', 'o', 'n'] <class 'list'>


Dictionaries are iterable, but by default the **keys** are used:

In [9]:
a = {'a': 1, 'b': 2, 'c': 3}

b = list(a)
print(b, type(b))

['a', 'b', 'c'] <class 'list'>


The `str()` function is very flexible in terms of acceptable arguments, but the results may not be what you always expect.

In [10]:
help(str)

Help on class str in module builtins:

class str(object)
 |  str(object='') -> str
 |  str(bytes_or_buffer[, encoding[, errors]]) -> str
 |  
 |  Create a new string object from the given object. If encoding or
 |  errors is specified, then the object must expose a data buffer
 |  that will be decoded using the given encoding and error handler.
 |  Otherwise, returns the result of object.__str__() (if defined)
 |  or repr(object).
 |  encoding defaults to sys.getdefaultencoding().
 |  errors defaults to 'strict'.
 |  
 |  Methods defined here:
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __contains__(self, key, /)
 |      Return key in self.
 |  
 |  __eq__(self, value, /)
 |      Return self==value.
 |  
 |  __format__(...)
 |      S.__format__(format_spec) -> str
 |      
 |      Return a formatted version of S as described by format_spec.
 |  
 |  __ge__(self, value, /)
 |      Return self>=value.
 |  
 |  __getattribute__(self, name, /)
 |      Return getatt

It's argument is an arbitrary object, so you can actually pass any object to it, including numbers, collections, and custom objects.

But the output will be a string *representation* of the object - whatever the oiibject defines its representation should look like.

For numbers it's quite straightforward:

In [11]:
str(-10), str(3.14), str(1+1j)

('-10', '3.14', '(1+1j)')

As you can see, each output is a string containing the characters that best represent the number we passed in as an argument.

But what about collection objects?

Well, we've actually been using this every time we used thr `print()` function!

The representation for a list `[1, 2, 3]` is precisely a string that looks like that!

In [12]:
a = [1, 2, 3]
print(a, type(a))

b = str(a)
print(b, type(b))

[1, 2, 3] <class 'list'>
[1, 2, 3] <class 'str'>


Be careful, once you have converted the list to a string, everything you see in the representation is a character in the string, including the brackets and the commas, and spaces.

For example:

In [13]:
a = [1, 2, 3]
print(a, len(a), type(a))

b = str(a)
print(b, len(b), type(b))

c = list(b)
print(c, len(c), type(c))

[1, 2, 3] 3 <class 'list'>
[1, 2, 3] 9 <class 'str'>
['[', '1', ',', ' ', '2', ',', ' ', '3', ']'] 9 <class 'list'>


Casting to sets can be very useful to determine the unique elements in an iterable.

For example, to determine the unique characters in a string we would do it this way:

In [14]:
a = "I'm a lumberjack and I'm OK, I sleep all night and I work all day."
print('total characters:', len(a))

s = set(a)
print('unique characters:', s)
print('unique count:', len(s))

total characters: 66
unique characters: {'r', 'e', 'm', 't', 'K', 'a', 'k', 'n', '.', 'O', "'", 'p', 'c', 'o', ' ', 'l', 'u', 'I', 'i', 'h', 'b', 'y', 's', 'g', 'j', ',', 'd', 'w'}
unique count: 28


Yes, this includes non-letters, as well as spaces, punctuation marks, etc.

If we do not want to differentiate between upper and lower case characters, we could use the `lower()` method of strings:

In [15]:
a_lower = a.lower()
print(a)
print(a_lower)

I'm a lumberjack and I'm OK, I sleep all night and I work all day.
i'm a lumberjack and i'm ok, i sleep all night and i work all day.


Next we can get our set of unique characters:

In [16]:
unique_chars = set(a_lower)
print(unique_chars)

{'r', 'e', 'm', 't', 'a', 'k', 'n', '.', "'", 'p', 'c', 'o', ' ', 'l', 'u', 'i', 'h', 'b', 'y', 's', 'g', 'j', ',', 'd', 'w'}


And finally, we can use set intersection to remove any non-letter character:

In [17]:
valid_chars = {'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 
               'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 
               'y', 'z'}

And take the intersection:

In [18]:
unique_chars = unique_chars & valid_chars
print(unique_chars)

{'r', 'e', 'm', 't', 'a', 'k', 'n', 'p', 'c', 'o', 'l', 'u', 'i', 'h', 'b', 'y', 's', 'g', 'j', 'd', 'w'}


So this approach works, but writing out the set of all valid characters was rather tedious!

Remember that strings are iterables, and the `set()` function can therefore handle strings:

In [19]:
valid_chars = set('abcdefghijklmnopqrstuvwxyz')
print(valid_chars)

{'r', 'e', 'm', 't', 'a', 'x', 'f', 'z', 'k', 'n', 'p', 'c', 'v', 'o', 'l', 'u', 'i', 'h', 'b', 'y', 'g', 's', 'q', 'j', 'd', 'w'}


This was much easier. Even better would be to use the `string` module. Modules in Python are extra bits of functionality that are not direcrtly available. Think of it as libraries.

To use them, we have to **import** them. We can either import the entire library, or just specific bits from the library.

In [20]:
import string

Now we have the module imported, we can use **dot notation** to access functionality inside that module.

In particular, the `string` module has a property called `ascii_lowercase` that is simply a string containing all the lower case letters of the latin alphabet:

In [21]:
string.ascii_lowercase

'abcdefghijklmnopqrstuvwxyz'

So now we could make our earlier code simpler this way:

In [22]:
valid_chars = set(string.ascii_lowercase)
print(valid_chars)

{'r', 'e', 'm', 't', 'a', 'x', 'f', 'z', 'k', 'n', 'p', 'c', 'v', 'o', 'l', 'u', 'i', 'h', 'b', 'y', 'g', 's', 'q', 'j', 'd', 'w'}


We can also just import specific functionality from a module. For example, we could import just the `ascii_lowercase` property from the `string` module:

In [23]:
from string import ascii_lowercase

We can now use `ascii_lowercase` directly in our code, without that dot notation:

In [24]:
print(set(ascii_lowercase))

{'r', 'e', 'm', 't', 'a', 'x', 'f', 'z', 'k', 'n', 'p', 'c', 'v', 'o', 'l', 'u', 'i', 'h', 'b', 'y', 'g', 's', 'q', 'j', 'd', 'w'}
