# Python and working with Data

## Introduction

In this section we begin to look at how python, and other programming languages, look at, interpret, and work with data.
- We will start by asking what data is
- How it is expressed in forms such as statements
- How we can interact with data using operators
- How we can supply data to the compute layer and how we can get data back
- We will look at something called data type and understand it's relevance 
- Finally we will look at some useful constructs in Python that can contain data and apply functionality to it.


## What is data

Data, as far as a computer is concerned is a series of electrical signals that are either there or not there. That doesn't mean too much to us though, we don't think in binary. Those signals will only mean anything if we tell the processor what it's supposed to mean, for example 

01000001 is a series of binary values that can mean the letter 'A' or the number 65 depending on what we tell the computer it means 

In [6]:
# Example of data in two ways 
data = '01000001'
#whats that as a number
print(f"the sequence {data} as a decimal value is {int('0b'+data, 2)}")
print(f"the sequence {data} as a character value is {chr(int('0b'+data, 2))}")

the sequence 01000001 as a decimal value is 65
the sequence 01000001 as a character value is A


## Data Types

What you have experienced is the idea of Types, these are the fundamental units of all computing, a single piece of data, of a given type, is termed a scalar. It helps for us to know what data types there are because they are useful in different contexts, for example if I have a whole number eg 10 then as a whole number, known as integer, with a value, I can add or subtract from it. 

In [8]:
number = 10
new_number = number - 1
print(f"The result of {number} - 1 is {new_number}")

The result of 10 - 1 is 9


Well, that's pretty well understood, we had a data type that is of type int (integer) and because it is in that form, and the computer knows it is in that form, I can numerically work on it. 

You can determine what a data type is in python by using a function called type() *You can tell its a function because it's got the parentheses after it*

Example : 

In [10]:
print(type(new_number))


<class 'int'>


### What happened
In the code example I passed something that was made in the previous code block called new_number as what is known as a parameter to a function called type.
It's the job of the function `type()` to look at the thing in the brackets and print out what the computer sees that type as.
Out of interest the names that were used, they were arbitrary, I could have called them bill and ben, instead of `number` and `new_number`, the results were the result of a mathematical operation on a data type int. The result was a new value that was stored in a space in memory that I called new_number.

For us to be able to see the result of the `type(new_number)` I had to pass it to another function `print()` which happily puts the thing that came out of the `type()` function onto the line outside the cell, the output was `<class 'int'>`

That meant the the data type for the binary value that is stored in a memory location that I called new_number is an int (don't worry about class for the moment). 


### Ex-1 Working with data

In [1]:
from Tester import Tester
session = Tester()

session.run_test('001')

## Changing types
Python is a language that can dynamically infer what type of data is being given to it from the nature of that data.
From the exercise you should be able to ask the interpreter what type it sees some data as, however, data in it's origicnal form may not suit our needs, for example, an integer value (int) is not a string (str), but the `print()` function needs a string to print however run the next cell see what happens: 


In [53]:
#here's a number
number = 5
# prove it 
type(number)
#print it 
print(number)

#oh

5


What happened? Well print does indeed nees a string and number is not a string type, what has happened is that when print ran, it asked number if it could represent itself as string by calling a special method in int called `__str__` and the response was a string version of number. All good. 
Except, where we may have something like the following:

In [55]:
#get a number from the user
number = input("Give me a number")
number_to_add = 5
print(f"The type of number is {type(number)} and the type of number_to_add is {type(number_to_add)} lets try and print the sum")
print(number + number_to_add)

Give me a number3
The type of number is <class 'str'> and the type of number_to_add is <class 'int'> lets try and print the sum


TypeError: can only concatenate str (not "int") to str

What happened? Well the first thing to note is that it's complaining about the Type with a TypeError and the last line is telling you what the actual problem is. However, it's telling you that it cannot concatonate string and integer, (concatonate is a string operation where two strings are jammed together) and so the interpreter is confused as to what you want. 

If you wanted to add them (yes we did) then we need to bring the data into the same domain for that to happen, in this case it's a numeric operation so that's what we will do.

The process by which this happens is known as type casting and thats where we change a scalar from one type to another let's see that last operation again with casting

In [56]:
#get a number from the user
number = input("Give me a number")
number_to_add = 5
print(f"The type of number is {type(number)} and the type of number_to_add is {type(number_to_add)} we cant add this")
number = int(number) # Here we change the type from string (thats what input gives us back) to int and assign it back to number
print(f"After the cast the type of number is {type(number)} and the type of number_to_add is {type(number_to_add)} let's add this")
print(number + number_to_add)

Give me a number4
The type of number is <class 'str'> and the type of number_to_add is <class 'int'> we cant add this
The type of number is <class 'int'> and the type of number_to_add is <class 'int'> let's add this
9


Great that worked, but how did print manage to print 9, that's a number, simple, because it is a number it has `__str__` in there that can return the string value. 

Review the casting 

In [None]:
#TODO plumb in exercises

## Statements with Python

Statements in python are lines of code that perform some action, such as assigning value to a variable, looping over a set of values, or calling a function. It is a complete instruction that the interpreter can execute in isolation. 

For Python, closing a statement is done when a newline is put at the end of it. 


### Ex-2 Working with statements

In [None]:
session.run_test('002')

## Operators

IN computer languages we use operators that operate on the units of data, the standard operators, from a mathematical context include:
- brackets  ()
- orders  <sup>2</sup> <sup>3</sup>
- Division /
- Multiplication *
- Addition +
- Subtracton -

however this is a small set of the operators that are available, additionally as we will see in a short while, some of these operators behave differently depending on the type of data they are working on. 

Lets try the ones above first and see the results

### Ex-3 Working with simple statements

In [2]:
session.run_test('003')


AttributeError: 'Tester' object has no attribute 'run_test'

## Type specific methods 
Each of the types so far looked at has it's own methods (functions) that are appropriate to it. If we wanted to know what they were we could examine the type using a function called dir()

Run the next cell to see what is available for int

In [52]:
dir(int)

['__abs__',
 '__add__',
 '__and__',
 '__bool__',
 '__ceil__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floor__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__le__',
 '__lshift__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rlshift__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__round__',
 '__rpow__',
 '__rrshift__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__trunc__',
 '__xor__',
 'as_integer_ratio',
 'bit_length',
 'conjugate',
 'denominator',
 'from_bytes',
 'imag',
 'numerator',
 'real',
 'to_bytes']

What happened? dir(int) produced an output of methods some of which had leading and lagging __ for example `__str__` , these are special methods and we will not look at those. However if we print an integer, it uses `__str__` to send back a string representation of itself (more on strings in a second)

The public methods are those that have no underscores. For example `as_integer_ratio`. To access those we need to have a data object of type int, if we put a `.` after it (dot accessor) we can access these public methods. Run the next cell

In [16]:
number = 15
help(number.as_integer_ratio)

Help on built-in function as_integer_ratio:

as_integer_ratio() method of builtins.int instance
    Return integer ratio.
    
    Return a pair of integers, whose ratio is exactly equal to the original int
    and with a positive denominator.
    
    >>> (10).as_integer_ratio()
    (10, 1)
    >>> (-10).as_integer_ratio()
    (-10, 1)
    >>> (0).as_integer_ratio()
    (0, 1)



Notice that it returns two values separated by a comma which actually are (numerator, denominator)  run the next cell

In [17]:
number = 2.5
number.as_integer_ratio()

(5, 2)

Now let's try the same with string (str) use dir() and help() to find the public methods, can you convert the string in the next cell to all lower case letters?

In [18]:
test_string = 'IM ALL UPPER BUT HUMILITY SAYS I SHOULD BE ALL LOWER'
# find out about test_string using dir()

# use help() to find documentation on any of the public methods that may look like they could do the job

# test your assertions to see if you found the correct method to lower the capitals. 

# Bonus point, can you assign the result back to test_string run the next cell to test if you got it right


In [19]:
# Only if you did the bonus point 
if test_string.islower():
    print(f"Well done \n{test_string} is all lower")
    
else:
    print(f"not quite, there are one or more caps in {test_string}")

not quite, there are one or more caps in IM ALL UPPER BUT HUMILITY SAYS I SHOULD BE ALL LOWER


### Note on strings
Strings are a avery widely used data type, this is because it's ahuman readable way of getting data in and out of the compute layer. As such, you will have seen that there is an extensive set of methods attached to the type. This is best viewed when working with inputs and outputs

## Working with Input and Output
As previously mentioned, the ability to put things into the compute layer and then get something meaningful out of the processing it does is pretty fundamental.

Broadly, people accept that text in and out is the standard way of doing this, that is correct to an extent, but what about a TV remote control, that has compute and the inputs are the buttons on it. The Output is an inra-red communications pattern to the TV, the TV in turn has an infra-red input that it decodes to translate into things like change the channel. 

For now though let's focus on that standard approach. 

All languages have some method of input and output, the default being keyboard for import and screen for output. This is articulated in python with two  functions: 
`input()`
`print()`

Although we haven't covered functions in detail, the input() function will get something off the user and hands it out to a variable as a string. Note that input() can accept a string in the brackets which it uses as a prompt

print() takes a string and prints it to the screen.

Run the next cell as an example

In [28]:
thing_the_user_types = input("You are the user, type something ")
print(f'''The data type passed to the variable thing_the_user_types is {type(thing_the_user_types)} \nand it's value which is in the memory location called thing_the_user_types is {thing_the_user_types}''')

You are the user, type something F
The data type passed to the variable thing_the_user_types is <class 'str'> 
and it's value which is in the memory location called thing_the_user_types is F


### Ex-4 Working with inputs and outputs

## Data Containers

Having looked at some of the data types it's very useful to reviewsome of the containers that are available, in Python these are extremely feature rich and highly useful.
The container types we will review are :
- Lists
- Dictionaries
- Sets


## Lists
In python a list is a collection of data types that are stored as a sequence. This is also known as an array in other languages. In python the list can contain whatever data types python supports. 
A list can be declared as empty and then items appended to it, or  it can be created with items in it. 

run the next cell


In [32]:
empty_list = []
list_with_things = [10,'phil', 33.33, 'coffee']
another_way_of_making_list = list((100,10,2,5,6))

print(empty_list)

print(list_with_things)

print(another_way_of_making_list)

[]
[10, 'phil', 33.33, 'coffee']
[100, 10, 2, 5, 6]


### useful things list can do 
As for the data types lists have their own methods that make them useful 
run the next cell

In [33]:
dir(list_with_things)

['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

From the list the public facing methods include control over the container, for example: 
- append() will add an item to the list 
- clear() clears a list 
- extend() appends one list to another
- insert() puts a new item into the list 
- pop() removes the last item but makes the item available (can be assigned to a variable)
- sort() a useful and configurable way of sorting the items in the list

A fuller list of functionality is available in the list.Cheatsheet notebook

For now the useful elements of this container are the ability to search, sort, and develop new structures:


## A note on string as a collection
Just one point on the string type, a string is a collection of characters that make up the string as such, there are SOME activities that are common, to prove this point look at the next cell, it has a string but then we can loop through the elements of the string : 

In [34]:
my_string = "This is my line of text"

for character in my_string:
    print(character)

T
h
i
s
 
i
s
 
m
y
 
l
i
n
e
 
o
f
 
t
e
x
t


As can be seen, each of the characters are separate things within the string. Note though, as we will see, this does not mean that a string can be changed as a list may.

## Dictionaries
Dictionaries in python are otherwise known as hashmaps in other languages. Their concept is pretty simple, unline the list type where we can reference elements in it by their position, in a dictionary we lookup values by a key that is assigned to it. 
For this to work, under normal condiditions, the keys in a dictionary object must be unique. 
This then means that  a dictionary is made up of key:value pairs. Lets observe it:

In [43]:
#A dictionary type 
my_dictionary = {} #an empty dictionary is declared
my_dictionary['name'] = 'Phil' # an element with the key name is added with the value Phil
print(my_dictionary['name']) # the value that is in the dictionary element with the key name is printed


Phil


In [45]:
#review of methods
dir(my_dictionary)

['__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__ior__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__ror__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

### more complex dictionary
In the example above there is a simple empty dictionary that is created. We can put whatever python supports into the value position, that can include lists, or indeed other dictionaries, in either case if you want to work with the content then you would need to be aware of the structures in it.


In [39]:
#complex dictionary:
my_complex_dictionary = {'first_names': ['joe', 'sarah', 'millie'], 'last_names': ['clark', 'bennet', 'richardson'], 'jobs': ['clerk', 'manager', 'tech lead']}

# iterating through the dictionary the first way
for key, value in my_complex_dictionary.items():
    print(key)
    for val in value:
        print(f"{val} ", end='')
    print('')


first_names
joe sarah millie 
last_names
clark bennet richardson 
jobs
clerk manager tech lead 


In [40]:
#iterating through the dictionary the second way
for key in my_complex_dictionary.keys():
    print(key)
    for val in my_complex_dictionary[key]:
        print(f"{val} ", end='')
    print('')

first_names
joe sarah millie 
last_names
clark bennet richardson 
jobs
clerk manager tech lead 


In [41]:
print(my_complex_dictionary.keys())
print(my_complex_dictionary.values())

dict_keys(['first_names', 'last_names', 'jobs'])
dict_values([['joe', 'sarah', 'millie'], ['clark', 'bennet', 'richardson'], ['clerk', 'manager', 'tech lead']])


A fuller list of functionality is available in the dictionaries.Cheatsheet notebook. 

## Sets 
Sets are another useful container type. They follow the same rules as venn diagrams, in that they are groups of unique data that express exclusivity and inclusivity over multiple sets. 

Here is an example 

![image.png](attachment:image.png)

In [42]:
front_end = {'Arthur', 'Aoife', 'Peter', 'Maisy', 'John'}
back_end = {'Callum', 'Aoife', 'Peter', 'Maisy'}
dev_ops = {'John', 'Maisy'}


Notice that the structure is similar to the python container Dictionary, however there are no key:value pairs, just individual elements in each set. 

Determining if there are commonalities or exclusivities is a  function of the set container type, review of the methods

In [46]:
dir(front_end)

['__and__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__iand__',
 '__init__',
 '__init_subclass__',
 '__ior__',
 '__isub__',
 '__iter__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__or__',
 '__rand__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__ror__',
 '__rsub__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__xor__',
 'add',
 'clear',
 'copy',
 'difference',
 'difference_update',
 'discard',
 'intersection',
 'intersection_update',
 'isdisjoint',
 'issubset',
 'issuperset',
 'pop',
 'remove',
 'symmetric_difference',
 'symmetric_difference_update',
 'union',
 'update']

In [47]:
# whos a front end and back end dev
front_end.intersection(back_end)

{'Aoife', 'Maisy', 'Peter'}

In [48]:
# who's a back end dev and dev ops
back_end.intersection(front_end)

{'Aoife', 'Maisy', 'Peter'}

In [49]:
# who's front end or back end but not devops
front_end.union(back_end).difference(dev_ops)

{'Aoife', 'Arthur', 'Callum', 'Peter'}

In [50]:
# who is only a single domain dev 
front_end.symmetric_difference(back_end)

{'Arthur', 'Callum', 'John'}

In [51]:
# Power hack 
# Given I have a list with duplicated data, how do I make a list of unique data
start_list = ["Berlin", "London", "Frankfurt", "Paris", "Berlin", "Amsterdam", "Dusseldorf", "London"]
unique_list = list(set(start_list))

print(start_list)
print(unique_list)

['Berlin', 'London', 'Frankfurt', 'Paris', 'Berlin', 'Amsterdam', 'Dusseldorf', 'London']
['Paris', 'Frankfurt', 'Amsterdam', 'Berlin', 'London', 'Dusseldorf']
