# Introduction to Python 3: The Basic Data Types
## Luca de Alfaro
Copyright Luca de Alfaro, 2018-21.  CC-BY-NC License.



Prepared on: Fri Jul 30 17:08:01 2021

This is a book chapter; it is not a homework assignment.  
Do not submit it as a solution to a homework assignment; you would receive no credit.


## Integers, floats, strings, booleans, and ... None!

In python there are numbers, which can be integer or float. 

In [1]:
x = 1 # int
y = 1. # float


You can sum, multiply, subtract numbers, and the result is integer iff both operands are integers. 

In [2]:
print(x + y)
print(x + 1)


2.0
2


In Python 3, division between integers generates a float.
This remedies a long-standing "trap" in Python 2, where 1 / 2 = 0, because division between integers returned an integer: the quotient. 

In [3]:
x/2


0.5

In Python 3, integer division (quotient) is written `//`, and remainder is written `%`.

In [4]:
print (7 // 3)
print (7 % 3)


2
1


There are also strings in Python.  They can be delimited with either `"` or `'`.

In [5]:
s = 'A string'
t = "It's nice to be able to choose the delimiters"
print(t)


It's nice to be able to choose the delimiters


The other basic data type is booleans.  They are `True` and `False`.


In [6]:
b = True
print(not b)


False


Relational operators, obviously enough, have boolean result:

In [7]:
print(4 < 8)
print(4 == 8)


True
False


There's a special value in Python that means, no value.  It's called `None`. 

In [8]:
c = None
print(c)


None


I know it seems funny to have a value for denoting no value, but it turns out 
to be incredibly useful.  For instance, it is very often the case that you 
want to return some result, and have the option to say that there is no 
new result. None is also a value that is very commonly used to denote that
an option is not used, or that a variable has not been initialized. 
We will see more examples of it later.  
In programming languages, None is more! 

The operators `+`, `-`, `*`, `/`, can also be used with the following shorthand:

In [9]:
x = 2
x = x + 1
x += 1 # Same as above
print(x)
x *= 3 # Do I need to explain this?
print(x)


4
12


## Strings

Strings can be built using either `'` or `"` as delimiters.
If you use `'`, the string can contain `"` inside, and vice versa.

In [10]:
s = 'A string'
t = "It's nice to be able to choose the delimiters"


You can use `+` to concatenate strings:

In [11]:
print(s + " " + t)


A string It's nice to be able to choose the delimiters


You can split a string according to spaces:

In [12]:
l = t.split()
print(l)


["It's", 'nice', 'to', 'be', 'able', 'to', 'choose', 'the', 'delimiters']


Or you can split it according to any character:

In [13]:
t.split('a')


["It's nice to be ", 'ble to choose the delimiters']

You can also put back a string you have split, using `.join()`. Yes, it's weird; had I invented the .join operation, I would have defined it as an operation of lists (rather than strings), so that one would write `l.join(' ')` rather than `' '.join(l)`.  But once you learn it, you get used to it. 

In [14]:
' '.join(l)


"It's nice to be able to choose the delimiters"

A string can also be addressed as if it were a list of its characters, using indexing and slicing:

In [15]:
t[10:]


'to be able to choose the delimiters'

### Unicode and bytes

We have already seen strings.  Strings in Python 3 are not merely sequence of bytes.  A byte would be able to encode only one of 256 characters, and there are many more than 256 characters in world languages. To allow people to write in their native languages, in Python 3,  strings consist of Unicode.

In [16]:
s = "Ouvrez la fenêtre, s'il vous plaît"
print(s)
print(type(s))


Ouvrez la fenêtre, s'il vous plaît
<class 'str'>


If you have a unicode string, you can "encode" its funny characters (the non-standard ASCII characters) into a byte sequence, that is, a (non-unicode) string.  Let's try it: 

In [17]:
bs = s.encode('utf8')
print(type(bs))


<class 'bytes'>


The `'utf8'` above specifies the 'encoding', that is, the way in which the
funny characters are encoded into bytes.  I will talk more about this later, for now let's see what happened:

In [18]:
print(bs)


b"Ouvrez la fen\xc3\xaatre, s'il vous pla\xc3\xaet"


You see the beauty and power of Python 3?  How it turns our
request `"Ouvrez la fenêtre, s'il vous plaît"` into a beautiful
byte string?  Ah the beauty of computer science!  Oh the pinnacles of 
accomplishment reached after centuries of striving! 
You can go back from byte strings (denoted by the little `'b'`) 
to unicode:

In [19]:
ut = bs.decode('utf8')
print(ut)


Ouvrez la fenêtre, s'il vous plaît


Ok.  So basically, the same thing has two representations: one in _plain_ unicode, and one in encoded form as a byte sequence. If you know the unicode code for a character, you can build a byte sequence containing that code, and decode it to get the character in a string:

In [20]:
bv = b"Sleep well \xe2\x9d\xa4"
print(bv.decode('utf8'))


Sleep well ❤


You see? The little heart was obtaining by decoding the hex sequence
of bytes e2, 9d, a4 into the graphical symbol corresponding to 
that sequence, which happens to be a heart.  Cute eh? 
One third of computer science is devoted to sending cute emojis to 
loved ones.  And this is natural and good, as human are social beings.
Don't ask about the other two thirds. 

Now, let's take a look at what we have done. 
We can print the types of expressions: 

In [21]:
print(type(s))
print(type(bs))


<class 'str'>
<class 'bytes'>


Now, this is all great.  Except for one thing.  You recall that `'utf8'` in

In [22]:
print(u"I love you ❤".encode('utf8'))


b'I love you \xe2\x9d\xa4'


What is that utf8?  Is is the table of correspondence between byte sequences
and symbols, the table associating in this case the little heart with 
the byte sequence e2, 9d, a4.  The problem is that there is MORE than one
table of correspondence.  The idea of having more than one encoding strikes me
as downright asinine.  If one standard is good, then two standards are better, right?
Well, a mission to Mars was lost due to imperial to SI unit conversion, so apparently not.
The trouble is that with more than one such correspondence is that it matters:

In [23]:
print(s.encode('iso-8859-1'))
print(s.encode('utf-8'))


b"Ouvrez la fen\xeatre, s'il vous pla\xeet"
b"Ouvrez la fen\xc3\xaatre, s'il vous pla\xc3\xaet"


So you need to know which encoding is used.  And the BIG trouble is that on the
internet, when somebody sends you a message, they often don't tell you which 
encoding they are using, or often they simply lie or get it wrong. 
It's one of those ideas that's supposed to be great in theory, but is 
bad in practice.  So the sad truth is that you typically hope that
whoever sends you bytes (because bytes is all that can be sent over a wire)
either uses utf8, or tells you honestly the encoding. 

Oh, the default in Python is utf-8, so you can omit it: 

In [24]:
bv.decode()


'Sleep well ❤'

### More on bytes

The implementation of bytes in Python 3 suffers from some truly unfortunate
problems.  For instance, you can slice bytes just fine: 

In [25]:
bs[3:8]


b'rez l'

But if you ask for a single byte, you get -- surprise! -- an integer!

In [26]:
bs[3]


114

This is a truly poor design choice: in no other case is the type of slice elements different from the type of individually-indexed elements.