# Introduction to Python 3: The Basic Data Types
## Luca de Alfaro
Copyright Luca de Alfaro, 2018-21.  CC-BY-NC License.



Prepared on: Tue Aug  3 11:57:15 2021

This is a book chapter; it is not a homework assignment.  
Do not submit it as a solution to a homework assignment; you would receive no credit.


## Integers, floats, strings, booleans, and ... None!

In python there are numbers, which can be integer or float. 

In [1]:
x = 1 # int
y = 1. # float


You can sum, multiply, subtract numbers, and the result is integer iff both operands are integers. 

In [2]:
print(x + y)
print(x + 1)


2.0
2


In Python 3, division between integers generates a float.
This remedies a long-standing "trap" in Python 2, where 1 / 2 = 0, because division between integers returned an integer: the quotient. 

In [3]:
x / 2


0.5

In Python 3, integer division (quotient) is written `//`, and remainder is written `%`.

In [4]:
print (7 // 3)
print (7 % 3)


2
1


There are also strings in Python.  They can be delimited with either `"` or `'`.

In [5]:
s = 'A string'
t = "It's nice to be able to choose the delimiters"
print(t)


It's nice to be able to choose the delimiters


The other basic data type is booleans.  They are `True` and `False`.


In [6]:
b = True
print(not b)


False


Relational operators, obviously enough, have boolean result:

In [7]:
print(4 < 8)
print(4 == 8)


True
False


There's a special value in Python that means, no value.  It's called `None`. 

In [8]:
c = None
print(c)


None


I know it seems funny to have a value for denoting no value, but it turns out 
to be incredibly useful.  For instance, it is very often the case that you 
want to return some result, and have the option to say that there is no 
new result. None is also a value that is very commonly used to denote that
an option is not used, or that a variable has not been initialized. 
We will see more examples of it later.  
In programming languages, None is more! 

The operators `+`, `-`, `*`, `/`, can also be used with the following shorthand:

In [9]:
x = 2
x = x + 1
x += 1 # Same as above
print(x)
x *= 3 # Do I need to explain this?
print(x)


4
12


## Strings

Strings can be built using either `'` or `"` as delimiters.
If you use `'`, the string can contain `"` inside, and vice versa.

In [10]:
s = 'A string'
t = "It's nice to be able to choose the delimiters"


You can use `+` to concatenate strings:

In [11]:
print(s + " " + t)


A string It's nice to be able to choose the delimiters


You can split a string according to spaces:

In [12]:
l = t.split()
print(l)


["It's", 'nice', 'to', 'be', 'able', 'to', 'choose', 'the', 'delimiters']


Or you can split it according to any character:

In [13]:
t.split('a')


["It's nice to be ", 'ble to choose the delimiters']

You can also put back a string you have split, using `.join()`. Yes, it's weird; had I invented the .join operation, I would have defined it as an operation of lists (rather than strings), so that one would write `l.join(' ')` rather than `' '.join(l)`.  But once you learn it, you get used to it. 

In [14]:
' '.join(l)


"It's nice to be able to choose the delimiters"

A string can also be addressed as if it were a list of its characters, using indexing and slicing:

In [15]:
t[10:]


'to be able to choose the delimiters'

In [16]:
t[5:10]


'nice '

In [17]:
t[100:]


''

### Unicode and bytes

Strings in Python represent sequences of _characters_, that is, symbols in some language.  Here's a string that corresponds to the word "coffee" in italian. 

In [18]:
s = "Caffé"
print(s)
print(type(s))


Caffé
<class 'str'>


If you have a string, you can "encode" it into a sequence of bytes.  "Usual" letters (e.g., the non-accented latin letters) are represented with just one byte, but many accented letters and other symbols are represented using more than one byte: 

In [19]:
bs = s.encode('utf-8')
print(bs)
print(type(bs))


b'Caff\xc3\xa9'
<class 'bytes'>


The `'utf8'` above specifies the 'encoding', that is, the way in which the
funny characters are encoded into bytes.  
Note that the length of "Caffé" is 5 letters, while the length of its byte encoding is 6 bytes, as it takes two bytes to encode the accented e: 

In [20]:
print(s, "length:", len(s))
print(bs, "length:", len(bs))


Caffé length: 5
b'Caff\xc3\xa9' length: 6


To obtain the string back from the byte sequence, you _decode_ it: 

In [21]:
ut = bs.decode('utf-8')
print(ut)


Caffé


This is all good.  The problem is that _there are multiple encodings_!  You see that `'utf-8'`?  That's the most common encoding.  But there are other encodings.  And if you decode a string with the wrong encoding, you get gibberish:  

In [22]:
bs.decode('iso-8859-1')


'CaffÃ©'

The problem with this is a byte sequence does not contain a description of its encoding.  So when you receive a byte sequence over a network link, or read it from a file, you have to hope you know which encoding has been used to produce it.  

If you know the unicode code for a character, you can build a byte sequence containing that code, and decode it to get the character in a string:

In [23]:
bv = b"Sleep well \xe2\x9d\xa4"
print(bv.decode('utf-8'))


Sleep well ❤


Above, the little heart was obtaining by decoding the hex sequence
of bytes e2, 9d, a4 into the graphical symbol corresponding to 
that sequence, which happens to be a heart.  Cute eh? 
One third of computer science is devoted to sending cute emojis to 
loved ones.  And this is natural and good, as human are social beings.
Don't ask about the other two thirds. 

The default encoding in Python is `utf-8`, so you can omit it: 

In [24]:
bv.decode()


'Sleep well ❤'