# Strings and Operations

Strings in Python are created with single or double quotes:

## Overview

In [7]:
message = "Do you like Batman?"
response = 'Yes'

Python has many extremely useful string functions and methods; here are a few of them:

In [8]:
# length of string
len(response)

3

In [9]:
# concatenation with +
message + " " + response

'Do you like Batman? Yes'

In [10]:
# multiplication is multiple concatenation
"Batman! " + "Na"*30

'Batman! NaNaNaNaNaNaNaNaNaNaNaNaNaNaNaNaNaNaNaNaNaNaNaNaNaNaNaNaNaNa'

### String formatting

Strings can be formatted in several ways in Python. Here is a summary

| Formatting option | Description                      | Comments                            |
|-------------------|----------------------------------|-------------------------------------|
| % operator        | "Old Style" String Formatting    | Totally discouraged                 |
| str.format()      | "New Style" String Formatting    | Recommended in Python 3.5 and below |
| f-strings         | String Interpolation / f-Strings | Recommended in Python 3.6+          |

In [11]:
name = "Daniel"

### "Old Style" String Formatting

In [12]:
# check the % operator
"My name is %s" % name

'My name is Daniel'

### "New Style" String Formatting

In [13]:
# check the str.format option
"My name is {}".format(name)

'My name is Daniel'

In [14]:
"My name is {name} and I live in {place}".format(name=name, place="Madrid")

'My name is Daniel and I live in Madrid'

### String Interpolation / f-Strings

In [15]:
# check the f-strings option
f"My name is {name}"

'My name is Daniel'

> ***EXAMPLE 1***
>
> Calculate the output of the following formula $A=\frac{3^5-4}{2\pi}$ and $B=4A$ and print the result within the following sentence with f-Strings
>
> $$\text{The result of A is --- and the result of B is ---}$$


In [22]:
import math

A = (3**5-4)/(2*math.pi)
B = 4*A

print(f"The result of A is {round(A, ndigits=2)} and the result of B is {round(B, ndigits=2)}")

The result of A is 38.04 and the result of B is 152.15


In [20]:
help(round)

Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.
    
    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.



## Slicing strings

In order to slice a string, we can use `[]` taking into account that Python is ZBI (Zero-based index)

| Slicing      | Outcome                                                                                           |
|--------------|---------------------------------------------------------------------------------------------------|
| `str[0]`     | First character                                                                                   |
| `str[-1]`    | Last character                                                                                    |
| `str[n:]`    | Substring from the n-th character (included) till the end                                         |
| `str[:m]`    | Substring from the beginning to the m-th character (not included)                                 |
| `str[n:m]`   | Substring between the n-th character (included) and the m-th character (not included)             |
| `str[n:m:h]` | Substring between the n-th character (included) and the m-th character (not included), but h by h |

In [1]:
# In Python, all structures are indexed starting by 0 (Zero-Based index)

"philippe"[0]

'p'

In [2]:
# If we use a negative index, we are indexing in reverse order

"philippe"[-3]

'p'

In [3]:
"philippe"[-1]

'e'

In [4]:
# Slicing in Python is done with ":"

"daniel"[:3]  # This means "From the begining (from position 0) to 3 (not included)"

'dan'

In [5]:
"daniel"[-3:] # This means "From position 3 (starting at the end) to the end"

'iel'

In [28]:
"daniel"[1:6:2] # This means "From position 1 position 6, in twos"

'ail'

> ***EXAMPLE 2***
>
> ```
> name = "Daniel" (6 characters)
>
> -> name[4:6]
> -> name[1:10]
> -> name[-3:-1]
> -> name[-3:1]
> -> name[1:6:2]
> -> name[1::2]
> -> name[-1:-6:-2]
> -> name[::2]
> -> name[::-1]
> ```
 

In [7]:
name = "Daniel"
print(name[4:6])
print(name[1:10])
print(name[-3:-1])
print(name[-3:1])
print(name[1:6:2])
print(name[1::2])
print(name[-1:-6:-2])
print(name[::2])
print(name[::-1])

el
aniel
ie

ail
ail
lia
Dne
leinaD


## Built-In Methods

Python language is plenty of powerful built-in methods to ease our life

| Method           | Description                                                                                   |
|------------------|-----------------------------------------------------------------------------------------------|
| `str.capitalize` | Converts the first character to upper case                                                    |
| `str.center`     | Returns a centered string                                                                     |
| `str.count`      | Returns the number of times a specified value occurs in a string                              |
| `str.endswith`   | Returns true if the string ends with the specified value                                      |
| `str.find`       | Searches the string for a specified value and returns the position of where it was found      |
| `str.isalnum`    | Returns True if all characters in the string are alphanumeric                                 |
| `str.isalpha`    | Returns True if all characters in the string are in the alphabet                              |
| `str.isascii`    | Returns True if all characters in the string are ascii characters                             |
| `str.isdecimal`  | Returns True if all characters in the string are decimals                                     |
| `str.isdigit`    | Returns True if all characters in the string are digits                                       |
| `str.islower`    | Returns True if all characters in the string are lower case                                   |
| `str.isnumeric`  | Returns True if all characters in the string are numeric                                      |
| `str.isspace`    | Returns True if all characters in the string are whitespaces                                  |
| `str.istitle`    | Returns True if the string follows the rules of a title                                       |
| `str.isupper`    | Returns True if all characters in the string are upper case                                   |
| `str.join`       | Converts the elements of an iterable into a string                                            |
| `str.ljust`      | Returns a left justified version of the string                                                |
| `str.lower`      | Converts a string into lower case                                                             |
| `str.lstrip`     | Returns a left trim version of the string                                                     |
| `str.replace`    | Returns a string where a specified value is replaced with a specified value                   |
| `str.rfind`      | Searches the string for a specified value and returns the last position of where it was found |
| `str.rjust`      | Returns a right justified version of the string                                               |
| `str.rstrip`     | Returns a right trim version of the string                                                    |
| `str.split`      | Splits the string at the specified separator, and returns a list                              |
| `str.splitlines` | Splits the string at line breaks and returns a list                                           |
| `str.startswith` | Returns true if the string starts with the specified value                                    |
| `str.strip`      | Returns a trimmed version of the string                                                       |
| `str.swapcase`   | Swaps cases, lower case becomes upper case and vice versa                                     |
| `str.title`      | Converts the first character of each word to upper case                                       |
| `str.upper`      | Converts a string into upper case                                                             |
| `str.zfill`      | Fills the string with a specified number of 0 values at the beginning                         |

In [41]:
# the split method

name = "Daniel Sierra Ramos"
name.split()

['Daniel', 'Sierra', 'Ramos']

In [42]:
help(name.split)

Help on built-in function split:

split(sep=None, maxsplit=-1) method of builtins.str instance
    Return a list of the words in the string, using sep as the delimiter string.
    
    sep
      The delimiter according which to split the string.
      None (the default value) means split according to any whitespace,
      and discard empty strings from the result.
    maxsplit
      Maximum number of splits to do.
      -1 (the default value) means no limit.



In [43]:
# the join method

name = ["Daniel", "Sierra", "Ramos"]
" ".join(name)

'Daniel Sierra Ramos'

In [44]:
# the strip method

name = "      Daniel Sierra         "
print(name.strip())
print(name.lstrip())
print(name.rstrip())

Daniel Sierra
Daniel Sierra         
      Daniel Sierra


In [45]:
# tha capitalize and title methods

name = "daniel sierra"
print(name.capitalize())
print(name.title())

Daniel sierra
Daniel Sierra


In [46]:
# the upper, lower and swapcase

name = "DaNiEl sIeRrA"
print(name.upper())
print(name.lower())
print(name.swapcase())

DANIEL SIERRA
daniel sierra
dAnIeL SiErRa


In [47]:
# the justification methods

name = " Daniel Sierra "

print(name.ljust(10, "-"))
print(name.ljust(30, "-"))
print(name.rjust(30, "-"))
print(name.center(30, "*"))

 Daniel Sierra 
 Daniel Sierra ---------------
--------------- Daniel Sierra 
******* Daniel Sierra ********


In [48]:
# the zfill method

name = "Daniel Sierra"

print(name.zfill(10))
print(name.zfill(30))

Daniel Sierra
00000000000000000Daniel Sierra


In [49]:
# the find method

name = "Daniel Sierra"

name.find("Sierra")

7

In [50]:
# the count method

name = "Daniel Sierra"

name.count("Sie")

1

> ***EXAMPLE 3***
>
> 1. Create a new variable `full_name` with your full name
> 2. Transform your full name to lower case
> 3. Reverse the order of the characters
> 4. Print the following sentence: `Hello, my reversed order name is <your full name>`

In [57]:
print(f"Hello, my reversed order name is {'Daniel Sierra'.lower()[::-1]}")

Hello, my reversed order name is arreis leinad


# Built-In Data Structures

We have seen Python's simple types: ``int``, ``float``, ``complex``, ``bool``, ``str``, and so on.
Python also has several built-in compound types, which act as containers for other types.
These compound types are:

| Type Name | Example                   |Description                            |
|-----------|---------------------------|---------------------------------------|
| ``list``  | ``[1, 2, 3]``             | Ordered collection                    |
| ``tuple`` | ``(1, 2, 3)``             | Immutable ordered collection          |
| ``dict``  | ``{'a':1, 'b':2, 'c':3}`` | Unordered (key,value) mapping         |
| ``set``   | ``{1, 2, 3}``             | Unordered collection of unique values |
| ``frozenset``   | ``frozenset({1, 2, 3})``             | Unordered immutable collection of unique values |

As you can see, round, square, and curly brackets have distinct meanings when it comes to the type of collection produced.
We'll take a quick tour of these data structures here.

## Lists
Lists are the basic *ordered* and *mutable* data collection type in Python.
They can be defined with comma-separated values between square brackets; for example, here is a list of the first several prime numbers:

In [58]:
L = [2, 3, 5, 7]

# but also
L = list((2, 3, 5, 7))
print(L)

[2, 3, 5, 7]


In [59]:
type(L)

list

In [60]:
# joining several list

I = [4, 5]
L+I

[2, 3, 5, 7, 4, 5]

In [61]:
# replicating a list
L*3

[2, 3, 5, 7, 2, 3, 5, 7, 2, 3, 5, 7]

In [62]:
# lists with mixed types

L = [1, 'two', 3.14, [0, 3, 5]]

### List indexing and slicing
Python provides access to elements in compound types through *indexing* for single elements, and *slicing* for multiple elements.
As we'll see, both are indicated by a square-bracket syntax.
Suppose we return to our list of the first several primes:

In [63]:
L = [2, 3, 5, 7, 11]

Python uses *zero-based* indexing, so we can access the first and second element in using the following syntax:

In [64]:
L[0]

2

In [65]:
L[1]

3

Elements at the end of the list can be accessed with negative numbers, starting from -1:

In [66]:
L[-1]

11

In [67]:
L[-2]

7

You can visualize this indexing scheme this way:

Here values in the list are represented by large numbers in the squares; list indices are represented by small numbers above and below.
In this case, ``L[2]`` returns ``5``, because that is the next value at index ``2``.

Where *indexing* is a means of fetching a single value from the list, *slicing* is a means of accessing multiple values in sub-lists.
It uses a colon to indicate the start point (inclusive) and end point (non-inclusive) of the sub-array.
For example, to get the first three elements of the list, we can write:

In [68]:
L[0:3]

[2, 3, 5]

Notice where ``0`` and ``3`` lie in the preceding diagram, and how the slice takes just the values between the indices.
If we leave out the first index, ``0`` is assumed, so we can equivalently write:

In [69]:
L[:3]

[2, 3, 5]

Similarly, if we leave out the last index, it defaults to the length of the list.
Thus, the last three elements can be accessed as follows:

In [70]:
L[-3:]

[5, 7, 11]

Finally, it is possible to specify a third integer that represents the step size; for example, to select every second element of the list, we can write:

In [71]:
L[::2]  # equivalent to L[0:len(L):2]

[2, 5, 11]

A particularly useful version of this is to specify a negative step, which will reverse the array:

In [72]:
L[::-1]

[11, 7, 5, 3, 2]

Both indexing and slicing can be used to set elements as well as access them.
The syntax is as you would expect:

In [73]:
L[0] = 100
print(L)

[100, 3, 5, 7, 11]


In [74]:
L[1:3] = [55, 56]
print(L)

[100, 55, 56, 7, 11]


A very similar slicing syntax is also used in many data science-oriented packages, including NumPy and Pandas (mentioned in the introduction).

Now that we have seen Python lists and how to access elements in ordered compound types, let's take a look at the other three standard compound data types mentioned earlier.

### Built-In methods for lists

| Method         | Description                                             |
|----------------|---------------------------------------------------------|
| `list.sort`    | Sorts the list in ascending order.                      |
| `list.append`  | Adds one element to a list.                             |
| `list.extend`  | Adds multiple elements to a list.                       |
| `list.index`   | Returns the first appearance of a particular value.     |
| `max(list)`    | It returns an item from the list with a max value.      |
| `min(list)`    | It returns an item from the list with a min value.      |
| `sum(list)`    | It returns the sum of the elements in the list.         |
| `len(list)`    | It gives the overall length of the list.                |
| `list.clear`   | Removes all the elements from the list.                 |
| `list.insert`  | Adds a component at the required position.              |
| `list.count`   | Returns the number of elements with the required value. |
| `list.pop`     | Removes the element at the required position.           |
| `list.remove`  | Removes the primary item with the desired value.        |
| `list.reverse` | Reverses the order of the list.                         |
| `list.copy`    | Returns a duplicate of the list.                        |

In [75]:
L = [11, 2, 45, 1, 0, 13]

In [76]:
# get the number of elements in a list

len(L)

6

In [80]:
L = [1,2,3,4]

In [81]:
# sorting a list

L.sort()
print(L)

L.sort(reverse=True)
print(L)

[1, 2, 3, 4]
[4, 3, 2, 1]


[4, 3, 2, 1]

In [78]:
h = L.sort()

In [83]:
type(h)

NoneType

In [None]:
type(h)

```{note}
The `sort` method is in-place, that is, the list itself is modified. Alternatively, the `sorted` built-in method in Python can be used
```

In [None]:
sorted(L)

In [84]:
# min and max

print(f"This is the minimum value of L: {min(L)}")
print(f"This is the maximum value of L: {max(L)}")

# and sum (just for numbers)
print(f"This is the sum of L: {sum(L)}")  

This is the minimum value of L: 1
This is the maximum value of L: 4
This is the sum of L: 10


In [85]:
# apply max to a strings list
max(["a", "b", "ar"])

'b'

````{warning}
Be careful with mixed types lists like `L = [1, 'two', 3.14, [0, 3, 5]]`.

Obviously, they cannot be sorted!
```trace
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [18], in <cell line: 1>()
----> 1 L.sort()

TypeError: '<' not supported between instances of 'str' and 'int'
```
````

In [86]:
# dynamically changing a list

L = [11, 2, 45, 1, 0, 13]

# append just one element
L.append(34)
print(L)

# append several element
L.extend([4, 9, 2])
print(L)

# insert a value in determined position
L.insert(3, "A")
print(L)

[11, 2, 45, 1, 0, 13, 34]
[11, 2, 45, 1, 0, 13, 34, 4, 9, 2]
[11, 2, 45, 'A', 1, 0, 13, 34, 4, 9, 2]


In [87]:
# remove values by position
L.pop(3)

'A'

In [88]:
# remove all occurrences of a value
L.remove(34)
print(L)

[11, 2, 45, 1, 0, 13, 4, 9, 2]


```{note}
All these method to dinamically change a list are in-place
```

## Tuples
Tuples are in many ways similar to lists, but they are defined with parentheses rather than square brackets:

In [89]:
t = (1, 2, 3)

# but also
t = tuple([1,2,3])

They can also be defined without any brackets at all:

In [90]:
t = 1, 2, 3
print(t)

(1, 2, 3)


Like the lists discussed before, tuples have a length, and individual elements can be extracted using square-bracket indexing:

In [91]:
len(t)

3

In [92]:
t[0]

1

The main distinguishing feature of tuples is that they are *immutable*: this means that once they are created, their size and contents cannot be changed:

In [93]:
t[1] = 4

TypeError: 'tuple' object does not support item assignment

In [94]:
t.append(4)

AttributeError: 'tuple' object has no attribute 'append'

Tuples are often used in a Python program; a particularly common case is in functions that have multiple return values.
For example, the ``as_integer_ratio()`` method of floating-point objects returns a numerator and a denominator; this dual return value comes in the form of a tuple:

In [95]:
x = 0.125
x.as_integer_ratio()

(1, 8)

These multiple return values can be individually assigned as follows:

In [97]:
# unpacking tuples

numerator, denominator = x.as_integer_ratio()
print(numerator / denominator)

0.125


## Built-in iterator functions: `range`, `zip` and `enumerate`

### `range`

The `range` function allow to build lists of numbers very quickly and flexibly

In [98]:
seq = range(10)

# returns and object of type range
print(seq)

range(0, 10)


In [None]:
# IMPORTANT: The result of the range function is no a list nor tuple, but a range (it's an iterator)

type(seq)

In [99]:
# to print the result of range, just transform it to a list
seq = list(seq)

print(seq)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [100]:
seq = list(range(1, 100, 2)) # "stop" value is not included

In [101]:
print(seq)

[1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37, 39, 41, 43, 45, 47, 49, 51, 53, 55, 57, 59, 61, 63, 65, 67, 69, 71, 73, 75, 77, 79, 81, 83, 85, 87, 89, 91, 93, 95, 97, 99]


### `enumerate`
The `enumerate` allow us to and incremental number to a given iterator (a list, for example). The result is another iterator.

In [102]:
seq = ["zero", "one", "two", "three", "four"]
print(seq)

['zero', 'one', 'two', 'three', 'four']


In [103]:
enum_seq = enumerate(seq)
print(list(enum_seq))

[(0, 'zero'), (1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')]


### `zip`
The `zip` function concatenates two or more lists elementwise

In [106]:
seq1 = ["Madrid", "Berlin", "Lisbon", "London","A"]
seq2 = ["Spain", "Germany", "Portugal", "UK"]
print(seq1)
print(seq2)

['Madrid', 'Berlin', 'Lisbon', 'London', 'A']
['Spain', 'Germany', 'Portugal', 'UK']


In [107]:
joint = zip(seq1, seq2)
print(list(joint))

[('Madrid', 'Spain'), ('Berlin', 'Germany'), ('Lisbon', 'Portugal'), ('London', 'UK')]
