<a href="https://colab.research.google.com/github/pablocurcodev/machine_learning/blob/main/1_Python_Language_Basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Python Language Basics**

In [None]:
# Introspection
# Using a question mark (?) before or after a variable will display some general information about the object:

b = [1, 2, 3]
b?

In [None]:
def add_numbers(a, b):
    """
    Add two numbers together

    Returns
    -------
    the_sum : type of arguments
    """
    return a + b

In [None]:
add_numbers?

In [None]:
"""
? has a final usage, which is for searching the python namespace
A number of characters combined with the wildcard (*) will show all names matching the wildcard expression.
"""

import numpy as np
np.*load*?

## **Language Semantics**

The Python language design is distinguished by its emphasis on readability, simplicity, and explicitness. Some people go so far as to liken it to “executable pseudocode.”

**Indentation, not braces**
Python uses whitespace (tabs or spaces) to structure code instead of using braces as in many other languages like R, C++, Java, and Perl. Consider a for loop from a sorting algorithm:

In [None]:
array = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
less, greater = [], []
pivot = 5
for x in array:
    if x < pivot:
        less.append(x)
    else:
        greater.append(x)

less, greater

([1, 2, 3, 4], [5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])

A colon denotes the start of an indented code block after which all of the code must be indented by the same amount until the end of the block.

I strongly recommend using four spaces as your default indentation and replacing tabs with four spaces.

As you can see by now, Python statements also do not need to be terminated by semicolons. Semicolons can be used, however, to separate multiple statements on a single line:

a = 5; b = 6; c = 7

An important characteristic of the Python language is the consistency of its object model. Every number, string, data structure, function, class, module, and so on exists in the Python interpreter in its own “box,” which is referred to as a Python object.

**Comments**
Any text preceded by the hash mark (pound sign) # is ignored by the Python interpreter. This is often used to add comments to code.

You call functions using parentheses and passing zero or more arguments, optionally assigning the returned value to a variable

**Dynamic references, strong types**
Variables in Python have no inherent type associated with them; a variable can refer to a different type of object simply by doing an assignment.

In [None]:
a = 5

print(type(a))

a = "foo"

print(type(a))

<class 'int'>
<class 'str'>


In [None]:
def isiterable(obj):
    try:
        iter(obj)
        return True
    except TypeError: # not iterable
        return False

print(isiterable(False))
print(isiterable(5))
print(isiterable('String'))
print(isiterable([1,2,3]))

False
False
True
True


To check if two variables refer to the same object, use the is keyword. Use is not to check that two objects are not the same:

In [None]:
a = [1, 2, 3]
b = a
c = list(a)

print(a is b)
print(a is c)

True
False


In [None]:
z = None
print(type(z))
print(z)

<class 'NoneType'>
None


In [None]:
c = """
This is a longer string that
spans multiple lines
"""
print(c)


This is a longer string that
spans multiple lines



In [None]:
# Many Python objects can be converted to a string using the str function:

a = 5.6
s = str(a)
print(s)

5.6


In [None]:
# raw string
s = r"this\has\no\special\characters"
s

'this\\has\\no\\special\\characters'

In [None]:
template = "{0:.2f} {1:s} are worth US${2:d}"
template.format(88.46, "Argentine Pesos", 1)

'88.46 Argentine Pesos are worth US$1'

In [None]:
amount = 10
rate = 88.46
currency = "Pesos"

result = f"{amount} {currency} is worth US${amount / rate}"
result

'10 Pesos is worth US$0.11304544426859599'

In [None]:
val = "español"
val_utf8 = val.encode("utf-8")
print(val_utf8)
print(type(val_utf8))

b'espa\xc3\xb1ol'
<class 'bytes'>


In [None]:
val2 = val_utf8.decode("utf-8")
print(val2)
print(type(val2))

print(val.encode("latin1"))

print(val.encode("utf-16"))

print(val.encode("utf-16le"))

español
<class 'str'>
b'espa\xf1ol'
b'\xff\xfee\x00s\x00p\x00a\x00\xf1\x00o\x00l\x00'
b'e\x00s\x00p\x00a\x00\xf1\x00o\x00l\x00'


## **Dates and times**

In [None]:
from datetime import datetime, date, time

dt = datetime(2011, 10, 29, 20, 30, 21)
print(dt.day)
print(dt.minute)
print(dt.date())
print(dt.time())

print(dt.strftime("%Y-%m-%d %H:%M"))

dt2 = datetime(2011, 11, 15, 22, 30)

delta = dt2 - dt
print(delta)
print(type(delta))

print(dt + delta)


29
30
2011-10-29
20:30:21
2011-10-29 20:30
17 days, 1:59:39
<class 'datetime.timedelta'>
2011-11-15 22:30:00


In [None]:
dt_hour = dt.replace(minute=0, second=0)

## **Control Flow**

Python has several built-in keywords for conditional logic, loops, and other standard control flow concepts found in other programming languages.

### **if, elif, and else**

In [None]:
x = -7
if x < 0:
    print("x is negative")

x is negative


In [None]:
x = 5

if x < 0:
    print("x is negative")
elif x == 0:
    print("x is equals to zero")
elif 0 < x < 5:
    print("x is positive but smaller than 5")
else:
    print("x is positive and larger than or equal to 5")

x is positive and larger than or equal to 5


In [None]:
5 > 3 > 2 > 0

True

In [None]:
sequence = [1, 2, None, 4, None, 5]
total = 0
for value in sequence:
    if value is None:
        continue # skipping the rest of the block
    total += value

print(total)

12


In [None]:
sequence = [1, 2, 0, 4, 6, 5, 2, 1]
total_until_5 = 0
for value in sequence:
    if value == 5:
        break # exiting the block altogether
    total_until_5 += value
print(total_until_5)

# The break keyword only terminates the innermost for loop;
# any outer for loops will continue to run:

for i in range(4):
    for j in range(4):
        if j > i:
            break
        print((i, j))

13
(0, 0)
(1, 0)
(1, 1)
(2, 0)
(2, 1)
(2, 2)
(3, 0)
(3, 1)
(3, 2)
(3, 3)


In [None]:
iterator = [(1,2,3),(4,5,6),(7,8,9)]
for a, b, c in iterator:
    print(a + b + c)

6
15
24


### **while loops**
A while loop specifies a condition and a block of code that is to be executed until the condition evaluates to False or the loop is explicitly ended with break:

In [None]:
x = 256
total = 0
while x > 0:
    if total > 500:
        break
    print("total = " + str(total) + " + " + str(x) + " = " + str(total + x))
    total += x
    print("New value for x = " + "str(x // 2) = " + str(x // 2))
    x = x // 2

total = 0 + 256 = 256
New value for x = str(x // 2) = 128
total = 256 + 128 = 384
New value for x = str(x // 2) = 64
total = 384 + 64 = 448
New value for x = str(x // 2) = 32
total = 448 + 32 = 480
New value for x = str(x // 2) = 16
total = 480 + 16 = 496
New value for x = str(x // 2) = 8
total = 496 + 8 = 504
New value for x = str(x // 2) = 4


### **pass**
pass is the “no-op” (or “do nothing”) statement in Python. It can be used in blocks where no action is to be taken (or as a placeholder for code not yet implemented); it is required only because Python uses whitespace to delimit blocks:

In [None]:
if x < 0:
    print("negative!")
elif x == 0:
    # TODO: put something smart here
    pass
else:
    print("positive!")

positive!


## **range**
The range function generates a sequence of evenly spaced integers:

In [None]:
range(10)
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [None]:
list(range(0, 20, 2))

[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]

In [None]:
# A common use of range is for iterating through sequences by index:
seq = [1, 2, 3, 4]
for i in range(len(seq)):
    print(f"element {i}: {seq[i]}")

element 0: 1
element 1: 2
element 2: 3
element 3: 4


# Data Structures and Sequences

Python’s data structures are simple but powerful. Mastering their use is a critical part of becoming a proficient Python programmer. We start with tuple, list, and dictionary, which are some of the most frequently used sequence types.

## **Tuple**

In [None]:
tup = (4, 5, 6)
print(tup)
print(type(tup))

# also
tup = 4, 5, 6
print(tup)
print(type(tup))

(4, 5, 6)
<class 'tuple'>
(4, 5, 6)
<class 'tuple'>


In [None]:
tup = tuple("string")
print(tup)
print(tup[1])

('s', 't', 'r', 'i', 'n', 'g')
t


In [None]:
nested_tup = (4, 5, 6), (7, 8)
print(nested_tup[0])
print(nested_tup[1][0])

(4, 5, 6)
7


In [None]:
tup = tuple(['foo', [1, 2], True])
tup[1].append(3)
tup

('foo', [1, 2, 3], True)

In [None]:
t = (4, None, 'foo') + (6, 0) + ('bar',)
print(t)
t = (3,5) * 15
print(t)
print(len(t))

(4, None, 'foo', 6, 0, 'bar')
(3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5, 3, 5)
30


In [None]:
# A common use of variable unpacking is iterating over sequences of tuples or lists:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print(f'a={a}, b={b}, c={c}')

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


In [None]:
values = 1, 2, 3, 4, 5
a, b, *rest = values
rest

# This rest bit is sometimes something you want to discard;
# there is nothing special about the rest name.
# As a matter of convention, many Python programmers
# will use the underscore (_) for unwanted variables:
a, b, *_ = values

In [None]:
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

4

## **List**
In contrast with tuples, lists are variable length and their contents can be modified in place. Lists are mutable. You can define them using square brackets [] or using the list type function:

In [None]:
# Elements can be appended to the end of the list with the append method:

l = ['Alpha','Beta','Gamma']
l.append('Delta')
print(l)
l.remove('Beta')
print(l)
l.insert(1, 'Beta')
print(l)

l.append('Epsilon')
print(l)

E = l.pop()
print(l)
print(E)

['Alpha', 'Beta', 'Gamma', 'Delta']
['Alpha', 'Gamma', 'Delta']
['Alpha', 'Beta', 'Gamma', 'Delta']
['Alpha', 'Beta', 'Gamma', 'Delta', 'Epsilon']
['Alpha', 'Beta', 'Gamma', 'Delta']
Epsilon


In [None]:
print('Zeta' in l)
print('Alpha' in l)

False
True


In [None]:
print(l)

['Alpha', 'Beta', 'Gamma', 'Delta']


In [None]:
l2 = ['Epsilon', 'Zeta','Etha', 'Theta', 'Iota']
l = l + l2
print(l)




['Alpha', 'Beta', 'Gamma', 'Delta', 'Epsilon', 'Zeta', 'Etha', 'Theta', 'Iota']


In [None]:
l.extend(['Kappa', 'Lambda'])
print(l)

l.extend(['Mu'])
print(l)

['Alpha', 'Beta', 'Gamma', 'Delta', 'Epsilon', 'Zeta', 'Etha', 'Theta', 'Iota', 'Kappa', 'Lambda']
['Alpha', 'Beta', 'Gamma', 'Delta', 'Epsilon', 'Zeta', 'Etha', 'Theta', 'Iota', 'Kappa', 'Lambda', 'Mu']


In [None]:
# Sorting

l = [5,25,17,36,2,81,102,9,4,15]
l.sort()
print(l)

l.sort(reverse=True)
print(l)

l =  ["saw", "small", "He", "foxes", "six"]
l.sort(key=len)
print(l)

[2, 4, 5, 9, 15, 17, 25, 36, 81, 102]
[102, 81, 36, 25, 17, 15, 9, 5, 4, 2]
['He', 'saw', 'six', 'small', 'foxes']


In [None]:
# Slicing

seq = [7, 2, 3, 7, 5, 6, 0, 1]
print(seq[1:5])

seq[3:5] = [6, 3]

print(seq)


[2, 3, 7, 5]
[7, 2, 3, 6, 3, 6, 0, 1]


In [None]:
# Either the start or stop can be omitted, in which case they default to the start of the sequence and the end of the sequence, respectively
print(seq[:5])

print(seq[3:])

# Negative indices slice the sequence relative to the end
seq[-4:]

seq[-6:-2]

[7, 2, 3, 6, 3]
[6, 3, 6, 0, 1]


[3, 6, 3, 6]

In [None]:
# A step can also be used after a second colon to, say, take every other element:
print(seq[::2])

#A clever use of this is to pass -1, which has the useful effect of reversing a list or tuple:
print(seq[::-1])

[7, 3, 3, 0]
[1, 0, 6, 3, 6, 3, 2, 7]


## **Dictionary**
The dictionary or dict may be the most important built-in Python data structure. In other programming languages, dictionaries are sometimes called hash maps or associative arrays. A dictionary stores a collection of key-value pairs, where key and value are Python objects. Each key is associated with a value so that a value can be conveniently retrieved, inserted, modified, or deleted given a particular key. One approach for creating a dictionary is to use curly braces {} and colons to separate keys and values:

In [None]:
greek_alphabet = {
    "Alpha": "α",
    "Beta": "β",
    "Gamma": "γ",
    "Delta": "δ",
    "Epsilon": "ε",
}
print(greek_alphabet)

{'Alpha': 'α', 'Beta': 'β', 'Gamma': 'γ', 'Delta': 'δ', 'Epsilon': 'ε'}


In [None]:
greek_alphabet['Alpha']

'α'

In [None]:
# You can access, insert, or set elements using the same syntax as for accessing elements of a list or tuple
d1 = {"a": "some value", "b": [1, 2, 3, 4]}
d1[7] = "an integer"
print(d1['b'])

[1, 2, 3, 4]


In [None]:
# You can check if a dictionary contains a key using the same syntax used for checking whether a list or tuple contains a value:
print("b" in d1)

# You can delete values using either the del keyword or the pop method (which simultaneously returns the value and deletes the key):

d1[5] = "some value"
print(d1)

d1["dummy"] = "another value"
print(d1)

del d1[5]
print(d1)

ret = d1.pop("dummy")
print(ret)
print(d1)


True
{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value'}
{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 5: 'some value', 'dummy': 'another value'}
{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer', 'dummy': 'another value'}
another value
{'a': 'some value', 'b': [1, 2, 3, 4], 7: 'an integer'}


In [None]:
# The keys and values method gives you iterators of the dictionary’s
# keys and values, respectively. The order of the keys depends on the
# order of their insertion, and these functions output the keys
# and values in the same respective order:

print(list(d1.keys()))
print(list(d1.values()))

print(d1.items())

# If you need to iterate over both the keys and values, you can use the items method to iterate over the keys and values as 2-tuples:
for k, v in d1.items():
    print(k, v)


['a', 'b', 7]
['some value', [1, 2, 3, 4], 'an integer']
dict_items([('a', 'some value'), ('b', [1, 2, 3, 4]), (7, 'an integer')])
a some value
b [1, 2, 3, 4]
7 an integer


In [None]:
# Creating dictionaries from sequences

cirilic_letters = ['А', 'Б', 'В', 'Г', 'Д', 'Е']
latin_letters = ['A', 'B', 'V', 'G', 'D', 'E']

alphabet_dict = {}
for cirilic, latin in zip(cirilic_letters, latin_letters):
  alphabet_dict[cirilic] = latin

print(alphabet_dict)


# Since a dictionary is essentially a collection of 2-tuples, the dict function accepts a list of 2-tuples

tuples = zip(range(5), reversed(range(5)))
print(tuples)
print(dict(tuples))

{'А': 'A', 'Б': 'B', 'В': 'V', 'Г': 'G', 'Д': 'D', 'Е': 'E'}
<zip object at 0x7f48083e3940>
{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}


get by default will return None if the key is not present, while pop will raise an exception. With setting values, it may be that the values in a dictionary are another kind of collection, like a list. For example, you could imagine categorizing a list of words by their first letters as a dictionary of lists:

In [None]:
words = ["apple", "bat", "bar", "atom", "book"]

by_letter = {}

for word in words:
  letter = word[0]
  if letter not in by_letter:
    by_letter[letter] = [word]
  else:
    by_letter[letter].append(word)

print(by_letter)

value = by_letter.get('c', 'NOT FOUND')
print(value)

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}
NOT FOUND


In [None]:
by_letter = {}
for word in words:
  letter = word[0]
  by_letter.setdefault(letter, []).append(word)

print(by_letter)

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}


The built-in collections module has a useful class, defaultdict, which makes this even easier. To create one, you pass a type or function for generating the default value for each slot in the dictionary:

In [None]:
from collections import defaultdict

by_letter = defaultdict(list)
for word in words:
  by_letter[word[0]].append(word)

print(by_letter)

defaultdict(<class 'list'>, {'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']})


**Valid dictionary key types**
While the values of a dictionary can be any Python object, the keys generally have to be immutable objects like scalar types (int, float, string) or tuples (all the objects in the tuple need to be immutable, too). The technical term here is hashability. You can check whether an object is hashable (can be used as a key in a dictionary) with the hash function:

In [None]:
print(hash("string"))
print(hash((1, 2, (2, 3))))
# hash((1, 2, [2, 3])) # fails because lists are mutable

6329962141983128110
-9209053662355515447


In [None]:
# To use a list as a key, one option is to convert it to a tuple, which can be hashed as long as its elements also can be:

d = {}
d[tuple([1, 2, 3])] = 5
print(d)

{(1, 2, 3): 5}


## **Set**
A set is an unordered collection of unique elements. A set can be created in two ways: via the set function or via a set literal with curly braces:

In [None]:
set([2, 2, 2, 1, 3, 3])

{1, 2, 3}

In [None]:
# Sets support mathematical set operations like union, intersection, difference, and symmetric difference. Consider these two example sets:

a = {1, 2, 3, 4, 5}

b = {3, 4, 5, 6, 7, 8}

# The union of these two sets is the set of distinct elements occurring in either set. This can be computed with either the union method or the | binary operator:

print(a.union(b))
print(a | b)

print(a.intersection(b))
print(a & b)



{1, 2, 3, 4, 5, 6, 7, 8}
{1, 2, 3, 4, 5, 6, 7, 8}
{3, 4, 5}
{3, 4, 5}


In [None]:
a_set = {1, 2, 3, 4, 5}
print({1, 2, 3}.issubset(a_set))

print(a_set.issuperset({1, 2, 3}))

# Sets are equal if and only if their contents are equal:
print({1, 2, 3} == {3, 2, 1})


True
True
True


## **Built-In Sequence Functions**
Python has a handful of useful sequence functions that you should familiarize yourself with and use at any opportunity.

In [None]:
# enumerate
# It’s common when iterating over a sequence to want to keep track of the index of the current item. A do-it-yourself approach would look like:

l = ['a', 'b', 'c', 'd', 'e']
i = 0
for value in l:
  print(i, value)
  i += 1

index = 0
for i, value in enumerate(l):
   print(i, value)
   index += 1

0 a
1 b
2 c
3 d
4 e
0 a
1 b
2 c
3 d
4 e


In [None]:
# sorted
# The sorted function returns a new sorted list from the elements of any sequence

print(sorted([7, 1, 2, 6, 0, 3, 2]))

sorted("horse race")

# zip
# zip “pairs” up the elements of a number of lists, tuples, or other sequences to create a list of tuples:

seq1 = ["foo", "bar", "baz"]
seq2 = ["one", "two", "three"]
zipped = zip(seq1, seq2)
print(list(zipped))

for index, (a, b) in enumerate(zip(seq1, seq2)):
    print(f"{index}: {a}, {b}")

[0, 1, 2, 2, 3, 6, 7]
[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]
0: foo, one
1: bar, two
2: baz, three


In [None]:
# reversed
# reversed iterates over the elements of a sequence in reverse order

print(list(reversed(range(10))))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


## **List, Set, and Dictionary Comprehensions**

List comprehensions are a convenient and widely used Python language feature. They allow you to concisely form a new list by filtering the elements of a collection, transforming the elements passing the filter into one concise expression.





In [None]:
strings = ["a", "as", "bat", "car", "dove", "python"]
[x.upper() for x in strings if len(x) > 2]

['BAT', 'CAR', 'DOVE', 'PYTHON']

In [None]:
unique_lengths = {len(x) for x in strings}
print(unique_lengths)
type(unique_lengths)

{1, 2, 3, 4, 6}


set

In [None]:
loc_mapping = {value: index for index, value in enumerate(strings)}
loc_mapping

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}

In [None]:
# Nested list comprehensions

all_data = [["John", "Emily", "Michael", "Mary", "Steven"],
            ["Maria", "Juan", "Javier", "Natalia", "Pilar"]]

names_of_interest = []
for names in all_data:
    enough_as = [name for name in names if name.count("a") >= 2]
    names_of_interest.extend(enough_as)

print(names_of_interest)

# You can actually wrap this whole operation up in a single nested list comprehension, which will look like:

result = [name for names in all_data for name in names
              if name.count("a") >= 2]

print(result)

"""
At first, nested list comprehensions are a bit hard to wrap your head around.
The for parts of the list comprehension are arranged according to the order of nesting,
and any filter condition is put at the end as before.
"""

['Maria', 'Natalia']
['Maria', 'Natalia']


'\nAt first, nested list comprehensions are a bit hard to wrap your head around.\nThe for parts of the list comprehension are arranged according to the order of nesting,\nand any filter condition is put at the end as before.\n'

In [None]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples for x in tup]
print(flattened)

[1, 2, 3, 4, 5, 6, 7, 8, 9]


# **Functions**

Functions are the primary and most important method of code organization and reuse in Python. As a rule of thumb, if you anticipate needing to repeat the same or very similar code more than once, it may be worth writing a reusable function. Functions can also help make your code more readable by giving a name to a group of Python statements.

Functions are declared with the def keyword. A function contains a block of code with an optional use of the return keyword

In [None]:
def my_function(x, y):
    return x + y

print(my_function(2,3))

5


In [None]:
def function_without_return(x):
    print(x)

result = function_without_return("hello!")

print(result)

hello!
None


In [None]:
# Each function can have positional arguments and keyword arguments.
# Keyword arguments are most commonly used to specify default values or optional arguments.
# Here we will define a function with an optional z argument with the default value 1.5

def my_function2(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

print(my_function2(5, 6, z=0.7))
print(my_function2(3.14, 7, 3.5))
print(my_function2(10, 20))


0.06363636363636363
35.49
45.0


## **Namespaces, Scope, and Local Functions**

Functions can access variables created inside the function as well as those outside the function in higher (or even global) scopes. An alternative and more descriptive name describing a variable scope in Python is a namespace. Any variables that are assigned within a function by default are assigned to the local namespace. The local namespace is created when the function is called and is immediately populated by the function’s arguments. After the function is finished, the local namespace is destroyed (with some exceptions that are outside the purview of this chapter).

In [None]:
a = []

def func():
    a = []
    for i in range(5):
        a.append(i)
func()
print(a)

func()
print(a)

[]
[]


In [None]:
a = []

def func():
  global a
  a = []
  for i in range(5):
      a.append(i)

func()
print(a)

func()
print(a)

[0, 1, 2, 3, 4]
[0, 1, 2, 3, 4]


## **Returning Multiple Values**

In [None]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

a, b, c = f()

print(a, b, c)

# the function is actually just returning one object,
# a tuple, which is then being unpacked into the result variables.
# In the preceding example, we could have done this instead:

return_value = f()
print(return_value)

5 6 7
(5, 6, 7)


In [None]:
states = ["   Alabama ", "Georgia!", "Georgia", "georgia", "FlOrIda",
            "south   carolina##", "West virginia?"]

import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub("[!#?]", "", value)
        value = value.title()
        result.append(value)
    return result

print(clean_strings(states))

# An alternative approach that you may find useful is to
# make a list of the operations you want to apply to a
# particular set of strings

def remove_punctuation(value):
    return re.sub("[!#?]", "", value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for func in ops:
            value = func(value)
        result.append(value)
    return result

print(clean_strings(states, clean_ops))

"""
A more functional pattern like this enables you to easily modify
how the strings are transformed at a very high level.
The clean_strings function is also now more reusable and generic.
"""

['Alabama', 'Georgia', 'Georgia', 'Georgia', 'Florida', 'South   Carolina', 'West Virginia']
['Alabama', 'Georgia', 'Georgia', 'Georgia', 'Florida', 'South   Carolina', 'West Virginia']


'\nA more functional pattern like this enables you to easily modify \nhow the strings are transformed at a very high level. \nThe clean_strings function is also now more reusable and generic.\n'

## **Anonymous (Lambda) Functions**

Python has support for so-called anonymous or lambda functions, which are a way of writing functions consisting of a single statement, the result of which is the return value. They are defined with the lambda keyword, which has no meaning other than “we are declaring an anonymous function”

In [None]:
def apply_to_list(some_list, f):
   return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]

equiv_anon = lambda x: x * 2

ints = apply_to_list(ints, lambda x: x * 2)

ints

[8, 0, 2, 10, 12]

In [None]:
strings = ["foo", "card", "bar", "aaaa", "abab"]
strings.sort(key=lambda x: len(set(x)))

strings

['aaaa', 'foo', 'abab', 'bar', 'card']

## **Generators**

Many objects in Python support iteration, such as over objects in a list or lines in a file. This is accomplished by means of the iterator protocol, a generic way to make objects iterable.

A generator is a convenient way, similar to writing a normal function, to construct a new iterable object. Whereas normal functions execute and return a single result at a time, generators can return a sequence of multiple values by pausing and resuming execution each time the generator is used. To create a generator, use the yield keyword instead of return in a function.

In [None]:
def squares(n=10):
    for i in range(1, n + 1):
        yield i ** 2

gen = squares(5)

print(gen)

for x in gen:
    print(x, end=" ")



<generator object squares at 0x7f48082085f0>
1 4 9 16 25 

## **Generator expressions**
Another way to make a generator is by using a generator expression. This is a generator analogue to list, dictionary, and set comprehensions. To create one, enclose what would otherwise be a list comprehension within parentheses instead of brackets

In [None]:
gen = (x ** 2 for x in range(10))

for x in gen:
    print(x, end=" ")


0 1 4 9 16 25 36 49 64 81 

In [None]:
print(sum(x ** 2 for x in range(100)))
print(dict((i, i ** 2) for i in range(5)))

328350
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}


## **itertools module**

The standard library itertools module has a collection of generators for many common data algorithms. For example, groupby takes any sequence and a function, grouping consecutive elements in the sequence by return value of the function.

In [None]:
import itertools

def first_letter(x):
   return x[0]

names = ["Alan", "Adam", "Wes", "Will", "Albert", "Steven"]

for letter, names in itertools.groupby(names, first_letter):
   print(letter, list(names)) # names is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


In [None]:
for x in itertools.combinations(['a', 'b', 'c', 'd'], 2):
  print(x, end='')

print()

for x in itertools.permutations(['a', 'b', 'c', 'd'], 2):
  print(x, end='')

('a', 'b')('a', 'c')('a', 'd')('b', 'c')('b', 'd')('c', 'd')
('a', 'b')('a', 'c')('a', 'd')('b', 'a')('b', 'c')('b', 'd')('c', 'a')('c', 'b')('c', 'd')('d', 'a')('d', 'b')('d', 'c')

## **Errors and Exception Handling**

In [None]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

print(attempt_float('s'))

s


In [None]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

print(attempt_float('s'))

s


In [None]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In [None]:
f = open(path, mode="w")

try:
    write_to_file(f)
except:
    print("Failed")
else:
    print("Succeeded")
finally:
    f.close()

NameError: name 'path' is not defined

# **Files and the Operating System**

In [None]:
# To write text to a file, you can use the file’s write or writelines methods.
# For example, we could create a version of examples/segismundo.txt with no blank lines like so

path = 'examples/segismundo.txt'

t = ['Sueña el rico en su riqueza,\n',
 'que más cuidados le ofrece;\n',
 'sueña el pobre que padece\n',
 'su miseria y su pobreza;\n',
 'sueña el que a medrar empieza,\n',
 'sueña el que afana y pretende,\n',
 'sueña el que agravia y ofende,\n',
 'y en el mundo, en conclusión,\n',
 'todos sueñan lo que son,\n',
 'aunque ninguno lo entiende.\n']

with open("tmp.txt", mode="w") as handle:
   handle.writelines(x for x in t if len(x) > 1)

with open("tmp.txt") as f:
   lines = f.readlines()

In [None]:
# To open a file for reading or writing, use the built-in open function with either a relative or absolute file path and an optional file encoding

path = "tmp.txt"
f = open(path, encoding="utf-8")

# By default, the file is opened in read-only mode "r". We can then treat the file object f like a list and iterate over the lines like so:



In [None]:
for line in f:
    print(line)

lines = [x.rstrip() for x in open(path, encoding="utf-8")]

Sueña el rico en su riqueza,

que más cuidados le ofrece;

sueña el pobre que padece

su miseria y su pobreza;

sueña el que a medrar empieza,

sueña el que afana y pretende,

sueña el que agravia y ofende,

y en el mundo, en conclusión,

todos sueñan lo que son,

aunque ninguno lo entiende.



In [None]:
# When you use open to create file objects, it is recommended to close the file when you are finished with it.
# Closing the file releases its resources back to the operating system

f.close()

with open(path, encoding="utf-8") as f:
  lines = [x.rstrip() for x in f]

f1 = open(path)
print(f1.read(10))
f2 = open(path, mode="rb")  # Binary mode
print(f2.read(10))

# The read method advances the file object position by the number of bytes read. tell gives you the current position

print(f1.tell())

f1.close()
f2.close()



Sueña el r
b'Sue\xc3\xb1a el '
11


In [None]:
# Even though we read 10 characters from the file f1 opened in text mode,
# the position is 11 because it took that many bytes to decode 10 characters
# using the default encoding. You can check the default encoding in the sys module

import sys

sys.getdefaultencoding()

'utf-8'

In [None]:
# To write text to a file, you can use the file’s write or writelines methods.
# For example, we could create a version of examples/segismundo.txt with no blank lines like so

path = 'tmp.txt'

t = ['Sueña el rico en su riqueza,\n',
 'que más cuidados le ofrece;\n',
 'sueña el pobre que padece\n',
 'su miseria y su pobreza;\n',
 'sueña el que a medrar empieza,\n',
 'sueña el que afana y pretende,\n',
 'sueña el que agravia y ofende,\n',
 'y en el mundo, en conclusión,\n',
 'todos sueñan lo que son,\n',
 'aunque ninguno lo entiende.\n']

with open("tmp.txt", mode="w") as handle:
   handle.writelines(x for x in t if len(x) > 1)

with open("tmp.txt") as f:
   lines = f.readlines()

f.close()




## **Bytes and Unicode with Files**
UTF-8 is a variable-length Unicode encoding, so when I request some number of characters from the file, Python reads enough bytes (which could be as few as 10 or as many as 40 bytes) from the file to decode that many characters. If I open the file in "rb" mode instead, read requests that exact number of bytes:

In [None]:
with open(path, mode="rb") as f:
  data = f.read(10)

print(data)

"""
Depending on the text encoding, you may be able to decode the bytes to a str object yourself, but only if each of the encoded Unicode characters is fully formed
"""

print(data.decode("utf-8"))

data[:4].decode("utf-8")

b'Sue\xc3\xb1a el '
Sueña el 


UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 3: unexpected end of data

In [None]:
sink_path = "sink.txt"

with open(path) as source:
  with open(sink_path, "x", encoding="iso-8859-1") as sink:
    sink.write(source.read())

with open(sink_path, encoding="iso-8859-1") as f:
  print(f.read(10))

Sueña el r


In [None]:
# Beware using seek when opening files in any mode other than binary.
# If the file position falls in the middle of the bytes defining a Unicode character,
# then subsequent reads will result in an error

f = open(path, encoding='utf-8')
print(f.read(5))

print(f.seek(4))

print(f.read(1))

Sueña
4


UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb1 in position 0: invalid start byte

In [None]:
f.close()