A tuple is a fixed-length, immutable sequence of Python objects
which, once assigned, cannot be changed. 

In [1]:
tup = (4, 5, 6)
tup

(4, 5, 6)

In many contexts, the parentheses can be omitted, so here we could
also have written:

In [2]:
tup = 4, 5, 6
tup

(4, 5, 6)

You can convert any sequence or iterator to a tuple by invoking tuple:

In [3]:
tuple([4, 0, 2])
tup = tuple('string')
tup

('s', 't', 'r', 'i', 'n', 'g')

sequences are 0-indexed in Python:

In [4]:
tup[0]

's'

When you’re defining tuples within more complicated expressions, it’s
often necessary to enclose the values in parentheses, as in this
example of creating a tuple of tuples:

In [5]:
nested_tup = (4, 5, 6), (7, 8)
nested_tup

((4, 5, 6), (7, 8))

In [6]:
nested_tup[0]


(4, 5, 6)

In [7]:
nested_tup[1]

(7, 8)

While the objects stored in a tuple may be mutable themselves, once
the tuple is created it’s not possible to modify which object is stored in
each slot:

In [9]:
tup = tuple(['foo', [1, 2], True])
tup[2] = False

TypeError: 'tuple' object does not support item assignment

If an object inside a tuple is mutable, such as a list, you can modify it in
place:

In [10]:
tup[1].append(3)
tup

('foo', [1, 2, 3], True)

You can concatenate tuples using the + operator to produce longer
tuples:

In [11]:
(4, None, 'foo') + (6, 0) + ('bar',)

(4, None, 'foo', 6, 0, 'bar')

Multiplying a tuple by an integer, as with lists, has the effect of
concatenating that many copies of the tuple:

In [12]:
('foo', 'bar') * 4

('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')

Unpacking tuples
If you try to assign to a tuple-like expression of variables, Python will
attempt to unpack the value on the righthand side of the equals sign:

In [13]:
tup = (4, 5, 6)
a, b, c = tup
b

5

Even sequences with nested tuples can be unpacked:

In [14]:
tup = 4, 5, (6, 7)
a, b, (c, d) = tup
d

7

in Python, the swap can be done like this:

In [15]:
a, b = 1, 2
a


1

In [16]:
b


2

In [17]:
b, a = a, b
a


2

In [None]:
b

A common use of variable unpacking is iterating over sequences of
tuples or lists:

In [18]:
seq = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
for a, b, c in seq:
    print(f'a={a}, b={b}, c={c}')

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


There are some situations where you may want to “pluck” a few
elements from the beginning of a tuple. There is a special syntax that
can do this, *rest, which is also used in function signatures to capture
an arbitrarily long list of positional arguments:

In [19]:
values = 1, 2, 3, 4, 5
a, b, *rest = values
a


1

In [20]:
b


2

In [21]:
rest

[3, 4, 5]

his rest bit is sometimes something you want to discard; there is
nothing special about the rest name. As a matter of convention, many
Python programmers will use the underscore (_) for unwanted variables:

In [16]:
a, b, *_ = values

Tuple methods:
A particularly useful one (also available on
lists) is count, which counts the number of occurrences of a value:

List
In contrast with tuples, lists are variable length and their contents can
be modified in place. Lists are mutable. You can define them using
square brackets [] or using the list type function:

In [17]:
a = (1, 2, 2, 2, 3, 4, 2)
a.count(2)

List:
In contrast with tuples, lists are variable length and their contents can
be modified in place. Lists are mutable. You can define them using
square brackets [] or using the list type function:

In [18]:
a_list = [2, 3, 7, None]

tup = ("foo", "bar", "baz")
b_list = list(tup)
b_list


In [None]:
b_list[1] = "peekaboo"
b_list

The list built-in function is frequently used in data processing as a way
to materialize an iterator or generator expression:

In [22]:
gen = range(10)
gen


range(0, 10)

In [23]:
list(gen)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Adding and removing elements:
Elements can be appended to the end of the list with the append
method:

In [20]:
b_list.append("dwarf")
b_list

Using insert you can insert an element at a specific location in the list:

In [24]:
b_list.insert(1, "red")
b_list

NameError: name 'b_list' is not defined

::WARNING::
insert is computationally expensive compared with append, because references
to subsequent elements have to be shifted internally to make room for the new
element. If you need to insert elements at both the beginning and end of a
sequence, you may wish to explore collections.deque, a double-ended queue,
which is optimized for this purpose and found in the Python Standard Library.

The inverse operation to insert is pop, which removes and returns an
element at a particular index:

In [22]:
b_list.pop(2)
b_list

Elements can be removed by value with remove, which locates the first
such value and removes it from the list:

In [23]:
b_list.append("foo")
b_list
b_list.remove("foo")
b_list

Check if a list contains a value using the in keyword:

In [24]:
"dwarf" in b_list

The keyword not can be used to negate in:

In [25]:
"dwarf" not in b_list

Concatenating and combining lists:
Similar to tuples, adding two lists together with + concatenates them:

In [26]:
[4, None, "foo"] + [7, 8, (2, 3)]

If you have a list already defined, you can append multiple elements to
it using the extend method:

In [27]:
x = [4, None, "foo"]
x.extend([7, 8, (2, 3)])
x

::Warning::Note that list concatenation by addition is a comparatively expensive
operation since a new list must be created and the objects copied
over. Using extend to append elements to an existing list, especially if
you are building up a large list, is usually preferable.
![image.png](attachment:image.png)

Sorting
You can sort a list in place (without creating a new object) by calling its
sort function:

In [25]:
a = [7, 2, 5, 1, 3]
a.sort()
a

[1, 2, 3, 5, 7]

sort has a few options that will occasionally come in handy. One is the
ability to pass a secondary sort key—that is, a function that produces
a value to use to sort the objects. For example, we could sort a
collection of strings by their lengths:

In [29]:
b = ["saw", "small", "He", "foxes", "six"]
b.sort(key=len)
b

Slicing:
You can select sections of most sequence types by using slice
notation, which in its basic form consists of start:stop passed to the
indexing operator []:

In [30]:
seq = [7, 2, 3, 7, 5, 6, 0, 1]
seq[1:5]

Slices can also be assigned with a sequence:

In [31]:
seq[3:5] = [6, 3]
seq

While the element at the start index is included, the stop index is not
included, so that the number of elements in the result is stop - start.
Either the start or stop can be omitted, in which case they default to
the start of the sequence and the end of the sequence, respectively:

In [32]:
seq[:5]


In [None]:
seq[3:]

Negative indices slice the sequence relative to the end:

In [33]:
seq[-4:]


In [None]:
seq[-6:-2]

In the figure, the indices are
shown at the “bin edges” to help show where the slice selections start
and stop using positive or negative indices.
![image.png](attachment:image.png)

A step can also be used after a second colon to, say, take every
other element:

In [34]:
seq[::2]

A clever use of this is to pass -1, which has the useful effect of
reversing a list or tuple:

In [35]:
seq[::-1]

Dictionary:
The dictionary or dict may be the most important built-in Python data
structure. A dictionary stores a collection of key-value pairs, where key and value are Python
objects. Each key is associated with a value so that a value can be
conveniently retrieved, inserted, modified, or deleted given a particular
key. One approach for creating a dictionary is to use curly braces {}
and colons to separate keys and values:

In [36]:
empty_dict = {}
d1 = {"a": "some value", "b": [1, 2, 3, 4]}
d1

You can access, insert, or set elements using the same syntax as for
accessing elements of a list or tuple:

In [37]:
d1[7] = "an integer"
d1


In [None]:
d1["b"]

You can check if a dictionary contains a key using the same syntax
used for checking whether a list or tuple contains a value:

In [38]:
"b" in d1

You can delete values using either the del keyword or the pop method
(which simultaneously returns the value and deletes the key):

In [39]:
d1[5] = "some value"
d1


In [None]:
d1["dummy"] = "another value"
d1


In [None]:
del d1[5]
d1


In [None]:
ret = d1.pop("dummy")
ret


In [None]:
d1

The keys and values method gives you iterators of the dictionary’s
keys and values, respectively. The order of the keys depends on the
order of their insertion, and these functions output the keys and values
in the same respective order:

In [40]:
list(d1.keys())


In [None]:
list(d1.values())

If you need to iterate over both the keys and values, you can use the
items method to iterate over the keys and values as 2-tuples:

In [41]:
list(d1.items())

You can merge one dictionary into another using the update method.
The update method changes dictionaries in place, so any existing keys
in the data passed to update will have their old values discarded.

In [42]:
d1.update({"b": "foo", "c": 12})
d1

Creating dictionaries from sequences
It’s common to occasionally end up with two sequences that you want
to pair up element-wise in a dictionary. As a first cut, you might write
code like this:
![image.png](attachment:image.png)
Since a dictionary is essentially a collection of 2-tuples, the dict
function accepts a list of 2-tuples:

In [26]:
tuples = zip(range(5), reversed(range(5)))
tuples
mapping = dict(tuples)
mapping

{0: 4, 1: 3, 2: 2, 3: 1, 4: 0}

Default values

"get" by default will return None if the key is not present, while "pop" will
raise an exception. With setting values, it may be that the values in a
dictionary are another kind of collection, like a list. For example, you
could imagine categorizing a list of words by their first letters as a
dictionary of lists:

In [44]:
words = ["apple", "bat", "bar", "atom", "book"]
by_letter = {}

for word in words:
    letter = word[0]
    if letter not in by_letter:
        by_letter[letter] = [word]
    else:
        by_letter[letter].append(word)

by_letter

The setdefault dictionary method can be used to simplify this workflow.
The preceding for loop can be rewritten as:

In [45]:
by_letter = {}
for word in words:
    letter = word[0]
    by_letter.setdefault(letter, []).append(word)
by_letter

The built-in collections module has a useful class, defaultdict, which
makes this even easier. To create one, you pass a type or function for
generating the default value for each slot in the dictionary:

In [46]:
from collections import defaultdict
by_letter = defaultdict(list)
for word in words:
    by_letter[word[0]].append(word)

While the values of a dictionary can be any Python object, the keys
generally have to be immutable objects like scalar types (int, float,
string) or tuples (all the objects in the tuple need to be immutable, too).
The technical term here is hashability. You can check whether an
object is hashable (can be used as a key in a dictionary) with the hash
function:

In [47]:
hash("string")


In [None]:
hash((1, 2, (2, 3)))


In [None]:
hash((1, 2, [2, 3])) # fails because lists are mutable

To use a list as a key, one option is to convert it to a tuple, which can
be hashed as long as its elements also can be:

In [48]:
d = {}
d[tuple([1, 2, 3])] = 5
d

Set:
A set is an unordered collection of unique elements. A set can be
created in two ways: via the set function or via a set literal with curly
braces:

In [49]:
set([2, 2, 2, 1, 3, 3])


In [None]:
{2, 2, 2, 1, 3, 3}

Sets support mathematical set operations like union, intersection,
difference, and symmetric difference. Consider these two example
sets:

In [50]:
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

he union of these two sets is the set of distinct elements occurring in
either set. This can be computed with either the union method or the |
binary operator:

In [51]:
a.union(b)
a | b

The intersection contains the elements occurring in both sets. The &
operator or the intersection method can be used:

In [52]:
a.intersection(b)
a & b

![image.png](attachment:image.png)
NOTE
If you pass an input that is not a set to methods like union and intersection,
Python will convert the input to a set before executing the operation. When using
the binary operators, both objects must already be sets.

All of the logical set operations have in-place counterparts, which
enable you to replace the contents of the set on the left side of the
operation with the result. For very large sets, this may be more
efficient:

In [53]:
c = a.copy()
c |= b
c


In [None]:
d = a.copy()
d &= b
d

Like dictionary keys, set elements generally must be immutable, and
they must be hashable (which means that calling hash on a value does
not raise an exception). In order to store list-like elements (or other
mutable sequences) in a set, you can convert them to tuples:

In [None]:
my_data = [1, 2, 3, 4]
my_set = {tuple(my_data)}
my_set

You can also check if a set is a subset of (is contained in) or a
superset of (contains all elements of) another set:

In [55]:
a_set = {1, 2, 3, 4, 5}
{1, 2, 3}.issubset(a_set)


In [None]:
a_set.issuperset({1, 2, 3})

Sets are equal if and only if their contents are equal:

In [56]:
{1, 2, 3} == {3, 2, 1}

![image.png](attachment:image.png)

sorted:
The sorted function returns a new sorted list from the elements of any
sequence:

In [57]:
sorted([7, 1, 2, 6, 0, 3, 2])


In [None]:
sorted("horse race")

zip:
zip “pairs” up the elements of a number of lists, tuples, or other
sequences to create a list of tuples:

In [58]:
seq1 = ["foo", "bar", "baz"]
seq2 = ["one", "two", "three"]
zipped = zip(seq1, seq2)


In [None]:
list(zipped)

zip can take an arbitrary number of sequences, and the number of
elements it produces is determined by the shortest sequence:

In [59]:
seq3 = [False, True]
list(zip(seq1, seq2, seq3))

A common use of zip is simultaneously iterating over multiple
sequences, possibly also combined with enumerate:

In [60]:
for index, (a, b) in enumerate(zip(seq1, seq2)):
    print(f"{index}: {a}, {b}")


reversed:
reversed iterates over the elements of a sequence in reverse order:

In [61]:
list(reversed(range(10)))

![image.png](attachment:image.png)

In [62]:
strings = ["a", "as", "bat", "car", "dove", "python"]
[x.upper() for x in strings if len(x) > 2]

![image.png](attachment:image.png)

Suppose we wanted
a set containing just the lengths of the strings contained in the
collection; we could easily compute this using a set comprehension:

In [63]:
unique_lengths = {len(x) for x in strings}
unique_lengths

We could also express this more functionally using the map function,
introduced shortly:

In [64]:
set(map(len, strings))

As a simple dictionary comprehension example, we could create a
lookup map of these strings for their locations in the list:

In [65]:
loc_mapping = {value: index for index, value in enumerate(strings)}
loc_mapping

Nested list comprehensions
Suppose we have a list of lists containing some English and Spanish
names:

In [66]:
all_data = [["John", "Emily", "Michael", "Mary", "Steven"],
            ["Maria", "Juan", "Javier", "Natalia", "Pilar"]]

Suppose we wanted to get a single list containing all names with two or
more a’s in them. We could certainly do this with a simple for loop:

In [67]:
names_of_interest = []
for names in all_data:
    enough_as = [name for name in names if name.count("a") >= 2]
    names_of_interest.extend(enough_as)
names_of_interest

You can actually wrap this whole operation up in a single nested list
comprehension, which will look like:

In [68]:
result = [name for names in all_data for name in names
          if name.count("a") >= 2]
result

At first, nested list comprehensions are a bit hard to wrap your head
around. The for parts of the list comprehension are arranged according
to the order of nesting, and any filter condition is put at the end as
before. Here is another example where we “flatten” a list of tuples of
integers into a simple list of integers:

In [69]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples for x in tup]
flattened

Keep in mind that the order of the for expressions would be the same if
you wrote a nested for loop instead of a list comprehension:

In [70]:
flattened = []

for tup in some_tuples:
    for x in tup:
        flattened.append(x)

It’s important to distinguish the syntax just shown from a list
comprehension inside a list comprehension, which is also perfectly
valid:

In [71]:
[[x for x in tup] for tup in some_tuples]

Functions:
Functions are declared with the def keyword. A function contains a
block of code with an optional use of the return keyword:

In [72]:
def my_function(x, y):
    return x + y

When a line with return is reached, the value or expression after return
is sent to the context where the function was called, for example:

In [73]:
my_function(1, 2)


In [None]:
result = my_function(1, 2)
result

There is no issue with having multiple return statements. If Python
reaches the end of a function without encountering a return statement,
None is returned automatically. For example:

In [74]:
def function_without_return(x):
    print(x)

result = function_without_return("hello!")
print(result)

Each function can have positional arguments and keyword arguments.
Keyword arguments are most commonly used to specify default values
or optional arguments. The main restriction on function arguments is that the keyword
arguments must follow the positional arguments (if any). 
Here we will define a function with an optional z
argument with the default value 1.5:

In [75]:
def my_function2(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

While keyword arguments are optional, all positional arguments must be
specified when calling a function.
You can pass values to the z argument with or without the keyword
provided, though using the keyword is encouraged:

In [76]:
my_function2(5, 6, z=0.7)


In [None]:
my_function2(3.14, 7, 3.5)


In [None]:
my_function2(10, 20)

Functions can access variables created inside the function as well as
those outside the function in higher (or even global) scopes. 
The local namespace is created when the function is called and is immediately
populated by the function’s arguments. After the function is finished,
the local namespace is destroyed (with some exceptions that are
outside the purview of this chapter). Consider the following function:

In [7]:
def func_1():
    a = []
    for i in range(5):
        a.append(i)

When func() is called, the empty list a is created, five elements are
appended, and then a is destroyed when the function exits. Suppose
instead we had declared a as follows:

In [8]:
a =[]
def func_2():
    for i in range(5):
        a.append(i)

In [9]:
func_1()
a


[]

In [11]:
func_1()
a

[]

In [12]:
func_2()
a

[0, 1, 2, 3, 4]

In [13]:
func_2()
a

[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]

nonlocal allows a function to modify variables defined in a higher-level
scope that is not global.
Assigning variables outside of the function’s scope is possible, but
those variables must be declared explicitly using either the global or
nonlocal keywords:

In [15]:
a = None
def bind_a_variable():
    global a
    a = []
bind_a_variable()
print(a)

[]


Returning Multiple Values:
You can return multiple values
from a function with simple syntax. 
Here’s an example:

In [23]:
def f():
    a =5
    b =6
    c =7
    return a, b, c
f()

(5, 6, 7)

A potentially attractive alternative to returning multiple values
like before might be to return a dictionary instead:

In [None]:
def f():
    a =5
    b =6
    c =7
    return {"a" : a, "b" : b, "c" : c}

Functions Are Objects: Suppose we were
doing some data cleaning and needed to apply a bunch of
transformations to the following list of strings:

In [27]:
states = ["   Alabama ", "Georgia!", "Georgia", "georgia", "FlOrIda",
          "south   carolina##", "West virginia?"]

One way to cleanse data is to use built-in string methods along with the re
standard library module for regular expressions:

In [28]:
import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub("[!#?]", "", value)
        value = value.title()
        result.append(value)
    return result

In [29]:
clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

An alternative approach that you may find useful is to make a list of the
operations you want to apply to a particular set of strings:

In [83]:
def remove_punctuation(value):
    return re.sub("[!#?]", "", value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for func in ops:
            value = func(value)
        result.append(value)
    return result

In [84]:
clean_strings(states, clean_ops)

map can be used as an alternative to list comprehensions without any
filter.You can use functions as arguments to other functions like the built-in
map function, which applies a function to a sequence of some kind:

In [85]:
for x in map(remove_punctuation, states):
    print(x)

Anonymous (Lambda) Functions:
They are defined with the lambda
keyword, which has no meaning other than “we are declaring an
anonymous function”:

In [86]:
def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2

It’s often less typing (and clearer) to pass a
lambda function as opposed to writing a full-out function declaration or
even assigning the lambda function to a local variable. Consider this
example:

In [30]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

You could also have written:

In [31]:
[x * 2 for x in ints]

[8, 0, 2, 10, 12]

As another example, suppose you wanted to sort a collection of strings
by the number of distinct letters in each string:

In [32]:
strings = ["foo", "card", "bar", "aaaa", "abab"]

In [33]:
strings.sort(key=lambda x: len(set(x)))
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

Generators:
Iterating over a dictionary yields the dictionary keys:

In [90]:
some_dict = {"a": 1, "b": 2, "c": 3}
for key in some_dict:
    print(key)

When you write for key in some_dict, the Python interpreter first
attempts to create an iterator out of some_dict:

In [91]:
dict_iterator = iter(some_dict)
dict_iterator

An iterator is any object that will yield objects to the Python interpreter
when used in a context like a for loop. Most methods expecting a list
or list-like object will also accept any iterable object. 

In [92]:
list(dict_iterator)

A generator is a convenient way, similar to writing a normal function, to
construct a new iterable object. Whereas normal functions execute and
return a single result at a time, generators can return a sequence of
multiple values by pausing and resuming execution each time thegenerator is used. 
To create a generator, use the yield keyword instead
of return in a function:

In [93]:
def squares(n=10):
    print(f"Generating squares from 1 to {n ** 2}")
    for i in range(1, n + 1):
        yield i ** 2

When you actually call the generator, no code is immediately executed:

In [94]:
gen = squares()
gen

It is not until you request elements from the generator that it begins
executing its code:

In [95]:
for x in gen:
    print(x, end=" ")

Another way to make a generator is by using a generator expression.
This is a generator analogue to list, dictionary, and set comprehensions.
To create one, enclose what would otherwise be a list comprehension
within parentheses instead of brackets:

In [37]:
gen = (x ** 2 for x in range(100))
gen

<generator object <genexpr> at 0x0000029E9DA64D60>

This is equivalent to the following more verbose generator:

In [40]:
def _make_gen():
    for x in range(100):
        yield x ** 2
gen = _make_gen()


Generator expressions can be used instead of list comprehensions as
function arguments in some cases:

In [97]:
sum(x ** 2 for x in range(100))


In [None]:
dict((i, i ** 2) for i in range(5))

itertools module:
The standard library itertools module has a collection of generators for
many common data algorithms. For example, groupby takes any
sequence and a function, grouping consecutive elements in the
sequence by return value of the function. Here’s an example:

In [46]:
import itertools
def first_letter(x):
    return x[0]

names = ["Alan", "Adam", "Wes", "Will", "Albert", "Steven"]

for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) # names is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


![image.png](attachment:image.png)


Errors and Exception Handling:
In data analysis applications, many functions
work only on certain kinds of input:

In [47]:
float("1.2345")


1.2345

In [None]:
float("something")

Suppose we wanted a version of float that fails gracefully, returning the
input argument:

In [50]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

In [51]:
attempt_float("1.2345")


1.2345

In [52]:
attempt_float("something")

'something'

You might notice that float can raise exceptions other than ValueError:

In [53]:
float((1, 2))

TypeError: float() argument must be a string or a real number, not 'tuple'

You might want to suppress only ValueError, since a TypeError (the
input was not a string or numeric value) might indicate a legitimate bug in
your program. To do that, write the exception type after except:

In [103]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

In [104]:
attempt_float((1, 2))

You can catch multiple exception types by writing a tuple of exception
types instead (the parentheses are required):

In [105]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In some cases, you may not want to suppress an exception, but you
want some code to be executed regardless of whether or not the code
in the try block succeeds. To do this, use finally:

In [None]:
f =open(path, mode="w")
try:
    write_to_file(f)
finally:
    f.close()

Here, the file object f will always get closed. Similarly, you can have
code that executes only if the try: block succeeds using else:

In [None]:
f =open(path, mode="w")
try:
    write_to_file(f)
except:
    print("Failed")
else:
    print("Succeeded")
finally:
    f.close()

To open a file for reading or writing, use the built-in open function with
either a relative or absolute file path and an optional file encoding:

In [106]:
path = "examples/segismundo.txt"
f = open(path, encoding="utf-8")

By default, the file is opened in read-only mode "r". We can then treat
the file object f like a list and iterate over the lines like so:
for line in f:
print(line)
The lines come out of the file with the end-of-line (EOL) markers
intact, so you’ll often see code to get an EOL-free list of lines in a file
like:

In [107]:
lines = [x.rstrip() for x in open(path, encoding="utf-8")]
lines

Closing the file releases its
resources back to the operating system:

In [108]:
f.close()

One of the ways to make it easier to clean up open files is to use the
with statement:

In [109]:
with open(path, encoding="utf-8") as f:
    lines = [x.rstrip() for x in f]

![image.png](attachment:image.png)

For readable files, some of the most commonly used methods are read,
seek, and tell. read returns a certain number of characters from the file:

In [110]:
f1 = open(path)
f1.read(10)


In [None]:
f2 = open(path, mode="rb")  # Binary mode
f2.read(10)

The read method advances the file object position by the number of
bytes read. tell gives you the current position:

In [111]:
f1.tell()


In [None]:
f2.tell()

You can check the default
encoding in the sys module:

In [112]:
import sys
sys.getdefaultencoding()

seek changes the file position to the indicated byte in the file:

In [113]:
f1.seek(3)


In [None]:
f1.read(1)


In [None]:
f1.tell()

Lastly, we remember to close the files:

In [114]:
f1.close()


In [None]:
f2.close()

To write text to a file, you can use the file’s write or writelines
methods:

In [115]:
path



In [None]:
with open("tmp.txt", mode="w") as handle:
    handle.writelines(x for x in open(path) if len(x) > 1)

with open("tmp.txt") as f:
    lines = f.readlines()

lines

![image.png](attachment:image.png)

Bytes and Unicode with Files:
The default behavior for Python files (whether readable or writable) is
text mode, which contrasts with binary mode. R
evisiting the file (which contains nonASCII characters with UTF-8 encoding) 
from the previous section, we have:

In [None]:
with open(path) as f:
    chars =f.read(10)
chars

In [None]:
len(chars)

If I open the file in "rb" mode instead, read
requests that exact number of bytes:

In [None]:
with open(path, mode="rb") as f:
    data =f.read(10)
data

Depending on the text encoding, you may be able to decode the bytes
to a str object yourself, but only if each of the encoded Unicode
characters is fully formed:

In [None]:
data.decode("utf-8")

In [None]:
data[:4].decode("utf-8")

Text mode, combined with the encoding option of open, provides a
convenient way to convert from one Unicode encoding to another:

In [54]:
sink_path ="sink.txt"
with open(path) as source:
    with open(sink_path, "x", encoding="iso-8859-1") as sink:
        sink.write(source.read())


NameError: name 'path' is not defined

In [None]:
with open(sink_path, encoding="iso-8859-1") as f:
    print(f.read(10))

If the file position falls in the middle of the bytes defining a Unicode
character, then subsequent reads will result in an error:

In [None]:
f =open(path, encoding='utf-8')
f.read(5)

In [None]:
f.seek(4)

In [None]:
f.read(1)

In [None]:
f.close()

In [121]:
import os
os.remove(sink_path)