### Notes on 3  Built-In Data Structures, Functions, and Files

#### Data Structures and Sequences

##### Tuple

* A tuple in Python is a fixed-length, immutable sequence of Python objects that cannot be changed once assigned.
* Tuples are created by comma-separated sequences of values, which can be enclosed in parentheses. Parentheses can be omitted in many contexts.

In [1]:
tup = (4, 5, 6)
print(tup)

(4, 5, 6)


* Tuples can be created from any sequence or iterator by invoking the tuple keyword.

In [3]:
tuple([4, 0, 2])
tup = tuple('string')
print(tup)

('s', 't', 'r', 'i', 'n', 'g')


* Elements in a tuple are accessed using square brackets [], similar to most other sequence types.

In [4]:
tup[0]

's'

* Tuples can contain mutable objects, like lists, but the objects' positions can't be changed.
* Concatenate tuples with +, repeat with *.

In [7]:
print((4, None, 'foo') + (6, 0) + ('bar',))
print(('foo', 'bar') * 4)

(4, None, 'foo', 6, 0, 'bar')
('foo', 'bar', 'foo', 'bar', 'foo', 'bar', 'foo', 'bar')


* Tuples support unpacking: a, b, c = (4, 5, 6).

In [8]:
tup = (4, 5, 6)
a, b, c = tup
print(b)

5


* Nested tuples can be unpacked: a, b, (c, d) = 4, 5, (6, 7).
* Swapping variables can be done via unpacking: b, a = a, b.

In [10]:
a, b = 1, 2
print(a)
print(b)

b, a = a, b
print(a)
print(b)

1
2
2
1


* Unpacking can be used in loop iterations over tuple sequences.

In [12]:
seq = [(1,2,3), (4,5,6), (7,8,9)]

for a, b, c in seq:
    print(f'a={a}, b={b}, c={c}')

a=1, b=2, c=3
a=4, b=5, c=6
a=7, b=8, c=9


* Python has a *rest syntax for capturing the rest of a tuple into a list.
* Discard unwanted variables in unpacking by assigning to _.

In [14]:
values = 1, 2, 3, 4, 5
a, b, *rest = values
print(rest)

a, b, *_ = values
print(_)

[3, 4, 5]
[3, 4, 5]


* Tuples have a count() method for counting value occurrences.

In [15]:
a = (1, 2, 2, 2, 3, 4, 2)
print(a.count(2))

4


#### Lists

* Lists in Python are mutable, variable-length sequences defined using square brackets [] or list().
* You can convert a tuple to a list: b_list = list(("foo", "bar", "baz")).
* Modify a list element by its index: b_list[1] = "peekaboo".

In [1]:
a_list = [2, 3, 7, None]

tup = ('foo', 'bar', 'baz')
b_list = list(tup)
print(b_list)

b_list[1] = 'peekaboo'
print(b_list)

['foo', 'bar', 'baz']
['foo', 'peekaboo', 'baz']


* Generate a list from an iterator or a generator: list(range(10)).
* Use append() to add elements to the end of the list.
* Insert elements at specific positions with insert(), but it's computationally expensive.
* Use pop() to remove and return an element at a specific index.
* Remove elements by value with remove(), which finds the first occurrence and removes it.
* Use the in keyword to check if a list contains a value: "dwarf" in b_list.

In [4]:
# Range
gen = range(10)
print(gen)
print(f'This is a list generated from the range: {list(gen)}')

# Adding and removing elements
b_list.append('dwarf')
print(f'This is a list after removing one element: {b_list}')

b_list.insert(1, 'red')
print(f'This is a list after inserting one element: {b_list}')

b_list.pop(2)
print(f'This is a list after removing one element at a particular index: {b_list}')

b_list.append('foo')
print(f'This is a list after appending one element: {b_list}')

b_list.remove('foo')
print(f'This is a list after removing one element: {b_list}')

print("dwarf" in b_list)


range(0, 10)
This is a list generated from the range: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
This is a list after removing one element: ['red', 'red', 'dwarf', 'dwarf', 'foo', 'dwarf']
This is a list after inserting one element: ['red', 'red', 'red', 'dwarf', 'dwarf', 'foo', 'dwarf']
This is a list after removing one element at a particular index: ['red', 'red', 'dwarf', 'dwarf', 'foo', 'dwarf']
This is a list after appending one element: ['red', 'red', 'dwarf', 'dwarf', 'foo', 'dwarf', 'foo']
This is a list after removing one element: ['red', 'red', 'dwarf', 'dwarf', 'dwarf', 'foo']
True


* Concatenate lists with + or add multiple elements with extend().
* Sort a list with sort(), you can pass a secondary sort key as a function with key= parameter.

In [5]:
# Concatenating and combining lists
print([4, None, 'foo'] + [7, 8, (2, 3)])

x = [4, None, 'foo']
x.extend([7, 8, (2, 3)])
print(x)

# Sorting
a = [7, 2, 5, 1, 3]
a.sort()
print(a)

# Sorting by length
b = ['saw', 'small', 'He', 'foxes', 'six']
b.sort(key=len)
print(b)

[4, None, 'foo', 7, 8, (2, 3)]
[4, None, 'foo', 7, 8, (2, 3)]
[1, 2, 3, 5, 7]
['He', 'saw', 'six', 'small', 'foxes']


* Slice lists using start:stop notation in square brackets, you can also assign to slices.
* Use a step after a second colon to get every other element: seq[::2], -1 to reverse a list or tuple.
* Remember, list operations like checking for value, insertion, and removal can be slower than in sets or dictionaries. Use collections.deque for efficient insertions and deletions at both ends.

In [6]:
# Slicing
seq = [7, 2, 3, 7, 5, 6, 0, 1]
print(seq[1:5])

seq[3:4] = [6, 3]
print(seq)

# Negative indices
print(seq[-4:])
print(seq[-6:-2])

# Step
print(seq[::2])
print(seq[::-1])

[2, 3, 7, 5]
[7, 2, 3, 6, 3, 5, 6, 0, 1]
[5, 6, 0, 1]
[6, 3, 5, 6]
[7, 3, 3, 6, 1]
[1, 0, 6, 5, 3, 6, 3, 2, 7]


#### Dictionaries

<b>1. Creation and Access</b>

Dictionaries in Python are created using curly braces {} with keys and values separated by colons :. They can also be created by the dict() constructor.

In [19]:
empty_dict = {}

d1 = {'a': 'some value', 'b': [1, 2, 3, 4]}
print(d1)
d2 = dict(a = 'some value', b = [1, 2, 3, 4])
print(d2)

{'a': 'some value', 'b': [1, 2, 3, 4]}
{'a': 'some value', 'b': [1, 2, 3, 4]}


Elements in a dictionary can be accessed, inserted, or modified using the square bracket notation.

In [20]:
print(d1['a'])

d1["c"] = "new value"
print(d1["c"])

some value
new value


<b>2. Checking for Keys and Deleting Entries</b>

To check if a dictionary contains a key, you can use the in keyword.

In [10]:
print("a" in d1)

True


You can delete dictionary entries using the `del` keyword or the `pop()` method. The `pop()` method returns the value of the key being removed.

In [11]:
del d1["c"]
removed_value = d1.pop("a")
print(removed_value)

some value


<b>3. Iterating over Dictionaries</b>

The `keys()`, `values()` and `items()` methods provide iterators over dictionary's keys, values and key-value pairs repectively.

In [12]:
print(list(d1.keys()))
print(list(d1.values()))
print(d1.items())

['b']
[[1, 2, 3, 4]]
dict_items([('b', [1, 2, 3, 4])])


<b>4. Merging Dictionaries</b>

The `update()` method merge dictionaries. If keys overlap, the values from the provided dictionary are used.

In [13]:
d1.update({'b': 'foo', 'c': 12})
print(d1)

{'b': 'foo', 'c': 12}


<b>5. Default values</b>

The `get()` and `pop()` methods can return a default value if the key is not presented.

In [14]:
print(d1.get("non_existent_key", "default_value"))

default_value


The `setdefault()` method and `defaultdict` class from the `collections` modeule help simplify assigning default values.

In [15]:
from collections import defaultdict

d = defaultdict(list)
words = ["apple", "bat", "bar", "atom", "book"]
for word in words:
    d[word[0]].append(word)

print(dict(d))

{'a': ['apple', 'atom'], 'b': ['bat', 'bar', 'book']}


<b>6. Hashability and Key types</b>

Dictionary keys must be immutable (hashable) Python objects. Strings, integers, floats, tuples, and frozensets can serve as dictionary keys, but lists and sets cannot. You can use the `hash()`function to check if an object is hashable.

In [16]:
print(hash("string"))

-894698539344563496


To use a list as a key, it must first be converted into a tuple.

In [17]:
d = {}
d[tuple([1, 2, 3])] = 5
print(d)

{(1, 2, 3): 5}


#### Set
1. Creation and Access

Sets in Python are an unordered collection of unique elements. Sets can be created either through the `set()` function or using curly braces {}.

In [1]:
s1 = set([2, 2, 2, 1, 3, 3])
s2 = {2, 2, 2, 1, 3, 3} # another way to create the same set
print(s1)
print(s2)

{1, 2, 3}
{1, 2, 3}


2. Set Operations

Sets in Python support mathematical set operations such as union (|), intersection (&), difference (-), and symmetric difference (^).

In [15]:
a = {1, 2, 3, 4, 5}
b = {3, 4, 5, 6, 7, 8}

# Union
print("The union of these two sets is the set\nof distinct elements occurring in either set.")
print(a.union(b))
print(a | b)
print("")
# Intersection
print("The intersection of these two sets is\nthe set containing elements occurring in both sets.")
print(a.intersection(b))
print(a & b)
print("")
# Difference
print("The difference of these two sets is the\nset containing elements occurring in the first set but not the second.")
print(a.difference(b))
print(a - b)
print("")
# Symmetric difference
print("The symmetric difference of these two sets\nis the set containing elements occurring in either set but not both.")
print(a.symmetric_difference(b))
print(a ^ b)

The union of these two sets is the set
of distinct elements occurring in either set.
{1, 2, 3, 4, 5, 6, 7, 8}
{1, 2, 3, 4, 5, 6, 7, 8}

The intersection of these two sets is
the set containing elements occurring in both sets.
{3, 4, 5}
{3, 4, 5}

The difference of these two sets is the
set containing elements occurring in the first set but not the second.
{1, 2}
{1, 2}

The symmetric difference of these two sets
is the set containing elements occurring in either set but not both.
{1, 2, 6, 7, 8}
{1, 2, 6, 7, 8}


3. In-Palce Operations

All the logical set operations have in-place counterparts which allow you to replace the contents of the set on the left side of the operation with the result. This can be more efficient for large sets.

In [3]:
c = a.copy()
c |= b
print(c)

d = a.copy()
d &= b
print(d)

{1, 2, 3, 4, 5, 6, 7, 8}
{3, 4, 5}


4. Adding and Removing Elements

You can add elements to a set using the add() method, and remove elements using the remove() method. If you need to remove an element that might not exist, you can use discard(), which won't raise an error.

In [4]:
a.add(6)
print(a)

a.remove(6)
print(a)

{1, 2, 3, 4, 5, 6}
{1, 2, 3, 4, 5}


5. Hashability and Element Types

Similar to dictionary keys, set elements must be hashable. Immutable objects like strings, integers, tuples, and frozensets can be elements of a set. Lists and dictionaries cannot be elements of a set due to their mutability.

In [5]:
my_data = [1, 2, 3, 4]
my_set = {tuple(my_data)}
print(my_set)

{(1, 2, 3, 4)}


6. Set Comparisons

You can compare sets using methods such as `issubset()`, `issuperset()`, and `isdisjoint()`. Two sets are considered equal if their contents are equal, regardless of order.

In [7]:
a_set = {1, 2, 3, 4, 5}
print({1, 2, 3}.issubset(a_set))
print(a_set.issuperset({1, 2, 3}))
print({1, 2, 3} == {3, 2, 1})
print(a_set.isdisjoint({6, 7, 8}))  

True
True
True
True


#### Built-In Sequence Functions

1. `enumarate()`

The `enumerate()` function is used for getting an indexed list, meaning you get the elements along with their index values.

In [16]:
collection = ["a", "b", "c"]
for i, value in enumerate(collection):
    print(f'Index {i} has value {value}')

Index 0 has value a
Index 1 has value b
Index 2 has value c


2. `sorted()`

The `sorted()` function returns a new sorted list from the elements of any sequence, leaving the original sequence unaltered. It can take a sequence of any type and return the sorted list.

In [17]:
print(sorted([7, 1, 2, 6, 0, 3, 2]))
print(sorted("horse race"))

[0, 1, 2, 2, 3, 6, 7]
[' ', 'a', 'c', 'e', 'e', 'h', 'o', 'r', 'r', 's']


3. `zip()`

The `zip()` function takes a number of iterable items (lists, tuples, etc.) and returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences. The returned list is truncated in length to the length of the shortest input iterable.


In [18]:
seq1 = ["foo", "bar", "baz"]
seq2 = ["one", "two", "three"]
seq3 = [False, True]

print(list(zip(seq1, seq2)))
print(list(zip(seq1, seq2, seq3)))

[('foo', 'one'), ('bar', 'two'), ('baz', 'three')]
[('foo', 'one', False), ('bar', 'two', True)]


4. `reversed()`

The `reversed()` function returns a reverse iterator. It doesn’t modify the original sequence but creates a new one with elements in reverse order.

In [20]:
print(list(reversed(range(10))))

[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]


Note: Since reversed() returns an iterator, you need to use a function like list() to see the contents immediately.

5. Combination of Functions

These functions can often be used together for more complex operations. For example, `zip()` can be combined with `enumerate()` to iterate over multiple sequences at once while still keeping track of the index.

In [21]:
for i, (a, b) in enumerate(zip(seq1, seq2)):
    print(f'{i}: {a}, {b}')

0: foo, one
1: bar, two
2: baz, three


#### List, Set, and Dictionary Comprehensions

1. List comprehensions:

This is a way of creating lists in Python using a single line of code. They are used for creating new lists from other iterables (like lists, tuples, sets, etc.). The output list can be created by applying a condition and/or function on the input iterable.

Here's an example where we're converting all the strings longer than 2 characters to upper case.

In [22]:
strings = ["a", "as", "bat", "car", "dove", "python"]
result = [x.upper() for x in strings if len(x) > 2]
print(result)

['BAT', 'CAR', 'DOVE', 'PYTHON']


2. Set comprehensions:

These are similar to list comprehensions, but they create a set instead of a list. Sets, by nature, contain unique elements.

Here's an example where we're finding the unique lengths of the strings.

In [23]:
unique_lengths = {len(x) for x in strings}
print(unique_lengths)

{1, 2, 3, 4, 6}


3. Dictionary comprehensions:

These are similar to list and set comprehensions, but they create a dictionary instead. Each key-value pair is generated in a loop.

Here's an example where we're creating a lookup map of the strings for their locations in the list.

In [24]:
loc_mapping = {val: index for index, val in enumerate(strings)}
print(loc_mapping)

{'a': 0, 'as': 1, 'bat': 2, 'car': 3, 'dove': 4, 'python': 5}


4. Nested list comprehensions:

A list comprehension can contain another list comprehension in it. This is useful when dealing with lists of lists.

For instance, suppose we have a list of names and we want to get a single list containing all names with two or more 'a's in them. This can be done with a single nested list comprehension:

In [26]:
all_data = [["John", "Emily", "Michael", "Mary", "Steven"],
            ["Maria", "Juan", "Javier", "Natalia", "Pilar"]]

result = [name for names in all_data for name in names if name.count("a") >= 2]
print(result) 

['Maria', 'Natalia']


Or another example where we "flatten" a list of tuples into a list:

In [27]:
some_tuples = [(1, 2, 3), (4, 5, 6), (7, 8, 9)]
flattened = [x for tup in some_tuples for x in tup]
print(flattened) 

[1, 2, 3, 4, 5, 6, 7, 8, 9]


5. List comprehension inside a list comprehension:

In this case, instead of flattening the data, we maintain the same level of nesting as the original data.

Here's an example:

In [28]:
nested_list = [[x for x in tup] for tup in some_tuples]
print(nested_list)

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]


#### Functions

A function in Python is defined using the `def` keyword, followed by the function's name and parentheses ( ).

Here's a simple Python function that takes two arguments and returns their sum:

In [29]:
def my_function(x, y):
    return x + y

result = my_function(1, 2)
print(result)

3


If a function doesn't explicitly return a value using the return statement, it will return None by default:

In [31]:
def function_without_return(x):
    print(x)

result = function_without_return("hello!")
print(result)

hello!
None


A function can take positional arguments and keyword arguments. Positional arguments are mandatory and must be provided in the same order as they are defined. Keyword arguments are optional and can be provided in any order. If a keyword argument is not provided, its default value (specified in the function definition) is used:

In [32]:
def my_function2(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

print(my_function2(5, 6, z=0.7)) 
print(my_function2(3.14, 7, 3.5))
print(my_function2(10, 20))

0.06363636363636363
35.49
45.0


1. Namespaces and Scope

A <b>namespace</b> is essentially a container where names are mapped to objects. These names can be variables, functions, and other objects. Namespaces are essential in Python to avoid naming conflicts.

There are various namespaces like:

* Local namespace: Specific to a function or a method. Created when a function is called and cleared once the function exits.

* Global namespace: Specific to a module. Created when a module is imported.

* Built-in namespace: Exists as long as the Python interpreter is running. It contains the built-in functions and exceptions.

The <b>scope</b>. of a variable determines its visibility throughout the code. The sequence in which namespaces are checked is: local → enclosing → global → built-in. This sequence is termed as the LEGB rule.

In [37]:
def func():
    a = []
    for i in range(5):
        a.append(i)

a = []
def func():
    for i in range(5):
        a.append(i)

<b>Global and nonlocal Keywords</b>

`global` and `nonlocal` keywords modify variables in higher-level scopes. global affects global variables, while nonlocal impacts non-global variables.

In [39]:
a = None
def bind_a_variable():
    global a
    a = []

bind_a_variable()
print(a)

[]


2. Returning Multiple Values

Python easily returns multiple values as tuples. Unpack them into variables for cleaner code.

In [40]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

a, b, c = f()

3. Functions are objects

Python treats functions as objects, allowing for powerful constructs not as easily achieved in other languages. Consider a scenario involving messy user-submitted survey data represented as a list of strings:

In [41]:
states = ["   Alabama ", "Georgia!", "Georgia", "georgia", "FlOrIda",
          "south   carolina##", "West virginia?"]

<b>Cleaning Strings Using Functions</b>

Cleaning this data involves tasks like:
* Removing leading and trailing whitespace.
* Eliminating punctuation symbols.
* Standardizing capitalization.

A common approach utilizes built-in string methods and the re module for regular expressions:

In [43]:
import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub("[!#?]", "", value)
        value = value.title()
        result.append(value)
    return result

print(clean_strings(states))

['Alabama', 'Georgia', 'Georgia', 'Georgia', 'Florida', 'South   Carolina', 'West Virginia']


<b>Cleaning Using a List of Operations</b>

An alternative strategy revolves around creating a list of transformation operations:

In [44]:
def remove_punctuation(value):
    return re.sub("[!#?]", "", value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings2(strings, ops):
    result = []
    for value in strings:
        for function in ops:
            value = function(value)
        result.append(value)
    return result

print(clean_strings2(states, clean_ops))

['Alabama', 'Georgia', 'Georgia', 'Georgia', 'Florida', 'South   Carolina', 'West Virginia']


This approach separates the transformations from the core function, enhancing reusability and generality.

<b>Applying Functions as Arguments</b>

Functions can be passed as arguments to other functions, exemplified by the `map` function. `map` applies a specified function to each element of a sequence:

In [45]:
for x in map(remove_punctuation, states):
    print(x)

   Alabama 
Georgia
Georgia
georgia
FlOrIda
south   carolina
West virginia


`map` is a versatile tool, serving as an alternative to list comprehensions when no filtering is necessary.

3. Anonymous (Lambda) Functions

Lambda functions are concise anonymous functions. They're practical for data transformations and arguments in other functions.

In [47]:
def short_function(x):
    return x * 2

equiv_anon = lambda x: x * 2

print(short_function(2))

4


In [48]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

ints = [4, 0, 1, 5, 6]

print(apply_to_list(ints, lambda x: x * 2))

[8, 0, 2, 10, 12]


4. Generators

Many Python objects support iteration, achieved through the iterator protocol.The iterator protocol allows objects to be iterable. For instance, iterating over a dictionary yields its keys:


In [52]:
some_dict = {"a": 1, "b": 2, "c": 3}

for key in some_dict:
    print(key)

a
b
c


The Python interpreter creates an iterator from `some_dict` when you use a `for` loop:

In [51]:
dict_iterator = iter(some_dict)
dict_iterator

<dict_keyiterator at 0x111856ef0>

An iterator returns objects to the interpreter when used in a loop context. Methods like min, max, sum, list, and tuple can accept any iterable object, not just lists.

Generators provide a way to construct iterable objects similar to regular functions. Generators return a sequence of values by pausing and resuming execution. Use the yield keyword in a function to create a generator:

In [55]:
def squares(n=10):
    print("Generating squares from 1 to {0}".format(n ** 2))
    for i in range(1, n + 1):
        yield i ** 2

gen = squares()
print(gen)

#Generators don't execute immediately upon creation; they execute when elements are requested.
# Request elements using a for loop:

for x in gen:
    print(x, end=" ")

<generator object squares at 0x111803dd0>
Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

<b>Generator Expressions</b>

Generator expressions provide a way to create generators using syntax similar to list comprehensions. Enclose a comprehension in parentheses to create a generator expression:

In [57]:
gen = (x ** 2 for x in range(100))
print(gen)

for x in gen:
    print(x, end=" ")

<generator object <genexpr> at 0x10c116740>
0 1 4 9 16 25 36 49 64 81 100 121 144 169 196 225 256 289 324 361 400 441 484 529 576 625 676 729 784 841 900 961 1024 1089 1156 1225 1296 1369 1444 1521 1600 1681 1764 1849 1936 2025 2116 2209 2304 2401 2500 2601 2704 2809 2916 3025 3136 3249 3364 3481 3600 3721 3844 3969 4096 4225 4356 4489 4624 4761 4900 5041 5184 5329 5476 5625 5776 5929 6084 6241 6400 6561 6724 6889 7056 7225 7396 7569 7744 7921 8100 8281 8464 8649 8836 9025 9216 9409 9604 9801 

Generator expressions can replace list comprehensions in some function arguments. They can lead to performance improvements in cases with many elements.

<b>`itertools` Module</b>

The `itertools` module in the standard library offers generators for common data algorithms. The groupby function groups consecutive elements by a specified key function:

In [58]:
import itertools

def first_letter(x):
    return x[0]

names = ["Alan", "Adam", "Wes", "Will", "Albert", "Steven"]

for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names))

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


<b>Errors and Exception Handling</b>

* Handling Python errors or exceptions gracefully is crucial for building robust programs.
* In data analysis applications, certain functions only work with specific input types.
* As an example, Python's float function can convert a string to a float, but it raises a `ValueError` on invalid input.

<b>Handling Exceptions with try and except</b>

* You can handle exceptions using the try and except blocks.
* To create a version of float that fails gracefully, enclose it in a try/except block.
* Code in the except block is executed when an exception occurs inside the try block.
* The except part can be refined to catch specific exceptions (e.g., `ValueError`, `TypeError`).

<b>Handling Specific Exceptions</b>

* `float` can raise exceptions other than `ValueError`, like `TypeError`.
* You can target specific exceptions to handle using except.
* This allows you to suppress only specific types of exceptions, ignoring legitimate bugs.

<b>Handling Multiple Exceptions</b>
* You can handle multiple exception types by specifying a tuple of exception types in `except`.

<b>Using finally and else</b>
* The finally block executes code regardless of whether an exception was raised.
* For instance, closing a file using finally ensures it's closed, even if an error occurs.
* You can use the else block to execute code only if the try block succeeds.


#### Files and the Operating System

To open a file for reading or writing, use the built-in `open` function with either a relative or absolute file path and an optional file encoding:

In [66]:
path = "../data/segismundo.txt"
f = open(path, encoding="utf-8")

By default, the file is opened in read-only mode "r". You can iterate over the lines of the file as if it were a list:

In [67]:
for line in f:
    print(line)

f.close()

Sueña el rico en su riqueza,

que más cuidados le ofrece;



sueña el pobre que padece

su miseria y su pobreza;



sueña el que a medrar empieza,

sueña el que afana y pretende,

sueña el que agravia y ofende,



y en el mundo, en conclusión,

todos sueñan lo que son,

aunque ninguno lo entiende.





To ensure proper file closure, use the with statement:

In [68]:
with open(path, encoding="utf-8") as f:
    lines = [x.rstrip() for x in f]

Use the read method to read a certain number of characters from the file:

In [69]:
f1 = open(path)
data = f1.read(10)
print(data)

Sueña el r


The tell method gives you the current position:

In [70]:
position = f1.tell()
print(position)

11


To write text to a file, use the file’s write or writelines methods:

In [71]:
#with open("tmp.txt", mode="w") as handle:
#    handle.writelines(x for x in open(path) if len(x) > 1)

Python file modes include "r" (read), "w" (write), "x" (exclusive write), and "a" (append). You can also add "b" for binary mode and "t" for text mode.

In text mode, files are read as Unicode strings. In binary mode ("rb"), files are read as raw bytes:

In [72]:
with open(path) as f:
    chars = f.read(10)  # Read 10 characters

with open(path, mode="rb") as f:
    data = f.read(10)  # Read 10 bytes

When decoding bytes to strings, ensure each encoded Unicode character is fully formed:

In [73]:
decoded_chars = data.decode("utf-8")  # Decode bytes to string

Don't forget to close files after you're done:

In [74]:
f1.close()