# Chapter 3: Built-in Data Structures, Functions, and Files
This chapter of the book discusses Python's built-in features for data manipulation, which are essential for effective data analysis, even when using powerful libraries like pandas and NumPy. The chapter focuses on *data structures, functions, and file objects.*

# Data Structure

## Tuple
### Coding Question:
* Create a tuple named coordinates containing the latitude and longitude of a location of your choice (e.g., 34.0522, -118.2437). Then, write code to access and print only the latitude value from the tuple.

### Conceptual Questions:
* Why are tuples preferred over lists when you need to ensure that data remains unchanged?
  * tuple does not support change of items, while list allows users to change items in the list
* Explain the significance of immutability in the context of data integrity and program safety.
  * I guess it ensures that data remains the same after being created?


In [3]:
example_tuple = (34.0522, -118.2437)  # latitude/longitude
latitude = example_tuple[0]
# print(latitude)

# example_tuple[0] = 123  #TypeError: 'tuple' object does not support item assignment

## List

## Coding Questions:
* Create a list named colors with at least three different color names. Add a new color to the end of the list. Then, remove the second color from the list.
* Write a Python program that takes a list of numbers as input and returns a new list containing only the even numbers from the input list.
## Conceptual Questions:
* How does the mutability of lists make them suitable for scenarios requiring data modification and updates?
  * Mutability allows changing item values, so this is suitable for scenarios that data might need to be updated
* What are the advantages and disadvantages of using lists compared to tuples?
  * Pros: If the data is wrongly created or inputed, it can be corrected
  * Cons: If the users accidently update the value, the value will be changed without any protection


In [6]:
colors = ["red", "green", "blue"]
colors.append("white")  # adda new color to the end of the list
removed_color = colors.pop(1)  # remove the second color from the list
# print(colors)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [8]:
def filter_even_numbers(numbers):
    even_numbers = [num for num in numbers if num % 2 == 0]
    return even_numbers

input_list = [int(x) for x in input("Enter number separated by spaces: ").split()]  # allow users to input numbers and separate the input into a list
result = filter_even_numbers(input_list)
print(result)

Enter number separated by spaces:  3 5 7 9 2 4 10000 0


[2, 4, 10000, 0]


## Built-in Sequence Functions
### Coding Questions:
* Use the enumerate function to print the index and value of each item in a list of fruits.
* Given two lists of the same length, use the zip function to create a new list of tuples, where each tuple contains the corresponding elements from the two input lists.
### Conceptual Question:
* Explain how using built-in sequence functions like enumerate, sorted, zip, and reversed can improve code readability and efficiency.
  * *enumerate() function adds a counter to an iterable* and returns it as an *enumerate* object. It is useful when both the index and the value are needed in a loop
  * *zip() function* allows for combining multiple iterables into tuples, creating a new iterable where corresponding elements from the input iterables are paird together. The output stops at the shortest iterable.

#### Note: An iterable is any object that can be looped over(e.g. list, tuple, string, dict, set, etc.)

In [9]:
fruits = ["apple", "banana", "orange"]

for index, fruit in enumerate(fruits):
    print(f"Index: {index}: {fruit}")

Index: 0: apple
Index: 1: banana
Index: 2: orange


In [10]:
#  Attention! The outputs from zip stops at the 3rd element because one of the iterables is shorter than others
names = ["Rex", "Chou", "Dita", "Vigante"]
nationality = ["TW", "TW", "LV"]
age = [24, 21, 21, 19]

for name, nationality, age in zip(names, nationality, age):
    print(f"{name}, {nationality}, {age}")

Rex, TW, 24
Chou, TW, 21
Dita, LV, 21


## dict

### Introduction
* What is dictionary
  * (also known as hash map or associative array) It is a flexible data structure in Python that stores a collection of key-value pairs, where keys must me immutable objects like strings, numbers, or tuples with immutable elements, and values can be any Python object.
 
* What are pros?
  * fast lookups and flexible key-value storage -> efficient for tasks involving data retrieval and manipulation based on unique identifiers
 
* When do you use it?
  * Dictionary is a very common data structure used in the industry. A software company can use dict to store user profile, an e-commerce can use dict to store product information, a language learning app can use dict to store the translation of one sentence.
    
### Coding Questions:
* Create a dictionary named student that stores the name, age, and grade of a student. Print the student's age from the dictionary.
* Write a function that takes a dictionary as input and returns a new dictionary with the keys and values swapped.
  * ValueError: too many values to unpack (expected 2); The error occurs because I'm trying to unpack the elements of a dict directly as *for key, value in student_dict*, but this only iterates over the keys of the dictionary by default. To make the program iterates over key-value pairs, use the *.items()* method of the dictionary.
  * Note: *student_dict.keys()* iterates over keys; *student_dict_values()* iterates over values
### Conceptual Questions:
* Why are dictionaries useful for representing data with key-value relationships, such as in configuration settings, database records, or user profiles?
  * Dictionaries are advantageous for representing data with key-value relationships *because they enable swift and efficient retrieval of values based on their unique keys*. This makes them well-suited for scenarios like configuration settings where a specific setting (value) can be accessed using its corresponding name (key), or database records where data can be retrieved using a primary key.
* What data types are permissible as keys in a dictionary?
  * Keys in Python dictionaries must be *immutable objects*. This means that the key's value cannot be changed after it is created. So keys can be scalar type (int, float, string), or a tuple.

In [11]:
def dict_value_swap(student_dict):
    new_dict = {}
    for key, value in student_dict.items():
        new_dict[value] = key
    return new_dict

In [12]:
student = {"name": "Rex",
           "age": 24,
           "grade": 8.5}
student_age = student['age']

In [13]:
inverse_student_dict = dict_value_swap(student)
print(inverse_student_dict)

{'Rex': 'name', 24: 'age', 8.5: 'grade'}


## set

### Introduction
* unordered collection of *unique* elements, meaning it automatically removes duplicates

### Coding Questions:

* Create two sets, set1 and set2, containing some common and unique numbers. Use set operations to find:
  * The union of the two sets
  * The intersection of the two sets 
  * The elements in set1 but not in set2
  * The elements in either of the set but not in both
  * Check whether all elements of one set are included in another
* Write a program that takes a list of words as input and returns a set containing only the unique words. 

### Conceptual Question:
* When might you choose to use a set over a list or a dictionary?
  * contain only unique value?

In [23]:
set_a = {1, 2, 3}
set_b = {3, 4, 5}

# union_set = set_a | set_b
union_set = set_a.union(set_b)

print("Set A:", set_a)
print("Set B:", set_b)
print("Union of A and B:", union_set)

# intersection_set = set_a & set_b
intersection_set = set_a.intersection(set_b)

print("Intersection of A and B:", intersection_set)

# difference_set = set_a - set_b
difference_set = set_a.difference(set_b)

print("Elements in set_a but not in set_b:", difference_set)

symmetric_difference = set_a ^ set_b
print("Elements in either of the sets, but not in both:", symmetric_difference)

is_subset = set_a <= set_b
print("Whether all elements in set_a are contained in set_b: ", is_subset)

Set A: {1, 2, 3}
Set B: {3, 4, 5}
Union of A and B: {1, 2, 3, 4, 5}
Intersection of A and B: {3}
Elements in set_a but not in set_b: {1, 2}
Elements in either of the sets, but not in both: {1, 2, 4, 5}
Whether all elements in set_a are contained in set_b:  False


In [28]:
def set_convertor(ls:list):
    return set(ls)

example_list = ["Hi", "Hi", "Hi I am", "Hi I am", "Hi I AM"]
set_convertor(example_list)

{'Hi', 'Hi I AM', 'Hi I am'}

## List, Set, and Dict Comprehensions
List and dict comprehensions are concise ways to create lists or dictionaries in Python using a single line of code, often involving a for loop and optional conditional logic. This enhance code readability and reduce the need for boilerplate loops, making them ideal for compact and expressive transformations.

The structure of the list comprehension (source: https://www.freecodecamp.org/news/list-comprehension-in-python/) 
![image.png](attachment:6d304891-5bd0-4c7d-8eb1-db6788412eeb.png)

### Coding Questions:

* Use a list comprehension to create a list of the squares of all even numbers from 1 to 10. 
* Use a dictionary comprehension to create a dictionary that maps the numbers from 1 to 5 to their corresponding Roman numeral representations (e.g., 1: "I", 2: "II", etc.).
  * How to create a dictionary through the dict comprehension? *{key: value for loop}*

### Conceptual Question:

* How do comprehensions enhance code conciseness and expressiveness compared to traditional loop-based approaches for creating lists, sets, and dictionaries?
  * Pros
      * Conciseness: It simplifies the code by reducing multiple lines of loops into a single, readable line
      * Performance: It is slightly faster than tranditional loops in many cases
      * Readability: Provide a clean way to describe transformations or filtering logic
  * Cons
    * Complexity for Beginners
    * Limited Debugging: Harder to debug compared to traditional loop because of lack of intermediate steps

In [34]:
squared_even_number = [i ** 2 for i in range(1, 11) if i % 2 == 0]
print(squared_even_number)

[4, 16, 36, 64, 100]


In [36]:
roman_numerals = ["I", "II", "III", "IV", "V"]
num_to_roman = {i: roman_numerals[i - 1] for i in range(1, 6)}

# Functions

## Namespaces, Scope, and Local Functions

### Introduction
A nested function in Python is *a function defined inside another function*. It can access variables in the containing(enclosing) function's scope and it can be used to hide implementation details or group related functions (e.g. Inventory Management - add_item(), remove_item. These functions can be grouped as they are relevant).

### Coding Question:

* Define a function called calculate_area that takes the length and width of a rectangle as arguments and returns its area. Within the function, define a nested function called is_square that checks if the rectangle is a square and prints a message accordingly.

### Conceptual Questions:

* Explain the concept of variable scope in Python, differentiating between local and global scope. 
What are the potential benefits and drawbacks of using nested functions?
  * Pros
    * Encapsulation: Allows related functionality to be grouped together, improving code organization and readability. (E.g. calculate area and check whether it is a square or not are two relevant functions)
    * Scope Management: Nested functions can *access variables from the parent function*, providing a clean way to manage local data.
  * Cons
    * Reduced Flexibility: The nexted functions are accessible only from within the enclosing function

In [42]:
def calculate_area(length, width):
    def is_square():
        return length == width

    shape_type = "square" if is_square() else "rectangle"
    print(f"It is a {shape_type}.")
   
    return length * width

area = calculate_area(10, 5)
print(f"Its area is {area}.")

It is a rectangle.
Its area is 50.


## Returning Multiple Values
### Coding Question:

* Write a function that calculates the sum, difference, product, and quotient of two numbers and returns all four results. 

### Conceptual Question:

* How does the ability to return multiple values from a function improve code organization and readability?
  * Simplifies Function Interfaces: Allow a function to serve as a compact and clear interface for complex operations, reducing the needs for multiple function calls or parameters
  * Enhances Code Readibility: Because the similar information are grouped together, it avoids splitting information across several variables or functions.

In [52]:
def math_operation(num1, num2):
    sum_result = num1 + num2
    difference_result = num1 - num2
    product_result = num1 * num2
    quotient_result = num1 / num2 if num2 != 0 else float("inf") # handle division by zero; positive infinity
    return (sum_result, difference_result, product_result, quotient_result)
results = math_operation(10, 0)
print(results)

(10, 10, 0, inf)


## Functions Are Objects (Still Quite Difficult to Understand)

### Introduction
* What does "functions are objs mean?"
  * functions can be treated as first-class citizens. This means they can be assigned to variables, passed as arguments, or returned from other functions. This feature enhances flexibility and allows for more advanced programming patterns.
* What are pros and cons
  * Pros
      * increased flexibility: functions can be dynamically changed or composed, enabling more modular and reusable code.
      * higher abstraction: allows more advanced functional programming techniques like decorators and closures.
  * Cons
    * Complexity: Using functions as objects can make the code harder to read and understand for beginners.
    * Debugging can be more challenging: Tracking function behavior becomes more complex when they are passed around as objects.
* When do you use it?
  * when you need to create higher-order functions, apply decorators, or implement more complex control flows that benefit from the flexibility and abstraction they offer.
### Coding Question:

* Create a list of functions, each performing a different mathematical operation (e.g., addition, subtraction, multiplication). Use a loop to iterate through the list of functions, applying each function to a set of input numbers and printing the results.

### Conceptual Question:

* How does treating functions as first-class objects enhance the flexibility and power of Python as a programming language? 

In [19]:
# A function as an object

def greet(name):
    return f"Hello, {name}!"

greeting = greet
print(greeting("Rex"))

# A function as an argument

def apply_function(f, arg):
    return f(arg)
    
result = apply_function(greet, "Rex")
print(result)

# Return a function from another function
def make_adder(x):
    def adder(y):
        return x + y
    return adder

add_five = make_adder(5)  # add_five is an object from function "adder", which rememebrs x = 5
print(add_five(10))  # here 10 is an argument for y from adder()

Hello, Rex!
Hello, Rex!
15


## Anonymous (Lambda) Functions
### Introduction

* What is Lambda functions?
  * A small, anoymous function in Python defined using the *lambda* keyword. It is useful for creating simple, one-line functions without needing a full *def* block. Lambda functions are often used when a function is needed *temporarily*, such as sorting, filtering, or as an argument to higher-order functions.

* What is the benefit of using Lambda functions?
  * Pros
    * Concise and quick to write
    * Useful for short, one-off functions
  * Cons
    * Might reduce readability if overused
    * Limited to single expression, making complex operations hard to manage
* When will you use it?
  * simple operation or passing a quick function to another function (e.g. *map*, *filter*, *sorted*)

### Coding Question:

* Use a lambda function to square a number. Assign the lambda function to a variable and call it with an input value. 

### Conceptual Question:

* When might you choose to use a lambda function instead of a regular named function? 

In [1]:
# lambda function defined separately
square = lambda x: x ** 2

result = square(4)
print(result)

16


In [3]:
# lambda function in one line
numbers = [i for i in range(1, 11)]

even_numbers = filter(lambda x: x % 2 == 0, numbers)  # a function, and an iterable

print(list(even_numbers))

[2, 4, 6, 8, 10]


## Currying: Partial Argument Application

### Introduction

* What is currying?
  * A technique where a function with multiple arguments is broken down into a series of functions that each take a single argument, or you fix some arguments using a helper like *functools.partial*. This allows us to create specialized versions of a general function by pre-filling some arguments

* What are pros and cons?
  * Improve reusability and readability
  * Make code harder to follow if overused

* When will you use it?
  * Use it when you want to reuse a function *with some arguments pre-filled*, such as in callback functions or functional programming tasks

### Coding Question:

* Use the partial function from the functools module to create a new function that calculates the area of a triangle, given its base and height. The new function should only require the height as input, with the base pre-set to a specific value.

### Conceptual Question:

Explain how currying can be useful for creating specialized functions from more general ones. 

In [4]:
from functools import partial

In [12]:
# An example

def power(base, exponent):
    return base ** exponent

# create a specialized version that always squares a number
square = partial(power, exponent=2)

result = square(5)
# print(result)

# Coding Question
def calculate_traingle_area(base, height):
    return base * height / 2

fixed__base_traingle = partial(calculate_traingle_area, base=5)
result = fixed__base_traingle(height=10)
print(result)

25.0


## Generators

### Introduction

* What is the generator?
  * A special type of *iterable* that produces values on-the-fly using the *yield* keyword, instead of storing them all in memory at once. This makes generators memory-efficient, especially for large datasets, as they generate each value only when needed. You use generators when working with large data sequences or streams where memory optimization is crucial.
 
* What are the pros and cons?
  * Pros
      * Memory-efficient: Avoids storing large data structures in memory
      * Lazy Evaluation: Generates values only as required
      * Easier to implement than manual iterator classes
  * Cons
      * Once exhausted, a generator cannot be reused without recreating it
      * Debugging is harder due to the one-at-a-time value generation
* When do you use it?
  * For large data processing, streaming data, or whenever you need a series of values but don't want to store them all at once

### Coding Question:

* Write a generator function that yields the Fibonacci sequence up to a given limit. Use a loop to print the first 10 numbers generated by the generator.

Image (Source: https://mathmonks.com/fibonacci-sequence)
![image.png](attachment:c77c6fcf-8657-4779-9937-d2bfeadf109e.png)

### Conceptual Question:

* What are the advantages of using generators for working with large datasets or sequences, compared to traditional data structures like lists?
  * Memory Efficiency: No need to save the entire dataset in memory
  * Lazy Evaluation: Values are generated on demand, which mean computation happens only when needed.
  * Stream Processing

In [15]:
# An example

def generate_squares(n):
    for i in range(1, n + 1):
        yield i ** 2

squares = generate_squares(5)
# for square in squares:
#     print(square)

# Coding Question (Original Solution)
def generate_fibonacci(n):
    a = 0
    b = 1
    c = 0
    for i in range(n):
        if i == 0:
            yield a
        elif i == 1:
            yield b
        else:
            c = a + b
            a = b
            b = c
            yield c

# Coding Question (2nd Solution) 
def generate_fibonacci(n):
    a, b = 0, 1
    for _ in range(n):
        yield a  # yield the current number first, and then update later
        a, b = b, a + b  # update a and b for the next Fibonacci number

fib_gen = generate_fibonacci(10)
for number in fib_gen:
    print(number)

0
1
1
2
3
5
8
13
21
34


## Errors and Exception Handling

### Introduction
* What is it?
  * It refers to the process of managing runtime errors in a program. When an error occurs, it can interrupt the program’s execution, so exception handling uses mechanisms like *try*, *except*, and *finally* blocks to gracefully manage these errors, *allowing the program to continue running or recover from the error.*

* What are pros and cons?
  * Pros
      * Increased robustness: Programs become more resilient and can handle unexpected issues
      * Better user experience: Users are less likely to encounter crashes or abrupt terminations
  * Cons
      * Performance Overhead: Handling excpetions can introduce slight performance costs.

* When do you use it?
  * when dealing with *risky operations* (e.g., file I/O, network requests) or in situations where an error could cause a program to *crash*.

### Coding Question:

* Write a function that takes two numbers as input and performs division. Implement error handling to catch the ZeroDivisionError and print a custom error message. 

### Conceptual Question:

* Explain the importance of exception handling in writing robust and reliable Python programs.
  * Instead of making the program crash, exception handling returns error messages if the intension of the function fails to operate. In this way, the program is moroe robust and reliable.

In [16]:
def divide(a,b):
    try:
        result = a / b  # Potentially risky operation
    except ZeroDivisionError:
        print("Error: Division by zero is undefined.")
        result = None
    finally:
        return result

print(divide(10, 2))
print(divide(10, 0))

5.0
Error: Division by zero is undefined.
None
