## Chapter 1: Foundations for Efficiencies

As a data scientist, the majority of your time should be spent gleaning actionable insights from data. Whether you're cleaning and curating a messy dataset, deploying a machine learning model, or creating a sleek data visualization, the code you write should be a helpful tool to quickly get you where you need to go - not something that leaves you waiting around. In this course, you'll learn how to write cleaner, faster, and more efficient Python code. We'll explore how to time and profile your code in order to find potential bottlenecks. Then, you'll practice eliminating these bottlenecks, and other bad design patterns, using Python's Standard Library, NumPy, and pandas. After completing this course, you'll have everything you need to start writing elegant and efficient Python code! But first, let's explore what is meant by efficient Python code.

In this chapter we will explore following concepts:

    1.1 What is Efficient Code?
Efficiency in Python code for data scientists revolves around writing code that is not only fast but also follows Pythonic principles. We'll explore the concept of efficiency in data science and how it aligns with Pythonic practices.

    1.2 Building with Built-ins
Python's Standard Library offers a treasure trove of built-in functions and modules. Leveraging these can significantly enhance code efficiency. We'll delve into examples of how to use built-ins to streamline your code.

    1.3 The Power of NumPy Arrays
NumPy, a fundamental library for scientific computing, provides powerful array operations. We'll explore how using NumPy arrays can optimize numerical operations and enhance the overall performance of your code.

### 1.1 Zen of Python

PEP 20, also known as "The Zen of Python," is a collection of guiding principles for writing computer programs in the Python language. Created by Tim Peters, PEP 20 aims to capture the design philosophy of Python and provide developers with a set of guiding principles to follow. These principles are meant to emphasize the importance of code readability, simplicity, and explicitness.

In [5]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


Here is the ChatGPT's explanation of each of the principles:

Let's delve into each principle of PEP 20, providing a bit more detail and context:

1. **Beautiful is better than ugly**: 
Emphasizes the importance of writing code that is aesthetically pleasing and easy to understand. Code should be crafted with care to enhance its visual appeal and maintainability.

2. **Explicit is better than implicit**: Encourages developers to be clear and explicit in their code rather than relying on hidden or implicit behaviors. This promotes transparency and reduces the chances of misunderstandings.

3. **Simple is better than complex**: Advocates for simplicity in code design. While complexity may be necessary, it should not be introduced unnecessarily. Simple code is often easier to understand, debug, and maintain.

4. **Complex is better than complicated**: Acknowledges that there are situations where complexity is unavoidable. However, it should be clear and straightforward rather than needlessly convoluted.

5. **Flat is better than nested**: Advises against excessive nesting in code. Flat structures are generally more readable and make it easier to follow the logic.

6. **Sparse is better than dense**: Recommends using whitespace and structural elements to make the code more readable. Avoid cramming too much information into a small space.

7. **Readability counts**: Reiterates the importance of code being easy to read. Given that code is read more often than it is written, prioritizing readability aids collaboration and maintenance.

8. **Special cases aren't special enough to break the rules**: Discourages the creation of exceptions or special cases that deviate significantly from established coding practices. Consistency is valuable.

9. **Although practicality beats purity**: Acknowledges that in the real world, practical solutions may be more important than strictly adhering to idealistic principles. Pragmatism is encouraged.

10. **Errors should never pass silently, unless explicitly silenced**: Promotes explicit error handling. Unhandled errors can lead to unexpected behavior, so it's crucial to either handle them appropriately or explicitly indicate when silence is intentional.

11. **In the face of ambiguity, refuse the temptation to guess**: Advocates for clarity when faced with uncertain situations. Avoid making assumptions and prioritize clear communication of intent.

12. **There should be one-- and preferably only one --obvious way to do it**: Encourages a singular, clear approach to problem-solving. Reducing ambiguity and standardizing solutions fosters consistency.

13. **Although that way may not be obvious at first unless you're Dutch**: A humorous nod to Guido van Rossum, the creator of Python. It acknowledges that the "obvious" way might not always be immediately apparent.

14. **Now is better than never**: Encourages progress and action. Avoid unnecessary delays and aim for incremental improvements rather than waiting for a perfect solution.

15. **Although never is often better than right now**: Recognizes the value of careful consideration and planning. Rushed decisions can lead to mistakes, and sometimes it's better to take the time to find a well-thought-out solution.

16. **If the implementation is hard to explain, it's a bad idea**: Emphasizes the importance of code that is easy to explain and understand. Code should be clear, reducing the need for convoluted explanations.

17. **If the implementation is easy to explain, it may be a good idea**: Contrasts with the previous point, suggesting that simplicity in explanation is a positive indicator of code quality.

18. **Namespaces are one honking great idea** -- let's do more of those!: Celebrates the concept of namespaces, which help organize and encapsulate code. Encourages the use of namespaces for clarity and avoiding naming conflicts.

These principles collectively capture the philosophy of Python, promoting code that is clear, readable, and maintainable, fostering a collaborative and efficient development process.

#### 1.1 Exercises

Suppose you wanted to collect the names in the above list that have six letters or more. In other programming languages, the typical approach is to create an index variable (i), use i to iterate over the list, and use an if statement to collect the names with six letters or more:

In [1]:
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

In [2]:
# Print the list created using the Non-Pythonic approach
i = 0
new_list= []
while i < len(names):
    if len(names[i]) >= 6:
        new_list.append(names[i])
    i += 1
print(new_list)

['Kramer', 'Elaine', 'George', 'Newman']


In [3]:
# Print the list created by looping over the contents of names
better_list = []
for name in names:
    if len(name) >= 6:
        better_list.append(name)
print(better_list)

['Kramer', 'Elaine', 'George', 'Newman']


In [4]:
# Print the list created by using list comprehension
best_list = [name for name in names if len(name) >= 6]
print(best_list)

['Kramer', 'Elaine', 'George', 'Newman']


## 1.2 Built-ins

Python's Standard Library offers a treasure trove of built-in functions and modules. Leveraging these can significantly enhance code efficiency. We'll delve into examples of how to use built-ins to streamline your code. We should default to using a built-in solution (if one exists) rather than developing our own.

1. Built-in function: range()
This is a handy tool whenever we want to create a sequence of numbers. Suppose we wanted to create a list of integers from zero to ten. We can provide range with a start and stop value to create this sequence. Or, we can provide just a stop value assuming that we want our sequence to start at zero. Notice that the stop value is exclusive, or up to but not including this value. Also note the range function returns a range object, which we can convert into a list and print.
    
    > range(0,11)
    > range(11)
    >> [0,1,2,3,4,5,6,7,8,9,10]
    
    range can also accept a start, stop, and step value (in that order). In this block of code, we tell range to create a sequence of numbers starting at two, ending at ten, and incrementing by two.
    
    > range(2,11,2)
    >> [2,4,6,8,10]
    
2. Built-in function: enumerate()
Another useful built-in function is enumerate. enumerate creates an index item pair for each item in the object provided. For example, calling enumerate on the list letters produces a sequence of indexed values. Similar to range, enumerate returns an enumerate object, which can also be converted into a list and printed.
    
    > letters = ['a','b','c']
    > indexed_letters = enumerate(letters)
    > list(indexed_letters)
     >> [(0,'a'), (1,'b'),(2,'c')]
     
    We can also specify the starting index of enumerate with the keyword argument start. Here, we tell enumerate to start the index at five by passing start equals five into the function call.

    > indexed_letters = enumerate(letters, start=5)
    > list(indexed_letters)
     >> [(5,'a'), (6,'b'),(7,'c')]
     
3. Built-in function: map()
map applies a function to each element in an object. Notice that map takes two arguments; first, the function you'd like to apply and second, the object you'd like to apply that function on. Here, we use map to apply the built-in function round to each element of the nums list.

    > nums = [1.5,2.3,3.4,4.6,5.0]
    > map(round,nums)
    >> [2,2,3,5,5]
    
    map can also be used with a lambda, or, an anonymous function. Notice here, that we can use map and a lambda expression to apply a function, which we've defined on the fly, to our original list nums. The map function provides a quick and clean way to apply a function to an object iteratively without writing a for loop.

    > nums = [1,2,3,4,5]
    > sqrd_nums = map(lambda x: x**2, nums)
    >> [1,4,9,16,25]

#### 1.2 Exercises


In [6]:
# Create a range object that goes from 0 to 5
nums = range(6)
print(type(nums))

# Convert nums to a list
nums_list = list(nums)
print(nums_list)

# Create a new list of odd numbers from 1 to 11 by unpacking a range object
nums_list2 = [*range(1,12,2)]
print(nums_list2)

<class 'range'>
[0, 1, 2, 3, 4, 5]
[1, 3, 5, 7, 9, 11]


In [7]:
# Rewrite the for loop to use enumerate
indexed_names = []
for i,name in enumerate(names):
    index_name = (i,name)
    indexed_names.append(index_name) 
print(indexed_names)

# Rewrite the above for loop using list comprehension
indexed_names_comp = [(i,name) for i,name in enumerate(names)]
print(indexed_names_comp)

# Unpack an enumerate object with a starting index of one
indexed_names_unpack = [*enumerate(names, 1)]
print(indexed_names_unpack)

[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]
[(0, 'Jerry'), (1, 'Kramer'), (2, 'Elaine'), (3, 'George'), (4, 'Newman')]
[(1, 'Jerry'), (2, 'Kramer'), (3, 'Elaine'), (4, 'George'), (5, 'Newman')]


In [8]:
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

# Use map to apply str.upper to each element in names
names_map  = map(str.upper, names)

# Print the type of the names_map
print(type(names_map))

# Unpack names_map into a list
names_uppercase = [*(names_map)]

# Print the list created above
print(names_uppercase)

<class 'map'>
['JERRY', 'KRAMER', 'ELAINE', 'GEORGE', 'NEWMAN']


### 1.3 The Power of NumPy Arrays
NumPy, a fundamental library for scientific computing, provides powerful array operations. We'll explore how using NumPy arrays can optimize numerical operations and enhance the overall performance of your code.

2. NumPy array overview
NumPy arrays provide a fast and memory efficient alternative to Python lists. Typically, we import NumPy as np and use np dot array to create a NumPy array. 

    > import numpy as np
    > nums_np = np.array(range(5))
    >> array([0,1,2,3,4])
    
    **NumPy arrays are homogeneous**, which means that they must contain elements of the same type. We can see the type of each element using the dot dtype method. 
    
    > nums_np.dtype
    >> dtype('int64')
    
    Suppose we created an array using a mixture of types. Here, we create the array nums_np_floats using the integers one and three and a float two point five. Can you spot the difference in the output? The integers now have a proceeding dot in the array. That's because NumPy converted the integers to floats to retain that array's homogeneous nature. Using dot dtype, we can verify that the elements in this array are floats. 
    
    > nums_float = np.array([1,2.5,3])
    >> array([1.,2.5,3.])
    
    > nums_float.dtype
    >> dtype('float64')
    
    Homogeneity allows NumPy arrays to be more memory efficient and faster than Python lists. Requiring all elements be the same type eliminates the overhead needed for data type checking.
    
2. NumPy array broadcasting
When analyzing data, you'll often want to perform operations over entire collections of values quickly. Say, for example, you'd like to square each number within a list of numbers. It'd be nice if we could simply square the list, and get a list of squared values returned. Unfortunately, Python lists don't support these types of calculations.

    We could square the values using a list by writing a for loop or using a list comprehension. But neither of these approaches is the most efficient way of doing this.

    > [num ** 2 for num in nums]

   Here lies the second advantage of NumPy arrays - their broadcasting functionality. **NumPy arrays vectorize operations**, so they are performed on all elements of an object at once. This allows us to efficiently perform calculations over entire arrays. Notice that by squaring the array nums_np, all elements are squared at once.
   
   > nums_np = np.array([]) ** 2
   
3. NumPy array indexing
Another advantage of NumPy arrays is their indexing capabilities. When comparing basic indexing between a one-dimensional array and list, the capabilities are identical.

    When using two-dimensional arrays and lists, the advantages of arrays are clear. To return the second item of the first row in our two-dimensional object, the array syntax is square bracket, zero, comma, one, square bracket. The analogous list syntax is a bit more verbose as you have to surround both the zero and one with square brackets. 
    
    lists
    > nums2 [0][1]
    
    arrays
    > nums2_np[0,1]
    
    To return the first column of values in our two-d object, the array syntax is square bracket, colon, comma, zero, square bracket. Lists don't support this type of syntax, so we must use a list comprehension to return columns.
    
    > [row[0] for row in nums2]
    >> [1,4]
    
    > nums2_np[:,0]
    >> array([1,4])
    
4. NumPy array boolean indexing
NumPy arrays also have a special technique called boolean indexing. Suppose we wanted to gather only positive numbers from the sequence listed here. With an array, we can create a boolean mask using a simple inequality. Indexing the array is as simple as enclosing this inequality in square brackets.

    > nums_np > 0
    >> array([False, False, True])
    
    > nums_np[nums_np > 0]
    >> array([1,2])

#### 1.3 Exercises

In [12]:
import numpy as np
nums = np.array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [13]:
# Print second row of nums
print(nums[1,:])

# Print all elements of nums that are greater than six
print(nums[nums > 6])

# Double every element of nums
nums_dbl = nums * 2
print(nums_dbl)

# Replace the third column of nums with a new column that adds 1 to each item in the original column.
nums[:,2] = nums[:,2] + 1
print(nums)

[ 6  7  8  9 10]
[ 7  8  9 10]
[[ 2  4  6  8 10]
 [12 14 16 18 20]]
[[ 1  2  4  4  5]
 [ 6  7  9  9 10]]


You have a list of guests (the names list). Each guest, for whatever reason, has decided to show up to the party in 10-minute increments. For example, Jerry shows up to Festivus 10 minutes into the party's start time, Kramer shows up 20 minutes into the party, and so on and so forth.

We want to write a few simple lines of code, using the built-ins we have covered, to welcome each of your guests and let them know how many minutes late they are to your party. Note that numpy has been imported into your session as np and the names list has been loaded as well.

In [14]:
# Create a list of arrival times
arrival_times = [*range(10, 60, 10)]

print(arrival_times)

[10, 20, 30, 40, 50]


You realize your clock is three minutes fast. Convert the arrival_times list into a numpy array (called arrival_times_np) and use NumPy broadcasting to subtract three minutes from each arrival time.

In [15]:
# Create a list of arrival times
arrival_times = [*range(10,60,10)]

# Convert arrival_times to an array and update the times
arrival_times_np = np.array(arrival_times)
new_times = arrival_times_np - 3

print(new_times)

[ 7 17 27 37 47]


In [16]:
# Use list comprehension and enumerate to pair guests to new times
guest_arrivals = [(names[i],time) for i,time in enumerate(new_times)]

print(guest_arrivals)

[('Jerry', 7), ('Kramer', 17), ('Elaine', 27), ('George', 37), ('Newman', 47)]


In [29]:
def welcome(guest_arrivals):
    return "Welcome to Festivus "+str(guest_arrivals[0])+"... You're "+str(guest_arrivals[1])+" min late."
                 
# Map the welcome_guest function to each (guest,time) pair
welcome_map = map(welcome, guest_arrivals)

guest_welcomes = [*welcome_map]
print(*guest_welcomes, sep='\n')

Welcome to Festivus Jerry... You're 7 min late.
Welcome to Festivus Kramer... You're 17 min late.
Welcome to Festivus Elaine... You're 27 min late.
Welcome to Festivus George... You're 37 min late.
Welcome to Festivus Newman... You're 47 min late.
