<h3><a href="https://github.com/mclix85/datacamp" target="_blank">View Source Code</a></h3>

<h3>Course Description</h3>

As a Data Scientist, the majority of your time should be spent gleaning actionable insights from data -- not waiting for your code to finish running. Writing efficient Python code can help reduce runtime and save computational resources, ultimately freeing you up to do the things you love as a Data Scientist. In this course, you'll learn how to use Python's built-in data structures, functions, and modules to write cleaner, faster, and more efficient code. We'll explore how to time and profile code in order to find bottlenecks. Then, you'll practice eliminating these bottlenecks, and other bad design patterns, using Python's Standard Library, NumPy, and pandas. After completing this course, you'll have the necessary tools to start writing efficient Python code!



# Foundations for efficiencies

<p class="chapter__description">
    In this chapter, you'll learn what it means to write efficient Python code. You'll explore Python's Standard Library, learn about NumPy arrays, and practice using some of Python's built-in tools.  This chapter builds a foundation for the concepts covered ahead.
  </p>



## Welcome!





### Pop quiz: what is efficient

<div class=""><p>In the context of this course, what is meant by <em>efficient Python code</em>?</p></div>



- [x] Code that executes quickly for the task at hand, minimizes the memory footprint and follows Python's coding style principles.
- [ ] Code that has a fast runtime, consumes a small amount of memory and can be verbose/hard to interpret (readability doesn't matter).
- [ ] Code that returns a correct result regardless of the execution time and resource consumption.



<p class="dc-completion-pane__message dc-u-maxw-100pc">Correct! Writing efficient Python code minimizes runtime and memory usage while also following the idioms in the <em>Zen of Python</em>.</p>



### A taste of things to come


<div class>
<p>In this exercise, you'll explore both the <em>Non-Pythonic</em> and <em>Pythonic</em> ways of looping over a list. </p>
<pre><code>names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']
</code></pre>
<p>Suppose you wanted to collect the names in the above list that have six letters or more. In other programming languages, the typical approach is to create an index variable (<code>i</code>), use <code>i</code> to iterate over the list, and use an if statement to collect the names with six letters or more:</p>
<pre><code>i = 0
new_list= []
while i &lt; len(names):
    if len(names[i]) &gt;= 6:
        new_list.append(names[i])
    i += 1
</code></pre>
<p>Let's explore some more <em>Pythonic</em> ways of doing this.</p>
</div>
<div class="exercise--instructions__content"><p>Print the list, <code>new_list</code>, that was created using a <em>Non-Pythonic</em> approach.</p></div>
<div>


In [None]:
# edited/added
names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']

# Print the list created using the Non-Pythonic approach
i = 0
new_list= []
while i < len(names):
    if len(names[i]) >= 6:
        new_list.append(names[i])
    i += 1
print(new_list)


</div>



<div class="exercise--instructions__content"><p>A more <em>Pythonic</em> approach would loop over the contents of <code>names</code>, rather than using an index variable. Print <code>better_list</code>.</p></div>
<div>


In [None]:
# Print the list created by looping over the contents of names
better_list = []
for name in names:
    if len(name) >= 6:
        better_list.append(name)
print(better_list)


</div>



<div class="exercise--instructions__content"><p>The best <em>Pythonic</em> way of doing this is by using list comprehension. Print <code>best_list</code>.</p></div>
<div>


In [None]:
# Print the list created by using list comprehension
best_list = [name for name in names if len(name) >= 6]
print(best_list)


</div>



<p class="">Great work! Don't get too caught up in the coding concepts just yet (you'll practice using lists, for loops, and list comprehensions later on). The important thing to notice here is that following some of Python's guiding principles allows you to write cleaner and more efficient code. <br> <br> Remember, <code>Pythonic code == efficient code</code>. You'll explore these, and other, Pythonic concepts later on in the course, but for now, this is just a taste of things to come!</p>



### Zen of Python


<div class>
<p>In the video, we covered the <em>Zen of Python</em> written by Tim Peters, which lists 19 idioms that serve as guiding principles for any Pythonista. Python has hundreds of <em>Python Enhancement Proposals</em>, commonly referred to as <em>PEPs</em>. The <em>Zen of Python</em> is one of these <em>PEPs</em> and is documented as <a href="https://www.python.org/dev/peps/pep-0020/">PEP20</a>.</p>
<p>One little Easter Egg in Python is the ability to print the <em>Zen of Python</em> using the command <code>import this</code>.  Let's take a look at one of the idioms listed in these guiding principles.</p>
<p>Type and run the command <code>import this</code> within your IPython console and answer the following question:</p>
<hr>
<p>What is the 7th idiom of the Zen of Python?</p>
</div>



- [ ] Flat is better than nested.
- [ ] Beautiful is better than ugly.
- [x] Readability counts.
- [ ] Python is the best programming language ever.



<p class="">That's correct! Python has a design philosophy that emphasizes readability. Throughout the course, you'll see that writing efficient Python code goes hand in hand with writing code that is easy to understand. Faster code is good, but faster &amp; readable code is best!</p>




## Building with built-ins





### Built-in practice: range()


<div class>
<p>In this exercise, you will practice using Python's built-in function <code>range()</code>. Remember that you can use <code>range()</code> in a few different ways:</p>
<p><strong>1)</strong> Create a sequence of numbers from 0 to a stop value (which is <em>exclusive</em>). This is useful when you want to create a simple sequence of numbers starting at zero:</p>
<pre><code>range(stop)</code>

<code># Example
list(range(11))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
</code></pre>
<p><strong>2)</strong> Create a sequence of numbers from a start value to a stop value (which is <em>exclusive</em>) with a step size. This is useful when you want to create a sequence of numbers that increments by some value other than one. For example, a list of even numbers:</p>
<pre><code>range(start, stop, step)</code>

<code># Example
list(range(2,11,2))

[2, 4, 6, 8, 10]
</code></pre>
</div>

<li>Create a <em>range object</em> that starts at zero and ends at five. Only use a <code>stop</code> argument.</li>
<li>Convert the <code>nums</code> variable into a list called <code>nums_list</code>.</li>
<li>Create a new list called <code>nums_list2</code> that starts at <strong>one</strong>, ends at <strong>eleven</strong>, and increments by <strong>two</strong> by unpacking a <em>range object</em> using the star character (<code>*</code>).</li>
<div>


In [None]:
# Create a range object that goes from 0 to 5
nums = range(6)
print(type(nums))

# Convert nums to a list
nums_list = list(nums)
print(nums_list)


</div>
<div>


In [None]:
# Create a new list of odd numbers from 1 to 11 by unpacking a range object
nums_list2 = [*range(1,12,2)]
print(nums_list2)


</div>



<p class="">Nicely done! Notice that using Python's <code>range()</code> function allows you to create a <em>range object</em> of numbers without explicitly typing them out. You can convert the <em>range object</em> into a list by using the <code>list()</code> function or by unpacking it into a list using the star character (<code>*</code>). Cool!</p>



### Built-in practice: enumerate()


<div class>
<p>In this exercise, you'll practice using Python's built-in function <code>enumerate()</code>. This function is useful for obtaining an indexed list. For example, suppose you had a list of people that arrived at a party you are hosting. The list is ordered by arrival (Jerry was the first to arrive, followed by Kramer, etc.):</p>
<pre><code>names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']
</code></pre>
<p>If you wanted to attach an index representing a person's arrival order, you <em>could</em> use the following for loop:</p>
<pre><code>indexed_names = []
for i in range(len(names)):
    index_name = (i, names[i])
    indexed_names.append(index_name)

[(0,'Jerry'),(1,'Kramer'),(2,'Elaine'),(3,'George'),(4,'Newman')]
</code></pre>
<p>But, that's not the most efficient solution. Let's explore how to use <code>enumerate()</code> to make this more efficient.</p>
</div>

<li>Instead of using <code>for i in range(len(names))</code>, update the for loop to use <code>i</code> as the index variable and <code>name</code> as the iterator variable and use <code>enumerate()</code>.</li>
<li>Rewrite the previous for loop using <code>enumerate()</code> and list comprehension to create a new list, <code>indexed_names_comp</code>.</li>
<li>Create another list (<code>indexed_names_unpack</code>) by using the star character (<code>*</code>) to unpack the <em>enumerate object</em> created from using <code>enumerate()</code> on <code>names</code>. This time, <strong>start the index for</strong> <code>enumerate()</code> <strong>at one instead of zero.</strong>
</li>
<div>


In [None]:
# Rewrite the for loop to use enumerate
indexed_names = []
for i,name in enumerate(names):
    index_name = (i,name)
    indexed_names.append(index_name) 
print(indexed_names)


</div>
<div>


In [None]:
# Rewrite the above for loop using list comprehension
indexed_names_comp = [(i,name) for i,name in enumerate(names)]
print(indexed_names_comp)


</div>
<div>


In [None]:
# Unpack an enumerate object with a starting index of one
indexed_names_unpack = [*enumerate(names, 1)]
print(indexed_names_unpack)


</div>



<p class="">Awesome! Using Python's built-in <code>enumerate()</code> function allows you to create an index for each item in the object you give it. You can use list comprehension, or even unpack the <em>enumerate object</em> directly into a list, to write a nice simple one-liner.</p>



### Built-in practice: map()


<div class>
<p>In this exercise, you'll practice using Python's built-in <code>map()</code> function to apply a function to every element of an object. Let's look at a list of party guests:</p>
<pre><code>names = ['Jerry', 'Kramer', 'Elaine', 'George', 'Newman']
</code></pre>
<p>Suppose you wanted to create a new list (called <code>names_uppercase</code>) that converted all the letters in each name to uppercase. you could accomplish this with the below for loop:</p>
<pre><code>names_uppercase = []

for name in names:
  names_uppercase.append(name.upper())

['JERRY', 'KRAMER', 'ELAINE', 'GEORGE', 'NEWMAN']
</code></pre>
<p>Let's explore using the <code>map()</code> function to do this more efficiently in one line of code.</p>
</div>

<li>Use <code>map()</code> and the method <code>str.upper()</code> to convert each name in the list <code>names</code> to uppercase. Save this to the variable <code>names_map</code>.</li>
<li>Print the data type of <code>names_map</code>.</li>
<li>Unpack the contents of <code>names_map</code> into a list called <code>names_uppercase</code> using the star character (<code>*</code>).</li>
<li>Print <code>names_uppercase</code> and observe its contents.</li>
<div>


In [None]:
# Use map to apply str.upper to each element in names
names_map  = map(str.upper, names)

# Print the type of the names_map
print(type(names_map))


</div>
<div>


In [None]:
# Unpack names_map into a list
names_uppercase = [*names_map]

# Print the list created above
print(names_uppercase)


</div>



<p class="">Well done! You used Python's built-in <code>map()</code> function to apply the <code>str.upper()</code> method to each element in the <code>names</code> object. Later on in the course, you'll explore how using <code>map()</code> can provide some efficiency improvements to your code.</p>



## The power of NumPy arrays





### Practice with NumPy arrays


<div class>
<p>Let's practice slicing <code>numpy</code> arrays and using NumPy's broadcasting concept. Remember, broadcasting refers to a <code>numpy</code> array's ability to vectorize operations, so they are performed on all elements of an object at once.</p>
<p>A two-dimensional <code>numpy</code> array has been loaded into your session (called <code>nums</code>) and printed into the console for your convenience. <code>numpy</code> has been imported into your session as <code>np</code>.</p>
</div>
<div>


In [None]:
# edited/added
import numpy as np
nums = np.array([[ 1,  2,  3,  4,  5], [ 6,  7,  8,  9, 10]])


</div>


<li>Print the second row of <code>nums</code>.</li>
<li>Print the items of <code>nums</code> that are greater than six.</li>
<li>Create <code>nums_dbl</code> that doubles each number in <code>nums</code>.</li>
<li>Replace the third column in <code>nums</code> with a new column that adds <code>1</code> to each item in the original column.</li>
<div>


In [None]:
# Print second row of nums
print(nums[1,:])

# Print all elements of nums that are greater than six
print(nums[nums > 6])


</div>
<div>


In [None]:
# Double every element of nums
nums_dbl = nums * 2
print(nums_dbl)


</div>
<div>


In [None]:
# Replace the third column of nums
nums[:,2] = nums[:,2] + 1
print(nums)


</div>





<div class=""><p>When compared to a list object, what are two advantages of using a <code>numpy</code> array?</p></div>

- [ ] A <code>numpy</code> array is the only data structure that can be used with the <code>numpy</code> package and often has less verbose indexing syntax.
- [x] A <code>numpy</code> array contains homogeneous data types (which reduces memory consumption) and provides the ability to apply operations on all elements through broadcasting.
- [ ] A <code>numpy</code> array supports boolean indexing and has much better one-dimensional indexing capabilities.
- [ ] Both a list object and a <code>numpy</code> array are identical.



<p class="">Well done! You're slicing <code>numpy</code> arrays like a pro and learning how to take advantage of NumPy's broadcasting concept. Using <code>numpy</code> arrays allows you to take advantage of an array's memory efficient nature and easily perform mathematical operations on your data.</p>



### Bringing it all together: Festivus!


<div class>
<p>In this exercise, you will be throwing a party—a Festivus if you will! </p>
<p>You have a list of guests (the <code>names</code> list). Each guest, for whatever reason, has decided to show up to the party in 10-minute increments. For example, Jerry shows up to Festivus 10 minutes into the party's start time, Kramer shows up 20 minutes into the party, and so on and so forth.</p>
<p>We want to write a few simple lines of code, using the built-ins we have covered, to welcome each of your guests and let them know how many minutes late they are to your party. Note that <code>numpy</code> has been imported into your session as <code>np</code> and the <code>names</code> list has been loaded as well.</p>
<p>Let's welcome your guests!</p>
</div>
<div>


In [None]:
# edited/added
def welcome_guest(guest_and_time):
    """
    Returns a welcome string for the guest_and_time tuple.
    
    Args:
        guest_and_time (tuple): The guest and time tuple to create
            a welcome string for.
            
    Returns:
        welcome_string (str): A string welcoming the guest to Festivus.
        'Welcome to Festivus {guest}... You're {time} min late.'
    
    """
    guest = guest_and_time[0]
    arrival_time = guest_and_time[1]
    welcome_string = "Welcome to Festivus {}... You're {} min late.".format(guest,arrival_time)
    return welcome_string


</div>


<li>Use <code>range()</code> to create a list of arrival times (10 through 50 incremented by 10). Create the list <code>arrival_times</code> by unpacking the <em>range object</em>.</li>
<div>


In [None]:
# Create a list of arrival times
arrival_times = [*range(10, 60, 10)]

print(arrival_times)


</div>

<li>You realize your clock is three minutes fast. Convert the <code>arrival_times</code> list into a <code>numpy</code> array (called <code>arrival_times_np</code>) and use NumPy broadcasting to subtract three minutes from each arrival time.</li>
<div>


In [None]:
# Create a list of arrival times
arrival_times = [*range(10,60,10)]

# Convert arrival_times to an array and update the times
arrival_times_np = np.array(arrival_times)
new_times = arrival_times_np - 3

print(new_times)


</div>

<li>Use list comprehension with <code>enumerate()</code> to pair each guest in the <code>names</code> list to their updated arrival time in the <code>new_times</code> array. You'll need to use the index variable created from using <code>enumerate()</code> on <code>new_times</code> to index the <code>names</code> list.</li>
<div>


In [None]:
# Create a list of arrival times
arrival_times = [*range(10,60,10)]

# Convert arrival_times to an array and update the times
arrival_times_np = np.array(arrival_times)
new_times = arrival_times_np - 3

# Use list comprehension and enumerate to pair guests to new times
guest_arrivals = [(names[i],time) for i,time in enumerate(new_times)]

print(guest_arrivals)


</div>

<li>A function named <code>welcome_guest()</code> has been pre-loaded into your session. Use <code>map()</code> to apply this function to each element of the <code>guest_arrivals</code> list and save it as the variable <code>welcome_map</code>.</li>
<div>


In [None]:
# Create a list of arrival times
arrival_times = [*range(10,60,10)]

# Convert arrival_times to an array and update the times
arrival_times_np = np.array(arrival_times)
new_times = arrival_times_np - 3

# Use list comprehension and enumerate to pair guests to new times
guest_arrivals = [(names[i],time) for i,time in enumerate(new_times)]

# Map the welcome_guest function to each (guest,time) pair
welcome_map = map(welcome_guest, guest_arrivals)

guest_welcomes = [*welcome_map]
print(*guest_welcomes, sep='\n')


</div>



<p class="">Congratulations and happy Festivus! You're using Python built-ins like a pro and well on your way to writing efficient Python code. Believe it or not, there is a way to make these simple lines of code even more efficient! We'll cover this in a future chapter, so continue on to see how!</p>



# Timing and profiling code

<p class="">In this chapter, you will learn how to gather and compare runtimes between different coding approaches.  You'll practice using the line_profiler and memory_profiler packages to profile your code base and spot bottlenecks. Then, you'll put your learnings to practice by replacing these bottlenecks with efficient Python code.</p>



## Examining runtime





### Using %timeit: your turn!


<div class>
<p>You'd like to create a list of integers from 0 to 50 using the <code>range()</code> function. However, you are unsure whether using list comprehension or unpacking the <em>range object</em> into a list is faster. Let's use <code>%timeit</code> to find the best implementation.</p>
<p>For your convenience, a reference table of time orders of magnitude is provided below (faster at the top). </p>
<table>
<thead><tr>
<th>symbol</th>
<th>name</th>
<th>unit (s)</th>
</tr></thead>
<tbody>
<tr>
<td>ns</td>
<td>nanosecond</td>
<td>10<sup>-9</sup>
</td>
</tr>
<tr>
<td>µs (us)</td>
<td>microsecond</td>
<td>10<sup>-6</sup>
</td>
</tr>
<tr>
<td>ms</td>
<td>millisecond</td>
<td>10<sup>-3</sup>
</td>
</tr>
<tr>
<td>s</td>
<td>second</td>
<td>10<sup>0</sup>
</td>
</tr>
</tbody>
</table>
</div>

<li>Use list comprehension and <code>range()</code> to create a list of integers from 0 to 50 called <code>nums_list_comp</code>.</li>
<div>


In [None]:
# Create a list of integers (0-50) using list comprehension
nums_list_comp = [num for num in range(51)]
print(nums_list_comp)


</div>

<li>Use <code>range()</code> to create a list of integers from 0 to 50 and unpack its contents into a list called <code>nums_unpack</code>.</li>

<div>


In [None]:
# Create a list of integers (0-50) using list comprehension
nums_list_comp = [num for num in range(51)]
print(nums_list_comp)


</div>
<div>


In [None]:
# Create a list of integers (0-50) by unpacking range
nums_unpack = [*range(51)]
print(nums_unpack)


</div>

<div class=""><p>Use <code>%timeit</code> <strong>within your IPython console</strong> (i.e. <strong>not</strong> within the script.py window) to compare the runtimes for creating a list of integers from 0 to 50 using list comprehension vs. unpacking the <em>range object</em>. Don't include the <code>print()</code> statements when timing.</p>
<p><strong>Which method was faster?</strong></p></div>



- [ ] List comprehension was faster than unpacking <code>range()</code>.
- [x] Unpacking the <em>range object</em> was faster than list comprehension.
- [ ] Both methods had the same runtime.



<p class="">Nice work! You used <code>%timeit</code> to gather and compare runtimes! Although list comprehension is a useful and powerful tool, sometimes unpacking an object can save time and looks a little cleaner.</p>



### Using %timeit: specifying number of runs and loops


<div class>
<p>A list of 480 superheroes has been loaded into your session (called <code>heroes</code>). You'd like to analyze the runtime for converting this <code>heroes</code> list into a set. Instead of relying on the default settings for <code>%timeit</code>, you'd like to only use 5 runs and 25 loops per each run.</p>
<p><strong>What is the correct syntax when using <code>%timeit</code> and only using 5 runs with 25 loops per each run?</strong></p>
</div>



- [ ] <code>timeit -runs5 -loops25 set(heroes)</code>
- [ ] <code>%%timeit -r5 -n25 set(heroes)</code>
- [ ] <code>%timeit set(heroes), 5, 25</code>
- [x] <code>%timeit -r5 -n25 set(heroes)</code>



<p class="">Correct! <code>%timeit</code> lets you specify the number of runs and number of loops you want to consider with the <code>-r</code> and <code>-n</code> flags. You can use <code>-r5</code> and <code>-n25</code> to specify 5 iterations each with 25 loops when calculating the average and standard deviation of runtime for your code.</p>



### Using %timeit: formal name or literal syntax


<div class>
<p>Python allows you to create data structures using <strong>either</strong> a <em>formal name</em> or a <em>literal syntax</em>. In this exercise, you'll explore how using a <em>literal syntax</em> for creating a data structure can speed up runtimes.</p>
<table>
<thead><tr>
<th>data structure</th>
<th>formal name</th>
<th>literal syntax</th>
</tr></thead>
<tbody>
<tr>
<td>list</td>
<td><code>list()</code></td>
<td><code>[]</code></td>
</tr>
<tr>
<td>dictionary</td>
<td><code>dict()</code></td>
<td><code>{}</code></td>
</tr>
<tr>
<td>tuple</td>
<td><code>tuple()</code></td>
<td><code>()</code></td>
</tr>
</tbody>
</table>
</div>

<li>Create an empty list called <code>formal_list</code> using the formal name (<code>list()</code>).</li>
<li>Create an empty list called <code>literal_list</code> using the literal syntax (<code>[]</code>).</li>
<div>


In [None]:
# Create a list using the formal name
formal_list = list()
print(formal_list)


</div>
<div>


In [None]:
# Create a list using the literal syntax
literal_list = []
print(literal_list)


</div>

<li>Print out the type of <code>formal_list</code> and <code>literal_list</code> to show that both naming conventions create a list.</li>
<div>


In [None]:
# Create a list using the formal name
formal_list = list()
print(formal_list)


</div>
<div>


In [None]:
# Create a list using the literal syntax
literal_list = []
print(literal_list)


</div>
<div>


In [None]:
# Print out the type of formal_list
print(type(formal_list))


</div>
<div>


In [None]:
# Print out the type of literal_list
print(type(literal_list))


</div>

<div class=""><p>Use <code>%timeit</code> <strong>in your IPython console</strong> to compare runtimes between creating a list using the formal name (<code>list()</code>) and the literal syntax (<code>[]</code>). Don't include the <code>print()</code> statements when timing.</p>
<p><strong>Which naming convention is faster?</strong></p></div>



In [None]:
# edited/added
import pandas as pd
heroes = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSoeJegBxXU61LsAJQ4sPPrw99EJXBccEfiC3fhrj6TQCnhZ5Q4J8P7oCP-TtR1W6Z3d9TMcCYc32Xy/pub?gid=1531075102&single=true&output=csv", header = None).iloc[:,0].tolist()
publishers = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSoeJegBxXU61LsAJQ4sPPrw99EJXBccEfiC3fhrj6TQCnhZ5Q4J8P7oCP-TtR1W6Z3d9TMcCYc32Xy/pub?gid=1911812864&single=true&output=csv", header = None).iloc[:,0].tolist()

def get_publisher_heroes(heroes, publishers, desired_publisher):

    desired_heroes = []

    for i,pub in enumerate(publishers):
        if pub == desired_publisher:
            desired_heroes.append(heroes[i])

    return desired_heroes
  
def get_publisher_heroes_np(heroes, publishers, desired_publisher):

    heroes_np = np.array(heroes)
    pubs_np = np.array(publishers)

    desired_heroes = heroes_np[pubs_np == desired_publisher]

    return desired_heroes


</div>
<li>Use the <code>get_publisher_heroes()</code> function and the <code>get_publisher_heroes_np()</code> function to collect heroes from the Star Wars universe. The <code>desired_publisher</code> for Star Wars is <code>'George Lucas'</code>.</li>
<div>


In [None]:
# Use get_publisher_heroes() to gather Star Wars heroes
star_wars_heroes = get_publisher_heroes(heroes, publishers, 'George Lucas')

print(star_wars_heroes)
print(type(star_wars_heroes))


</div>
<div>


In [None]:
# Use get_publisher_heroes_np() to gather Star Wars heroes
star_wars_heroes_np = get_publisher_heroes_np(heroes, publishers, 'George Lucas')

print(star_wars_heroes_np)
print(type(star_wars_heroes_np))


</div>

<div class=""><ul>
<li><strong>Within your IPython console</strong>, load the <code>line_profiler</code> and use <code>%lprun</code> to profile the two functions for line-by-line runtime. When using <code>%lprun</code>, use each function to gather the Star Wars heroes as you did in the previous step. After you've finished profiling, answer the following question:</li>
</ul>
<p><strong>Which function has the fastest runtime?</strong></p></div>



- [ ] <code>get_publisher_heroes()</code> is faster.
- [x] <code>get_publisher_heroes_np()</code> is faster.
- [ ] Both functions have the same runtime.



<div class=""><ul>
<li><strong>Within your IPython console</strong>, load the <code>memory_profiler</code> and use <code>%mprun</code> to profile the two functions for line-by-line memory consumption.</li>
</ul>
<p>The <code>get_publisher_heroes()</code> function and <code>get_publisher_heroes_np()</code> function have been saved within a file titled <code>hero_funcs.py</code> (i.e., you can import both functions from <code>hero_funcs</code>). When using <code>%mprun</code>, use each function to gather the Star Wars heroes as you did in the previous step. After you've finished profiling, answer the following question:</p>
<p><strong>Which function uses the least amount of memory?</strong></p></div>



- [ ] <code>get_publisher_heroes()</code> consumes less memory.
- [ ] <code>get_publisher_heroes_np()</code> consumes less memory.
- [x] Both functions have the same memory consumption.



<div class=""><p>Based on your runtime profiling and memory allocation profiling, which function would you choose to gather Star Wars heroes?</p></div>



- [ ] I would use <code>get_publisher_heroes()</code>.
- [x] I would use <code>get_publisher_heroes_np()</code>.
- [ ] I could use either function since their runtimes, and memory usage were identical.



<p class="">The Force is strong with this one! You're timing and profiling like a true Jedi. Now that you have the tools to evaluate code efficiencies, it's time to put them to use and start writing efficient Python code.</p>



# Gaining efficiencies

<p class="">This chapter covers more complex efficiency tips and tricks. You'll learn a few useful built-in modules for writing efficient code and practice using set theory.  You'll then learn about looping patterns in Python and how to make them more efficient.</p>



## Efficiently combining, counting, and iterating





### Combining Pokémon names and types


<div class>
<p>Three lists have been loaded into your session from a dataset that contains 720 Pokémon:</p>
<ul>
<li>The <code>names</code> list contains the names of each Pokémon.</li>
<li>The <code>primary_types</code> list contains the corresponding <strong>primary</strong> type of each Pokémon.</li>
<li>The <code>secondary_types</code> list contains the corresponding <strong>secondary</strong> type of each Pokémon (<code>nan</code> if the Pokémon has only one type).</li>
</ul>
<p>We want to combine each Pokémon's name and types together so that you easily see a description of each Pokémon. Practice using <code>zip()</code> to accomplish this task.</p>
</div>
<div class="exercise--instructions__content"><p>Combine the <code>names</code> list and the <code>primary_types</code> list into a new list object (called <code>names_type1</code>).</p></div>
<div>


In [None]:
# edited/added
names = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSoeJegBxXU61LsAJQ4sPPrw99EJXBccEfiC3fhrj6TQCnhZ5Q4J8P7oCP-TtR1W6Z3d9TMcCYc32Xy/pub?gid=728081830&single=true&output=csv", header = None).iloc[:,0].tolist()
primary_types = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSoeJegBxXU61LsAJQ4sPPrw99EJXBccEfiC3fhrj6TQCnhZ5Q4J8P7oCP-TtR1W6Z3d9TMcCYc32Xy/pub?gid=1599592048&single=true&output=csv", header = None).iloc[:,0].tolist()
secondary_types = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSoeJegBxXU61LsAJQ4sPPrw99EJXBccEfiC3fhrj6TQCnhZ5Q4J8P7oCP-TtR1W6Z3d9TMcCYc32Xy/pub?gid=1185014273&single=true&output=csv", header = None).iloc[:,0].tolist()

# Combine names and primary_types
names_type1 = [*zip(names, primary_types)]

print(*names_type1[:5], sep='\n')


</div>

<div class="exercise--instructions__content"><p>Combine <code>names</code>, <code>primary_types</code>, and <code>secondary_types</code> (in that order) using <code>zip()</code> and unpack the <em>zip object</em> into a new list.</p></div>
<div>


In [None]:
# Combine all three lists together
names_types = [*zip(names, primary_types, secondary_types)]

print(*names_types[:5], sep='\n')


</div>

<div class="exercise--instructions__content"><p>Use <code>zip()</code> to combine the <strong>first five items</strong> from the <code>names</code> list and the <strong>first three items</strong> from the <code>primary_types</code> list.</p></div>
<div>


In [None]:
# Combine five items from names and three items from primary_types
differing_lengths = [*zip(names[:5], primary_types[:3])]

print(*differing_lengths, sep='\n')


</div>



<p class="">Good job! You practiced using <code>zip()</code> to combine multiple objects together. This is a nice function that allows you to easily combine two or more objects. <br> <br> Did you notice that if you provide <code>zip()</code> with objects of differing lengths, it will only combine until the smallest lengthed object is exhausted?</p>



### Counting Pokémon from a sample


<div class>
<p>A sample of 500 Pokémon has been generated, and three lists from this sample have been loaded into your session:</p>
<ul>
<li>The <code>names</code> list contains the names of each Pokémon in the sample.</li>
<li>The <code>primary_types</code> list containing the corresponding <strong>primary</strong> type of each Pokémon in the sample.</li>
<li>The <code>generations</code> list contains the corresponding <strong>generation</strong> of each Pokémon in the sample.</li>
</ul>
<p>You want to quickly gather a few counts from these lists to better understand the sample that was generated. Use <code>Counter</code> from the <code>collections</code> module to explore what types of Pokémon are in your sample, what generations they come from, and how many Pokémon have a name that starts with a specific letter.</p>
<p><code>Counter</code> has already been imported into your session for convenience.</p>
</div>
<div>


In [None]:
# edited/added
from collections import Counter
names = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSoeJegBxXU61LsAJQ4sPPrw99EJXBccEfiC3fhrj6TQCnhZ5Q4J8P7oCP-TtR1W6Z3d9TMcCYc32Xy/pub?gid=1939281460&single=true&output=csv", header = None).iloc[:,0].tolist()
primary_types = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSoeJegBxXU61LsAJQ4sPPrw99EJXBccEfiC3fhrj6TQCnhZ5Q4J8P7oCP-TtR1W6Z3d9TMcCYc32Xy/pub?gid=642695943&single=true&output=csv", header = None).iloc[:,0].tolist()
generations = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSoeJegBxXU61LsAJQ4sPPrw99EJXBccEfiC3fhrj6TQCnhZ5Q4J8P7oCP-TtR1W6Z3d9TMcCYc32Xy/pub?gid=1609546095&single=true&output=csv", header = None).iloc[:,0].tolist()


</div>
<li>Collect the count of each primary type from the sample.</li>
<li>Collect the count of each generation from the sample.</li>
<li>Use list comprehension to collect the first letter of each Pokémon in the <code>names</code> list. Save this as <code>starting_letters</code>.</li>
<li>Collect the count of starting letters from the <code>starting_letters</code> list. Save this as <code>starting_letters_count</code>.</li>
<div>


In [None]:
# Collect the count of primary types
type_count = Counter(primary_types)
print(type_count, '\n')


</div>
<div>


In [None]:
# Collect the count of generations
gen_count = Counter(generations)
print(gen_count, '\n')


</div>
<div>


In [None]:
# Use list comprehension to get each Pokémon's starting letter
starting_letters = [name[0] for name in names]

# Collect the count of Pokémon for each starting_letter
starting_letters_count = Counter(starting_letters)
print(starting_letters_count)


</div>



<p class="">Great job! You used <code>Counter</code> from the <code>collections</code> module to better understand the sample of 500 Pokémon that was generated. The sample's most common Pokémon type was <code>'Water'</code> and the sample's least common Pokémon types were <code>'Ghost'</code> and <code>'Dark'</code>. Did you also notice that most of the Pokémon in the sample came from generation <code>5</code> and had a starting letter of <code>'S'</code>?</p>



### Combinations of Pokémon


<div class>
<p>Ash, a Pokémon trainer, encounters a group of five Pokémon. These Pokémon have been loaded into a list within your session (called <code>pokemon</code>) and printed into the console for your convenience.</p>
<p>Ash would like to try to catch some of these Pokémon, but his Pokédex can only store <strong>two</strong> Pokémon at a time. Let's use <code>combinations</code> from the <code>itertools</code> module to see what the possible pairs of Pokémon are that Ash could catch.</p>
</div>
<div>


In [None]:
# edited/added
pokemon = ['Geodude', 'Cubone', 'Lickitung', 'Persian', 'Diglett']


</div>
<li>Import <code>combinations</code> from <code>itertools</code>.</li>
<li>Create a <em>combinations object</em> called <code>combos_obj</code> that contains all possible pairs of Pokémon from the <code>pokemon</code> list. A pair has <code>2</code> Pokémon.</li>
<li>Unpack <code>combos_obj</code> into a list called <code>combos_2</code>.</li>
<li>Ash upgraded his Pokédex so that it can now store <strong>four</strong> Pokémon. Use <code>combinations</code> to collect all possible combinations of <code>4</code> different Pokémon. Save these combinations <strong>directly into a list</strong> called <code>combos_4</code> using the star character (<code>*</code>).</li>
<div>


In [None]:
# Import combinations from itertools
from itertools import combinations

# Create a combination object with pairs of Pokémon
combos_obj = combinations(pokemon, 2)
print(type(combos_obj), '\n')


</div>
<div>


In [None]:
# Convert combos_obj to a list by unpacking
combos_2 = [*combos_obj]
print(combos_2, '\n')


</div>
<div>


In [None]:
# Collect all possible combinations of 4 Pokémon directly into a list
combos_4 = [*combinations(pokemon, 4)]
print(combos_4)


</div>



<p class="">Awesome! You used <code>combinations()</code> from <code>itertools</code> to collect various combination-tuples from a list. <code>combinations()</code> allows you to specify any size of combinations by passing an integer as the second argument. Ash has <code>10</code> combination options when his Pokédex can store only two Pokémon. He has <code>5</code> combination options when his Pokédex can store four Pokémon.</p>



## Set theory





### Comparing Pokédexes


<div class>
<p>Two Pokémon trainers, Ash and Misty, would like to compare their individual collections of Pokémon. Let's see what Pokémon they have in common and what Pokémon Ash has that Misty does not.</p>
<p>Both Ash and Misty's Pokédex (their collection of Pokémon) have been loaded into your session as lists called <code>ash_pokedex</code> and <code>misty_pokedex</code>. They have been printed into the console for your convenience.</p>
</div>
<div>


In [None]:
# edited/added
ash_pokedex = ['Pikachu', 'Bulbasaur', 'Koffing', 'Spearow', 'Vulpix', 'Wigglytuff', 'Zubat', 'Rattata', 'Psyduck', 'Squirtle'] 
misty_pokedex = ['Krabby', 'Horsea', 'Slowbro', 'Tentacool', 'Vaporeon', 'Magikarp', 'Poliwag', 'Starmie', 'Psyduck', 'Squirtle']


</div>
<li>Convert both lists (<code>ash_pokedex</code> and <code>misty_pokedex</code>) to sets called <code>ash_set</code> and <code>misty_set</code> respectively.</li>
<li>Find the Pokémon that both Ash and Misty have in common using a set method.</li>
<li>Find the Pokémon that are within Ash's Pokédex but <strong>are not</strong> within Misty's Pokédex with a set method.</li>
<li>Use a set method to find the Pokémon that are unique to <strong>either</strong> Ash or Misty (i.e., the Pokémon that exist in <strong>exactly one</strong> of the Pokédexes but not both).</li>
<div>


In [None]:
# Convert both lists to sets
ash_set = set(ash_pokedex)
misty_set = set(misty_pokedex)

# Find the Pokémon that exist in both sets
both = ash_set.intersection(misty_set)
print(both)


</div>
<div>


In [None]:
# Find the Pokémon that Ash has, and Misty does not have
ash_only = ash_set.difference(misty_set)
print(ash_only)


</div>
<div>


In [None]:
# Find the Pokémon that are in only one set (not both)
unique_to_set = ash_set.symmetric_difference(misty_set)
print(unique_to_set)


</div>



<p class="">Great work! Using sets lets you do some cool comparisons between objects without the need to write a for loop. With a few lines of code, you were able to see that both Ash and Misty have <code>'Psyduck'</code> and <code>'Squirtle'</code> in their Pokédex. You were also able to see that Ash has <code>8</code> Pokémon that Misty does not have.</p>



### Searching for Pokémon


<div class>
<p>Two Pokémon trainers, Ash and Brock, have a collection of ten Pokémon each. Each trainer's Pokédex (their collection of Pokémon) has been loaded into your session as lists called <code>ash_pokedex</code> and <code>brock_pokedex</code> respectively.</p>
<p>You'd like to see if certain Pokémon are members of either Ash or Brock's Pokédex.</p>
<p>Let's compare using a <code>set</code> versus using a <code>list</code> when performing this membership testing.</p>
</div>
<div>


In [None]:
# edited/added
brock_pokedex = ['Onix', 'Geodude', 'Zubat', 'Golem', 'Vulpix', 'Tauros', 'Kabutops', 'Omastar', 'Machop', 'Dugtrio']


</div>
<li>Convert Brock's Pokédex list (<code>brock_pokedex</code>) to a set called <code>brock_pokedex_set</code>.</li>
<div>


In [None]:
# Convert Brock's Pokédex to a set
brock_pokedex_set = set(brock_pokedex)
print(brock_pokedex_set)


</div>
<li>Check if <code>'Psyduck'</code> is in Ash's Pokédex list (<code>ash_pokedex</code>) and if <code>'Psyduck'</code> is in Brock's Pokédex <strong>set</strong> (<code>brock_pokedex_set</code>).</li>
<div>


In [None]:
# Convert Brock's Pokédex to a set
brock_pokedex_set = set(brock_pokedex)
print(brock_pokedex_set)


</div>
<div>


In [None]:
# Check if Psyduck is in Ash's list and Brock's set
print('Psyduck' in ash_pokedex)
print('Psyduck' in brock_pokedex_set)


</div>
<li>Check if <code>'Machop'</code> is in Ash's Pokédex list (<code>ash_pokedex</code>) and if <code>'Machop'</code> is in Brock's Pokédex <strong>set</strong> (<code>brock_pokedex_set</code>).</li>
<div>


In [None]:
# Convert Brock's Pokédex to a set
brock_pokedex_set = set(brock_pokedex)
print(brock_pokedex_set)


</div>
<div>


In [None]:
# Check if Psyduck is in Ash's list and Brock's set
print('Psyduck' in ash_pokedex)
print('Psyduck' in brock_pokedex_set)


</div>
<div>


In [None]:
# Check if Machop is in Ash's list and Brock's set
print('Machop' in ash_pokedex)
print('Machop' in brock_pokedex_set)


</div>

<div class=""><p><strong>Within your IPython console</strong>, use <code>%timeit</code> to compare membership testing for <code>'Psyduck'</code> in <code>ash_pokedex</code>, <code>'Psyduck'</code> in <code>brock_pokedex_set</code>, <code>'Machop'</code> in <code>ash_pokedex</code>, and <code>'Machop'</code> in <code>brock_pokedex_set</code> (a total of <strong>four different timings</strong>).</p>
<p>Don't include the <code>print()</code> function. Only time the commands that you wrote <strong>inside</strong> the <code>print()</code> function in the previous steps. </p>
<p><strong>Which membership testing was faster?</strong></p></div>



- [ ] Using a list is faster than using a set for membership testing in all four cases.
- [ ] Member testing using a list and a set have the same runtimes for all four cases.
- [x] Member testing using a set is faster than using a list in all four cases.



<p class="">Awesome! Membership testing is much faster when you use sets. Did you notice that using a set for member testing is faster than using a list regardless if the item you are checking is in the set? Checking for 'Psyduck' (which was not in Brock's set) is still faster than checking for 'Psyduck' in Ash's list!</p>



### Gathering unique Pokémon


<div class>
<p>A sample of 500 Pokémon has been created <strong>with replacement</strong> (meaning a Pokémon could be selected more than once and duplicates exist within the sample).</p>
<p>Three lists have been loaded into your session:</p>
<ul>
<li>The <code>names</code> list contains the names of each Pokémon in the sample.</li>
<li>The <code>primary_types</code> list containing the corresponding <strong>primary</strong> type of each Pokémon in the sample.</li>
<li>The <code>generations</code> list contains the corresponding <strong>generation</strong> of each Pokémon in the sample.</li>
</ul>
<p>The below function was written to gather unique values from each list:</p>
<pre><code>def find_unique_items(data):
    uniques = []

    for item in data:
        if item not in uniques:
            uniques.append(item)

    return uniques
</code></pre>
<p>Let's compare the above function to using the <code>set</code> data type for collecting unique items.</p>
</div>
<div>


In [None]:
# edited/added
from collections import Counter
names = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSoeJegBxXU61LsAJQ4sPPrw99EJXBccEfiC3fhrj6TQCnhZ5Q4J8P7oCP-TtR1W6Z3d9TMcCYc32Xy/pub?gid=425640626&single=true&output=csv", header = None).iloc[:,0].tolist()
primary_types = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSoeJegBxXU61LsAJQ4sPPrw99EJXBccEfiC3fhrj6TQCnhZ5Q4J8P7oCP-TtR1W6Z3d9TMcCYc32Xy/pub?gid=553528834&single=true&output=csv", header = None).iloc[:,0].tolist()
generations = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSoeJegBxXU61LsAJQ4sPPrw99EJXBccEfiC3fhrj6TQCnhZ5Q4J8P7oCP-TtR1W6Z3d9TMcCYc32Xy/pub?gid=800099022&single=true&output=csv", header = None).iloc[:,0].tolist()

def find_unique_items(data):
    uniques = []

    for item in data:
        if item not in uniques:
            uniques.append(item)

    return uniques


</div>
<li>Use the provided function to collect the unique Pokémon in the <code>names</code> list. Save this as <code>uniq_names_func</code>.</li>
<div>


In [None]:
# Use the provided function to collect unique Pokémon names
uniq_names_func = find_unique_items(names)
print(len(uniq_names_func))


</div>
<li>Use a <code>set</code> data type to collect the unique Pokémon in the <code>names</code> list. Save this as <code>uniq_names_set</code>.</li>
<div>


In [None]:
# Use find_unique_items() to collect unique Pokémon names
uniq_names_func = find_unique_items(names)
print(len(uniq_names_func))


</div>
<div>


In [None]:
# Convert the names list to a set to collect unique Pokémon names
uniq_names_set = set(names)
print(len(uniq_names_set))


</div>
<div>


In [None]:
# Check that both unique collections are equivalent
print(sorted(uniq_names_func) == sorted(uniq_names_set))


</div>

<div class=""><p><strong>Within your IPython console</strong>, use <code>%timeit</code> to compare the <code>find_unique_items()</code> function with using a <code>set</code> data type to collect unique Pokémon character names in <code>names</code>.</p>
<p><strong>Which membership testing was faster?</strong></p></div>



- [x] Using a <code>set</code> to collect unique values is faster.
- [ ] Using the provided function that uses a loop to gather unique items is faster.
- [ ] Both approaches have the same runtime.



<li>Use the most efficient approach for gathering unique items to collect the unique Pokémon types (from the <code>primary_types</code> list) and Pokémon generations (from the <code>generations</code> list).</li>
<div>


In [None]:
# Use find_unique_items() to collect unique Pokémon names
uniq_names_func = find_unique_items(names)
print(len(uniq_names_func))


</div>
<div>


In [None]:
# Convert the names list to a set to collect unique Pokémon names
uniq_names_set = set(names)
print(len(uniq_names_set))


</div>
<div>


In [None]:
# Check that both unique collections are equivalent
print(sorted(uniq_names_func) == sorted(uniq_names_set))


</div>
<div>


In [None]:
# Use the best approach to collect unique primary types and generations
uniq_types = set(primary_types) 
uniq_gens = set(generations)
print(uniq_types, uniq_gens, sep='\n') 


</div>



<p class="">Nice work! Using a <code>set</code> data type to collect unique values is much faster than using a for loop (like in the <code>find_unique_items()</code> function). Since a set is defined as a collection of distinct elements, it is an efficient way to collect unique items from an existing object. Here you took advantage of a <code>set</code> to find the distinct Pokémon from the sample (eliminating duplicate Pokémon) and saw what unique Pokémon types and generations were included in the sample.</p>



## Eliminating loops





### Gathering Pokémon without a loop


<div class>
<p>A list containing 720 Pokémon has been loaded into your session as <code>poke_names</code>. Another list containing each Pokémon's corresponding generation has been loaded as <code>poke_gens</code>.</p>
<p>A for loop has been created to filter the Pokémon that belong to generation one or two, and collect the number of letters in each Pokémon's name:</p>
<pre><code>gen1_gen2_name_lengths_loop = []

for name,gen in zip(poke_names, poke_gens):
    if gen &lt; 3:
        name_length = len(name)
        poke_tuple = (name, name_length)
        gen1_gen2_name_lengths_loop.append(poke_tuple)
</code></pre>
</div>
<div>


In [None]:
# edited/added
poke_names = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSoeJegBxXU61LsAJQ4sPPrw99EJXBccEfiC3fhrj6TQCnhZ5Q4J8P7oCP-TtR1W6Z3d9TMcCYc32Xy/pub?gid=728081830&single=true&output=csv", header = None).iloc[:,0].tolist()
poke_gens = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSoeJegBxXU61LsAJQ4sPPrw99EJXBccEfiC3fhrj6TQCnhZ5Q4J8P7oCP-TtR1W6Z3d9TMcCYc32Xy/pub?gid=1561338519&single=true&output=csv", header = None).iloc[:,0].tolist()
gen1_gen2_name_lengths_loop = []


</div>
<li>Eliminate the above for loop using list comprehension and the <code>map()</code> function:
<ul>
<li>Use list comprehension to collect each Pokémon that belongs to generation 1 or generation 2. Save this as <code>gen1_gen2_pokemon</code>.</li>
<li>Use the <code>map()</code> function to collect the number of letters in each Pokémon's name within the <code>gen1_gen2_pokemon</code> list. Save this <em>map object</em> as <code>name_lengths_map</code>.</li>
<li>Combine <code>gen1_gen2_pokemon</code> and <code>name_length_map</code> into a list called <code>gen1_gen2_name_lengths</code>.</li>
</ul>
</li>
<div>


In [None]:
# Collect Pokémon that belong to generation 1 or generation 2
gen1_gen2_pokemon = [name for name,gen in zip(poke_names, poke_gens) if gen < 3]

# Create a map object that stores the name lengths
name_lengths_map = map(len, gen1_gen2_pokemon)

# Combine gen1_gen2_pokemon and name_lengths_map into a list
gen1_gen2_name_lengths = [*zip(gen1_gen2_pokemon, name_lengths_map)]

print(gen1_gen2_name_lengths_loop[:5])
print(gen1_gen2_name_lengths[:5])


</div>



<p class="">Great job! You successfully used list comprehension and the <code>map()</code> function to eliminate a for loop. If you compared runtimes between the for loop and using list comprehension with a <code>map()</code> function, you'd see that the for loop took quite a bit longer. <br> <br> If you're an experienced Pythonista, you may have noticed that you could replace the entire for loop with one list comprehension: <code>[(name, len(name)) for name,gen in zip(poke_names, poke_gens) if gen &lt; 3]</code></p>



### Pokémon totals and averages without a loop


<div class>
<p>A list of 720 Pokémon has been loaded into your session called <code>names</code>. Each Pokémon's corresponding statistics has been loaded as a NumPy array called <code>stats</code>. Each row of <code>stats</code> corresponds to a Pokémon in <code>names</code> and each column represents an individual Pokémon stat (<code>HP</code>, <code>Attack</code>, <code>Defense</code>, <code>Special Attack</code>, <code>Special Defense</code>, and <code>Speed</code> respectively.)</p>
<p>You want to gather each Pokémon's total stat value (i.e., the sum of each row in <code>stats</code>) and each Pokémon's average stat value (i.e., the mean of each row in <code>stats</code>) so that you find the strongest Pokémon.</p>
<p>The below for loop was written to collect these values:</p>
<pre><code>poke_list = []

for pokemon,row in zip(names, stats):
    total_stats = np.sum(row)
    avg_stats = np.mean(row)
    poke_list.append((pokemon, total_stats, avg_stats))
</code></pre>
</div>
<div>


In [None]:
# edited/added
names = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSoeJegBxXU61LsAJQ4sPPrw99EJXBccEfiC3fhrj6TQCnhZ5Q4J8P7oCP-TtR1W6Z3d9TMcCYc32Xy/pub?gid=728081830&single=true&output=csv", header = None).iloc[:,0].tolist()
stats_df = pd.read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vSoeJegBxXU61LsAJQ4sPPrw99EJXBccEfiC3fhrj6TQCnhZ5Q4J8P7oCP-TtR1W6Z3d9TMcCYc32Xy/pub?gid=99951674&single=true&output=csv")
stats = stats_df.to_numpy()
poke_list = []

for pokemon,row in zip(names, stats):
    total_stats = np.sum(row)
    avg_stats = np.mean(row)
    poke_list.append((pokemon, total_stats, avg_stats))


</div>
<li>Replace the above for loop using NumPy:<ul>
<li>Create a total stats array (<code>total_stats_np</code>) using the <code>.sum()</code> method and specifying the correct axis.</li>
<li>Create an average stats array (<code>avg_stats_np</code>) using the <code>.mean()</code> method and specifying the correct axis.</li>
<li>Create a final output list (<code>poke_list_np</code>) by combining the <code>names</code> list, the <code>total_stats_np</code> array, and the <code>avg_stats_np</code> array.</li>
</ul>
</li>
<div>


In [None]:
# Create a total stats array
total_stats_np = stats.sum(axis=1)

# Create an average stats array
avg_stats_np = stats.mean(axis=1)

# Combine names, total_stats_np, and avg_stats_np into a list
poke_list_np = [*zip(names, total_stats_np, avg_stats_np)]

print(poke_list_np == poke_list, '\n')
print(poke_list_np[:3])
print(poke_list[:3], '\n')
top_3 = sorted(poke_list_np, key=lambda x: x[1], reverse=True)[:3]
print('3 strongest Pokémon:\n{}'.format(top_3))


</div>



<p class="">Great work! You used NumPy's <code>.sum()</code> and <code>.mean()</code> methods with a specific axis to eliminate a for loop. With this approach, you were able to quickly see that 'GroudonPrimal Groudon', 'KyogrePrimal Kyogre', and 'Arceus' were the strongest Pokémon in your list based on total stats. <br><br> If you were to gather run times, the for loop would have taken <em>milliseconds</em> to execute while the NumPy approach would have taken <em>microseconds</em> to execute. This is quite an improvement!</p>



## Writing better loops





### One-time calculation loop


<div class>
<p>A list of integers that represents each Pokémon's generation has been loaded into your session called <code>generations</code>. You'd like to gather the counts of each generation and determine what percentage each generation accounts for out of the total count of integers.</p>
<p>The below loop was written to accomplish this task:</p>
<pre><code>for gen,count in gen_counts.items():
    total_count = len(generations)
    gen_percent = round(count / total_count * 100, 2)
    print(
      'generation {}: count = {:3} percentage = {}'
      .format(gen, count, gen_percent)
    )
</code></pre>
<p>Let's make this loop more efficient by moving a one-time calculation outside the loop.</p>
</div>

<li>Import <code>Counter</code> from the <code>collections</code> module.</li>
<li>Use <code>Counter()</code> to collect the count of each generation from the <code>generations</code> list. Save this as <code>gen_counts</code>.</li>
<li>Write a better for loop that places a <strong>one-time</strong> calculation outside (above) the loop. Use the exact same syntax as the original for loop (simply copy and paste the one-time calculation above the loop).</li>
<div>


In [None]:
# Import Counter
from collections import Counter

# Collect the count of each generation
gen_counts = Counter(generations)

# Improve for loop by moving one calculation above the loop
total_count = len(generations)

for gen,count in gen_counts.items():
    gen_percent = round(count / total_count * 100, 2)
    print('generation {}: count = {:3} percentage = {}'
          .format(gen, count, gen_percent))


</div>



<p class="">Well done! You spotted a calculation that could be moved outside a loop to make the loop more efficient. Since the total count is now calculated just once (and not with each loop iteration), you can expect to see an efficiency gain with your new loop. When writing a loop is unavoidable, be sure to analyze the loop and move any one-time calculations outside.</p>



### Holistic conversion loop


<div class>
<p>A list of all possible Pokémon types has been loaded into your session as <code>pokemon_types</code>. It's been printed in the console for convenience.</p>
<p>You'd like to gather all the possible pairs of Pokémon types. You want to store each of these pairs in an individual list with an enumerated index as the first element of each list. This allows you to see the total number of possible pairs and provides an indexed label for each pair.</p>
<p>The below loop was written to accomplish this task:</p>
<pre><code>enumerated_pairs = []

for i,pair in enumerate(possible_pairs, 1):
    enumerated_pair_tuple = (i,) + pair
    enumerated_pair_list = list(enumerated_pair_tuple)
    enumerated_pairs.append(enumerated_pair_list)
</code></pre>
<p>Let's make this loop more efficient using a holistic conversion.</p>
</div>
<div>


In [None]:
# edited/added
pokemon_types = ['Bug', 'Dark', 'Dragon', 'Electric', 'Fairy', 'Fighting', 'Fire', 'Flying', 'Ghost', 'Grass', 'Ground', 'Ice', 'Normal', 'Poison', 'Psychic', 'Rock', 'Steel', 'Water']


</div>
<li>
<code>combinations</code> from the <code>itertools</code> module has been loaded into your session. Use it to create a list called <code>possible_pairs</code> that contains all possible pairs of Pokémon types (each pair has <code>2</code> Pokémon types).</li>
<li>Create an empty list called <code>enumerated_tuples</code> above the for loop.</li>
<li>Within the for loop, append each <code>enumerated_pair_tuple</code> to the empty list you created in the above step. </li>
<li>Use a built-in function to convert each tuple in <code>enumerated_tuples</code> to a list.</li>
<div>


In [None]:
# Collect all possible pairs using combinations()
possible_pairs = [*combinations(pokemon_types, 2)]

# Create an empty list called enumerated_tuples
enumerated_tuples = []

# Append each enumerated_pair_tuple to the empty list above
for i,pair in enumerate(possible_pairs, 1):
    enumerated_pair_tuple = (i,) + pair
    enumerated_tuples.append(enumerated_pair_tuple)

# Convert all tuples in enumerated_tuples to a list
enumerated_pairs = [*map(list, enumerated_tuples)]
print(enumerated_pairs)


</div>



<p class="">Great job! Rather than converting each tuple to a list <em>within</em> the loop, you used the <code>map()</code> function to convert tuples to lists all at once outside of a loop. You're getting the hang of writing efficient loops! Remember, you want to avoid looping as much as possible when writing Python code. In cases where looping is unavoidable, be sure to check your loops for one-time calculations and holistic conversions to make them more efficient.</p>



### Bringing it all together: Pokémon z-scores


<div class>
<p>A list of 720 Pokémon has been loaded into your session as <code>names</code>. Each Pokémon's corresponding Health Points is stored in a NumPy array called <code>hps</code>. You want to analyze the Health Points using the <a href="https://en.wikipedia.org/wiki/Standard_score">z-score</a> to see how many standard deviations each Pokémon's HP is from the mean of all HPs.</p>
<p>The below code was written to calculate the HP z-score for each Pokémon and gather the Pokémon with the highest HPs based on their z-scores:</p>
<pre><code>poke_zscores = []

for name,hp in zip(names, hps):
    hp_avg = hps.mean()
    hp_std = hps.std()
    z_score = (hp - hp_avg)/hp_std
    poke_zscores.append((name, hp, z_score))
</code></pre>
<pre><code>highest_hp_pokemon = []

for name,hp,zscore in poke_zscores:
    if zscore &gt; 2:
        highest_hp_pokemon.append((name, hp, zscore))
</code></pre>
</div>
<div>


In [None]:
# edited/added
hps = stats_df.HP.values
len(hps)


</div>
<li>Use NumPy to eliminate the for loop used to create the z-scores.</li>
<li>Then, combine the <code>names</code>, <code>hps</code>, and <code>z_scores</code> objects together into a list called <code>poke_zscores2</code>.</li>
<div>


In [None]:
# Calculate the total HP avg and total HP standard deviation
hp_avg = hps.mean()
hp_std = hps.std()

# Use NumPy to eliminate the previous for loop
z_scores = (hps - hp_avg)/hp_std

# Combine names, hps, and z_scores
poke_zscores2 = [*zip(names, hps, z_scores)]
print(*poke_zscores2[:3], sep='\n')


</div>
<li>Use list comprehension to replace the for loop used to collect Pokémon with the highest HPs based on their z-score.</li>
<div>


In [None]:
# Calculate the total HP avg and total HP standard deviation
hp_avg = hps.mean()
hp_std = hps.std()

# Use NumPy to eliminate the previous for loop
z_scores = (hps - hp_avg)/hp_std

# Combine names, hps, and z_scores
poke_zscores2 = [*zip(names, hps, z_scores)]
print(*poke_zscores2[:3], sep='\n')


</div>
<div>


In [None]:
# Use list comprehension with the same logic as the highest_hp_pokemon code block
highest_hp_pokemon2 = [(name, hp, zscore) for name,hp,zscore in poke_zscores2 if zscore > 2]
print(*highest_hp_pokemon2, sep='\n')


</div>

<div class=""><p>Use <code>%%timeit</code> (<em>cell magic mode</em>) <strong>within your IPython console</strong> to compare the runtimes between the original code blocks and the new code you developed using NumPy and list comprehension.</p>
<p><strong>Don't include the <code>print()</code> statements when timing.</strong> You should include <strong>ten lines of code</strong> when timing the original code blocks and <strong>five lines of code</strong> when timing the new code you developed. You may need to press <code>SHIFT+ENTER</code> after entering <code>%%timeit</code> to get to a new line within your IPython console.</p>
<p><strong>Which approach was the faster?</strong></p></div>



- [ ] The total time for executing both of the original code blocks was faster.
- [x] The total time for executing the updated solution using NumPy and list comprehension was faster.
- [ ] Both approaches had the same execution time.



<p class="">Great job! You're Catching 'Em All (efficiencies that is). You eliminated two loops using NumPy broadcasting and list comprehension. Did you notice how much faster the approach you developed was compared to the original loops? What a great improvement! <br><br> Remember the techniques you've learned throughout this chapter as you continue writing Python code outside this course. Keep in mind the built-in functions and modules you covered to eliminate loops and remember to check your unavoidable loops for things that can be moved outside.</p>



# Basic pandas optimizations

<p class="">This chapter offers a brief introduction on how to efficiently work with pandas DataFrames. You'll learn the various options you have for iterating over a DataFrame. Then, you'll learn how to efficiently apply functions to data stored in a DataFrame.</p>



## Intro to pandas DataFrame iteration





### Iterating with .iterrows()


<div class>
<p>In the video, we discussed that <code>.iterrows()</code> returns each DataFrame row as a tuple of (index, <code>pandas</code> Series) pairs. But, what does this mean? Let's explore with a few coding exercises.</p>
<p>A <code>pandas</code> DataFrame has been loaded into your session called <code>pit_df</code>. This DataFrame contains the stats for the Major League Baseball team named the Pittsburgh Pirates (abbreviated as <code>'PIT'</code>) from the year 2008 to the year 2012. It has been printed into your console for convenience.</p>
</div>
<div class="exercise--instructions__content"><p>Use <code>.iterrows()</code> to loop over <code>pit_df</code> and print each row. Save the first item from <code>.iterrows()</code> as <code>i</code> and the second as <code>row</code>.</p></div>
<div>


In [None]:
# edited/added
baseball_df = pd.read_csv("https://assets.datacamp.com/production/repositories/3581/datasets/779033fb8fb5021aee9ff46253980abcbc5851f3/baseball_stats.csv")
pit_df = baseball_df[baseball_df.Team == 'PIT']

# Iterate over pit_df and print each row
for i,row in pit_df.iterrows():
    print(row)


</div>

<div class="exercise--instructions__content"><p>Add <strong>two</strong> lines to the loop: one <em>before</em> <code>print(row)</code> to print each index variable and one <em>after</em> to print each row's type.</p></div>
<div>


In [None]:
# Iterate over pit_df and print each index variable, row, and row type
for i,row in pit_df.iterrows():
    print(i)
    print(row)
    print(type(row))


</div>

<div class="exercise--instructions__content"><p>Instead of using <code>i</code> and <code>row</code> in the for statement to store the output of <code>.iterrows()</code>, use <strong>one</strong> variable named <code>row_tuple</code>.</p></div>
<div>


In [None]:
# Use one variable instead of two to store the result of .iterrows()
for row_tuple in pit_df.iterrows():
    print(row_tuple)


</div>

<div class="exercise--instructions__content"><p>Add a line in the for loop to print the type of each <code>row_tuple</code>.</p></div>
<div>


In [None]:
# Print the row and type of each row
for row_tuple in pit_df.iterrows():
    print(row_tuple)
    print(type(row_tuple))


</div>



<p class="">Nice work! Since <code>.iterrows()</code> returns each DataFrame row as a tuple of (index, <code>pandas</code> Series) pairs, you can either split this tuple and use the index and row-values separately (as you did with <code>for i,row in pit_df.iterrows()</code>), or you can keep the result of <code>.iterrows()</code> in the tuple form (as you did with <code>for row_tuple in pit_df.iterrows()</code>).<br><br>If using <code>i,row</code>, you can access things from the row using square brackets (i.e., <code>row['Team']</code>). If using <code>row_tuple</code>, you would have to specify which element of the tuple you'd like to access before grabbing the team name (i.e., <code>row_tuple[1]['Team']</code>). <br><br> With either approach, using <code>.iterrows()</code> will still be substantially faster than using <code>.iloc</code> as you saw in the video.</p>



### Run differentials with .iterrows()


<div class>
<p>You've been hired by the San Francisco Giants as an analyst—congrats! The team's owner wants you to calculate a metric called the <em>run differential</em> for each season from the year 2008 to 2012. This metric is calculated by subtracting the total number of runs a team allowed in a season from the team's total number of runs scored in a season. <code>'RS'</code> means runs scored and <code>'RA'</code> means runs allowed.</p>
<p>The below function calculates this metric:</p>
<pre><code>def calc_run_diff(runs_scored, runs_allowed):

    run_diff = runs_scored - runs_allowed

    return run_diff
</code></pre>
<p>A DataFrame has been loaded into your session as <code>giants_df</code> and printed into the console. Let's practice using <code>.iterrows()</code> to add a <em>run differential</em> column to this DataFrame.</p>
</div>
<div>


In [None]:
# edited/added
giants_df = baseball_df[(baseball_df.Team == 'SFG') & (baseball_df.Year.between(2008,2012))][['Team', 'League', 'Year', 'RS', 'RA', 'W', 'G', 'Playoffs']]

def calc_run_diff(runs_scored, runs_allowed):

    run_diff = runs_scored - runs_allowed

    return run_diff


</div>
<li>Create an empty list called <code>run_diffs</code> that will be used to store the <em>run differentials</em> you will calculate.</li>
<div>


In [None]:
# Create an empty list to store run differentials
run_diffs = []


</div>
<li>Write a for loop that uses <code>.iterrows()</code> to loop over <code>giants_df</code> and collects each row's runs scored and runs allowed.</li>
<div>


In [None]:
# Create an empty list to store run differentials
run_diffs = []

# Write a for loop and collect runs allowed and runs scored for each row
for i,row in giants_df.iterrows():
    runs_scored = row['RS']
    runs_allowed = row['RA']


</div>
<li>Add a line to the for loop that uses the provided function to calculate each row's <em>run differential</em>.</li>
<div>


In [None]:
# Create an empty list to store run differentials
run_diffs = []

# Write a for loop and collect runs allowed and runs scored for each row
for i,row in giants_df.iterrows():
    runs_scored = row['RS']
    runs_allowed = row['RA']
    
    # Use the provided function to calculate run_diff for each row
    run_diff = calc_run_diff(runs_scored, runs_allowed)


</div>
<li>Add a line to the loop that appends each row's <em>run differential</em> to the <code>run_diffs</code> list.</li>
<div>


In [None]:
# Create an empty list to store run differentials
run_diffs = []

# Write a for loop and collect runs allowed and runs scored for each row
for i,row in giants_df.iterrows():
    runs_scored = row['RS']
    runs_allowed = row['RA']
    
    # Use the provided function to calculate run_diff for each row
    run_diff = calc_run_diff(runs_scored, runs_allowed)
    
    # Append each run differential to the output list
    run_diffs.append(run_diff)

giants_df['RD'] = run_diffs
print(giants_df)


</div>



<p class="">Great job! Take a look at the <code>giants_df</code> DataFrame with the new run differential column (<code>'RD'</code>) you created (it has been printed in the console).<br><br>The <code>'Playoffs'</code> column tells you if a team made the playoffs for a given season. A <code>1</code> means that the team made the playoffs in that season and a <code>0</code> means the team did not make the playoffs in that season.<br><br>Did you notice that in the seasons with the highest run differentials the Giants made the playoffs? In fact, in both of these seasons (2010 and 2012), the San Francisco Giants not only made the playoffs but also won the World Series! Cool!</p>



## Another iterator method: .itertuples()





### Iterating with .itertuples()


<div class>
<p>Remember, <code>.itertuples()</code> returns each DataFrame row as a special data type called a <strong>namedtuple</strong>. You can look up an attribute within a namedtuple with a special syntax. Let's practice working with namedtuples.</p>
<p>A <code>pandas</code> DataFrame has been loaded into your session called <code>rangers_df</code>. This DataFrame contains the stats (<code>'Team'</code>, <code>'League'</code>, <code>'Year'</code>, <code>'RS'</code>, <code>'RA'</code>, '<code>W'</code>, <code>'G'</code>, and <code>'Playoffs'</code>) for the Major League baseball team named the Texas Rangers (abbreviated as <code>'TEX'</code>).</p>
</div>
<div class="exercise--instructions__content"><p>Use <code>.itertuples()</code> to loop over <code>rangers_df</code> and print each row.</p></div>
<div>


In [None]:
# edited/added
rangers_df = baseball_df[(baseball_df.Team == 'TEX') & (baseball_df.Year.between(1973,2012))][['Team', 'League', 'Year', 'RS', 'RA', 'W', 'G', 'Playoffs']]

# Loop over the DataFrame and print each row
for row in rangers_df.itertuples():
  print(row)


</div>

<div class="exercise--instructions__content"><p>Loop over <code>rangers_df</code> with <code>.itertuples()</code> and save each row's <code>Index</code>, <code>Year</code>, and Wins (<code>W</code>) attribute as <code>i</code>, <code>year</code>, and <code>wins</code>.</p></div>
<div>


In [None]:
# Loop over the DataFrame and print each row's Index, Year and Wins (W)
for row in rangers_df.itertuples():
  i = row.Index
  year = row.Year
  wins = row.W
  print(i, year, wins)


</div>

<div class="exercise--instructions__content"><p>Now, loop over <code>rangers_df</code> and print these values <strong>only for those rows</strong> where the Rangers made the playoffs.</p></div>
<div>


In [None]:
# Loop over the DataFrame and print each row's Index, Year and Wins (W)
for row in rangers_df.itertuples():
  i = row.Index
  year = row.Year
  wins = row.W
  
  # Check if rangers made Playoffs (1 means yes; 0 means no)
  if row.Playoffs == 1:
    print(i, year, wins)


</div>



<p class="">Awesome! You're getting the hang of using <code>.itertuples()</code>. Remember, you need to use the <em>dot</em> syntax for referencing an attribute in a <strong>namedtuple</strong>.<br><br> You can create a new variable using a row's dot reference (as you did when storing <code>row.Index</code> as the variable <code>i</code>). Or you can use the row's dot reference directly to perform calculations and checks. Notice that you did not have to save <code>row.Playoffs</code> to a new variable in your check statement (you were able to use <code>row.Playoffs</code> directly in your check).<br><br> Did you notice the pattern in the Texas Rangers playoff appearances? Only six appearances and two distinct sets of groupings (one from 2010 - 2012 and one from 1996 - 1999).</p>



### Run differentials with .itertuples()


<div class>
<p>The New York Yankees have made a trade with the San Francisco Giants for your analyst contract— you're a hot commodity! Your new boss has seen your work with the Giants and now wants you to do something similar with the Yankees data. He'd like you to calculate <em>run differentials</em> for the Yankees from the year 1962 to the year 2012 and find which season they had the best <em>run differential</em>.</p>
<p>You've remembered the function you used when working with the Giants and quickly write it down:</p>
<pre><code>def calc_run_diff(runs_scored, runs_allowed):

    run_diff = runs_scored - runs_allowed

    return run_diff
</code></pre>
<p>Let's use <code>.itertuples()</code> to loop over the <code>yankees_df</code> DataFrame (which has been loaded into your session) and calculate <em>run differentials</em>.</p>
</div>
<div>


In [None]:
# edited/added
yankees_df = baseball_df[(baseball_df.Team == 'NYY') & (baseball_df.Year.between(1962,2012))][['Team', 'League', 'Year', 'RS', 'RA', 'W', 'G', 'Playoffs']]


</div>
<li>Use <code>.itertuples()</code> to loop over <code>yankees_df</code> and grab each row's runs scored and runs allowed values.</li>
<div>


In [None]:
run_diffs = []

# Loop over the DataFrame and calculate each row's run differential
for row in yankees_df.itertuples():
    
    runs_scored = row.RS
    runs_allowed = row.RA


</div>
<li>Now, calculate each row's <em>run differential</em> using <code>calc_run_diff()</code>. Be sure to append each row's <em>run differential</em> to <code>run_diffs</code>.</li>
<div>


In [None]:
run_diffs = []

# Loop over the DataFrame and calculate each row's run differential
for row in yankees_df.itertuples():
    
    runs_scored = row.RS
    runs_allowed = row.RA

    run_diff = calc_run_diff(runs_scored, runs_allowed)
    
    run_diffs.append(run_diff)


</div>
<li>Append a new column called <code>'RD'</code> to the <code>yankees_df</code> DataFrame that contains the <em>run differentials</em> you calculated.</li>
<div>


In [None]:
run_diffs = []

# Loop over the DataFrame and calculate each row's run differential
for row in yankees_df.itertuples():
    
    runs_scored = row.RS
    runs_allowed = row.RA

    run_diff = calc_run_diff(runs_scored, runs_allowed)
    
    run_diffs.append(run_diff)

# Append new column
yankees_df['RD'] = run_diffs
print(yankees_df)


</div>

<div class=""><ul>
<li>In what year within your DataFrame did the New York Yankees have the highest <em>run differential</em>?</li>
</ul>
<p><strong>You'll need to rerun the code that creates the <code>'RD'</code> column if you'd like to analyze the DataFrame with code rather than looking at the console output.</strong></p></div>



- [ ] In <strong>2011</strong> (with a <em>Run Differential</em> of <strong>210</strong>)
- [x] In <strong>1998</strong> (with a <em>Run Differential</em> of <strong>309</strong>)
- [ ] In <strong>1962</strong> (with a <em>Run Differential</em> of <strong>503</strong>)
- [ ] In <strong>1985</strong> (with a <em>Run Differential</em> of <strong>315</strong>)



<p class="">Great job! You used <code>.itertuples()</code> to help the Yankees calculate <em>run differentials</em>. Remember, using <code>.itertuples()</code> is just like using <code>.iterrows()</code> except it tends to be faster. You also have to use a <em>dot</em> reference when looking up attributes with <code>.itertuples()</code>.<br><br> You found that the Yankees' highest <em>run differential</em> was in 1998. Did you know they actually hold the record for the highest <em>run differential</em> in an MLB season (411 in the year 1939 where they scored 967 runs and allowed 556)? Wow!</p>



## pandas alternative to looping





### Analyzing baseball stats with .apply()


<div class>
<p>The Tampa Bay Rays want you to analyze their data.</p>
<p>They'd like the following metrics:</p>
<ul>
<li>The sum of each column in the data</li>
<li>The total amount of runs scored in a year (<code>'RS'</code> + <code>'RA'</code> for each year)</li>
<li>The <code>'Playoffs'</code> column in text format rather than using <code>1</code>'s and <code>0</code>'s</li>
</ul>
<p>The below function can be used to convert the <code>'Playoffs'</code> column to text:</p>
<pre><code>def text_playoffs(num_playoffs): 
    if num_playoffs == 1:
        return 'Yes'
    else:
        return 'No' 
</code></pre>
<p>Use <code>.apply()</code> to get these metrics. A DataFrame (<code>rays_df</code>) has been loaded and printed to the console. This DataFrame is indexed on the <code>'Year'</code> column.</p>
</div>
<div class="exercise--instructions__content"><p>Apply <code>sum()</code> to each <strong>column</strong> of <code>rays_df</code> to collect the sum of each column. Be sure to specify the correct <code>axis</code>.</p></div>
<div>


In [None]:
# edited/added
def text_playoffs(num_playoffs): 
    if num_playoffs == 1:
        return 'Yes'
    else:
        return 'No' 

rays_df = baseball_df[baseball_df.Team == 'TBR'][['Year', 'RS', 'RA', 'W', 'Playoffs']].set_index('Year')
rays_df.index.names = [None]

# Gather sum of all columns
stat_totals = rays_df.apply(sum, axis=0)
print(stat_totals)


</div>

<div class="exercise--instructions__content"><p>Apply <code>sum()</code> to each <strong>row</strong> of <code>rays_df</code>, only looking at the <code>'RS'</code> and <code>'RA'</code> columns, and specify the correct <code>axis</code>.</p></div>
<div>


In [None]:
# Gather total runs scored in all games per year
total_runs_scored = rays_df[['RS', 'RA']].apply(sum, axis=1)
print(total_runs_scored)


</div>

<div class="exercise--instructions__content"><p>Use <code>.apply()</code> and a <code>lambda</code> function to apply <code>text_playoffs()</code> to each <strong>row</strong>'s <code>'Playoffs'</code> value of the <code>rays_df</code> DataFrame.</p></div>
<div>


In [None]:
# Convert numeric playoffs to text by applying text_playoffs()
textual_playoffs = rays_df.apply(lambda row: text_playoffs(row['Playoffs']), axis=1)
print(textual_playoffs)


</div>



<p class="">Great work! The <code>.apply()</code> method let's you apply functions to all rows or columns of a DataFrame by specifying an axis.<br><br>If you've been using <code>pandas</code> for some time, you may have noticed that a better way to find these stats would use the <code>pandas</code> built-in <code>.sum()</code> method.<br><br> You could have used <code>rays_df.sum(axis=0)</code> to get columnar sums and <code>rays_df[['RS', 'RA']].sum(axis=1)</code> to get row sums.<br><br> You could have also used <code>.apply()</code> <strong>directly</strong> on a Series (or column) of the DataFrame. For example, you could use <code>rays_df['Playoffs'].apply(text_playoffs)</code> to convert the <code>'Playoffs'</code> column to text.</p>



### Settle a debate with .apply()


<div class>
<p>Word has gotten to the Arizona Diamondbacks about your awesome analytics skills. They'd like for you to help settle a debate amongst the managers. One manager claims that the team has made the playoffs every year they have had a win percentage of <code>0.50</code> or greater. Another manager says this is not true.</p>
<p>Let's use the below function and the <code>.apply()</code> method to see which manager is correct.</p>
<pre><code>def calc_win_perc(wins, games_played):
    win_perc = wins / games_played
    return np.round(win_perc,2)
</code></pre>
<p>A DataFrame named <code>dbacks_df</code> has been loaded into your session.</p>
</div>
<div>


In [None]:
# edited/added
def calc_win_perc(wins, games_played):
    win_perc = wins / games_played
    return np.round(win_perc,2)
  
dbacks_df = baseball_df[(baseball_df.Team == 'ARI') & (baseball_df.Year.between(1998,2012))][['Team', 'League', 'Year', 'RS', 'RA', 'W', 'G', 'Playoffs']]


</div>
<li>Print the first five rows of the <code>dbacks_df</code> DataFrame to see what the data looks like.</li>
<div>


In [None]:
# Display the first five rows of the DataFrame
print(dbacks_df.head())


</div>
<li>Create a <code>pandas</code> Series called <code>win_percs</code> by <em>applying</em> the <code>calc_win_perc()</code> function to each <strong>row</strong> of the DataFrame with a <code>lambda</code> function.</li>
<div>


In [None]:
# Display the first five rows of the DataFrame
print(dbacks_df.head())


</div>
<div>


In [None]:
# Create a win percentage Series 
win_percs = dbacks_df.apply(lambda row: calc_win_perc(row['W'], row['G']), axis=1)
print(win_percs, '\n')


</div>
<li>Create a new column in <code>dbacks_df</code> called <code>WP</code> that contains the win percentages you calculated in the above step.</li>
<div>


In [None]:
# Display the first five rows of the DataFrame
print(dbacks_df.head())


</div>
<div>


In [None]:
# Create a win percentage Series 
win_percs = dbacks_df.apply(lambda row: calc_win_perc(row['W'], row['G']), axis=1)
print(win_percs, '\n')


</div>
<div>


In [None]:
# Append a new column to dbacks_df
dbacks_df['WP'] = win_percs
print(dbacks_df, '\n')


</div>
<div>


In [None]:
# Display dbacks_df where WP is greater than 0.50
print(dbacks_df[dbacks_df['WP'] >= 0.50])


</div>

<div class=""><ul>
<li>Which manager was correct in their claim?</li>
</ul></div>



- [ ] The manager who claimed the team <strong>made</strong> the playoffs every year they've had a win percentage of <code>0.50</code> or greater.
- [x] The manager who claimed the team <strong>has not made</strong> the playoffs every year they've had a win percentage of <code>0.50</code> or greater.
- [ ] Both managers are crazy! The Arizona Diamondbacks have never made the playoffs.



<p class="">Nicely done! Using the <code>.apply()</code> method with a <code>lambda</code> function allows you to apply a function to a DataFrame without the need to write a for loop.<br><br>Sadly, the second manager was correct. In the year 2012, 2008, 2003, and 2000 the Arizona Diamondbacks had a win percentage greater than or equal to 0.50, but still <strong>did not</strong> make the playoffs.</p>



## Optimal pandas iterating





### Replacing .iloc with underlying arrays


<div class>
<p>Now that you have a better grasp on a DataFrame's internals let's update one of your previous analyses to leverage a DataFrame's underlying arrays. You'll revisit the win percentage calculations you performed row by row with the <code>.iloc</code> method:</p>
<pre><code>def calc_win_perc(wins, games_played):
    win_perc = wins / games_played
    return np.round(win_perc,2)

win_percs_list = []

for i in range(len(baseball_df)):
    row = baseball_df.iloc[i]

    wins = row['W']
    games_played = row['G']

    win_perc = calc_win_perc(wins, games_played)

    win_percs_list.append(win_perc)

baseball_df['WP'] = win_percs_list
</code></pre>
<p>Let's update this analysis to use arrays instead of the <code>.iloc</code> method. A DataFrame (<code>baseball_df</code>) has been loaded into your session.</p>
</div>

<li>Use <em>the right method</em> to collect the underlying <code>'W'</code> and <code>'G'</code> arrays of <code>baseball_df</code> and pass them <strong>directly into</strong> the <code>calc_win_perc()</code> function. Store the result as a variable called <code>win_percs_np</code>.</li>
<div>


In [None]:
# Use the W array and G array to calculate win percentages
win_percs_np = calc_win_perc(baseball_df['W'].values, baseball_df['G'].values)


</div>
<li>Create a new column in <code>baseball_df</code> called <code>'WP'</code> that contains the win percentages you just calculated.</li>
<div>


In [None]:
# Use the W array and G array to calculate win percentages
win_percs_np = calc_win_perc(baseball_df['W'].values, baseball_df['G'].values)

# Append a new column to baseball_df that stores all win percentages
baseball_df['WP'] = win_percs_np

print(baseball_df.head())


</div>

<div class=""><p>Use <code>timeit</code> in <em>cell magic mode</em> <strong>within your IPython console</strong> to compare the runtimes between the old code block using <code>.iloc</code> and the new code you developed using NumPy arrays.</p>
<p><strong>Don't include the code that defines the <code>calc_win_perc()</code> function or the <code>print()</code> statements or when timing</strong>.</p>
<p>You should include <strong>eight lines of code</strong> when timing the old code block and <strong>two lines of code</strong> when timing the new code you developed. You may need to press <code>SHIFT+ENTER</code> when using <code>timeit</code> in <em>cell magic mode</em> to get to a new line within your IPython console.</p>
<p><strong>Which approach was the faster?</strong></p></div>



- [ ] The original code with <code>.iloc</code> is much faster than using NumPy arrays
- [ ] The old code block with <code>.iloc</code> and the new code with NumPy arrays have similar runtimes.
- [x] The NumPy array approach is faster than the <code>.iloc</code> approach.



<p class="">Great job! You're knocking it out of the park! Using a DataFrame's underlying arrays to perform calculations can really speed up your code and yields some significant efficiency gains. Did you notice that the NumPy array approach was not just faster, but that it also used fewer lines of code and was easier to read?</p>



### Bringing it all together: Predict win percentage


<div class>
<p>A <code>pandas</code> DataFrame (<code>baseball_df</code>) has been loaded into your session. For convenience, a dictionary describing each column within <code>baseball_df</code> has been printed into your console. You can reference these descriptions throughout the exercise.</p>
<p>You'd like to attempt to <em>predict</em> a team's win percentage for a given season by using the team's total runs scored in a season (<code>'RS'</code>) and total runs allowed in a season (<code>'RA'</code>) with the following function:</p>
<pre><code>def predict_win_perc(RS, RA):
    prediction = RS ** 2 / (RS ** 2 + RA ** 2)
    return np.round(prediction, 2)
</code></pre>
<p>Let's compare the approaches you've learned to calculate a <em>predicted win percentage</em> for each season (or row) in your DataFrame.</p>
</div>
<div>


In [None]:
# edited/added
def predict_win_perc(RS, RA):
    prediction = RS ** 2 / (RS ** 2 + RA ** 2)
    return np.round(prediction, 2)


</div>
<li>Use a for loop and <code>.itertuples()</code> to predict the win percentage for each row of <code>baseball_df</code> with the <code>predict_win_perc()</code> function. Save each row's predicted win percentage as <code>win_perc_pred</code> and append each to the <code>win_perc_preds_loop</code> list.</li>
<div>


In [None]:
win_perc_preds_loop = []

# Use a loop and .itertuples() to collect each row's predicted win percentage
for row in baseball_df.itertuples():
    runs_scored = row.RS
    runs_allowed = row.RA
    win_perc_pred = predict_win_perc(runs_scored, runs_allowed)
    win_perc_preds_loop.append(win_perc_pred)


</div>
<li>Apply <code>predict_win_perc()</code> to each row of the <code>baseball_df</code> DataFrame using a <code>lambda</code> function. Save the predicted win percentage as <code>win_perc_preds_apply</code>.</li>
<div>


In [None]:
win_perc_preds_loop = []

# Use a loop and .itertuples() to collect each row's predicted win percentage
for row in baseball_df.itertuples():
    runs_scored = row.RS
    runs_allowed = row.RA
    win_perc_pred = predict_win_perc(runs_scored, runs_allowed)
    win_perc_preds_loop.append(win_perc_pred)

# Apply predict_win_perc to each row of the DataFrame
win_perc_preds_apply = baseball_df.apply(lambda row: predict_win_perc(row['RS'], row['RA']), axis=1)


</div>
<li>Calculate the predicted win percentages by passing the underlying <code>'RS'</code> and <code>'RA'</code> <strong>arrays</strong> from <code>baseball_df</code> into <code>predict_win_perc()</code>. Save these predictions as <code>win_perc_preds_np</code>.</li>
<div>


In [None]:
win_perc_preds_loop = []

# Use a loop and .itertuples() to collect each row's predicted win percentage
for row in baseball_df.itertuples():
    runs_scored = row.RS
    runs_allowed = row.RA
    win_perc_pred = predict_win_perc(runs_scored, runs_allowed)
    win_perc_preds_loop.append(win_perc_pred)

# Apply predict_win_perc to each row of the DataFrame
win_perc_preds_apply = baseball_df.apply(lambda row: predict_win_perc(row['RS'], row['RA']), axis=1)

# Calculate the win percentage predictions using NumPy arrays
win_perc_preds_np = predict_win_perc(baseball_df['RS'].values, baseball_df['RA'].values)
baseball_df['WP_preds'] = win_perc_preds_np
print(baseball_df.head())


</div>

<div class=""><p>Compare runtimes <strong>within your IPython console</strong> between <strong>all three</strong> approaches used to calculate the predicted win percentages.</p>
<p>Use <strong><code>%%timeit</code></strong> (<em>cell magic mode</em>) to time the <strong>six lines of code</strong> (not including comment lines) for the <code>.itertuples()</code> approach. You may need to press <code>SHIFT+ENTER</code> after entering <code>%%timeit</code> to get to a new line within your IPython console.</p>
<p>Use <strong><code>%timeit</code></strong> (<em>line magic mode</em>) to time the <code>.apply()</code> approach and the NumPy array approach separately. Each has only <strong>one line of code</strong> (not including comment lines). </p>
<p><strong>What is the order of approaches from fastest to slowest?</strong></p></div>



- [ ] The <code>.apply()</code> with a <code>lambda</code> function was the <strong>fastest</strong>, followed by the <code>.itertuples()</code> approach, and the array approach was <strong>slowest</strong>.
- [x] Using NumPy arrays was the <strong>fastest</strong> approach, followed by the <code>.itertuples()</code> approach, and the <code>.apply()</code> approach was <strong>slowest</strong>.
- [ ] The <code>.itertuples()</code> approach was <strong>fastest</strong>, followed by the array approach, and the <code>.apply()</code> approach was <strong>slowest</strong>.
- [ ] All three approaches had comparable runtimes.



<p class="">Great job! That's a home run! You practiced using three different approaches to iterate over a <code>pandas</code> DataFrame and perform calculations. Did you notice that the <code>.itertuples()</code> approach beat the <code>.apply()</code> approach? Even though both these implementations can be useful, you should default to using a DataFrame's underlying arrays to perform calculations.<br><br>Take a look at your win percentage predictions (column <code>'WP_preds'</code>) and compare them to the actual win percentages (column <code>'WP'</code>). Not bad!<br><br>You've done a great job throughout the course! Now, you are well on your way to writing efficient Python and <code>pandas</code> code!</p>



## Congratulations!

### Congratulations!

Congratulations on completing the course! Now, you have the necessary tools to start writing efficient Python code!

### What you have learned

Over the four chapters of this course, you have learned what writing efficient code truly means, and that writing Pythonic code often yields efficient code. You've explored Python's Standard Library and practiced using built-in functions like range, enumerate, and map. You know the power of NumPy arrays and can use them for fast, efficient calculations. You're a whiz at using magic commands like %timeit and know how to profile your code with the line_profiler and memory_profiler packages. You've also applied more advanced techniques to gain efficiencies by using built-in functions like zip, built-in modules like itertools and collections, and a branch of mathematics called set theory. Finally, you explored looping patterns in Python and why they are not always the most efficient approach to solving problems. You successfully eliminated loops in your code and even learned how to efficiently iterate over pandas DataFrames.

### Well done!

Well done! It has been an absolute pleasure working with you! Thank you for taking the course, and I hope to see you again in the future!
