# An introduction to for-loops

## aka how we go from specific code to general code

As we saw with the concordance exercise from last week, for loops (and and any type of loop really) can be very useful but also a bit confusing to create. In this notebook, I want to give you an example of what I mean when I say "make the code work on one example and then copy-paste it to a loop and make it general". <br>
We'll also go through how we build code up from the bottom when we only have an idea of where we want to end – so solve some type of problem or get a specific output. <br>
Let's go!

### make a list of books and their length 

This exercise might sound familiar, and that's because it is! I have stolen it from Ethan's concordance exercise last week, because it is a good example to start with I think :-) <br>
So if we imagine Ethan hadn't been as nice as he is, and he had just written a challenge asking us to _"make a list of book titles and the corresponding book length", how would we go about that?_

In [None]:
# what's the first step ? 

# packages
%pip install nltk 

import nltk
import ssl


try:
    _create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
    pass
else:
    ssl._create_default_https_context = _create_unverified_https_context

    # data
nltk.download('gutenberg')

# now that that is sorted, let's do it

It is easy to get overwhelmed when looking at some sort of end product, because you want to just immediatly start doing loops and stuff. But! It is usually easier and faster to start with - in this case - a single book, and make sure you have code that works, which we can then generalize to all the books.

In [None]:
# so let's start with the first book in the list of books: 

# making a list of the books we want to work with - what is in it?
books = nltk.corpus.gutenberg.fileids()
print(books)

In [None]:
# how do we get the first book in the list? and what is in it?
book1 = books[0]
print(book1)

# how do we get any specific book in the list?
book2 = books[6]

In [None]:
# now, the challenge said to have a list of titles, so how do we get the title from this single book?
title1 = book1
print(title1)

In [None]:
# alright, we have the title, now we want the length - how do we do that? 
# We steal some code! Specifically, Ethan's

book1_length = len(book1) # what happens when I run this code? why won't this give us the result we want?
print(book1_length)

# answer: it gives us the number of characters in the string 'austen-emma.txt'

In [None]:
# so what we want to do instead is:

book1 = nltk.corpus.gutenberg.raw(title1) 

# remember that title1 before running the above code contains the same as book1
# but now that we have overwritten book1, what do each of them hold now?

print(title1)
print(book1[:50]) # slicing because the output is otherwise too big
len(book1)
# why does this output look the way it does?
# answer: it's counting the number of characters in the whole book (including spaces)

In [None]:
# but we want to count words! 
# so, let's steal some more code!

# we add this line again, so that we can run this cell over and over without error
# (which would be due to overwriting)
book1 = nltk.corpus.gutenberg.raw(title1) 

# From Ethan's notebook:

# make all characters lowercase
book1 = book1.lower()

# remove the "\n" characters, which indicate line breaks in the text (newlines)
book1 = book1.replace('\n', '') 
book1 = book1.replace('\r', '') 


# split up the text into a long list of individual words
book1 = book1.split(' ')

# How should we modify this code to work for us? 
# answer: we swapped out all instances of 'bible' with 'book1'

In [None]:
# notice again that we have overwritten book1 - so when we print this out, the output is now different
print(book1[100:150]) 

In [None]:
# and what happens now when we do this?

book1_length = len(book1)
print(book1_length)

# yaaay!! the number of words!! (with some artefacts due to the way we split, but that is besides the point)

### So far so good! We have code that now prints the title and the length of a specific book - let's make it a for-loop

In [None]:
# let's try a simple thing
for a in books:
    print(a)


In [None]:
# now we copy our code and modify it, and then check whether that works
for i in books:
    title = i
    book = nltk.corpus.gutenberg.raw(title)
    book = book.lower()
    book = book.replace('\n', '') 
    book = book.replace('\r', '') 
    book = book.split(' ')
    book_length = len(book)
    
    print(title, book_length)



In [None]:
# and now we get the whole thing: the two lists zipped together
titles = []
book_lengths = []

for title in books: 
    book = nltk.corpus.gutenberg.raw(title)
    book = book.lower()
    book = book.replace('\n', '') 
    book = book.replace('\r', '') 
    book = book.split(' ')
    book_length = len(book)
    
    titles.append(title)
    book_lengths.append(book_length)

output = list(zip(titles, book_lengths))
output

In [None]:
# advanced – I didn't go through this in class, but it might be of some interest to some of you
# DON'T PANIC IF YOU DON'T GET THIS, there's a reason I call it 'advanced' :-) 

output = []

for title in books:
    book = nltk.corpus.gutenberg.raw(title)
    book = book.lower().replace('\n', ' ').replace('\r', ' ').split(' ')
    output.append((title[:-4], len(book)))

output
