# More on the * operator
We used the "\*" operator to "unlist" a list of lists in the previous notebook, and that was fairly confusing. I'd like to give a few more examples about what this does. 

In [52]:
# Using the * operator takes elements in a list out of the list so we can do things with them
a = ['Every', 'good', 'boy', 'deserves', 'fudge']
print(a)
print(*a)

['Every', 'good', 'boy', 'deserves', 'fudge']
Every good boy deserves fudge


## Range
We can make a list of numbers by specifying the "range" or starting and ending points. Remember, Python counts in offsets from zero (zero-indexed), so point 10 in the range of integers is the number 9. 

## Function arguments
The inputs we feed into a function are called "arguments". If we give "range" 1 argument (in this case "10", it interprets this as a request for integers from zero to the index specified by the argument. In this case, because Python is zero-indexed, the argument "10" gives us integers from 0-9.

In [53]:
a = list(range(10))
print(a)

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


## Two arguments for "range"

If we give range two arguments, it interpets these as starting and stopping indices

In [55]:
#Make a list of number between index 1 and 6
a = list(range(1, 6))
print(a)

[1, 2, 3, 4, 5]


## Three arguments for "range"

If we give range three arguments, it interprets the first and second as the starting and stopping indices, and the third one as the "step size".

In [56]:
#Make a list of integers between index 0 and 20, in jumps of 5
a = list(range(0, 20, 5))
print(a)

#Make a list of integers between index 0 and 20, in jumps of 2
b = list(range(0, 20, 2))
print(b)

[0, 5, 10, 15]
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]


## Dynamically generated arguments

It might be useful in some situations to put arguments in a list sometimes, instead of typing them directly into the range function call. Why would you want to do this? Well, in the case of "range", the function expects a certain number of input arguments. Well, sometimes we might not know ahead of time what the arguments we want to plug into a function are going to be; these might be extracted from our data, for instance. In a case like this, we could enter varialbe as arguments

In [69]:
# generate a random number between index 2 and 10
from random import randint
random_number = randint(2,10)
print(random_number)

# make a list of integers with the random number as input
a = list(range(random_number))
print(a)

7
[0, 1, 2, 3, 4, 5, 6]


## Lists as arguments

This works great, but sometimes it would be nice to input the elements of a list as the arguments to a function. Why would we want to do this? Well, there could be situations in which the function will accept any number of arguments (unlike range, which wants between 1 and 3 arguments). These arguments could be generated in some way by our script, perhaps based on the number of lines in our transcript, or number of words in an utterance, and we don't know ahead of time how many there will be. In a situation like this, it would be nice to collect our arguments in a list, and then feed them into the fuction. This is what we want to do in the n-gram function below, and is the reason we will need the * operator. So what happens if we try to input a list of numbers as arguments to "range"?

In [71]:
# make a list of integers starting and stopping at the indices in "start_and_stop"
start_and_stop = [1, 6]
print(start_and_stop)
a = list(range(start_and_stop))
print(a)

[1, 6]


TypeError: 'list' object cannot be interpreted as an integer

## Error!
Python doesn't like this, because "range" wants integers: we can feed it a list. That gives it indigestion! On the other hand, if we use a "\*" it takes the integers in the list out of the list, and feeds them into the "range" function, just the way we want

In [72]:
# make a list of integers starting and stopping at the indices in "start_and_stop"
start_and_stop = [1, 6]
a = list(range(*start_and_stop))
print(a)

[1, 2, 3, 4, 5]


So long as there is a number of integers in the input list that is within the range expected by "range" it works

In [74]:
# make a list of integers that starts, stops, and steps by the values given in "start_stop_step"
start_stop_step = [0, 20, 5]
print(list(range(*start_stop_step)))

[0, 5, 10, 15]


## Using text arguments for zip function 
Now let's look at the n-gram problem again. First we need to get our text ready for input

In [75]:
from string import punctuation as pnc
text = 'Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small, unregarded yellow sun'
text = text.lower()
text = ''.join(x for x in text if x not in pnc)
text = text.split()
print(text)

['far', 'out', 'in', 'the', 'uncharted', 'backwaters', 'of', 'the', 'unfashionable', 'end', 'of', 'the', 'western', 'spiral', 'arm', 'of', 'the', 'galaxy', 'lies', 'a', 'small', 'unregarded', 'yellow', 'sun']


Next we want to make a dynamically generate list of lists, in which each list is our text offset by one word. Because we could feed any size text into this script, we don't necessarily know how many lists are going to be in "a". That depends on the number of words in the text.

In [80]:
n = 3
a = [text[i:] for i in range (n)]

#show us the new lists, separted by a blank line, so we can see what we are working with
for i, val in enumerate(a):
    print(val)
    print(' ')

['far', 'out', 'in', 'the', 'uncharted', 'backwaters', 'of', 'the', 'unfashionable', 'end', 'of', 'the', 'western', 'spiral', 'arm', 'of', 'the', 'galaxy', 'lies', 'a', 'small', 'unregarded', 'yellow', 'sun']
 
['out', 'in', 'the', 'uncharted', 'backwaters', 'of', 'the', 'unfashionable', 'end', 'of', 'the', 'western', 'spiral', 'arm', 'of', 'the', 'galaxy', 'lies', 'a', 'small', 'unregarded', 'yellow', 'sun']
 
['in', 'the', 'uncharted', 'backwaters', 'of', 'the', 'unfashionable', 'end', 'of', 'the', 'western', 'spiral', 'arm', 'of', 'the', 'galaxy', 'lies', 'a', 'small', 'unregarded', 'yellow', 'sun']
 


Now we want to zip all of these lists together, so that we can get groups of words (n-grams). If we want a tri-gram, we could do this:

In [81]:
b = zip(a[0],a[1],a[2])

for i, val in enumerate(b):
    print(val)

('far', 'out', 'in')
('out', 'in', 'the')
('in', 'the', 'uncharted')
('the', 'uncharted', 'backwaters')
('uncharted', 'backwaters', 'of')
('backwaters', 'of', 'the')
('of', 'the', 'unfashionable')
('the', 'unfashionable', 'end')
('unfashionable', 'end', 'of')
('end', 'of', 'the')
('of', 'the', 'western')
('the', 'western', 'spiral')
('western', 'spiral', 'arm')
('spiral', 'arm', 'of')
('arm', 'of', 'the')
('of', 'the', 'galaxy')
('the', 'galaxy', 'lies')
('galaxy', 'lies', 'a')
('lies', 'a', 'small')
('a', 'small', 'unregarded')
('small', 'unregarded', 'yellow')
('unregarded', 'yellow', 'sun')


That works great if we know we want tri-grams every time. But it would be even sweeter if we could just decide on the fly how many words we want grouped together. If we want a 7-gram, we don't want to have to type in:

    b = zip(a[0],a[1],a[2], a[3, ...........etc)
    
This is where the \* comes in. It takes all of thes lists out of our list "a" and zips them together. Now all we have to do is change the line "n = " to whatever number we want, and Python takes care of the rest.

In [82]:
n = 3
a = zip(*[text[i:] for i in range (n)])
print(*a)

('far', 'out', 'in') ('out', 'in', 'the') ('in', 'the', 'uncharted') ('the', 'uncharted', 'backwaters') ('uncharted', 'backwaters', 'of') ('backwaters', 'of', 'the') ('of', 'the', 'unfashionable') ('the', 'unfashionable', 'end') ('unfashionable', 'end', 'of') ('end', 'of', 'the') ('of', 'the', 'western') ('the', 'western', 'spiral') ('western', 'spiral', 'arm') ('spiral', 'arm', 'of') ('arm', 'of', 'the') ('of', 'the', 'galaxy') ('the', 'galaxy', 'lies') ('galaxy', 'lies', 'a') ('lies', 'a', 'small') ('a', 'small', 'unregarded') ('small', 'unregarded', 'yellow') ('unregarded', 'yellow', 'sun')


In [83]:
n = 4
a = zip(*[text[i:] for i in range (n)])
print(*a)

('far', 'out', 'in', 'the') ('out', 'in', 'the', 'uncharted') ('in', 'the', 'uncharted', 'backwaters') ('the', 'uncharted', 'backwaters', 'of') ('uncharted', 'backwaters', 'of', 'the') ('backwaters', 'of', 'the', 'unfashionable') ('of', 'the', 'unfashionable', 'end') ('the', 'unfashionable', 'end', 'of') ('unfashionable', 'end', 'of', 'the') ('end', 'of', 'the', 'western') ('of', 'the', 'western', 'spiral') ('the', 'western', 'spiral', 'arm') ('western', 'spiral', 'arm', 'of') ('spiral', 'arm', 'of', 'the') ('arm', 'of', 'the', 'galaxy') ('of', 'the', 'galaxy', 'lies') ('the', 'galaxy', 'lies', 'a') ('galaxy', 'lies', 'a', 'small') ('lies', 'a', 'small', 'unregarded') ('a', 'small', 'unregarded', 'yellow') ('small', 'unregarded', 'yellow', 'sun')


In [89]:
# we can even generate a rand-gram :-)
from random import randint
n = randint(3,len(text))
print('This is a ' + str(n) + '-gram')

a = zip(*[text[i:] for i in range (n)])
print(*a)

This is a 23-gram
('far', 'out', 'in', 'the', 'uncharted', 'backwaters', 'of', 'the', 'unfashionable', 'end', 'of', 'the', 'western', 'spiral', 'arm', 'of', 'the', 'galaxy', 'lies', 'a', 'small', 'unregarded', 'yellow') ('out', 'in', 'the', 'uncharted', 'backwaters', 'of', 'the', 'unfashionable', 'end', 'of', 'the', 'western', 'spiral', 'arm', 'of', 'the', 'galaxy', 'lies', 'a', 'small', 'unregarded', 'yellow', 'sun')
