# Chapter 8. Lists

## 8.1. Range
returns a list of numbers that range from 0 to one less than the parameter  

range(4)) = [0, 1, 2, 3]

In [32]:
print(range(4))

range(0, 4)


In [34]:
friends = ['Vladimir', 'Joseph', 'the Iron Felix']
print(len(friends))
print(range(len(friends)))

3
range(0, 3)


Two loops that return the same result:

In [40]:
friends = ['Vladimir', 'Joseph', 'Iron Felix']

for friend in friends:
    print('Happy October Revolution Day,', friend + '!')

for i in range(len(friends)):
    friend = friends[i]
    print('Long live,', friend + '!')

Happy October Revolution Day, Vladimir!
Happy October Revolution Day, Joseph!
Happy October Revolution Day, Iron Felix!
Long live, Vladimir!
Long live, Joseph!
Long live, Iron Felix!


## 8.2. Loop Operations

### List Methods - DIR

In [16]:
x = list()
print(type(x))
print(dir(x))

<class 'list'>
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']


### Building a List from Scratch

In [18]:
# first we create an empty list
stuff = list()

# then we start adding the elements using the .append method
stuff.append('the book')
stuff.append(9)
stuff.append(7.4)

# the list stays in order, new elements are added at the end of the list
print(stuff)

['the book', 9, 7.4]


### Is something IN the List? TRUE or FALSE

In [21]:
list = [45, 75, 2, -2, 3, 94, 5, 34]
9 in list

False

### SORT Method
Lists are ordered and they can be sorted (i.e. change the order)

In [24]:
friends = ['Vladimir', 'Joseph', 'Iron Felix']
friends.sort()
print(friends)
print(friends[0])

['Iron Felix', 'Joseph', 'Vladimir']
Iron Felix


### AVERAGE of the list items
1. using our own loops,
2. using the LIST Methods.

In [3]:
# this technique does not use much of memory
# because the loop updates the total with every new input
total = 0
count = 0
while True:
    inp = input("Enter a number: ")
    if inp == 'done' : break
    value = float(inp)
    total = total + value
    count = count + 1
    numlist.append(value)

average = total / count
print('Average:', average)

Enter a number: 1
Enter a number: 3
Enter a number: 5
Enter a number: done
Average: 3.0


In [1]:
# it uses more memory because it has to keep all the inputs in memory
numlist = list()
while True:
    inp = input("Enter a number: ")
    if inp == 'done': break
    value = float(inp)
    numlist.append(value)

average = sum(numlist) / len(numlist)
print('Average:', average)

Enter a number: 6
Enter a number: 90
Enter a number: 7
Enter a number: done
Average: 34.333333333333336


## 8.3. Strings vs Lists

### .SPLIT Method

object**.split()** - nothing in squares by default means "use a space as a delimiter"

In [1]:
abc = "I can do that"
stuff = abc.split()
print(stuff)
for i in stuff:
    print(i)
print('Length:', len(stuff))
print('stuff[1]:', stuff[1])


['I', 'can', 'do', 'that']
I
can
do
that
Length: 4
stuff[1]: can


In [1]:
fhand = open('text.txt', 'r', encoding='utf-8')
for line in fhand:
    line = line.rstrip()
    if not line.startswith("З"): continue
    words = line.split()
    print(words)

['Задремал', 'было,', 'но', 'в', 'кухне', 'заплакал', 'братнин', 'ребенок.']
['Зарылся', 'головой', 'в', 'горячую', 'подушку,', 'в', 'уши', 'назойливо', 'сочится:']


### The DOUBLE Split Pattern
This is a typical way of parsing the data. For a cleaner technique check the **"Parsing and Extracting"** above.

In [3]:
line = "lefthand67@yandex.ru Sat Sept 24"

words = line.split()
print(words)

email = words[0]
print(email)

host = email.split('@')
print(host[1])

['lefthand67@yandex.ru', 'Sat', 'Sept', '24']
lefthand67@yandex.ru
yandex.ru


# Chapter 9. Dictionaries

## List VS Dictionary
- List is a linear collection of values that stay in order,
- Dictionary is a "bag" of values, each with its own label. Key-value pairs

_NB!_ dict**[key]** = **value**

You can asign not only integers as the value in the dictionary pair:

In [99]:
new_dict = {}
new_dict['alpha'] = 100
new_dict['beta'] = 'gamma'
print(new_dict)

{'alpha': 100, 'beta': 'gamma'}


### Counting

In [2]:
# counting the frequence of the list elements

host_count = {}
long_list = ['localhost', 'collab.sakaiproject.org', 'iupui.edu', 
             'collab.sakaiproject.org', 'iupui.edu', 'iupui.edu', 
             'umich.edu', 'collab.sakaiproject.org', 'nakamura.uits.iupui.edu', 
             'collab.sakaiproject.org', 'collab.sakaiproject.org']

for name in long_list:
    if name not in host_count:
        host_count[name] = 1
    else:
        host_count[name] = host_count[name] + 1
            
print('\nFREQUENCE of the host names:\n', host_count, '\n\nCheck the .Get Method block for a more convenient techique')



FREQUENCE of the host names:
 {'localhost': 1, 'collab.sakaiproject.org': 5, 'iupui.edu': 3, 'umich.edu': 1, 'nakamura.uits.iupui.edu': 1} 

Check the .Get Method block for a more convenient techique


### .GET Method
The pattern of checking to see if a **key** is already in a dictionary and assuming a **default value** (i.e. 0) if the **key** is not in there, resolving the problem of the traceback.

dict.**get**(key, value)

Two equal operations:

In [25]:
# x is the value, so the value is the same as the dict[key]
# => value = dict[key]

# 1) the loop
if name in host_count:
    x = host_count[name]
else:
    x = 0
        
# 2) GET
    x = host_count.get(name, 0)

### IDIOM: Counting the frequence of the list elements with the .GET method

In [27]:
host_count = {}
long_list = ['localhost', 'collab.sakaiproject.org', 'iupui.edu', 
             'collab.sakaiproject.org', 'iupui.edu', 'iupui.edu', 
             'umich.edu', 'collab.sakaiproject.org', 'nakamura.uits.iupui.edu', 
             'collab.sakaiproject.org', 'collab.sakaiproject.org']

for name in long_list:
    host_count[name] = host_count.get(name, 0) + 1  # dict[name] is just the value, expressed in the different form
            
print('\nFREQUENCE of the host names:\n', host_count)



FREQUENCE of the host names:
 {'localhost': 1, 'collab.sakaiproject.org': 5, 'iupui.edu': 3, 'umich.edu': 1, 'nakamura.uits.iupui.edu': 1}


## Counting words in text
### Counting Pattern
1. split the line into words,
2. loop through the words,
3. use a dictionary to track the count of each word independently.

In [5]:
counts = {}
# You can type input('Enter the file name: ') instead of 'text.txt'
fhand = open('text.txt', 'r', encoding='utf-8')
for line in fhand:
    line = line.rstrip() and line.lower()
    words = line.split()    
#     print('Words:', words)
    for word in words:
        word = word.strip(',')
        word = word.strip('.')
        word = word.strip(';')
        word = word.strip('?')
        word = word.strip('!')
        word = word.strip('?!')
        word = word.strip('...')
        word = word.strip('"')
        word = word.strip(':')
        word = word.strip('(')
        word = word.strip(')')
        counts[word] = counts.get(word, 0) + 1
# print('Length: ', len(counts), '\nCounts:', counts)

print('Move to the next cell')

Move to the next cell


### Retrieving lists of Keys and Values

In [4]:
# !Run the previous cell before

largest_count = 0
most_freq_word = ''
for word, count in counts.items():
    if count > largest_count:
        largest_count = count        
#         print(largest_count, word)
#     print(count, word)            
print('Count: {}'.format(largest_count))

freq_words = []
for word, count in counts.items():
    if (count == largest_count) and (word not in freq_words):
        freq_words.append(word)
print('Quantity of words:', len(freq_words), '\nList of most frequent words:\n', freq_words)

Count: 1925
Quantity of words: 1 
List of most frequent words:
 ['и']


In [5]:
jjj = {'chuck': 1, 'fred': 42, 'jan': 100, '19': 19, 26: 72}
print('list(jjj):', list(jjj))
print(jjj.keys())
print(jjj.values())
print(jjj.items(), '<= these are the tuples')

for key, value in jjj.items():
    try:
        key = float(key)
        continue
    except:
        print(key)

list(jjj): ['chuck', 'fred', 'jan', '19', 26]
dict_keys(['chuck', 'fred', 'jan', '19', 26])
dict_values([1, 42, 100, 19, 72])
dict_items([('chuck', 1), ('fred', 42), ('jan', 100), ('19', 19), (26, 72)]) <= these are the tuples
chuck
fred
jan


### Counting Word Frequency using a Dictionary

In [9]:
# dfile = input('Enter the default file name to make your life easier: ')  # activate for a script

def most_freq_word():

    """
    The Most Frequent Word Function
    searches for the most frequent word(s) in the document with disregard to the case of 
    the letters in words. 
    It shows the "Largest count" number (i.e. value in dictionary) and lists all 
    the words (keys) that are complemented to the value.
    
    The function ignores the strings that can be converted to floats and integers. 
    
    NOTE! You can reassign the default file in prompt by activating the line above the function code
    and deactivating the first line of the function.
    
    >>> Example:
    
    Enter the file name (press Enter to open the default file): 
    Opened by default: mbox-short.txt

    Largest count: 352
    jan

    Quantity of words: 1
    Most frequent words: ['jan'] <<<
    
    """

    dfile = 'mbox-short.txt'  # a file that be opened by default (Enter button), deactivate when using as script

    fname = input('Enter the file name (press Enter to open the default file): ')
    if len(fname) < 1:
        fname = dfile
        print('Opened by default:', dfile)
    fhand = open(fname, 'r', encoding='utf-8')

    counts = {}
    for line in fhand:
        line = line.rstrip() and line.lower()
        words = line.split()
    #     print('Words:', words)
        for word in words:
            word = word.strip(',')
            word = word.strip('.')
            word = word.strip(';')
            word = word.strip('?')
            word = word.strip('!')
            word = word.strip('?!')
            word = word.strip('"')
            word = word.strip(':')
            word = word.strip('(')
            word = word.strip(')')
            word = word.strip('[')
            word = word.strip(']')
            try:                        # the operation filters ints and floats
                word = float(word)
                continue
            except ValueError:           # we get only non-ints and non-floats in counts list
                if len(word) > 0:
                    counts[word] = counts.get(word, 0) + 1
    # print('\nLength: {}\nCounts: {}'.format(len(counts),counts))

    largest_count = None                # searching for the largest value (i.e. count of the words)
    # print('\nLargest count search:')  # got be activated only with print(count, word)
    for word, count in counts.items():
        if largest_count is None or count > largest_count:
            largest_count = count
    #         print(largest_count, word)
    #     print(count, word)
    print('\nLargest count: {}'.format(largest_count))

    freq_words = []                     # most frequent words' list
    for word, count in counts.items():
        if (count == largest_count) and (word not in freq_words):
            freq_words.append(word)
            # print(word)
    return '\nQuantity of words: {}\nMost frequent words: {}.\nDone'.format(len(freq_words), freq_words)


In [10]:
print(most_freq_word())

Enter the file name (press Enter to open the default file): 
Opened by default: mbox-short.txt

Largest count: 352

Quantity of words: 1
Most frequent words: ['jan'].
Done


# Chapter 10. Tuples
Tuples are like lists but they are **immutable** - you cannnot change them similar to strings.

Tuples are more efficient in terms of **memory use** and **performance** than lists. So when you are making a "temporary variable" tuples are prefered over lists.

### Assignment of tuples
You put the tuple on the left-hand side of an assignment statment and when you call for the element in the tuple it returns the element from the right-hand side in the _same_ position. It is the Python unique feature.

##### NB! Parentheses can be omitted.

In [1]:
(x, y) = (4, 'Fred')
print(y)

Fred


In [4]:
x, y = 4, 'Fred'
print(x)

4


Compare the methods of lists and tuples (in the end of the dir list):

In [4]:
l = list()
dir(l)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [2]:
t = tuple()
dir(t)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'count',
 'index']

### Tuples are COMPARABLE

Python checks the elements until it can say TRUE oR FALSE and then stops. If the elements are the same it goes to the next elements.

In [3]:
(0, 1, 2) < (5, 2, 1)

True

In [6]:
('Jones', 'Sally') < ('Jones', 'Sam')

True

In [7]:
('Jones', 'Sally') == ('Jones', 'Sam')

False

### SORTING lists of tuples

This way we can get a **sorted version of a dictionary**.

First we sort the dictionary by the key using:
- the dict.**items()** method and
- **sorted()** function.

In [10]:
d = {'a': 10, 'b':1, 'c':22}
print(d.items())
print(sorted(d.items()), '\n')

for k, v in sorted(d.items()):
    print(k, v)

dict_items([('a', 10), ('b', 1), ('c', 22)])
[('a', 10), ('b', 1), ('c', 22)] 

a 10
b 1
c 22


### SORT by VALUES instead of key

We need to create a list of tuples of the form (value, key) and then sort it:

In [41]:
c = {'a': 10, 'b':1, 'c':22}

tmp = list()
for key, val in c.items():
    tmp.append( (val, key) )
print(tmp)

tmp = sorted(tmp, reverse=True)
print(tmp)

[(10, 'a'), (1, 'b'), (22, 'c')]
[(22, 'c'), (10, 'a'), (1, 'b')]


### Shorter version

by using **list comprehension**:

In [23]:
c = {'a': 10, 'b':1, 'c':22}

print( sorted( [ (v, k) for k, v in c.items() ], reverse=True ) )

[(22, 'c'), (10, 'a'), (1, 'b')]


### Ten top most common words in the text

In [40]:
with open('text.txt', 'r', encoding='utf-8') as fhand:  # context manager to close the file when done
    counts = dict()
    for line in fhand:
        line = line.rstrip() and line.lower()
        words = line.split()
        for word in words:
            if word.isalpha():
                counts[word] = counts.get(word, 0) + 1

lst = sorted( [(val, key) for key, val in counts.items()], reverse=True )

for val, key in lst[:10]:
    print(f'{key} - {val}', end='; ')

и - 1811; не - 1020; на - 927; в - 926; а - 710; с - 566; что - 442; как - 406; он - 377; я - 324; 