# Python 101 @ SzISz III.

---

In [1]:
from szisz import *
BASE = '../data/'

---

#Deus ex Python

You've downloaded a series complete with subtitles but the video and  
subtitle filenames don't match! Write a function which renames the  
mismatching subtitles!

In [2]:
# hint:
#  useful functions:
#  - download(_name, _season, _episodes, _mismatch)
#  - string.lower() (built-in)
#  - find_episode_number(filename)
#  - rename_subtitle(original, new, target_dir)
def rename(directory):
    files = list_files(directory)
    avis = [avi for avi in files if avi[-4:].lower() == '.avi']
    srts = [srt for srt in files if srt[-4:].lower() == '.srt']
    for avi in avis:
        ep_number = find_episode_number(avi)
        for srt in srts:
            if find_episode_number(srt) == ep_number:
                rename_subtitle(srt, avi[:-4] + '.srt', directory)
                break

In [4]:
rename('super_series')

---

#File I/O

Reading from a file is really easy:

In [5]:
# we need a filename and a mode
filename = BASE + 'text.txt'

In [6]:
# and a mode
mode = 'r' # r stands for reading
# and we have to open it for reading
my_file = open(filename, mode) 
# we can read from it directly:
for line in my_file:
    print line

Wake up Neo!

Follow the white rabbit!


In [7]:
my_file.seek(0, 0) # help(file.seek)
# or read every line into a list:
lines_as_list = my_file.readlines()
print lines_as_list

['Wake up Neo!\r\n', 'Follow the white rabbit!']


In [8]:
my_file.seek(0, 0)
# or read the whole file as string:
lines_as_string = my_file.read()
print lines_as_string

Wake up Neo!
Follow the white rabbit!


In [9]:
# we can do it either way... BUT!
# DO NOT FORGET TO CLOSE IT once you finished working with it!
my_file.close()

Pretty easy, huh? What about writing into a file?

In [10]:
mode = 'w' # as you can guess, w stands for writing ;)
my_file = open(filename, mode) 
# we can write into it directly:
my_file.write('You take the red pill, you stay in Wonderland, '
              'and I show you how deep the rabbit hole goes...')
# again, don't forget to close the file
my_file.close()

There is more! Do you feel cumbersome to open and close the file?

In [11]:
# You do not have to worry about!  
mode = 'r' 
with open(filename, mode) as my_file: 
    for line in my_file.readlines(): 
        print line 
# aaaaand it's closed ;)

You take the red pill, you stay in Wonderland, and I show you how deep the rabbit hole goes...


Can we add content to existing files?

In [12]:
# Yes, we can!
mode = 'a' # a stands for append
with open(filename, mode) as my_file:
    my_file.write('Remember, all I\'m offering is the truth, nothing more...')

---

#CSV files

But... We want to read in some CSV files. Do we really need to do all  
the hassle with the commas, quotations and all that bs?

In [13]:
# ofc not! someone already wrote that for us!
import csv
filename = BASE + 'text.csv'
mode = 'r'
# read it!
with open(filename, mode) as my_file:
    # we have to create a csv reader in order to read
    # and we have to specify the delimeter, and the quotecharacter
    # or the dialect.
    my_csv = csv.reader(my_file, delimiter=';', quotechar='"')
    # we can read out the rows easily from the file
    for row in my_csv:
        # you get each row as a list
        print row
        
# write it!
mode = 'w'
with open(filename, mode) as my_file:
    # we'll need a writer
    # the arguments are the same as before
    my_csv = csv.writer(my_file, delimiter=';', quotechar='"')
    # we need some data to save:
    data = [['Smith', 'Smith', 'Smith', 'Smith'],
            ['Smith', 'Smith', 'Smith', 'Smith']]
    # then write each row into the file,
    # one-by-one
    for row in data:
        my_csv.writerow(row)

['Neo', 'Trinity', 'Morpheus', 'Switch']
['Apoc', 'Cypher', 'Mouse', 'Tank']
[]


---

#Unicode madness

Writing in exotic languages can cause problems, and we need to handle them.  
Originally we could only select from 128 characters to work with.

In [14]:
print_image('http://www.asciitable.com/index/asciifull.gif', 'net')

But then the problem was addressed with the unicode character set.  
It currently contains more than 100k characters - including the   
complete kanji set, the klingon and the elf alphabet as well. Long  
story short, we should use utf-8 character encoding when working with  
text files.

In [15]:
# We need a built-in python module
import codecs

In [16]:
filename = BASE + 'unicodetext.txt'
mode = 'r'
encoding = 'utf-8'
# and use it's functions to work with files:
with codecs.open(filename, mode, encoding) as my_unicode_file:
    content = u'\n'.join(my_unicode_file.readlines())
print content
print repr(content)
print type(content)

Árvíztűrő tükörfúrógép
u'\xc1rv\xedzt\u0171r\u0151 t\xfck\xf6rf\xfar\xf3g\xe9p'
<type 'unicode'>


In [17]:
mode = 'w'
with codecs.open(filename, mode, encoding) as my_unicode_file:
    my_unicode_file.write(u'Árvíztűrő tükörfúrógép')

In [18]:
# Represent a unicode sting in ascii
ascii_content = content.encode('utf-8')
print ascii_content
print repr(ascii_content)
print type(ascii_content)

Árvíztűrő tükörfúrógép
'\xc3\x81rv\xc3\xadzt\xc5\xb1r\xc5\x91 t\xc3\xbck\xc3\xb6rf\xc3\xbar\xc3\xb3g\xc3\xa9p'
<type 'str'>


In [19]:
# Represent an ascii sting in unicode
unicode_content = ascii_content.decode('utf-8')
print unicode_content
print repr(unicode_content)
print type(unicode_content)

Árvíztűrő tükörfúrógép
u'\xc1rv\xedzt\u0171r\u0151 t\xfck\xf6rf\xfar\xf3g\xe9p'
<type 'unicode'>


---

## Let's see how how deep the rabbit hole goes!

Write our fake "download" function

In [24]:
def download(name='hyper_series', series=2, episodes=6):
    # create directory
    os.mkdir(name)
    # create files
    for s in range(1, series+1):
        for e in range(1, episodes+1):
            filename = './{name}/{name}S{serie:0>2}E{episode:0>2}.avi'.format(name=name, serie=s, episode=e)
            with open(filename, 'w') as fptr:
                fptr.write(filename)

In [25]:
download()

Merge the matching rows.

In [38]:
# read the data from the "matching.csv"
# add the appropriate values together
data = []
# read csv content
with open(BASE + 'matching.csv', 'r') as csvfile:
    CSV = csv.reader(csvfile, delimiter=';')
    for row in CSV:
        data.append(row)
# collect matching rows to a dictionary
match = {}
for row in data[1:]:
    if row[0] in match.keys():
        match[row[0]].append(row[1:])
    else:
        match[row[0]] = [row[1:]]
# aggregate the data
merged = [data[0]]
for id, rows in match.iteritems():
    val1 = [] 
    val2 = []
    val3 = []
    for row in rows:
        val1.append(row[0])
        val2.append(int(row[1]))
        val3.append(int(row[2]))
    merged.append([id, ' & '.join(val1), sum(val2), sum(val3)])
# display outcome    
merged

[['ID', 'VAL1', 'VAL2', 'VAL3'],
 ['10', 'Artur Kiraly', 5, 30],
 ['1', 'Neo & Trinity', 15, 76],
 ['3', 'Bendeguz', 7, 50],
 ['2', 'Bud & Terence', 14, 55],
 ['5', 'A lovagok akik azt mondjak, hogy ni!', 14, 25],
 ['4', 'Son Goku & Krilin & Ifju Satan & Zselialis Teknos', 67, 172],
 ['7', 'Brian', 18, 21],
 ['6', 'Superman', 17, 49],
 ['9', 'Batman & Robin', 11, 76],
 ['8', 'Tom & Jerry', 29, 109]]