# Day 2

## Topics

* If statement
* Lists, List Comprehensions & Iteration Patterns
* Dictionaries
* Working with Files
* Writing Command-line Applications
* Regular Expressions

## Conditional Expressions & If Statement

In [1]:
marks = 75

In [3]:
marks > 35

True

In [4]:
filename = "hello.py"

In [5]:
filename.endswith(".py")

True

In [6]:
"lo" in filename

True

In [7]:
"lo" not in filename

False

In [8]:
def get_ext(filename):
    if "." in filename:
        return filename.split(".")[1]
    else:
        return None

In [9]:
"a" in ["a", "b", "c", "d"]

True

Let's write a function `is_vowel` to check if a character is vowel.

In [10]:
def is_vowel(c):
    vowels = ['a', 'e', 'i', 'o', 'u']
    return c in vowels

In [11]:
is_vowel('a')

True

In [12]:
is_vowel('x')

False

### Combining Conditional Expressions

#### `not`

In [13]:
filename = "hello.py"

In [14]:
filename.endswith(".py")

True

In [15]:
not filename.endswith(".py")

False

In [16]:
"hell" in filename

True

In [17]:
"hell" not in filename

False

#### `and` and `or`

In [18]:
# make sure filename does not have ".." and ends with ".txt"
".." not in filename and filename.endswith(".txt")

False

Please note that `and` and `or` are short circuit operators.

In `a and b`, `b` is evaluated only if `a` is True.
Similarly, in `a or b`, `b` is evaluated only if `a` is False.

In [19]:
filename

'hello.py'

In [20]:
# doom() is never called.
filename.endswith(".txt") and doom()

False

In [21]:
# how about this?
filename.endswith(".txt") and doom(

SyntaxError: incomplete input (4004568105.py, line 2)

The values `0`, `""`, `[]`, `{}` etc. are considered equivalant to False. Everything else is consider equivalant to True.

In [24]:
orders = [1, 2, 3]

In [25]:
if orders:
    print("You have", len(orders), "orders")

You have 3 orders


### If Statement

In [26]:
n = 25

In [27]:
if n % 2 == 0:
    print("even")
else:
    print("odd")

odd


In [28]:
n = 123

In [29]:
if n < 10:
    print(n, "is a single digit number")
elif n < 100:
    print(n, "is a two digit number")
else:
    print(n, "is a big number")

123 is a big number


In [30]:
def check_number(n):
    if n < 10:
        print(n, "is a single digit number")
    elif n < 100:
        print(n, "is a two digit number")
    else:
        print(n, "is a big number")

In [31]:
check_number(1)

1 is a single digit number


In [32]:
check_number(12)

12 is a two digit number


In [33]:
check_number(123)

123 is a big number


In [35]:
%load_problem minimum

In [37]:
# your code here
def minimum(a, b):
    if a < b:
        return a
    else:
        return b



In [38]:
%verify_problem minimum

✓ minimum(3, 7)
✓ minimum(33, 7)
✓ 1 + minimum(3, 7)
✓ 1 + minimum(3, 3)
🎉 Congratulations! You have successfully solved problem minimum!!


In [36]:
%load_problem minimum3

In [41]:
def minimum3(a, b, c):
    # ab = minimum(a, b)
    # return minimum(ab, c)
    return minimum(minimum(a, b), c)

In [42]:
%verify_problem minimum3

✓ minimum3(1, 2, 3)
✓ minimum3(3, 2, 1)
✓ minimum3(3, 1, 2)
✓ minimum3(2, 1, 3)
✓ minimum3(2, 3, 1)
✓ minimum3(1, 1, 1)
✓ 1 + minimum3(1, 1, 1)
🎉 Congratulations! You have successfully solved problem minimum3!!


## Lists

In [43]:
x = ['a', 'b', 'c', 'd']

In [44]:
x

['a', 'b', 'c', 'd']

In [45]:
len(x)

4

In [46]:
x[0]

'a'

### For Loop

In [47]:
names = ["alice", "bob", "charlie", "dave"]

In [48]:
for name in names:
    print("Hello", name)

Hello alice
Hello bob
Hello charlie
Hello dave


In [49]:
for n in [1, 2, 3, 4, 5]:
    print(n)

1
2
3
4
5


### range

The built-in function `range` can be used to create a sequence of numbers.

In [50]:
range(5)

range(0, 5)

In [51]:
for i in range(5):
    print(i)

0
1
2
3
4


In [52]:
list(range(5))

[0, 1, 2, 3, 4]

In [56]:
list(range(5)) # numbers from 0 to 5 (end is not included)names

[0, 1, 2, 3, 4]

In [57]:
list(range(2, 5))

[2, 3, 4]

In [58]:
list(range(2, 20, 3)) 

[2, 5, 8, 11, 14, 17]

### Modifying and Growing Lists

In [59]:
x = ['a', 'b', 'c', 'd']

In [60]:
x[0]

'a'

In [61]:
x[0] = 'aa'

In [62]:
x

['aa', 'b', 'c', 'd']

In [63]:
x.append('e')

In [64]:
x

['aa', 'b', 'c', 'd', 'e']

Please note that `append` does not return anything.

In [65]:
x = [1, 2, 3, 4]

In [66]:
x = x.append(5)

In [67]:
print(x)

None


#### Example: Squares

Let's write a function `squares` that takes a list of numbers as argument and computes the square of each one of them.

```
>>> squares([1, 2, 3, 4, 5])
[1, 4, 9, 16, 25]
```

In [70]:
def squares(numbers):
    result = []
    for n in numbers:
        result.append(n*n)
    return result

In [71]:
squares([1, 2, 3, 4, 5])

[1, 4, 9, 16, 25]

In [72]:
%load_problem evens

In [None]:
# your code here





### List Indexing

In [73]:
x = ['a', 'b', 'c', 'd', 'e']

In [74]:
x[0]

'a'

How do you get the last element?

In [75]:
x[len(x)-1]

'e'

In [76]:
x[-1]

'e'

```
+----+----+----+----+
|  a |  b |  c |  d |
+----+----+----+----+
|  0 |  1 |  2 |  3 | --> regular index
+----+----+----+----+
| -4 | -3 | -2 | -1 | <-- negative index
+----+----+----+----+
```

In [77]:
def get_last_word(sentence):
    return sentence.split()[-1]

In [78]:
get_last_word("one two three")

'three'

### List Slicing

In [79]:
x = ["zero", "one", "two", "three", "four", "five", "six", "seven", "eight"]

In [80]:
x[0:2] # from index 0 to index 2 (end is not included)

['zero', 'one']

In [81]:
x[:2] # upto index 2

['zero', 'one']

In [82]:
x[2:] # index 2 onwards

['two', 'three', 'four', 'five', 'six', 'seven', 'eight']

In [83]:
x[1:6] # from index 1 to index 6

['one', 'two', 'three', 'four', 'five']

In [84]:
x[1:6:2] # from index to index 6, take every 2nd element

['one', 'three', 'five']

How about all but not the first element?

In [85]:
x[1:]

['one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight']

How about all except the last one?

In [86]:
x[:-1]

['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven']

How to get the last two elements?

In [87]:
x[-1:-3]

[]

In [88]:
x[3:1]

[]

In [89]:
x[-2:]

['seven', 'eight']

How to get the elements in the reverse order?

In [90]:
x[::-1]

['eight', 'seven', 'six', 'five', 'four', 'three', 'two', 'one', 'zero']

In [91]:
x[:] # copy of the list

['zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight']

### List Comprehensions

In [92]:
def squares(numbers):
    result = []
    for n in numbers:
        result.append(n*n)
    return result

In [93]:
numbers = [1, 2, 3, 4, 5]

In [94]:
squares(numbers)

[1, 4, 9, 16, 25]

In [95]:
[n*n for n in numbers]

[1, 4, 9, 16, 25]

In [96]:
result = [n*n for n in numbers]

In [97]:
result

[1, 4, 9, 16, 25]

In [98]:
[n*n for n in numbers if n % 2 == 0]

[4, 16]

In [99]:
# compute the sum of squares of all even numbers below one million
sum([n*n for n in range(1000000) if n%2 == 0])

166666166667000000

Let's unpack it a bit.

    result = [expr for a_var in a_list]
    result = [expr for a_var in a_list if some_cond]

They are equivalant to:

    result = []
    for a_var in a_list:
        result.append(expr)

    result = []
    for a_var in a_list:
        if some_cond:
            result.append(expr)
    

Find all Python files in the current directory.

In [100]:
import os

In [101]:
os.listdir() # all files

['add.py',
 'ls.py',
 '.ipynb_checkpoints',
 'sq.py',
 'sentences.txt',
 'day1.ipynb',
 'three.txt',
 'echo.py',
 'args.py',
 'day2.ipynb']

In [102]:
[f for f in os.listdir() if f.endswith(".py")]

['add.py', 'ls.py', 'sq.py', 'echo.py', 'args.py']

In [103]:
os.path.getsize("add.py")

19

In [104]:
!ls -l add.py

-rw-r--r-- 1 jupyter-anand jupyter-anand 19 May 13 07:51 add.py


What if we want to find the total size of all python files in the current directory?

In [105]:
[f for f in os.listdir() if f.endswith(".py")]

['add.py', 'ls.py', 'sq.py', 'echo.py', 'args.py']

In [106]:
[os.path.getsize(f) for f in os.listdir() if f.endswith(".py")]

[19, 61, 69, 31, 28]

In [107]:
sum([os.path.getsize(f) for f in os.listdir() if f.endswith(".py")])

208

### Iterations Patterns

#### Iterating over a list

In [109]:
x = ['a', 'b', 'c', 'd']

In [110]:
for a in x:
    print(a, a.upper())

a A
b B
c C
d D


In [111]:
[a.upper() for a in x]

['A', 'B', 'C', 'D']

#### Iterating over a sequence of numbers

In [113]:
for i in range(5):
    print(i*i)

0
1
4
9
16


In [114]:
[i*i for i in range(5)]

[0, 1, 4, 9, 16]

#### Iterating over two lists together

In [115]:
names = ["a", "b", "c", "d"]
scores = [10, 20, 30, 40]

In [116]:
for i in range(len(names)):
    print(names[i], scores[i])

a 10
b 20
c 30
d 40


This is one correct way to do it, but this is not very pythonic.

In [117]:
for name, score in zip(names, scores):
    print(name, score)

a 10
b 20
c 30
d 40


In [118]:
zip(names, scores)

<zip at 0x7582266c1800>

In [119]:
list(zip(names, scores))

[('a', 10), ('b', 20), ('c', 30), ('d', 40)]

In [120]:
x, y = ('a', 10)

In [121]:
x

'a'

In [122]:
y

10

In [123]:
list(zip(['a', 'b', 'c'], [1, 2, 3, 4]))

[('a', 1), ('b', 2), ('c', 3)]

In [124]:
%load_problem vector-add

In [None]:
# your code here





In [125]:
v1 = [1, 2, 3, 4]
v2 = [10, 20, 30, 40]

In [126]:
zip(v1, v2)

<zip at 0x75822756fd80>

In [127]:
for x, y in zip(v1, v2):
    print(x, y)

1 10
2 20
3 30
4 40


#### Looping over the index and the element together

In [128]:
chapters = [
    "Getting Started",
    "Functions",
    "Lists",
    "Dictionaries"
]

How to print the chapter number and chapter title together?

In [129]:
for i in range(len(chapters)):
    print("Chapter", i+1, ":", chapters[i])

Chapter 1 : Getting Started
Chapter 2 : Functions
Chapter 3 : Lists
Chapter 4 : Dictionaries


In [130]:
for i, title in enumerate(chapters):
    print(f"Chapter {i}: {title}")

Chapter 0: Getting Started
Chapter 1: Functions
Chapter 2: Lists
Chapter 3: Dictionaries


In [131]:
for i, title in enumerate(chapters):
    print(f"Chapter {i+1}: {title}")

Chapter 1: Getting Started
Chapter 2: Functions
Chapter 3: Lists
Chapter 4: Dictionaries


In [132]:
for i, title in enumerate(chapters, start=1):
    print(f"Chapter {i}: {title}")

Chapter 1: Getting Started
Chapter 2: Functions
Chapter 3: Lists
Chapter 4: Dictionaries


### Command-line Arguments

Let's improve our echo program to print all the command-line arguments passed to it, instead of just the first one.

In [140]:
%%file echo.py
import sys
args = sys.argv[1:]
# print(args)
print(" ".join(args))

Overwriting echo.py


In [141]:
!echo hello world

hello world


In [142]:
!python echo.py hello world

hello world


In [143]:
%load_problem sum-of-arguments

In [None]:
%%file sum.py
# your code here





### Sorting Lists

In [145]:
names = ["alice", "dave", "charlie", "bob"]

In [146]:
names.sort() # sorts in-place

In [147]:
names

['alice', 'bob', 'charlie', 'dave']

In [148]:
names = ["alice", "dave", "charlie", "bob"]

In [149]:
sorted(names)

['alice', 'bob', 'charlie', 'dave']

In [150]:
names

['alice', 'dave', 'charlie', 'bob']

How to sort by length?

In [151]:
sorted(names, key=len)

['bob', 'dave', 'alice', 'charlie']

In [152]:
sorted(names, key=len, reverse=True)

['charlie', 'alice', 'dave', 'bob']

Find top-5 largest files in the current directory.

In [153]:
files = os.listdir()

In [154]:
files

['add.py',
 'ls.py',
 '.ipynb_checkpoints',
 'sq.py',
 'sentences.txt',
 'day1.ipynb',
 'three.txt',
 'echo.py',
 'args.py',
 'day2.ipynb']

In [155]:
sorted(files)

['.ipynb_checkpoints',
 'add.py',
 'args.py',
 'day1.ipynb',
 'day2.ipynb',
 'echo.py',
 'ls.py',
 'sentences.txt',
 'sq.py',
 'three.txt']

In [157]:
sorted(files, key=os.path.getsize, reverse=True)[:5]

['day1.ipynb', 'day2.ipynb', '.ipynb_checkpoints', 'sentences.txt', 'sq.py']

In [158]:
sorted(os.listdir(), key=os.path.getsize, reverse=True)[:5]

['day1.ipynb', 'day2.ipynb', '.ipynb_checkpoints', 'sentences.txt', 'sq.py']

### Dictionaries

Dictionaries are used to store name value pairs.

In [159]:
d = {"x": 1, "y": 2, "z": 3}

In [160]:
d

{'x': 1, 'y': 2, 'z': 3}

In [161]:
d["x"]

1

In [162]:
d["x"] = 11

In [163]:
d

{'x': 11, 'y': 2, 'z': 3}

In [164]:
d['w'] = 10

In [165]:
d

{'x': 11, 'y': 2, 'z': 3, 'w': 10}

### Dictionary Usage Patterns

There are two common patterns to use dictionaries.

1. as a record
2. as a lookup table

In [166]:
# as a record
person = {
    "name": "Alice",
    "email": "alice@example.com",
    "phone": "987654312"
}

In [167]:
person["name"]

'Alice'

In [168]:
person["email"]

'alice@example.com'

In [169]:
# lookup table
phone_numbers = {
    "alice": 1234,
    "bob": 2345
}

In [170]:
phone_numbers["alice"]

1234

In [171]:
"bob" in phone_numbers

True

In [172]:
"charlie" in phone_numbers

False

#### Example: Greeting in multiple languages

Let's write a function `greet` to greet a person in any language.

If we have to greet in just English, we could write this as:

In [173]:
def greet(name):
    print("Hello", name)

In [174]:
greet("Alice")

Hello Alice


Lets add support for multiple languages.

In [175]:
def greet(name, lang):
    if lang == "en":
        print("Hello", name)
    elif lang == "hi":
        print("Namaste", name)
    elif lang == "kn":
        print("Namastara", name)


In [178]:
greet("Alice", "hi")

Namaste Alice


We have to modify the code if we want to add a new language, which is not nice. Let's try to move the translations out of the greet function.

In [182]:
prefixes = {
    "en": "Hello",
    "hi": "Namaste",
    "kn": "Namsakara",
    "it": "Caiso"
}

In [183]:
def greet(name, lang):
    prefix = prefixes[lang]
    print(prefix, name)

In [184]:
greet("Alice", "hi")

Namaste Alice


In [185]:
greet("Alice", "it")

Caiso Alice


We can go one step further and move the translations into a text file.

In [186]:
%%file greetings.txt
en Hello
hi Namaste
kn Namaskara
it Caiso

Writing greetings.txt


In [187]:
prefixes = {}
for line in open("greetings.txt"):
    lang, prefix = line.strip().split()
    prefixes[lang] = prefix

In [188]:
prefixes

{'en': 'Hello', 'hi': 'Namaste', 'kn': 'Namaskara', 'it': 'Caiso'}

In [189]:
dict([("x", 1), ("y", 2)])

{'x': 1, 'y': 2}

In [190]:
[line.strip().split() for line in open("greetings.txt")]

[['en', 'Hello'], ['hi', 'Namaste'], ['kn', 'Namaskara'], ['it', 'Caiso']]

In [191]:
dict([line.strip().split() for line in open("greetings.txt")])

{'en': 'Hello', 'hi': 'Namaste', 'kn': 'Namaskara', 'it': 'Caiso'}

In [192]:
%load_problem read-prices

In [None]:
# your code here





### Common operations on dictionaries

In [193]:
phone_numbers

{'alice': 1234, 'bob': 2345}

In [194]:
"alice" in phone_numbers

True

In [195]:
"dave" in phone_numbers

False

#### `get`

The `get` method takes two arguments, the key and a default value. If the key is present it return the corresponding value, if not returns the default value.

In [196]:
phone_numbers.get("alice", "-")

1234

In [197]:
phone_numbers.get("dave", "-")

'-'

In [198]:
phone_numbers

{'alice': 1234, 'bob': 2345}

In [199]:
ph = phone_numbers.get("dave")

In [200]:
print(ph)

None


In [201]:
phone_numbers.get("dave") or "-"

'-'

#### `setdefault`

The `setdefault` works like `get`, but also adds an entry to the dictionary.

In [202]:
d = {"x": 1, "y": 2}

In [203]:
d.setdefault("x", 0)

1

In [204]:
d

{'x': 1, 'y': 2}

In [205]:
d.setdefault("z", 0)

0

In [206]:
d

{'x': 1, 'y': 2, 'z': 0}

#### `update`

In [208]:
d1 = {"x": 1, "y": 2}

In [209]:
d2 = {"x": 11, "z": 3}

In [210]:
d1.update(d2)

In [211]:
d1

{'x': 11, 'y': 2, 'z': 3}

### Iterating over dictionaries

In [212]:
d = {"x": 1, "y": 2, "z": 3}

In [213]:
d.keys()

dict_keys(['x', 'y', 'z'])

In [214]:
d.values()

dict_values([1, 2, 3])

In [215]:
d.items()

dict_items([('x', 1), ('y', 2), ('z', 3)])

#### Iterating over keys

In [217]:
for k in d.keys():
    print(k)

x
y
z


In [218]:
for k in d:
    print(k)

x
y
z


In [219]:
[k.upper() for k in d]

['X', 'Y', 'Z']

In [221]:
for k in d:
    print(k, d[k])

x 1
y 2
z 3


#### Iterating over values

In [220]:
for v in d.values():
    print(v)

1
2
3


Iterating over both keys and values together

In [222]:
for k, v in d.items():
    print(k, v)

x 1
y 2
z 3


### Example: Marks of a student

In [223]:
marks = {
    "english": 87,
    "science": 78,
    "maths": 68
}

In [224]:
marks.items()

dict_items([('english', 87), ('science', 78), ('maths', 68)])

In [227]:
for subject, score in marks.items():
    print(subject, score)
print("---")
print("Total", sum(marks.values())) # FIXME

english 87
science 78
maths 68
---
Total 233


Which subject did the student score the maximum marks?

In [228]:
max(marks.values())

87

In [229]:
[subject for subject, score in marks.items() if score==max(marks.values())]

['english']

In [230]:
def get_score(subject):
    return marks[subject]
    
max(marks, key=get_score)

'english'

In [231]:
max(marks, key=marks.get)

'english'

## Sets

Sets are unordered collection of elements.

In [232]:
x = {1, 2, 3}

In [233]:
1 in x

True

In [234]:
type(x)

set

How to create an empty set?

In [235]:
x = set() # {} creates empty dictionary

How to find unique elements in a list?

In [237]:
names = ["a", "b", "d", "c", "a"]

let's try a naive approach first.

In [238]:
def unique(values):
    result = []
    for v in values:
        if v not in result:
            result.append(v)
    return result

In [239]:
unique(names)

['a', 'b', 'd', 'c']

In [243]:
%time unique(range(1000));

CPU times: user 1.79 ms, sys: 0 ns, total: 1.79 ms
Wall time: 1.8 ms


In [244]:
%time unique(range(10000));

CPU times: user 182 ms, sys: 0 ns, total: 182 ms
Wall time: 181 ms


In [246]:
%time unique(range(20000));

CPU times: user 1.42 s, sys: 0 ns, total: 1.42 s
Wall time: 1.47 s


In [247]:
%time unique(range(40000));

CPU times: user 3.61 s, sys: 0 ns, total: 3.61 s
Wall time: 3.64 s


In [248]:
%time unique(range(80000));

CPU times: user 18.9 s, sys: 0 ns, total: 18.9 s
Wall time: 19.1 s


Let's try with sets and see the performance again.

In [249]:
def unique2(values):
    return list(set(values))

In [252]:
%time unique2(range(20000));

CPU times: user 841 µs, sys: 0 ns, total: 841 µs
Wall time: 853 µs


In [253]:
%time unique2(range(40000));

CPU times: user 819 µs, sys: 0 ns, total: 819 µs
Wall time: 822 µs


In [254]:
%time unique2(range(80000));

CPU times: user 4.7 ms, sys: 0 ns, total: 4.7 ms
Wall time: 4.63 ms


In [258]:
%timeit unique2(range(100000));

3.59 ms ± 219 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [259]:
%timeit unique2(range(200000));

8.14 ms ± 282 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [262]:
%timeit unique2(range(1000000));

65.8 ms ± 3.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


The issue is checking an item is present in a list is expensive.

In [268]:
numbers = list(range(1000000))
numbers_set = set(numbers)
n = -1

In [269]:
%timeit n in numbers

3.57 ms ± 107 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [270]:
%timeit n in numbers_set

30.1 ns ± 1.41 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


#### Dictionary & Set Comprehensions

In [272]:
{f: os.path.getsize(f) for f in os.listdir()}

{'add.py': 19,
 'ls.py': 61,
 '.ipynb_checkpoints': 4096,
 'sq.py': 69,
 'sentences.txt': 101,
 'day1.ipynb': 85466,
 'three.txt': 14,
 'echo.py': 67,
 'args.py': 28,
 'day2_files': 4096,
 'day2.ipynb': 97806,
 'greetings.txt': 42}

In [273]:
{f.split(".")[1] for f in os.listdir() if "." in f}

{'ipynb', 'ipynb_checkpoints', 'py', 'txt'}

In [274]:
[f.split(".")[1] for f in os.listdir() if "." in f]

['py',
 'py',
 'ipynb_checkpoints',
 'py',
 'txt',
 'ipynb',
 'txt',
 'py',
 'py',
 'ipynb',
 'txt']

In [275]:
set([f.split(".")[1] for f in os.listdir() if "." in f])

{'ipynb', 'ipynb_checkpoints', 'py', 'txt'}

## Writing Professional Command-line Applications

In [277]:
!cat files/words.txt

one
one two
one two three
one two three four
one two three four five
two three four five
three four five
four five
five
one-two-three-four-five-six-seven


In [276]:
!wc files/words.txt

 10  26 154 files/words.txt


In [278]:
!wc -l files/words.txt

10 files/words.txt


In [279]:
!wc --help

Usage: wc [OPTION]... [FILE]...
  or:  wc [OPTION]... --files0-from=F
Print newline, word, and byte counts for each FILE, and a total line if
more than one FILE is specified.  A word is a non-zero-length sequence of
printable characters delimited by white space.

With no FILE, or when FILE is -, read standard input.

The options below may be used to select which counts are printed, always in
the following order: newline, word, character, byte, maximum line length.
  -c, --bytes            print the byte counts
  -m, --chars            print the character counts
  -l, --lines            print the newline counts
      --files0-from=F    read input from the files specified by
                           NUL-terminated names in file F;
                           If F is - then read names from standard input
  -L, --max-line-length  print the maximum display width
  -w, --words            print the word counts
      --total=WHEN       when to print a line with total counts;
                 

### Example: hello.py

Let's write a simple program that says hello given a name.

In [285]:
%%file hello.py
import argparse

p = argparse.ArgumentParser()
p.add_argument("name", help="name to greet")
p.add_argument("-r", "--repeats", type=int, default=1,
               help="number of times to repeat the message")

args = p.parse_args()
print(args)

for i in range(args.repeats):
    print("Hello", args.name)

Overwriting hello.py


In [286]:
!python hello.py Alice

Namespace(name='Alice', repeats=1)
Hello Alice


In [287]:
!python hello.py

usage: hello.py [-h] [-r REPEATS] name
hello.py: error: the following arguments are required: name


In [288]:
!python hello.py --help

usage: hello.py [-h] [-r REPEATS] name

positional arguments:
  name                  name to greet

options:
  -h, --help            show this help message and exit
  -r REPEATS, --repeats REPEATS
                        number of times to repeat the message


In [289]:
!python hello.py -r 3 Alice

Namespace(name='Alice', repeats=3)
Hello Alice
Hello Alice
Hello Alice


In [290]:
%load_problem skip-cmd

In [311]:
%%file skip.py
import argparse
p = argparse.ArgumentParser()
p.add_argument("filename", help="name of the file to read")
p.add_argument("-n", type=int, default=5, 
               help="number of lines to skip")
args = p.parse_args()
# print(args)

lines = open(args.filename).readlines()
lines = lines[args.n:]
# print(lines)
for line in lines:
    print(line, end="")


Overwriting skip.py


In [312]:
!python skip.py -n 3 files/ten.txt 

4
5
6
7
8
9
10


In [313]:
!python skip.py --help

usage: skip.py [-h] [-n N] filename

positional arguments:
  filename    name of the file to read

options:
  -h, --help  show this help message and exit
  -n N        number of lines to skip


In [314]:
%verify_problem skip-cmd

✓ python skip.py /opt/files/ten.txt
✓ python skip.py -n 8 /opt/files/ten.txt
✓ python skip.py -n 2 /opt/files/abcd.txt
🎉 Congratulations! You have successfully solved problem skip-cmd!!


#### Boolean Flags

The common flags take an argument. But there are some flags which don't need any argument.

In [315]:
!head -n 5 files/ten.txt

1
2
3
4
5


In [316]:
!wc -l files/ten.txt

10 files/ten.txt


Let's add a boolean flag `-u` or `--uppercase` to convert the message into upper case.

In [322]:
%%file hello2.py
import argparse

p = argparse.ArgumentParser()
p.add_argument("name", help="name to greet")
p.add_argument("-r", "--repeats", type=int, default=1,
               help="number of times to repeat the message")
p.add_argument("-u", "--uppercase", action="store_true", 
               default=False,
               help="display the message in uppercase")

args = p.parse_args()
print(args)

msg = "Hello " + args.name

if args.uppercase:
    msg = msg.upper()

for i in range(args.repeats):
    print(msg)

Overwriting hello2.py


In [323]:
!python hello2.py --help

usage: hello2.py [-h] [-r REPEATS] [-u] name

positional arguments:
  name                  name to greet

options:
  -h, --help            show this help message and exit
  -r REPEATS, --repeats REPEATS
                        number of times to repeat the message
  -u, --uppercase       display the message in uppercase


In [325]:
!python hello2.py Alice -r 3 --uppercase

Namespace(name='Alice', repeats=3, uppercase=True)
HELLO ALICE
HELLO ALICE
HELLO ALICE


## Working with Files

We have already seen how to read files.

In [326]:
!cat files/three.txt

one
two
three


In [327]:
open("files/three.txt").read()

'one\ntwo\nthree\n'

In [328]:
open("files/three.txt").readlines()

['one\n', 'two\n', 'three\n']

### Writing to a File

To write to a file, we need to open it in write mode.

In [329]:
f = open("a.txt", "w")
f.write("one\n")
f.write("two\n")
f.close()

In [330]:
open("a.txt").read()

'one\ntwo\n'

It is important to close a file after writing. Only after closing the file, the contents are flushed to the disk.

In [335]:
!ls -l a.txt

-rw-r--r-- 1 jupyter-anand jupyter-anand 8 May 14 11:19 a.txt


In [336]:
f = open("a.txt", "w")

In [337]:
!ls -l a.txt

-rw-r--r-- 1 jupyter-anand jupyter-anand 0 May 14 11:20 a.txt


In [338]:
f.write("one\n")
f.write("two\n")

4

In [339]:
!ls -l a.txt

-rw-r--r-- 1 jupyter-anand jupyter-anand 0 May 14 11:20 a.txt


In [340]:
f.close()

In [341]:
!ls -l a.txt

-rw-r--r-- 1 jupyter-anand jupyter-anand 8 May 14 11:20 a.txt


#### Appending to a file

To append to an existing file, we need to open it in append (`a`) mode.

In [342]:
f = open("a.txt", "a")
f.write("three\n")
f.write("four\n")
f.close()

In [343]:
!cat a.txt

one
two
three
four


### The `with` statement

In [350]:
with open("a.txt", "w") as f:
    f.write("one\n")
    f.write("two\n")
# f gets closed automatically

In [351]:
open("a.txt").read()

'one\ntwo\n'

#### Binary Data and Binary files

Just like strings, Python has `bytes` type.

In [344]:
a = "hello"

In [345]:
type(a)

str

In [346]:
b = b"hello"

In [347]:
type(b)

bytes

In [348]:
data = b"hello\x01\x02\x03"

In [349]:
data[0]

104

In [353]:
with open("data.bin", "wb") as f:
    f.write(data)

In [354]:
!ls -l data.bin

-rw-r--r-- 1 jupyter-anand jupyter-anand 8 May 14 11:26 data.bin


In [359]:
open("data.bin", "rb").read()

b'hello\x01\x02\x03'

In [360]:
!ls -l files/python.png

-rw-rw-r-- 1 jupyter-anand jupyter-anand 11155 Nov 15 11:11 files/python.png


In [361]:
# this will fail
data = open("files/python.png").read()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

In [364]:
data = open("files/python.png", "rb").read()

In [365]:
type(data)

bytes

In [366]:
data[:100]

b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x02Y\x00\x00\x00\xcb\x08\x06\x00\x00\x00]\xc9\x86&\x00\x00\x00\x04sBIT\x08\x08\x08\x08|\x08d\x88\x00\x00\x00\tpHYs\x00\x00\n\xf0\x00\x00\n\xf0\x01B\xac4\x98\x00\x00\x00\x16tEXtCreation Time\x0006/05/04'

## The `pathlib` module

In [367]:
from pathlib import Path

In [372]:
root = Path("files")

In [373]:
root

PosixPath('files')

In [374]:
root.absolute()

PosixPath('/opt/zeomega-python-2024/live-notes/files')

In [375]:
!ls files

10.txt		     empty.txt		orders2.txt  ten.txt
1234.txt	     extcount		orders.txt   three.txt
50.txt		     five.txt		poems	     tryst.txt
abc		     images		prices.txt   words2.txt
abcd.txt	     images.zip		python.png   words.txt
animals.txt	     large-files	quotes.txt   zen-of-python.txt
blake.txt	     leading-space.txt	rsync
bumper-stickers.txt  names.txt		sort
bytes.bin	     numbers		sumfile


In [376]:
root.joinpath("ten.txt")

PosixPath('files/ten.txt')

In [377]:
p = root.joinpath("ten.txt")

In [378]:
p

PosixPath('files/ten.txt')

In [379]:
p.name

'ten.txt'

In [380]:
p.parent

PosixPath('files')

In [381]:
p.is_file()

True

In [382]:
p.exists()

True

In [383]:
p.suffix

'.txt'

In [384]:
p.stem

'ten'

In [385]:
p.read_text()

'1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n'

In [386]:
p.stat()

os.stat_result(st_mode=33204, st_ino=1053127, st_dev=2048, st_nlink=1, st_uid=1003, st_gid=1005, st_size=21, st_atime=1715680973, st_mtime=1700046666, st_ctime=1715599054)

In [387]:
p.stat().st_size

21

Let's say we want to find the largest file in the files directory.

In [388]:
files = os.listdir("files")

In [389]:
files

['poems',
 'words.txt',
 'prices.txt',
 'quotes.txt',
 'sumfile',
 'animals.txt',
 'abcd.txt',
 '10.txt',
 'empty.txt',
 '50.txt',
 'bumper-stickers.txt',
 'ten.txt',
 'names.txt',
 'five.txt',
 'large-files',
 'numbers',
 'three.txt',
 'orders2.txt',
 'zen-of-python.txt',
 'tryst.txt',
 'images.zip',
 'python.png',
 'bytes.bin',
 'rsync',
 'leading-space.txt',
 'sort',
 'words2.txt',
 'blake.txt',
 '1234.txt',
 'extcount',
 'orders.txt',
 'abc',
 'images']

In [390]:
os.path.getsize("words.txt")

FileNotFoundError: [Errno 2] No such file or directory: 'words.txt'

In [392]:
path = os.path.join("files", "words.txt")
os.path.getsize(path)

154

In [393]:
root

PosixPath('files')

In [395]:
paths = [p for p in root.iterdir() if p.is_file()]

In [396]:
def getsize(p):
    return p.stat().st_size

In [397]:
max(paths, key=getsize)

PosixPath('files/images.zip')