# Everything in Python is an object

Python is an object oriented programming language i.e. it stresses on objects.

An object is simply a collection of data (variables) and methods (functions) that act on those data.

A class is a blueprint for that object i.e. it defines the structure of the object

## The basic datatypes!

The builtin data types you've been using are all objects!

In [10]:
# When you've been converting datatypes,
# You've actually been calling the constructors of these built-in types
print(type(str))
print(type(int))
print(type(bool))

<class 'type'>
<class 'type'>
<class 'type'>


In [11]:
print(type('abcde'))
print(type(2.0))
print(type(2))

<class 'str'>
<class 'float'>
<class 'int'>


In [13]:
# The dir() function returns all properties and methods of the specified object, without the values.
dir(2.0)

['__abs__',
 '__add__',
 '__bool__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getformat__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__le__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rmod__',
 '__rmul__',
 '__round__',
 '__rpow__',
 '__rsub__',
 '__rtruediv__',
 '__set_format__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__trunc__',
 'as_integer_ratio',
 'conjugate',
 'fromhex',
 'hex',
 'imag',
 'is_integer',
 'real']

In [14]:
print(2.01.is_integer())
print(2.0.is_integer())

False
True


### Even functions are objects!

* Functions are objects as well. 
* They are nothing special, when you call them, you are just calling their ``__call__`` method.
* You can assign functions to variables, put them in lists, pass them as arguments to other functions - anything you can do with a number or a string.

In [22]:
def greeter(name):
    print("Hello "+name)

In [23]:
print(type(greeter))

print(dir(greeter))

<class 'function'>
['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']


In [24]:
x=greeter
print(x.__name__)
print(x.__code__)

greeter
<code object greeter at 0x7fac7425a7c0, file "<ipython-input-22-80612c9eab1d>", line 1>


In [25]:
# Two ways to call function, first syntax used 
x("Prateek")
x.__call__("Eckovation")

Hello Prateek
Hello Eckovation


### Don't get intimidated, it's just data + logic!

I hope this doesn't confuse you! It's very simple,
* In Python, everythinhg is an object - this means that everything packages some data + some logic with it.
* For example, a number packages the number's value & lots of logic defining what is the absolute value of the number, how to add to numbers, how to take powers etc. Just like our Family class earlier.

### Commonly used data structures to store objects/data: lists, sets, tuple, dictionary

## 1. Lists

In [26]:
# Lists are obviously objects
mylist=[22,34,2, 56,77]
# Check out the index method
print(mylist)
print(mylist.index(34))


[22, 34, 2, 56, 77]
1


In [27]:
list_mix = [1, 'hello', 'prateek', 5.0]
# List indexing, starts from 0 and goes to length - 1
# Length can be calculated using len()
print(list_mix[0])
print(list_mix[1])

1
hello


In [28]:
# Let's check its type - a builtin method
print(type(mylist))
print(type(list_mix))

<class 'list'>
<class 'list'>


In [29]:
# We can use dir to see all attributes
dir(mylist)

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

In [34]:
# Using some inbuilt fuctions present in class list.

print('length:'+ str(len(mylist)))
print(mylist)
print(mylist[-2])

length:5
[22, 34, 2, 56, 77]
56


In [35]:
mylist.sort()
print(mylist)

[2, 22, 34, 56, 77]


### Note on variable assignments and garbage collection

In [37]:
# The id() function returns a unique id for the specified object. All objects in Python has its own unique id. 
# The id is the object's memory address, and will be different for each time you run the program.

x=[22,33]
print(id(x))
y=x
print(id(y))

140378719599040
140378719599040


* Clearly, the earlier logic for variable assignments hold true for all objects. 
* A variable is just a name for an object.
* In the above example, when we write ``y=x``, now ``y`` is another name for the list ``x`` refers to.

In [39]:
x=5
print(y)

print(id(x))
print(id(y))

[22, 33]
94114616708544
140378719599040


* We can now make x refer to something else...
* We can still refer to the list via ``y``.

In [40]:
y=2
print(id(y))

94114616708448


* But now, the list is gone.
* Once an object has no more names, Python deletes it.

# More on Strings

* So you now know that strings are objects as well. 
* So the first thing you should ask is : Does it have any interesting methods?

You can use ``dir`` to find out!

In [45]:
dir("bdkjbcksn")

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isascii',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',


* Wow, lots of functions. 
* But you're going to need them - because you will need to work with strings a lot!

**Checking strings**

In [46]:
print("New Delhi".startswith("new"))
print("225".isdigit())
print("Attributes".islower())

False
True
False


**Transforming strings**

In [47]:
print("PratEEK".lower())
print("PRAteek".upper())
print("praTEEK".capitalize())

prateek
PRATEEK
Prateek


### Some methods very useful for data processing

In [49]:
x="This is a sentence I want to break into parts"
parts=x.split()
print(parts)
print(x.split("s"))

['This', 'is', 'a', 'sentence', 'I', 'want', 'to', 'break', 'into', 'parts']
['Thi', ' i', ' a ', 'entence I want to break into part', '']


In [51]:
print(" ".join(parts))

This is a sentence I want to break into parts


In [24]:
x="    Remove all the extra space    "
print(x)
print("|"+x.lstrip()+"|")
print("|"+x.rstrip()+"|")
print("|"+x.strip()+"|")

    Remove all the extra space    
|Remove all the extra space    |
|    Remove all the extra space|
|Remove all the extra space|


In [61]:
x="I want to capitalize every a in here"
print(x.replace("capitalize","-"))

I want to - every a in here


Tip :
* Practice manipulating strings - very useful for software development. 
* But even for ML, you have to read files which are strings. You can't escape strings!

In [54]:
x = "/home/prateek/lectures/revision1.ipynb"
print(x)
print(x.replace("ipynb", "pdf"))

/home/prateek/lectures/revision1.ipynb
/home/prateek/lectures/revision1.pdf


### But can you change strings?

In [57]:
# you can do this with lists
x = [12, 1 ,"abc"]
print(x[1])
x[1]="one"
print(x)

1
[12, 'one', 'abc']


In [58]:
# can you do the same thing with strings?
x="abcdefgh"
print(x)
x[0]="z"

abcdefgh


TypeError: 'str' object does not support item assignment

* No strings are immutable objects - you can't edit them once they have been created.
* Nothing complicated, just like turning off a switch. 
* Numbers and booleans are also immutable objects

In [62]:
# but then doesn't this mutate the string?
x="Ravina"
print(x.lower())
print(x)

ravina
Ravina


Nope, its just returning a new string.

# File I/O

* In this section you will learn how to read and write to files.
* For this section, the files that you are dealing with will be text files.

###  A Note on files 

* Generally, there are two kinds of files ~ binary and text files.
* .py, .txt, .c - anything that can be opened with a text editor is a text file.
* Even .docx files are text files - but they are in XML format. If you're curious you can try opening one afterwards! :)
* .jpeg, .jpg, .exe, .mp3, .wav etc are generally binary files - although they might have some text content.
* Finally everything is made up of sequences of bytes - an 8 bit number like this 01010101 (could be represented in hex, oct, doesn't matter, it's finally an 8-bit number)
* But when the file is a text file, the byte corresponds to a character table/encoding like ASCII or unicode.

### Reading from a file

* The ``open()`` function returns a ``File`` object that exposes methods to read the file.
* How does it actually read the file from the disc? Don't know, don't care! That's the beauty of abstraction (hiding implementation details behind an interface)

There are two ways you can open a file.

In [63]:
input_file=open("test_file.txt","r")

print(input_file.read())
print('----------------')
input_file.close()

This is a test file for python revision 2.
-Prateek Manocha
-Eckovation


----------------


* When you open a file, this is a transaction between Python and the OS - Windows/Mac/Linux. 
* The OS gives Python a **handle** or connection to the file. 
* It's python's responsibility to **close that connection**. 
* If you forget to close files, **bad things might happen**. For example, your program might become slow, especially if you're opening too many files. See this [SO post](https://stackoverflow.com/questions/25070854/why-should-i-close-files-in-python) for more examples. 
* In general, it's just bad programming.

So I recommend you always use this syntax

In [68]:
# the file will be automatically closed once you reach the end of the block
with open("test_file.txt","r") as input_file:
    print(input_file.readline())

This is a test file for python revision 2.



In [69]:
print(input_file.readline())

ValueError: I/O operation on closed file.

* What is the second argument to open()?
* It's the mode of opening the file.
* "r" mode = open for reading. Actually this is same as "rt" = read text
* "rb" is for reading binary files

* But now lets look at the methods that let us actually read a text file.
* While reading a file, a "line" is a string that ends with a newline character i.e. "\n".

In [70]:
with open("test_file.txt","r") as fi:
    line="starting"
    lines=[]
    # readline returns "" at the end of the file
    while line !="\n":
        line=fi.readline()
        lines.append(line)
print(lines)

['This is a test file for python revision 2.\n', '-Prateek Manocha\n', '-Eckovation\n', '\n']


* Notice, the newlines are preserved.

In [71]:
with open("test_file.txt","rt") as fi:
    lines=fi.readlines()
print(lines)

['This is a test file for python revision 2.\n', '-Prateek Manocha\n', '-Eckovation\n', '\n']


* You can also pass how many bytes to read to the readlines() function.

In [73]:
with open("test_file.txt","rt") as fi:
    # read lines upto this many bytes
    lines=fi.readlines(64)
print(lines)
# curious people can check out below syntax
# its called "list comprehension"
print([len(line) for line in lines])

['This is a test file for python revision 2.\n', '-Prateek Manocha\n', '-Eckovation\n']
[43, 17, 12]


In [75]:
with open("test_file.txt","rt") as fi:
    # you can read character by character
    i=0
    while i <10:
        # read just 4 bytes = 4 characters
        print(fi.read(2))
        i+=1

Th
is
 i
s 
a 
te
st
 f
il
e 


Finally the most common way to read a file, using a for loop!

In [80]:
with open("test_file.txt","rt") as fi:
    i=0
    for line in fi:
        print(line)
        if i==0:
            break
        i+=1

This is a test file for python revision 2.



### Writing to a file

* The difference between reading and writing is just the **mode** and the methods! \
* With the "w" mode, ``open`` will open a file for writing.
* If the file doesn't exit a new file will be created.

In [84]:
with open("outfile.txt","w") as fo:
    fo.write("Hello my name is Prateek\n")
    fo.write("Unless we add \n There is no ")
    fo.write("automatic insertion of newline")

Now let's see what we wrote.

In [85]:
with open("outfile.txt","r") as fi:
    print(fi.read())

Hello my name is Prateek
Unless we add 
 There is no automatic insertion of newline


In [86]:
with open("outfile.txt","w") as fo:
    fo.write("This will overwrite the original stuff.\n")
    
with open("outfile.txt","r") as fi:
    print(fi.read())

This will overwrite the original stuff.



* "w" mode deletes everything in the file when it opens it.
* So what do we do, are we forced to write everything at one go?
* Nope, we can use the "a" or append mode

In [87]:
with open("outfile.txt","a") as fo:
    fo.write("This will be added to the original stuff")

with open("outfile.txt","r") as fi:
    print(fi.read())

This will overwrite the original stuff.
This will be added to the original stuff


* Mode matters! You can't use read() on a file opened with "w" or "a" mode.
* You can't use write on a file opened with "r".

In [88]:
with open("outfile","w") as fi:
    print(fi.write("Won't work"))

10


# More on lists

So you already know how to 

* create lists
* index lists
* slice lists
* update value at an index of a list

Here's some more important stuff

### Appending to lists

In [89]:
empty_list=[]
print(dir(empty_list))

['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']


In [90]:
print(empty_list)

empty_list.append(2)
empty_list.append(3.0)
empty_list.append("first")
print(empty_list)

[]
[2, 3.0, 'first']


### Concatenating lists

You can add 2 (or more) lists together, and they are concatenated just like strings.

In [92]:
l1=[25, 27, 28] 
l2=[2,5,1]
print(l1+l2+l1+l1)

[25, 27, 28, 2, 5, 1, 25, 27, 28, 25, 27, 28]


You can even multiply lists!

In [93]:
l3=l1*3   # Equivalent to l3 = l1 + l1 + l1
print(l3)

[25, 27, 28, 25, 27, 28, 25, 27, 28]


But I wanted to show you this because it doesn't exactly work as expected...

In [98]:
x=[[22,33],12,[22]]
y=x*3
print(y)
x[0][0]=11
print(x)
print(y)

[[22, 33], 12, [22], [22, 33], 12, [22], [22, 33], 12, [22]]
[[11, 33], 12, [22]]
[[11, 33], 12, [22], [11, 33], 12, [22], [11, 33], 12, [22]]


In [99]:
y1 = x.copy()
print(y1)
x[0] = 1
print(y1)

[[11, 33], 12, [22]]
[[11, 33], 12, [22]]


You can even do this...

In [100]:
x=[99,4]
print(x)
x+="hello" # x = x + [h,e,l,l,o]
print(x)

[99, 4]
[99, 4, 'h', 'e', 'l', 'l', 'o']


* Why? I'll leave that answer to you 
* hint: Why can you use for loop for both strings and lists?

### How to delete an item?

In [104]:
# remove a function included in list class
# del, an inbuilt python function to delete objects i.e. variables, list, parts of list, every object.
x=[9,5,6,8,2,44,1, 9]
print(x)
x.remove(9)
x.remove(9)
print(x)
x=[9,5,6,8,2,44,1]
del x[1]
print(x)
x=[9,5,6,8,2,44,1]
del x[1:2]
print(x)

[9, 5, 6, 8, 2, 44, 1, 9]
[5, 6, 8, 2, 44, 1]
[9, 6, 8, 2, 44, 1]
[9, 6, 8, 2, 44, 1]


### Note on operators (Advanced)

* You might have seen that a list has a + operator and a * operator, and oh, this must be how lists are.
* Nope! It's all defined somewhere. The creators wanted lists to behave this way.
* In fact, operators like + are implemented as **methods** on the class they work on.
* Try looking at the objects using ``dir`` and guessing which methods implement the operators.

In [107]:
x=[44]
x=x.__add__([22])  # x = x + [22]
print(x)

x=[44]
x=x+[22]
print(x)

x=[44]
x=x+[22]
print(x)

[44, 22]
[44, 22]
[44, 22]


# Other datastructures in Python

### Sets

Let's look at sets.

* Sets are like lists - they are collections of items.
* But there is no order.
* And there can be just one copy of each item.
* Just like mathematical sets!

In [109]:
# create a set from scratch  [] {}
x = {1, 2, 6, 7, 2, 7, 6}
print(x)

{1, 2, 6, 7}


In [111]:
# from a list
y_list = [1,2,3,1,1,13,4]
print(y_list)
print(set(y_list))

[1, 2, 3, 1, 1, 13, 4]
{1, 2, 3, 4, 13}


* You can do interesting operations on sets

In [113]:
x1={1,2,3,4,5,6}
x2={2,4,6,8}

* Union

In [114]:
# Read as x1 or x2
print(x1 | x2)  

{1, 2, 3, 4, 5, 6, 8}


* Intersection

In [115]:
# Read as x1 and x2
print(x1 & x2)  

{2, 4, 6}


* Set Difference ( Remove elements in x2 from x1)

In [119]:
# Read as x1 minus x2
print(x2 - x1)  
print(x1 - x2)

{8}
{1, 3, 5}


* Check for membership

In [122]:
myset={1,2,4,6,7}
print(1 in myset)
print(2 in myset)
print(8 not in myset)

True
True
True


* You can convert from a set to a list by calling the ``list()`` constructor.

In [60]:
print(x1)
x1_list = list(x1)
print(x1_list)

{1, 2, 3, 4, 5, 6}
[1, 2, 3, 4, 5, 6]


* You can loop through a set using a for loop.
* But **remember the order of the set is not guaranteed**

In [125]:
# sets can have heterogeneous elements too
myset={"abc",1,1.0,3,3,3,5,6,"abc"}
for element in myset:
    print(element)

1
3
5
abc
6


### Tuples

* Tuples are an ordered collection of objects. 
* They are immutable - they can't be changed. 
* Once created, they can't be changed

In [126]:
x=(1,2,4,5)
print(x)
# you can index tuples
print(x[2])
# you can slice tuples
print(x[2:4])

(1, 2, 4, 5)
4
(4, 5)


In [128]:
# you can't change tuples
x[0]=2

TypeError: 'tuple' object does not support item assignment

* Remember you saw tuples in the functions section?
* If they are multiple return values they are packed into a tuple first.

In [129]:
def return_initials(first_name,middle_name,last_name):
    return first_name[0],middle_name[0],last_name[0]

print(return_initials("John","Winston","Lennon"))

('J', 'W', 'L')


In [130]:
# You can also apply for loop on tuples
print(x)
for el in x:
    print(el)

(1, 2, 4, 5)
1
2
4
5


### Dictionaries

* After lists, dictionaries are **the most important pythonic datastructure!**
* Dictionaries are maps from **key** to **value**.

In [131]:
student_ages={ "Jibran" : 24,
                "Maria" : 22,
                "Kaustav" : 23,
                "Somali": 22,
                "Devika": 25,
                "Jibran": 21
               }

* Keys are **unique**, you can't add more than one value for a key.
* You can see the unique keys in a dictionary by calling the ``keys()`` method.
* Note that ``keys()`` does not return a list!

In [132]:
dir(student_ages)

['__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'clear',
 'copy',
 'fromkeys',
 'get',
 'items',
 'keys',
 'pop',
 'popitem',
 'setdefault',
 'update',
 'values']

In [133]:
print(student_ages.keys())
# keys() is not a list, but you can convert it to one
print(list(student_ages.keys()))
print(list(student_ages.values()))

dict_keys(['Jibran', 'Maria', 'Kaustav', 'Somali', 'Devika'])
['Jibran', 'Maria', 'Kaustav', 'Somali', 'Devika']
[21, 22, 23, 22, 25]


You can **read and update** the value of a key by indexing the dictionary

In [135]:
# The first value got overwritten.
print(student_ages["Jibran"])
student_ages["Jibran"]=48
print(student_ages)
student_ages["prateek"] = 23
print(student_ages)

48
{'Jibran': 48, 'Maria': 22, 'Kaustav': 23, 'Somali': 22, 'Devika': 25, 'prateek': 23}
{'Jibran': 48, 'Maria': 22, 'Kaustav': 23, 'Somali': 22, 'Devika': 25, 'prateek': 23}


In [136]:
# You can add a new key just by indexing and assigning a value
student_ages["Shubham"]=27
print(student_ages)

{'Jibran': 48, 'Maria': 22, 'Kaustav': 23, 'Somali': 22, 'Devika': 25, 'prateek': 23, 'Shubham': 27}


You can check for membership (w.r.t keys)

In [140]:
print("Jibran" in student_ages)
print(408 in student_ages.values())

True
False


* You can also loop through a dictionary
* But the values that are returned are **keys** not values.

In [141]:
print(student_ages['prateek'])

23


In [142]:
for key in student_ages:
    print(key,student_ages[key])

Jibran 48
Maria 22
Kaustav 23
Somali 22
Devika 25
prateek 23
Shubham 27


* The keys and values can all be heterogeneous. Anything goes!

In [143]:
chaos = { # list within dictionary
        "key1" : [1,2,4,6],
         # dictionary within dictionary
           10: {"hello":True,1.2 : "watsup"},
        # tuples
        (1,2) : "tuples"
        }
print(chaos)

{'key1': [1, 2, 4, 6], 10: {'hello': True, 1.2: 'watsup'}, (1, 2): 'tuples'}


Well, almost anything... 

In [78]:
print(chaos.keys())

chaos[[21]]="Will this work?"

dict_keys(['key1', 10, (1, 2), '[21]'])


TypeError: unhashable type: 'list'

In [82]:
chaos[{21:"hello"}]="Will this work?"

TypeError: unhashable type: 'dict'

* The reason has to do with mutability and its relationship with hashing. You can read [this StackOverflow post](https://stackoverflow.com/questions/42203673/in-python-why-is-a-tuple-hashable-but-not-a-list) for more on this.
* For now, to be safe, just stick to using Numbers, Booleans, Strings and Tuples as keys for your dictionary.
* Any value is safe as a value.

### Example : Representing data with lists and dictionaries

**With lists and dictionaries, you can represent PRACTICALLY ANY TYPE OF DATA**

In [144]:
family = {
    
    "father" : {
        "name" : "Joseph D'Souza",
        "age": 55,
        "hobbies": ["cricket","tabla"],
        "employed":True
        
    },
    "mother" : {
        "name" : "Mary D'Souza",
        "age": 50,
        "hobbies": ["guitar","tennis"],
        "employed":True
    },
    "children": [
        {
            "name" : "Mary D'Souza",
            "age": 22,
            "gender" : "female",
            "hobbies": ["guitar","drums"],
            "employed":True
        },
         {
            "name" : "Aron D'Souza",
            "age": 15,
            "gender" : "male",
            "hobbies": ["sketching","swimming"],
            "employed":False
        }
    ],
    "home_address": "Lumbini Avenue, Gachibowli, Hyderabad"
}

In [152]:
print(family['father'])
print(type(family['father']))
print(family['father']['name'])
print(family['mother']['name'])
print(family['children'][0]['name'])

{'name': "Joseph D'Souza", 'age': 55, 'hobbies': ['cricket', 'tabla'], 'employed': True}
<class 'dict'>
Joseph D'Souza
Mary D'Souza
Mary D'Souza


* People coming from Javascript, might find this combination of dictionaries and lists to be similar to JSON - JavaScript Object Notation.

# More on for loops

* Last time we saw the for loop, you didn't know about functions so we couldn't introduce two important functions.

In [154]:
# The range() function returns a sequence of numbers, 
# starting from 0 by default, and increments by 1 (by default), 
# and stops before a specified number.

# range(start, stop, step)
print(list(range(0,10)))
print(list(range(0,11, 2)))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 2, 4, 6, 8, 10]


In [155]:
for i in range(2,5):
    print(i)

2
3
4


In [156]:
for i in range(2,10,2):
    print(i)

2
4
6
8


In [157]:
# range is an iterable, but not a list.
print(type(range))
print(range(0,10))
# Note :  you can use range to create a list though
print(list(range(0,10)))

<class 'type'>
range(0, 10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


In [159]:
# many times its useful to have the index while iterating
# enumerate gives us the index number for each element corresponding to each object
colors=["red","blue","green","yellow"]
for i,color in enumerate(colors):
    print(i,color)

0 red
1 blue
2 green
3 yellow


* One last point I want to make is that for loops are absolutely 100% representable by while loops.
* Then why do we need the for loop?
* For readability! Consider which of the following examples makes the meaning of the code clearer

In [160]:
students=["jibran","rukmini","ravi","john"]
print("Students of our class")
i=0
while i < len(students):
    print(students[i].capitalize())
    i+=1

Students of our class
Jibran
Rukmini


In [92]:
students=["jibran","rukmini","ravi","john"]
print("Students of our class")
for student in students:
    print(student.capitalize())

Students of our class
Jibran
Rukmini
Ravi
John


# Practical use case : Reading a data file

* Let's first read a few lines of the data file

In [93]:
with open("MOCK_DATA.csv","r") as fi:
    i=0
    for line in fi:
        print(line)
        if i==5:
            break
        i+=1

id,first_name,last_name,email,gender,ip_address

1,Imojean,Fidock,ifidock0@ebay.com,Female,37.151.142.118

2,Brnaby,Belch,bbelch1@cbslocal.com,Male,80.158.217.23

3,Basile,Killby,bkillby2@sphinn.com,Male,27.36.169.115

4,Sargent,Jakeway,sjakeway3@ted.com,Male,153.39.159.130

5,Ruthi,Calbert,rcalbert4@weebly.com,Female,28.90.57.225



* First line represents the data fields.
* Second line onwards, every line contains the profile of one person.
* Let's decide the data format of one person.

In [94]:
person={
    "id" : 1,
    "name":{
        "first_name":"Imojean",
        "last_name":"Fidock"
    },
    "email":{
        "local_part":"ifidock0",
        "domain":"ebay.com"
    },
    "gender": "Female",
    "ip_address":[37,151,142,118]
}

* This seems like a nice format for getting the data.
* Now let's parse one line, after that we just need to loop over the file.

In [95]:
with open("MOCK_DATA.csv","r") as fi:
    fi.readline()
    line=fi.readline()

print("|"+line+"|")
line=line.strip()
print("|"+line+"|")
parts=line.split(",")
print(parts)
person={}
person["id"]=int(parts[0])
person["first_name"]=parts[1]
person["last_name"]=parts[2]

emailparts=parts[3].split("@")
print(emailparts)
person["email"]={"local_part":emailparts[0],"domain":emailparts[1]}

person["gender"]=parts[4]
person["ip_address"]=[]
ipparts=parts[5].split(".")
for n in ipparts:
    person["ip_address"].append(int(n))
print(person)

|1,Imojean,Fidock,ifidock0@ebay.com,Female,37.151.142.118
|
|1,Imojean,Fidock,ifidock0@ebay.com,Female,37.151.142.118|
['1', 'Imojean', 'Fidock', 'ifidock0@ebay.com', 'Female', '37.151.142.118']
['ifidock0', 'ebay.com']
{'id': 1, 'first_name': 'Imojean', 'last_name': 'Fidock', 'email': {'local_part': 'ifidock0', 'domain': 'ebay.com'}, 'gender': 'Female', 'ip_address': [37, 151, 142, 118]}


In [96]:
persons=[]
with open("MOCK_DATA.csv","r") as fi:
    # discard first line
    fi.readline()
    for line in fi:
        line=line.strip()
        parts=line.split(",")
        
        
        person={}
        person["id"]=int(parts[0])
        person["first_name"]=parts[1]
        person["last_name"]=parts[2]
        emailparts=parts[3].split("@")
        person["email"]={"local_part":emailparts[0],"domain":emailparts[1]}
        person["gender"]=parts[4]
        person["ip_address"]=[]
        ipparts=parts[5].split(".")
        for n in ipparts:
            person["ip_address"].append(int(n))
        
        persons.append(person)

In [97]:
persons[0:3]

[{'id': 1,
  'first_name': 'Imojean',
  'last_name': 'Fidock',
  'email': {'local_part': 'ifidock0', 'domain': 'ebay.com'},
  'gender': 'Female',
  'ip_address': [37, 151, 142, 118]},
 {'id': 2,
  'first_name': 'Brnaby',
  'last_name': 'Belch',
  'email': {'local_part': 'bbelch1', 'domain': 'cbslocal.com'},
  'gender': 'Male',
  'ip_address': [80, 158, 217, 23]},
 {'id': 3,
  'first_name': 'Basile',
  'last_name': 'Killby',
  'email': {'local_part': 'bkillby2', 'domain': 'sphinn.com'},
  'gender': 'Male',
  'ip_address': [27, 36, 169, 115]}]

* Now, you can do all sorts of cool stuff with this data.
* For example if you want to know how many people have email addresses in which website

In [98]:
domains_count={}
for person in persons:
    dom=person["email"]["domain"]
    if dom not in domains_count:
        domains_count[dom]=0
    domains_count[dom]+=1

In [99]:
print(domains_count)

{'ebay.com': 1, 'cbslocal.com': 1, 'sphinn.com': 1, 'ted.com': 1, 'weebly.com': 1, 'jigsy.com': 1, 'si.edu': 1, 'kickstarter.com': 1, '360.cn': 1, 'google.com.br': 1, 'ustream.tv': 1, 'cornell.edu': 1, 'disqus.com': 1, 'bloglovin.com': 1, 'shinystat.com': 1, 'tuttocitta.it': 1, 'amazon.com': 1, 'smh.com.au': 1, 'globo.com': 1, 'vistaprint.com': 1, 'wikia.com': 1, 'who.int': 1, 'latimes.com': 1, 'google.cn': 1, 'gravatar.com': 1, 'booking.com': 1, 'ox.ac.uk': 1, 'mozilla.com': 1, 'stumbleupon.com': 1, 'cdc.gov': 1, 'hibu.com': 1, 'mapy.cz': 1, 'wikipedia.org': 1, 'flavors.me': 1, 'bandcamp.com': 1, 'hugedomains.com': 1, 'columbia.edu': 1, 'storify.com': 1, 'eepurl.com': 1, 'ocn.ne.jp': 1, 'istockphoto.com': 1, 'alexa.com': 1, 'geocities.com': 1, 'chron.com': 1, 'instagram.com': 1, 'seesaa.net': 1, 'about.me': 1, 'dropbox.com': 1, 'smugmug.com': 1, 'google.co.jp': 1}


# Python Modules and Import Statements

* What is a module?
* It's a collection of functions and/or classes.
* Like functions, classes, this is just a larger unit of "packaging".
* Progammers love to package logic at many levels - in fact there is one more above modules - a "package"!

* Any .py file is automatically a module.
* Let's look at the file ``mymodules.py``.
* It's got some interesting functions and a Cat class. How do I get it here?
* We can use the ``import`` statement.
* There are a few ways we can do this.

**Import the module's name, then we can access the functions and class as it's attributes**

Or you can import **everything**. 

### Python Standard Library

* Python has a bunch of built-in modules, which you have to import, but don't need to install.

In [105]:
import os

# you can use this to create directories, check if a path exists etc
# basically command line stuff
print(os.getcwd())
print(os.listdir(".."))
print(os.path.exists("./mymodules.py"))
print(os.path.isfile("./mymodules.py"))

/home/prateek/Documents/projects/eckovation/revision 2
['revision 1', 'revision 2']
False
False


* ``os`` is a package. This basically means its a module, which has submodules, which may have subsubmodules etc.
* ``os.path`` is a submodule. ``os.path.exists`` is a function.
* How are packages created? --> Advanced topic.

In [162]:
# make copie of objects
import copy

# normal assignment
x=[22,56,88]
y=x
print(y)
x[0]=2
print(x,y)



[22, 56, 88]
[2, 56, 88] [2, 56, 88]


In [163]:
# copy 
x=[22,56,88]
y=copy.copy(x)
print(y)
x[0]=2
print(x,y)

[22, 56, 88]
[2, 56, 88] [22, 56, 88]


* Exercise : What is shallow copy vs deep copy?

In [172]:
# time related functions 
import time

start_time=time.time()
counter=0
for i in range(0,20000):
    counter+=2
print("It takes ",time.time()-start_time,"seconds to do 10000 loops!")

It takes  0.00500798225402832 seconds to do 10000 loops!


In [173]:
# save objects as binary files
import pickle

with open("persons.p","wb") as fo:
    pickle.dump(persons,fo)
    
with open("persons.p","rb") as fi:
    unpickled=pickle.load(fi)
print(unpickled[:3])

NameError: name 'persons' is not defined

In [170]:
# save objects as text files
import json

with open("persons.json","wt") as fo:
    json.dump(persons,fo)

with open("persons.json","rt") as fi:
    unjsoned=json.load(fi)
print(unjsoned[:3])

NameError: name 'persons' is not defined

Other very useful modules

* random : random number generation
* re : regex - useful for getting/checking for complicated patterns from/in strings.
* datetime : related to date and time.
* sys : command line args + other stuff.

# Error Handling in Python

* You have already seen many examples of errors and exceptions already

In [175]:
x=[1,2,3]
x[5]

IndexError: list index out of range

In [178]:
x={"a":22,"b":33}
print(x['e'])

KeyError: 'e'

In [180]:
int(5.1)

5

In [114]:
int("abcd")

ValueError: invalid literal for int() with base 10: 'abcd'

* An error can stop the execution of your program. 
* In most of the cases - this is what you want to happen! If your car's engine starts smoking, you're not going to want to keep driving it right?
* But in some cases, you don't want it to stop the program. 
* That's because you know **how to handle the error** i.e. you have a plan for what to do if that error happens.

* In that case, you can handle an error like this.

In [182]:
x=input("Enter an integer\n")
try :
    y=int(x)
except Exception:
    print("The string is not an integer, try again!")

Enter an integer
abcd
The string is not an integer, try again!


* Now there is one thing wrong with this example.
* "Exception" is the base class of all errors -
   > Every error is an exception.
* So in this case, no matter what the error, it'll be handled by your code - but that's not good! You don't know how to handle TheComputerIsOverheatingError! 
* So its better to refer to the error more specifically.

In [186]:
x=input("Enter an integer below 3\n")
print(type(x))
mylist=[1,2,3]
try :
    y=int(x)
    print(mylist[y])
except ValueError:
    print("The string is not an integer, try again!")

Enter an integer below 3
5
<class 'str'>


IndexError: list index out of range

* See in this case, if you had handled Exception the IndexError would have been handled the same way as ValueError.
* But how to handle both?

In [119]:
x=input("Enter an integer below 3\n")
mylist=[1,2,3]
try :
    y=int(x)
    print(mylist[y])
except ValueError:
    print("The string is not an integer, try again!")
except IndexError:
    print("The integer is an invalid index." )

Enter an integer below 3
p
The string is not an integer, try again!


* You can ``raise`` your own error.

In [189]:
x=input("Enter an integer below 100\n")
try :
    y=int(x)
except Exception:
    print("The string is not an integer, try again!")
if y>100:
    raise Exception("Number is greater than 100")

Enter an integer below 100
1000


Exception: Number is greater than 100

* Usually you'll do this either because you want the program to terminate, or you want the code calling your code to handle the exception, because you don't know how - or can't for some reason. 

# Conclusion

**When you get stuck**

* Google and StackOverflow are your best friends.
* But don't forget the [Python docs](https://www.python.org/doc/) - very useful especially for looking up function and module contents.
* ``dir()`` ,``help()``, ``type()`` -- use functions like this to use the python interpreter to help you.

**Code style**

* Name all functions and variables in snake case ``list_of_fruits``.
* Name all classes in title case ``MyAwesomeClass``.
* Name variables according to their function - what do they do?
* Your code should read like English. This is possible with smart variable naming.
* Use 4 spaces for indentation. This is the Python standard. I am a rebel - I like tabs. 
* Comment liberally.

**Resources**

* [Conda cheatsheet](https://docs.conda.io/projects/conda/en/4.6.0/_downloads/52a95608c49671267e40c689e0bc00ca/conda-cheatsheet.pdf)
* I like [TutorialsPoint](https://www.tutorialspoint.com/python/index.htm) tutorials for most things, to brush up on basic syntax and concepts. Don't expect deep philosophies or the "whys".
* [Edabit](https://edabit.com/challenges) has a nice collection of python problems - but really there are loads of such websites.