# ICT 778-005 Day 9: Object Oriented Programming

In this lecture we will expend on classes and we will cover some of the concept in Panda Library

# Person Class

Assume we want to create a class to keep track of people's info. For each person added we want to keep track of name, address, age, salary, sex, and occupation.

To do so first we need to create a class called `person` then after that we start adding the attributed for this class

In [36]:
class person:
    
    def __init__(self,pId,pName,pAddress,pAge,pSalary,pSex,pOccupation):
        self.id=pId
        self.name=pName
        self.address=pAddress
        self.age=pAge
        self.salary=pSalary
        self.sex=pSex
        self.occupation=pOccupation
    def __str__(self):
        return str(self.id) + "-"+self.name

Now let's read data from a file and create a list of people

In [29]:
import csv
people=[]
with open('people.csv', 'r') as file:
    reader = csv.reader(file, delimiter=',')
    next(reader, None) #remove header
    for row in reader:
        people.append(person(int(row[0]),row[1],row[2],int(row[3]),float(row[4]),row[5],row[6]))

for pp in people:
    print(pp)
        


1-Mike
2-Kristine
3-Chris
4-sam


Now, let's add a new person to the list only and only if this person was not already added. To check if a person is already added we just want to make sure the Id is not the same. To do so we want to change what the `==` operator means. To do so we need to override the `__eq__()` function in the class

In [28]:

class person:
    
    def __init__(self,pId,pName,pAddress,pAge,pSalary,pSex,pOccupation):
        self.id=pId
        self.name=pName
        self.address=pAddress
        self.age=pAge
        self.salary=pSalary
        self.sex=pSex
        self.occupation=pOccupation
    def __str__(self):
        return str(self.id) + "-"+self.name
    
    def __eq__(self,other):
        if self.id==other.id:
            return True
        else:
            return False

In [26]:
pp=person(1,"Mike","Calgary-Canada",18,4000,"m","Occuntant") 


if pp not in people:
    people.append(pp)

for pp in people:
    print(pp)

1-Mike
2-Kristine
3-Chris
4-sam
4-Sali
1-Mike


In the example above Mike has been added although there was another person with the same id


In [30]:
pp=person(1,"Mike","Calgary-Canada",18,4000,"m","Occuntant") 


if pp not in people:
    people.append(pp)

for pp in people:
    print(pp)

1-Mike
2-Kristine
3-Chris
4-sam


In the example above Mike was not added as there is another person with the same id

Now the address attribute is a composite attribute. Which means its value can be split into city and country. How I can do that? 

We can create another class called address with two attributes namely `city` and `country`

In [32]:
class address:
    
    def __init__(self,city,country):
        self.city=city
        self.country=country
    def __str__(self):
        return self.city+"-"+self.country

In [33]:
import csv
people=[]
with open('people.csv', 'r') as file:
    reader = csv.reader(file, delimiter=',')
    next(reader, None) #remove header
    for row in reader:
        pAddress=address(row[2].split("-")[0].strip(),row[2].split("-")[1])
        people.append(person(int(row[0]),row[1],pAddress,int(row[3]),float(row[4]),row[5],row[6]))

In [35]:
for p in people:
    print(p.address.city)

Calgary
Edmonton
Vancouver
Vtoronto


Can we have a class called people where we can manipulate people's info such as a new person, remove a person, check if we have people, etc? 

In [None]:
class people:
    import random
    
    def __init__(self):
        self.people=[]
    def removePerson(self,person):
        if person in self.people:
            self.people.remove(person)
    def addPerson(self,person):
        if person not in self.people:
            self.people.append(person)
    def checkIfEmpty(self):
        if len(self.people)==0:
            return True
        else:
            return False
    def shufflePeople(self):
        return random.shuffle(self.people)
    
        

## Inheritance


Inheritance is a way of creating a new class for using details of an existing class. The new class is a derived class from the parent class.

Now let's assume some of the people are plumbers and we need to keep track of that. Then we can create a class called `plumber` that inherits all the functions from the people class and also has its own.

In [37]:
class people:
    import random
    
    def __init__(self):
        self.people=[]
    def removePerson(self,person):
        if person in self.people:
            self.people.remove(person)
    def addPerson(self,person):
        if person not in self.people:
            self.people.append(person)
    def checkIfEmpty(self):
        if len(self.people)==0:
            return True
        else:
            return False
    def shufflePeople(self):
        return random.shuffle(self.people)
    def whoIsThis(self):
        print("I am a person")

In [39]:
class plumber(people):
    def __init__(self):
        super().__init__()
    def whoIsThis(self):
        print("I am a plumber")

In [41]:
plum=plumber()
plum.whoIsThis()
print(len(plum.people))

I am a plumber
0


In general Object-Oriented Programming makes the program easy to understand as well as efficient. Code can be shared, redused along with data being safe and secure with data abstraction

 # Brief Intro To Panda

Pandas is one of the major Python packages for data analysis. In this lecture, we will cover the foundations of this library

You should first make sure to import the library

Often data loaded into panda is in a CSV file (comma separated)

In this lecture, we will use `books.csv` uploaded on D2L. We'll import the Pandas package and then read the data into Jupyter.

In [2]:
import pandas as pd
import numpy as np 

bookdata = pd.read_csv('books.csv')

Now the `book.csv` is imported as a `Dataframe` object. This would provide us to use functions provided by the datatframe object

We can view the `bookdata` object by simply typing `bookdata` variable in a Jupyter cell.

In [3]:
bookdata

Unnamed: 0,Title,Author,Genre,Pages,Publisher
0,Fundamentals of Wavelets,"Goswami, Jaideva",signal_processing,228,Wiley
1,Data Smart,"Foreman, John",data_science,235,Wiley
2,God Created the Integers,"Hawking, Stephen",mathematics,197,Penguin
3,Superfreakonomics,"Dubner, Stephen",economics,179,HarperCollins
4,Orientalism,"Said, Edward",history,197,Penguin
...,...,...,...,...,...
206,Structure and Randomness,"Tao, Terence",mathematics,252,
207,Image Processing with MATLAB,"Eddins, Steve",signal_processing,241,
208,Animal Farm,"Orwell, George",fiction,180,
209,"Idiot, The","Dostoevsky, Fyodor",fiction,197,


To print the first top 5 rows we can use the `head()` function

In [11]:
bookdata.head()

Unnamed: 0,Title,Author,Genre,Pages,Publisher
0,Fundamentals of Wavelets,"Goswami, Jaideva",signal_processing,228,Wiley
1,Data Smart,"Foreman, John",data_science,235,Wiley
2,God Created the Integers,"Hawking, Stephen",mathematics,197,Penguin
3,Superfreakonomics,"Dubner, Stephen",economics,179,HarperCollins
4,Orientalism,"Said, Edward",history,197,Penguin


To print column names we can use `columns` attribute

In [12]:
bookdata.columns

Index(['Title', 'Author', 'Genre', 'Pages', 'Publisher'], dtype='object')

To only display certain rows, use slicing. We'll display rows 8 through 16.

In [4]:
bookdata[8:16]

Unnamed: 0,Title,Author,Genre,Pages,Publisher
8,Image Processing & Mathematical Morphology,"Shih, Frank",signal_processing,241,CRC
9,How to Think Like Sherlock Holmes,"Konnikova, Maria",psychology,240,Penguin
10,Data Scientists at Work,Sebastian Gutierrez,data_science,230,Apress
11,Slaughterhouse Five,"Vonnegut, Kurt",fiction,198,Random House
12,Birth of a Theorem,"Villani, Cedric",mathematics,234,Bodley Head
13,Structure & Interpretation of Computer Programs,"Sussman, Gerald",computer_science,240,MIT Press
14,"Age of Wrath, The","Eraly, Abraham",history,238,Penguin
15,"Trial, The","Kafka, Frank",fiction,198,Random House


We can also display specific columns of the dataframe by using a similar technique to accessing dictionary values by key. Instead of specifying a key, we now specify the column name.

In [9]:
# Display all of the book authors.
file_authors = bookdata['Author']
file_authors

0        Goswami, Jaideva
1           Foreman, John
2        Hawking, Stephen
3         Dubner, Stephen
4            Said, Edward
              ...        
206          Tao, Terence
207         Eddins, Steve
208        Orwell, George
209    Dostoevsky, Fyodor
210      Dickens, Charles
Name: Author, Length: 211, dtype: object

In [14]:
# Display all of the book authors AND publishers.
file_authors_and_publishers = bookdata[['Author','Publisher']]
file_authors_and_publishers

Unnamed: 0,Author,Publisher
0,"Goswami, Jaideva",Wiley
1,"Foreman, John",Wiley
2,"Hawking, Stephen",Penguin
3,"Dubner, Stephen",HarperCollins
4,"Said, Edward",Penguin
...,...,...
206,"Tao, Terence",
207,"Eddins, Steve",
208,"Orwell, George",
209,"Dostoevsky, Fyodor",


We can select whihc rows and columns to be printed using `iloc`

In [20]:
bookdata.iloc[8:16,[2,3]]

Unnamed: 0,Genre,Pages
8,signal_processing,241
9,psychology,240
10,data_science,230
11,fiction,198
12,mathematics,234
13,computer_science,240
14,history,238
15,fiction,198


From looking at the displayed data, we can see several missing entries, which are represented by 'not a number' or `NaN`. If we want to remove these entries, we may define a new dataset and call the `dropna` method of our `bookdata` object.

The argument of the `dropna` method is specified as `axis = 0`, which means drop any *row* with missing values. If you specify `axis = 1`, you will drop any *column* with missing values.

In [21]:
file_nomissing = bookdata.dropna(axis = 0)
file_nomissing

Unnamed: 0,Title,Author,Genre,Pages,Publisher
0,Fundamentals of Wavelets,"Goswami, Jaideva",signal_processing,228,Wiley
1,Data Smart,"Foreman, John",data_science,235,Wiley
2,God Created the Integers,"Hawking, Stephen",mathematics,197,Penguin
3,Superfreakonomics,"Dubner, Stephen",economics,179,HarperCollins
4,Orientalism,"Said, Edward",history,197,Penguin
...,...,...,...,...,...
114,Rationality & Freedom,"Sen, Amartya",economics,213,Springer
115,Clash of Civilizations and Remaking of the Wor...,"Huntington, Samuel",history,228,Simon&Schuster
116,Uncommon Wisdom,"Capra, Fritjof",nonfiction,197,Fontana
117,One,"Bach, Richard",nonfiction,172,Dell


Now we'll make a quick histogram of the remaining data by number of pages. In this case, our histogram will plot how many books have total pages within given ranges.