![Cloud-First](../image/CloudFirst.png) 



# SIT742: Modern Data Science
**(Module: Python Foundations for Big Data)**

---
- Materials in this module include resources collected from various open-source online repositories.
- You are free to use, change and distribute this package.
- If you found any issue/bug for this document, please submit an issue at [tulip-lab/sit742](https://github.com/tulip-lab/sit742/issues)


Prepared by **SIT742 Teaching Team**

---


## Session 2H: Python Packages 

In this session, we will learn how to use Python Packages to manipulate the data and files.

## Content



### Part 1 Python packages

1.1 [Standard Library](#standlib)

1.2 [Third Party Packages](#3rdparty)

1.3 [Importing a module](#importmod) 


### Part 2 Python Simple IO

2.1 [Input](#input)

2.2 [Output](#output)


### Part 3 Datetime Module

3.1 [Time](#time)

3.2 [Date](#date)

3.3 [Timedelta](#timedelta)

3.4 [Formatting and Parsing](#parsing)



---
## <span style="color:#0b486b">1. Python packages</span>

After completing previous Python sessions, you should know about the syntax and semantics of the Python language. But apart from that, you should also learn about Python libraries and its packages to be able to code efficiently. Python’s standard library is very extensive, offering a wide range of facilities as indicated [here](https://docs.python.org/3/library/). The library contains built-in modules (written in C) that provide access to system functionality such as file I/O that would otherwise be inaccessible to Python programmers, as well as modules written in Python that provide standardized solutions for many problems that occur in everyday programming. Look at the [Python Standard Library Manual](https://docs.python.org/3/library/) to read more.

In addition to the standard library, there is a growing collection of several thousand components (from individual programs and modules to packages and entire application development frameworks), available from the [Python Package Index](https://pypi.org).

<a id = "standlib"></a>

### <span style="color:#0b486b">1.1 Standard libraries</span>

For a complete list of Python standard library and their documentation look at the [Python Manual.](https://docs.python.org/2/library/) A few to mention are:

* ``math`` for numeric and math-related functions and data types
* ``urllib`` for fetching data across the web
* ``datetime`` for manipulating dates and times
* ``pickle`` and ``cPickle`` for serializing and deserializing data structures enabling us to save our variables on the disk and load them from the disk
* ``os`` for operating system dependent functions

<a id = "3rdparty"></a>

### <span style="color:#0b486b">1.2 Third party packages</span>

There are thousands of third party packages, each developed for a special task. Some of the useful libraries for data science are:

* ``numpy`` is probably the most fundamental package for efficient scientific computing in Python
* ``scipy`` is one of the core packages for scientific computations
* ``pandas`` is a library for operating with table-like data structures called DataFrame object
* ``matplotlib`` is a comprehensive plotting library
* ``BeautifulSoup`` is an HTML and XML parser
* ``scikit-learn`` is the most general machine learning library for Python
* ``nltk`` is a toolkit for natural language processing

---
<a id = "importmod"></a>
### <span style="color:#0b486b">1.3 Importing a module</span>

To use a module, first you have to ``import`` it. There are different ways to import a module:

* `import my_module`
* `from my_module import my_function`
* `from my_module import my_function as func`
* `from my_module import submodule`
* `from my_module import submodule as sub`
* `from my_module import *`

**`'import my_module'`** imports the module `'my_module'` and creates a reference to it in the namespace. For example `'import math'` imports the module `'math'` into the namespace. After importing the module this way, you can use the dot operator `(.)` to refer to the objects defined in the module. For example `'math.exp()'` refers to function `'exp()'` in module `'math'`.

In [None]:
import math

x = 2
y1 = math.exp(x)
y2 = math.log(x)

print("e^{} is {} and log({}) is {}".format(x, y1, x, y2))

**`'from my_module import my_function'`** only imports the function `'my_function'` from the module `'my_module'` into the namespace. This way you won't have access to neither the module (since you have not imported the module), nor the other objects of the module. You can only have access to the object you have imported.

You can use a comma to import multiple objects.

In [None]:
from math import exp

x = 2
y = exp(x)  # no need to math.exp()

print("e^{} is {}".format(x, y))

**`'from my_module import my_function as func'`** imports the function `'my_function'` from module `'my_module'` but its identifier in the namespace is changed into `'func'`. This syntax is used to import submodules of a module as well. For example later you will see that nowadays it is almost a convention to import matplotlib.pyplot as plt.

In [None]:
# you can change the name of the imported object
from math import exp as myfun

x = 2
y = myfun(x)

print("e^{} is {}".format(x, y))

**`'from my_module import *'`** imports all the public objects defined in `'my_module'` into the namespace. Therefore after this statement you can simply use the plain name of the object to refer to it and there is no need to use the dot operator:

In [None]:
from math import *

x = 2
y1 = exp(x)
y2 = log(x)

print("e^{} is {} and log({}) is {}".format(x, y1, x, y2))

**Exercise 1:** 

1. Import the library `math` from standard Python libraries
2. Define a variable and assign an integer value to it (smaller than 20)
3. Use `factorial()` function (an object in `math` library) to calculate the factorial of the variable
4. Print its value

In [None]:
#Put your code here

<details><summary><u><b><font color="Blue">Click here for the solution to Exercise 1</font></b></u></summary>

```python
    import math
    n = 10
    # factorial means that the result when you multiply a whole number by all the numbers below it
    # Hence, factorial(n) = n * (n-1) * (n-2) * ...... * 1
    print(math.factorial(n))   
```


**Exercise 2:**

1. Write a function that takes an integer variable and returns its factorial.
2. Use this function to find the factorial of the variable defined in Exercise 1
3. Do your answers match?

In [None]:
#Put your code here

<details><summary><u><b><font color="Blue">Click here for the solution to Exercise 2</font></b></u></summary>

```python
def my_factorial(n):
    if n==1:
        return 1
    else:
        return n * my_factorial(n-1)    
print(my_factorial(10)) 
```

---

## <span style="color:#0b486b">2. Python simple input/output</span>

<a id = "input"></a>

### <span style="color:#0b486b">2.1 Input</span>

`input()` asks the user for a string of data (ended with a newline), and simply returns the string.

In [None]:
x = input('What is your name? ')

print("x is {}".format(type(x)))
print("Your name is {}".format(x))

**Exercise 3:**

1. Use `input()` to take a float value between -1 and 1 from the user
2. Use the function `acos()` from `math` to find the arc cosine of it
3. Print the value of the variable and its arc cosine

In [None]:
#Put your code here

<details><summary><u><b><font color="Blue">Click here for the solution to Exercise 3</font></b></u></summary>

```python
x = input('Enter a real number between -1 and 1: ')
y = math.acos(float(x))
# acos function is used as the trigonometric function that is equal to the ratio of the side adjacent to an acute angle (in a right-angled triangle) to the hypotenuse.
print('acos({}) = {}'.format(x,y)) 
```



As we know the domain of [arc cosine function][acos] is [-1, 1]. So, what if the value entered by the user is not in the domain (the value is smaller than -1 or greater that 1)? What happens then? 

To avoid raising a ValueError exception, before passing the value to `acos()` function make sure it is in range and if not, display an appropriate message.

[acos]: http://mathworld.wolfram.com/InverseCosine.html

In [None]:
# Add if-else statement on the exercise 3
x = input('Enter a real number between -1 and 1: ')
x = float(x)
if x>=-1 and x<=1:
    y = math.acos(x)
    print('acos({}) = {}'.format(x,y))
else:
    print('Out of range')

<a id = "output"></a>

### <span style="color:#0b486b">2.2 output</span>

The basic way to do output is the print statement. To print multiple things on the same line separated by spaces, use commas between them.

In [None]:
name = "John"
msg = "Hello"

print(msg)
print(msg, name)

Objects can be printed on the same line using the 'end' arguments. You can read the [print()](https://docs.python.org/3/library/functions.html#print) syntax.

In [None]:
print('Sample is using the end=\'\,\'')
for i in range(10):
    print(i, end=',')    
print('\nSample is using the end=\' \'')   
for i in range(10):    
    print(i, end=' ')
print('\nSample is without the end arguments') 
for i in range(10):    
    print(i)

---
## <span style="color:#0b486b">3. datetime module</span>


The datetime module includes functions and classes for date and time parsing, formatting, and arithmetic.

<a id = "time"></a>

### <span style="color:#0b486b">3.1 Time</span>

Time values are represented with the time class. Times have attributes for hour, minute, second, and microsecond. They can also include time zone information.

In [None]:
import datetime

t = datetime.time(11, 21, 33)
print(t)
print('hour  :', t.hour)
print('minute:', t.minute)
print('second:', t.second)
print('microsecond:', t.microsecond)
print('tzinfo:', t.tzinfo)

<a id = "date"></a>

### <span style="color:#0b486b">3.2 Date</span>

Calendar date values are represented with the date class. Instances have attributes for year, month, and day.

In [None]:
import datetime

today = datetime.date.today()
print(today)
print('ctime:', today.ctime())
print('tuple:', today.timetuple())
print('ordinal:', today.toordinal())
print('Year:', today.year)
print('Mon :', today.month)
print('Day :', today.day)

A way to create new date instances is using the `replace()` method of an existing date. For example, you can change the year, leaving the day and month alone.

In [None]:
import datetime

d1 = datetime.date(2013, 3, 12)
print('d1:', d1)

d2 = d1.replace(year=2015)
print('d2:', d2)

**Exercise 4:**

1. Write a piece of code that gives you the day of the week that you were born.
2. How about this year? Do you know what day of the week is it?

In [None]:
#Put your code here

<details><summary><u><b><font color="Blue">Click here for the solution to Exercise 4</font></b></u></summary>

```python
import datetime
day_of_week = {0 : 'Monday',
              1: 'Tuesday',
              2: 'Wednesday',
              3: 'Thursday',
              4: 'Friday',
              5: 'Saturday',
              6: 'Sunday'}
# you could also use a list to store the days o the week
# and it would work just fine.
# days_of_week = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']

print('Today is', day_of_week[datetime.date.today().weekday()])

# Assume my birthday value is 10/10/2008
my_birthdate = datetime.date(2008,10,10)
print('I was born on', day_of_week[my_birthdate.weekday()])

# Change the previous birthday year value with 2019
t2 = my_birthdate.replace(year=2019)
print('and my birthday this year is on a', day_of_week[t2.weekday()])
```


<a id = "timedelta"></a>

### <span style="color:#0b486b">3.3 timedelta</span>
Using `replace()` is not the only way to calculate future/past dates. You can use datetime to perform basic arithmetic on date values via the timedelta class. 

In [None]:
today = datetime.datetime.today()
print(today)

tomorrow = today + datetime.timedelta(days=1)  
print(tomorrow)

**Exercise 5:**

Rewrite Exercise 4 using timedelta method.

In [None]:
#Put your code here

<details><summary><u><b><font color="Blue">Click here for the solution to Exercise 5</font></b></u></summary>

```python
import datetime
day_of_week = {0 : 'Monday',
              1: 'Tuesday',
              2: 'Wednesday',
              3: 'Thursday',
              4: 'Friday',
              5: 'Saturday',
              6: 'Sunday'}
# You could also use a list to store the days o the week
# and it would work just fine.
# days_of_week = ['Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun']

print('Today is', day_of_week[datetime.date.today().weekday()])

# Assume my birthday value is 10/10/2008
my_birthdate = datetime.date(2008,10,10)
print('I was born on', day_of_week[my_birthdate.weekday()])

# Change the previous birthday year value with 2019
#You need to note that both 2012 and 2016 are leap year, which are 366 days in one year not 365 days.
t2 = my_birthdate + datetime.timedelta(days=365)*9 + datetime.timedelta(days=366)*2
print('and my birthday this year is on a', day_of_week[t2.weekday()])
```


You will find that the result of exercise 4 is same with the exercise 5.  

You can use comparison operators for datetime objects too. It makes sense right?

In [None]:
tomorrow > today

<a id = "parsing"></a>

### <span style="color:#0b486b">3.4 Formatting and Parsing</span>

The default string representation of a datetime object uses the ISO 8601 format (YYYY-MM-DDTHH:MM:SS.mmmmmm). Alternate formats can be generated using `strftime()`. Similarly, if your input data includes timestamp values parsable with `time.strptime()`, then `datetime.strptime()` is a convenient way to convert them to datetime instances.

In [None]:
today = datetime.datetime.today()
print('ISO     :', today)

string from datetime object

In [None]:
str_format = "%a %b %d %H:%M:%S %Y"
s = today.strftime(str_format)
print('strftime:', s)

datetime object from string

In [None]:
print(s)

d = datetime.datetime.strptime(s, str_format)
print(d)
print('strptime:', d.strftime(str_format))

In [None]:
# Define a string variable "s", its value is 07/03/2017
s = "07/03/2017"
# Define the string format
str_format = "%m/%d/%Y"


d = datetime.datetime.strptime(s, str_format)
print(d)

**Exercise 6:**

You have a string as "7/30/2017 - 12:13". How do you convert it into a datetime object?

In [None]:
#Put your code here

<details><summary><u><b><font color="Blue">Click here for the solution to Exercise 6</font></b></u></summary>

```python
s = '7/30/2017 - 12:13'
str_format = "%m/%d/%Y - %H:%M"
t = datetime.datetime.strptime(s, str_format)
print(t)
```
