# Python Programming for Beginners in Data Science

# 1. Standard libraries

## 1.1 DateTime Module

__Summary:__ Date and time calculations can get pretty unmanegable if we have to do it by ourselves without helo from external libraries. Python has standard libraries that let us deal with this. There four main objects in this library: Date, Time, DateDelta and DateTime.

### 1.1.1 Date Object

At first, we need to import this library for using it. For creating a date we can use:

In [10]:
from datetime import date

stored_date = date(2020,1,22)
today = date.today()

today


datetime.date(2020, 4, 27)

Also, we can extract a specific day, month or year from an already initialized date:

In [16]:
today_day = today.day
today_month = today.month

today_month


4

We can replace a determined day, month or year with:

In [17]:
today = today.replace(year = 2019)
today


datetime.date(2019, 4, 27)

To know in which weekday we are located, we can use the __.weekday()__ function, where the week starts with Monday (index = 0) and starts with Sunday (index = 6):

In [19]:
today = date.today()
today.weekday()

0

We can also format the date as we want with __date.strftime()__. For example: 

 - __'%a__'  is the abbreviated weekdat name (Mon, Tue, Wed, ...)
 - __'%A'__  is the full weekday name (Monday, Tuesday, Wednesday, ...)
 - __'%w'__  shows the number of the day (0, 1, 2 ..., 6)
 - __'%d'__  displays the day with a zero in the front if there is only one significant figure (01, 05, 11, ... )
 - __'%-d'__  the day as a decimal number (1, 2, 1, ... )
 - __'%b'__  abbreviated month name (Jan, Feb, ... )
 - __'%B'__  full month name (January, Febraury, ... )
 - __'%m'__  month number with a zero in the front if only one s.f. (01, 05, 11, ... )
 - __'%y'__  las two digits of the year (97, 99, 01, 20, ... )
 - __'%Y'__  whole year number (1997, 1999, 2001, 2020, ... ) 

In [21]:
from datetime import datetime

timestamp = 1528797322
date_time = datetime.fromtimestamp(timestamp)

print("Date time object:", date_time)

d = date_time.strftime("%m/%d/%Y, %H:%M:%S")
print("Output 2:", d)	

d = date_time.strftime("%d %b, %Y")
print("Output 3:", d)

d = date_time.strftime("%d %B, %Y")
print("Output 4:", d)

d = date_time.strftime("%I%p")
print("Output 5:", d)


Date time object: 2018-06-12 06:55:22
Output 2: 06/12/2018, 06:55:22
Output 3: 12 Jun, 2018
Output 4: 12 June, 2018
Output 5: 06AM


### 1.1.2 TimeDelta object

__Summary:__ TimeDelta object is used for obtaining the difference between a DateTime object and another DateTime object (date, times and datetimes). Also we used timedelta objects to add or substract a set of days/hours/minutes to a current DateTime object.

For example:

In [25]:
from datetime import date, timedelta

today = date( 2020, 4, 27 )
tomorrow = today + timedelta( days = 9)

tomorrow


datetime.date(2020, 5, 6)

Another example could be to answer the following question: How much days do I have to finish a task with a deadline?

In [29]:
today = date.today()
task = date( 2020 , 5 , 8 )

days_left = task - today
days_left


datetime.timedelta(days=11)

We can also have boolean values (these are not timedelta), e.g:

In [33]:
boolean_dt = task > today
type(boolean_dt)

bool

__Important:__ For __timedelta()__ you can only enter days/weeks/hours/minutes/seconds between the parenthesis but not months nor years.

__Exercise:__ Calculate the number of days between today and your following birthday.

In [40]:
def days_for_birthday(birthday):
    
    today = date.today()
    if ( birthday.month > today.month ) or ( ( birthday.month == today.month ) and ( birthday.day > today.day) ):
        birthday = birthday.replace( year = today.year )
        return ( birthday - today )
    else:
        birthday = birthday.replace( year = today.year + 1 )
        return ( today - birthday )

birthday = date( 1997, 9, 3 )
how_many_days = days_for_birthday(birthday)

how_many_days


datetime.timedelta(days=129)

### 1.1.3 Time object

__Summary:__ It's basically identical to Date, with the difference that we have hours, minutes, seconds and microseconds instead of the date parameters. We now have the following functions:

 - now = __time(__ 13, 12, 21, 32000 __)__
 - now __.hour__
 - now __.minute__
 - now __.second__
 - now __.microsecond__
 - now __.replace(__ hour = 2 ) __)__
 - now __.strftime(__ '%I' __)__

Where the most useful formats are:

 - __'%H'__ for hour in 24-hour-clock as a zero-padded decimal (00, 01, 15, ... )
 - __'%h'__ for hour in 24-hour-clock (0, 1, 15, ... )
 - __'%I'__ for hour in 12-hour-clock as a zero-padded decimal (00, 01, 11, ... ) not >= 12
 - __'%i'__ for hour in 24-hour-clock (0, 1, 11, ... ) not >= 12
 
 And the same pattern is used for minutes, seconds and microseconds with the use of lowercase and uppercase.

We also have another time method that we can with __time.time()__. This will give us the number of seconds that passed since a determined fixed date, which is the 1st of January of 1970 (1/1/1970) named 'epoch'. This is useful for making functions that can retrieve the time that elapsed since a particular event.

For this we have to import the time module with: __import__ time or by,

Then, to know how much seconds have elapsed since the 1/1/1970, we can do the following code:

In [121]:
import time

time.time()

1588022706.495981

It can be used for calculating the runtime of a code. This is usdeful for optimizing code, by taking the time of each peace of code, we can determine which peace is the one that is having some runtime problems. For example:

In [125]:
before = time.time()
sum = 0
for i in range(10000000):
    sum += i
after = time.time()
time_taken = after - before
time_taken


1.7179932594299316

We can also use the __time.localtime()__ function to give us the local date and time in the form of a tupple: (year, month, day, hour, minute, seconds, weekday, year_day, isdist)

In [127]:
today = time.localtime()
today

time.struct_time(tm_year=2020, tm_mon=4, tm_mday=27, tm_hour=18, tm_min=31, tm_sec=42, tm_wday=0, tm_yday=118, tm_isdst=0)

### 1.1.4 DateTime object

__Summary:__ It's basically a combination of Date at Time. As same as before, we can use the __.datetime()__ formula and also the __datetime.now()__. And obviously we can still usea timedeltas.

In [133]:
import datetime as dt

now = dt.datetime.now()
now


datetime.datetime(2020, 4, 27, 18, 37, 11, 931691)

In [142]:
date = dt.datetime( 2020, 5, 12 )
print(date)

date_2 = dt.datetime( 2020, 5, 12, 5, 32, 59)
print(date_2)

print(date_2.minute) # Do not confuse with minimum (.min)

date_3 = dt.datetime.now()
date_3.strftime('%d-%b-%Y')


2020-05-12 00:00:00
2020-05-12 05:32:59
32


'27-Apr-2020'

With the function __.strftime()__ we managed to convert a datetime value into a specific formatted string. However, we can also another functional called __.strptime()__ which allows as to convert a specific formated string datetime into a actual datetime object. It would be the inverse of the .strftime() method.

In [148]:
string_date = 'Dec-18-2019'
print(string_date)

actual_date = dt.datetime.strptime( string_date, '%b-%d-%Y' )
actual_date


Dec-18-2019


datetime.datetime(2019, 12, 18, 0, 0)

# 1.2 Read & Write files

Text files, image files and others, are usually accessed to work with data. Usually in Pyhton, and specifically in Machine Learning we will use standardize libraries for reading and writing files (.csv, .tsv, .xlsx, etc). However, sometimes the data we access is not in this format, and we will have to work with these and convert it into a standardize format for later use.

In order to do that, there are some libraries thar are optimized to use standardize files, as in __Pandas__ or __NumPy__. However, in this lecture, we are gonna see how to work with other non standardized formats.

## 1.2.1 File types

Typically there are two types of files: text and binary files. For reading a file, we use the __open(__"location\file_name.txt."__,__ "mode"__)__ function and __.close()__ for closing it. To check if a file has been closed, we can use de __.closed__ binary parameter. The mode can be:

 - __'r'__  for reading a file.
 - __'w'__  for writting a file.
 - __'x'__  for exclusive writting a file.
 - __'a'__  for appending a file to another.
 - __'t'__  to enter to 'text mode'.
 - __'b'__  to enter to binary mode.
 - __'+'__  for reading and writting files.

It's important to remark that in order to do the backslash symbol (\\) we have to make a doble backslash (\\\\) for Windows' paths. In case of Mac or Linux, using the slash si sufficient.

Then, if we open the file with the __r__ mode (meant for reading the file). This will just get the pointer of the file to the memory and assing it to a variable. We can use the var_of_openfile __.read()__ to fully read it. This can be useful for printing this file.

For reading a line of the file we can use the var_of_openfile __.readline()__. By using this function repeatedly, we acces the following line and so on. And after the lines have been read, it displays nothing. For reading every line and printing it we can also use the following code:

In [None]:
f = open ( "\\file_folder\\file_name.txt" , "r" )

for line in f:
    print(line)
    
f.close()

Pictures, for example, are binary files and can not be read it with this command. You can not print and imagewith print line, we need a different type of formatting for that, as an image viewer.

We can handle error, e.g: not closing files; with the __'with as'__ command. This will close automatically the file if an exeption occurs. For example:

In [None]:
with open ( "\\file_folder\\file_name.txt" , "a" ) as file_var:
    expeption_example = 1/0
    file_var.write("a new line to the file")

# The with will automatically close the file as an exeption occurs.

# 1.3 Math libraries

In Machine Learning (ML) or Deep Learning (DP), most of the time we will not use the standard math libraries fo Python. This is because, we usually will be using mathematical libraries especially designed for Data Science as NumPy or Pandas, and we can not use this math functions in those libraries. We do have to __import__ math.

Some of the basic __math functions__ are:

 - __ceil()__ transforms a real number to the closest integer that is greater or equal than the real number.
 - __floor()__ transforms a real number to the closest integer that is lower or equal than the real number.
 - __trunc()__ ignores the real part of a real number so it converts to an integer.
 - __exp()__ is basically the exponential function: e^(x).
 - __pow(m,n)__ means the function that represents: m^n.
 - __sqrt()__ is basically the square root of a number.
 - __log()__ is the log in base e. If we want another base, we can add it with a comma: log(7,10) is the log(7) with a base of 10.
 - __pi__ =  3,1415...
 - __e__ = 2,7...
 - __sin()__ sine function in __radians__, if we want it in degrees we can use the following math function:
 - __randians__ converts a number in degrees to radians.
 - __cos()__ cosine function.
 - __tan()__ tangent function.
 - __sum(n)__ is the sum of all the integers between 0 and n.
 - __fsum()__ it is a sum of floating point numbers, useful with lists of real numbers; we would isea fsum(list).
 - __gcd()__ determines the greater common divisor between two numbers-
 - __factorial()__ determines the factorial of a number: 5! = 120.
 

# 2. NumPy

## 2.1 What is NumPy?

NumPy is the fundamental package for scientific computing with Python; it stands for Numeric Python. NumPy contains among other things:

 - A powerful N-dimensional array object
 - Sophisticated (broadcasting) functions
 - Tools for integrating C/C++ and Fortran code
 - Useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an __efficient__ multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases. 


## 2.2 What makes NumPy faster?

Lists in Python are data structures that manages data as if it is a column with several rows. In lists, each element of the list is stored in the RAM but not continuous to the other elements of the list, so each calculation we want to do with all the elementos of the list, we have to acces multiple memory spaces that are not continuously arranged.

However, NumPy handles adding or appending elements by creating new arrays, storing all the elements continuously and deleting the prior array; this has the advantage that me computer does not has to look after the memory space in which each element is stored.

Another reason, is that, under the hood, NumPy uses C programming language, which makes it faster than Python itself.


## 2.3 NumPy Arrays

It is the most fundamental structure of NumPy. AN array is almost as a list but is not uniquely one-dimension, it can be of a n-dimension big. One-dimension arrays are called vectors, two-dimension arrays are called matrix and if more than two dimensions it is called a multi-dimension array.

Some examples could be as the ones presented below:

In [6]:
import numpy as np

one_d = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
two_d = np.array([[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                  [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]])

print(one_d)
print()
print(two_d)

[ 1  2  3  4  5  6  7  8  9 10 11 12]

[[ 1  2  3  4  5  6  7  8  9 10]
 [11 12 13 14 15 16 17 18 19 20]]


We can also create a two-dimensional array from more than one one-dimensional arrays:

In [8]:
a = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
b = np.array([11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22])

c = np.array([a,b])
c

array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22]])

To get the dimensions of an array, we can use the __.shape__ method which will give us a tupple with the size of each n-dimentions. To do arrays more quickly, instead of using the range() function, we can use the __np.arange()__ which does the same.

Another function we can use is __.reshape()__. This is useful if we want to take and array of m elements with n-dimentions and change it to another m element array of p-dimentions (p != n). For instance:

In [11]:
a = np.arange(1,13)
a

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [13]:
a.shape

(12,)

In [14]:
a.reshape(4,3)

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

## 2.4 Array operations

There are to main types of operations that can be done with arrays in NumPy: Element-wise operations and Agreggate operations.


### 2.4.1 Element-wise operations

When we sum lists in Python, we create an even longer list. For example:

In [20]:
a = [1, 2, 3] 
b = [-1, 1, 4]

c = a + b
c

[1, 2, 3, -1, 1, 4]

But it would be very useful to have some sort of matrix operations instead. That's why NumPy have these element-wise operations. We can use the sum of NumPy arrays.

In [21]:
import numpy as np

a = np.array([1, 2, 3])
b = np.array([-1, 1, 4])

c = a + b
c

array([0, 3, 7])

The same thing happens with multiplication. We have also another phenomenom called __broadcasting__ which is when you try to sum a one-dimention array of p-lenght (p != 1) with a one-dimention array of length 1. In this case, the second array is extended to fit the size of the first array and it's summed element-wise. It's important that the second array must by a __single value__, if not it will not work as desired.

In [24]:
a = np.array([1, 5, 2])
b = np.array([3])

c = a + b
c

array([4, 8, 5])

### 2.4.2 Agreggate operations

This operations are used with more than one number at a time. We can sum the numbers allong a certain axis, taking into consideration that axis 0 is the __column axis__ and axis 1 is the __row axis__.

For example, if we want to sum all the elements of the array, or from the one of the rows:

In [30]:
a = np.array([[1, 2, 3],
                [4, 5, 6]])

b = np.sum(a)  # This will sum all the elements in the array.
c = np.sum(a, axis = 1) # This will sum all the elements of each row.
d = np.sum(a, axis = 0) # This will sum all the elements of each column.

print(b)
print()
print(c)
print()
print(d)

21

[ 6 15]

[5 7 9]


We can also do the same with the functions: __np.max()__, __np.min()__, __np,mean()__, etc.

## 2.5 Array indexing and slicing


### 2.5.1 Array indexing

Indexing in arrays works almost exactly the same as Python's lists. We have the same 0 based index and we can recall the end of the array with a -1. The same works for two-dimentional arrays.

In [31]:
a = np.array([[1, 2, 3],
              [4, 5, 6]])

print(a[1,0], " is the same as ", a[-1, -3])

4  is the same as  4


### 2.5.2 Array slicing

It's basically the same but with more dimentions. An example can explain it perfectly.

In [32]:
a = np.array([[1, 2, 3, 4],
              [5, 6, 7, 8],
              [9, 10, 11, 12],
              [13, 14, 15, 16]])

a

array([[ 1,  2,  3,  4],
       [ 5,  6,  7,  8],
       [ 9, 10, 11, 12],
       [13, 14, 15, 16]])

What shoud we do to extract the center 2x2 matrix?

In [34]:
b = a[1:3,1:3]
b

array([[ 6,  7],
       [10, 11]])

And an extra exercise, how can we convert it to a 1x4 array?

In [35]:
c = b.reshape(1,4)
c

array([[ 6,  7, 10, 11]])

## 2.6 Array manupilation

We can __delete__, __append__ or __insert__ particular columns or rows.


### 2.6.1 Appending arrays

How to append a row or column into an already existing matrix? Using de __np.append()__ function.

In [52]:
b = np.array([2, 3, 4, 5])

append_row_wise = np.append(a, [b], axis = 0)

print(append_row_wise)


[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]
 [13 14 15 16]
 [ 2  3  4  5]]


In [57]:
c = b.reshape(4,-1)  # Transform the 4-length row into a column

append_col_wise = np.append(a, c, axis = 1)
# In this case, is not necessary to put [] in c as it already is a two-dimentional array

print(append_col_wise)

[[ 1  2  3  4  2]
 [ 5  6  7  8  3]
 [ 9 10 11 12  4]
 [13 14 15 16  5]]


### 2.6.2 Inserting arrays

We will use the __np.insert()__ function instead of append. You have two select in which row/column would you like your array to be inserted, and in which axis.

For example:

In [59]:
a = np.array([[1, 2, 3, 4, 5],
              [6, 7, 8, 9, 10],
              [11, 12, 13, 14, 15],
              [16, 17, 18, 19, 20],
              [21, 22, 23, 24, 25]])

b = np.array([-1, -2, -3, -4, -5])

b_in_third_row = np.insert(a, 2, b, axis = 0 )

print(b_in_third_row)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [-1 -2 -3 -4 -5]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]]


In [62]:
b_in_fourth_column = np.insert(a, 3, b, axis = 1 )
print(b_in_fourth_column)

[[ 1  2  3 -1  4  5]
 [ 6  7  8 -2  9 10]
 [11 12 13 -3 14 15]
 [16 17 18 -4 19 20]
 [21 22 23 -5 24 25]]


### 2.6.3 Deleting arrays

We now are gonna use the __np.delete()__ function, in which we have to give in which array we want to delete a row/column, specify the index number of that row/column and in which axis.

For example, we will delete the inserted column in the previous example:

In [63]:
b_in_fourth_column = np.delete(b_in_fourth_column, 3, axis = 1)
print(b_in_fourth_column)

[[ 1  2  3  4  5]
 [ 6  7  8  9 10]
 [11 12 13 14 15]
 [16 17 18 19 20]
 [21 22 23 24 25]]
