# Python Primer

**Author: Jessica Cervi**


### Introduction to Jupyter Notebook

If you are reading this line, it  means you have successfully opened a `Jupyter Notebook`. If you are viewing this Notebook and do not have a `Python 3` kernel running, please review the Jupyter Notebooks video or seek technical help from Programme Support. 

![](kernel.png)

`Jupyter Notebook` is an application that offers plenty of functionality which can be extended with plug-ins and and other modifications. It can incorporate text, LaTeX equations and images and execute Python codes.

You might have noticed that the basic structure of a Jupyter Notebook consists of cells. The main ones you will be using are *code cells* and *markdown cells*. Code cells are designed to accept code while markdown cells are predisposed to accept text.

Below is an empty code cell.

Note the `In [ ]` to its left - thats the clearest indication that a cell is ready to read and execute a code.  

*These* words, on the other hand, appear in a `markdown` cell. Below is an empty `markdown` cell.

`Markdown` cells can be used to integrate text, figures, etc. into the Notebook. 

When navigating through the various cells in a `Notebook`, you will use two modes: `Command Mode` and `Edit Mode`.  

In `Command Mode`, cells will feature a blue border, whereas `Edit Mode` features cells with a green border.

When in `Command Mode`, double-clicking on a cell or hitting `enter` will activate `Edit Mode`. In `Edit Mode`, hitting `esc` will return to Command Mode.  

A full list of keyboard shortcuts available in both `Command` and `Edit` modes can be found under the `<Help>` menu at the top of your screen.  

A few important and convenient `Command Mode` shortcuts are as follows:  
- `<a>` creates a new cell [a]bove the current cell
- `<b>` creates a new cell [b]elow the current cell  
- `<d , d>` will delete the selected cell(s)
- `<s>` or `<ctrl-s>` will save the notebook   

In `Edit Mode`, the important shortcuts are are follows:
- `<shift-enter>` runs the current cell and highlights the next cell.
- `<alt-enter>` runs the current cell and creates a new cell below.

Use this cell to practice switching between `Command` and `Edit` modes; try running the cell and creating new cells.  

---
When executing a cell, you will see the `In [ ]` change, for example, to `In [1]`.
After running a cell, the number next to the `In` will increment indicating a cell has run.



## Introduction to Python

In this part of the Notebook, we will review some basic concepts of the Python programming language, such as data types, functions, conditional statements and loops. Additionally, we will review some of the functionalities of powerful Python packages, such as `NumPy` and `pandas`.

### Python as a calculator

Python can simply be thought of as a big calculator. As a calculator, it features a number of built-in operators:

- `+` for addition
- `-` for subtraction
- `*` for multiplication
- `/` for division
- `**` for exponents, e.g. `2**4` is $2^4$  
- `//` for floor division e.g.  `9/4 = 2.25; 9//4 = 2; 9/2 = 4.5, 9//2 = 4`
- `%` as the modulo operator. e.g. `9%3 = 0; 9%2 = 1; 9%4 = 1; 9%5 = 4`, and in general, `n1%n2` returns the remainder from `n1/n2`

Run the cell below to see how this works.

In [1]:
2+2, 1-3, 5.2*3, 9/2, 9//2, 2**4

(4, -2, 15.600000000000001, 4.5, 4, 16)

Python can also print out messages to the 'console' using the `print()` command.  
Run the following:

In [2]:
print("Hello world")

Hello world


### Python data types 
Python has some  basic, built-in data types that you will encounter throughout this course. 
- `int` is an integer, such as 1.
- `float` is a float or a number with a decimal point, such as 1.0.
- `str` is a string or any data enclosed in double quotation marks. This is often used for texts such as 'hello'
- `bool` is a boolean, which produces a value of `True` or `False`.
- `tuple` is a tuple consisting of comma separated elements enclosed within parentheses `()`. It is immutable.
- `list` is a list consisting of comma separated elements enclosed within brackets `[]`. Lists may include elements of multiple data types.
- `dict` is a dictionary consisting of comma separated `key: value` pairs enclosed within curly brackets `{}`. Example: `{'key1': 1, 'key2': 2}`.


Below are some examples of the basic data types:

In [3]:
print(7, "is a ", type(7))
print(7.1, "is a ", type(7.1))
print("Joe", "is a ", type("Joe"))
print(True, "is a ", type(True))

7 is a  <class 'int'>
7.1 is a  <class 'float'>
Joe is a  <class 'str'>
True is a  <class 'bool'>


### Variable assignment  

Variables may be assigned (almost) any name by the user. There are a few [reserved words](https://www.google.com/search?client=opera&ei=a4q1W7juKua0jwS30ZO4BA&q=python+keywords+3.6+list&oq=python+keywords+3.6+list&gs_l=psy-ab.3..33i22i29i30l10.6879.7300..7427...0.0..0.180.641.1j4......0....1..gws-wiz.......0i22i30.FR38nCgXD0g) and [names of builtin functions](https://docs.python.org/3.6/library/functions.html) that cannot be used for variable names. In general, Python style guides suggest using `lower_case_words_connected_by_undersore_for_variable_names`.  

Assignment in Python occurs using the `=` operator, as seen in the next cell.

In [4]:
var = 174
print(var)

174


### Tuples

A tuple is a collection of Python objects defined within brackets and separated by commas. Tuples are indexed and can contain nested objects and repetition. Tuples are immutable.

An example of a tuple is given in the code cell below.

In [5]:
my_tuple = ("Hello", 3, "London", 28)
print(my_tuple)
print(type(my_tuple))

('Hello', 3, 'London', 28)
<class 'tuple'>


#### A note about indexing and slicing in Python

_Indexing_ just means accessing elements. To access elements in a list, you can use the square brackets notation. There are many methods to access elements in Python.

You can access single elements using the name followed by a number in []. For example:

In [6]:
tuple_index = (1,2,3,4,5,7)

#The first element in Python has indez zero!
first_elem = tuple_index[0]
print(first_elem)

#Access last element
last_elem = tuple_index[-1]
print(last_elem)

1
7


_Slicing_ an object gives us another object instead of a single element.

A slice specifies a start index and an end index, and it creates and returns a new list based on the indices. The indices are separated by a colon `:`. Keep in mind that the sub-list returned contains only the elements till (end index - 1). For example:

In [7]:
sliced_tuple = tuple_index[2:5]
print(sliced_tuple)

(3, 4, 5)


### Lists

A list is  a collection of elements defined within square brackets and separated by commas. Lists are indexed, can contain nested objects and repetition and, unlike tuples, are mutable.

It can have any number of items and they may be of different types (integer, float, string, etc.).

An example of a list is given in the code cell below.

In [8]:
my_list = [1,3.4, "Alex", "Machine Learning"]
print(my_list)
print(type(my_list))

[1, 3.4, 'Alex', 'Machine Learning']
<class 'list'>


### Dictonaries

A dictionary can be created by placing elements within curly braces and separating them with commas. 

A dictionary holds a pair of values, one being the `Key` and the other corresponding pair element being its `value`. Values in a dictionary can be of any datatype and can be duplicated, whereas keys can’t be repeated and must be immutable. Like tuples and lists, dictonaries are indexed.

An example of a dictonary is given in the code cell below.

In [9]:
my_dict = {"name": "John", "age": 27, "city" : "Seattle", "salary" : 4382.78}
print(my_dict)
print(type(my_dict))

{'name': 'John', 'age': 27, 'city': 'Seattle', 'salary': 4382.78}
<class 'dict'>


### Python functions
Python has built-in functions and allows users to create user-defined functions. In this course, you will use both and a starter code will be provided. Functions include a header, body and arguments or parameters. 

Here's how to define a function:

- Keyword `def` marks the start of a function header followed by the function name to uniquely identify it. Add parameters (arguments) through which we pass values to a function. They are optional. End the header with a colon `:`.
- Optional documentation string (docstring) describes what the function does.
- Instructions make up the function body. Statements must be indented (usually four spaces).
- An optional return statement returns a value from the function.


#### Sample function
Try running the sample functions below. Be sure to run the first code box, to define the function, before running the second code box to see what the function does.

In [10]:
def hello_name(name):
    x = "Hello, " + name
    print(x)

In [11]:
hello_name("Jack")

Hello, Jack


In [12]:
def cube_power(x):
    """
    Function to compute the cube 
    of a number x
    """
    ans = x**3
    return ans

In [13]:
print(cube_power(2))

8


### Lambda functions

In Python, anonymous function (or lambda functions) are functions without a name. They are defined using the `lambda`  and they generally have the following syntax:

``` Python
lambda arguments: expression
```

- Lambda functions can have any number of arguments but only one expression, which is evaluated and returned.
- Lambda functions are syntactically restricted to a single expression.

Below, there's an example of a lambda function that returns the cube of a number x:

In [14]:
cube = lambda x : x**3
print(cube(2))

8


### If - elif - else statement

In a Python program, the if statement is how you perform if-then decision-making. It allows for conditional execution of a statement or group of statements based on the value of an expression.

**if**: The if statement is the most simple decision-making statement. It is used to decide whether a certain statement or block of statements will be executed or not, i.e if a certain condition is true, then a block of statement is executed, otherwise not.

The basic syntax is given below:

```python
if condition:
    # Executes this block if
    # condition is true
```
where condition is an expression evaluated in a boolean context. The indented block of code below the if statement will be implemented only if the condition is true.

In [15]:
#if statement example

i = 10
if (i > 15): 
   print ("10 is greater than 15") 
print ("10 is less than 15") 

10 is less than 15


Notice that because i is not greater than 15, the condition *inside* the if statement does not get executed and the message '10 is less than 15' gets printed.

**if-else**: we can incorporate the else statement with the if statement to execute a block of code when the condition is false.


```python
if (condition):
    # Executes this block if
    # condition is true
else:
    # Executes this block if
    # condition is false
```


Observe the example below:

In [26]:
#if-else statement example

i = 10
if (i > 15): 
    print ("10 is greater than 15") 
else:
    print ("10 is 5 units smaller than 15") 

10 is 5 units smaller than 15


Notice that, the block of code following the else statement is executed, as the condition present in the if statement is false. 

**if-elif-else:** In this case, the if statements are executed from the top down. As soon as one of the conditions controlling the **if** becomes true, the statement associated with that **if** is executed, and the rest of the ladder is bypassed. If none of the conditions is true, then the final **else** statement will be executed.

    
```python
if (condition1):
    statement
elif (condition2):
    statement
elif (condition3):
    statement   
else:
    statement
```

Observe the example below:

In [27]:
i = 20
if (i == 10): 
    print ("i is 10") 
elif (i == 15): 
    print ("i is 15") 
elif (i == 20): 
    print ("i is 20") 
else: 
    print ("i is not present")

i is 20


Because we have defined `i` to be equal to 20, the first two conditions get bypassed and the second `elif` expression gets executed.

## Loops

### For loop

In Python, **for loops** are used for sequential traversal to iterate over a list or string or an array.

The basic syntax of a for loop is given below:

```python
for i in (range or list):
    # Executes this block 
    # until we are done iterating over  
    #the given list or array
```

For loops can iterate over a sequence of numbers using the `range` function. Observe the example below:


In [2]:
for x in range(4,8):
    print(x**2)

16
25
36
49


For loops can also iterate over lists. Observe the example below:

In [5]:
vegetables = ["tomatoes", "salad", "potatoes", "peppers"]
for v in vegetables:
    print(v)

tomatoes
salad
potatoes
peppers


### While loops

A while loop statement in Python repeatedly executes a target statement as long as a given condition is true. Make sure your while loop has an exit condition: if the condition is always true, you will get stuck in an infinite loop!

The basic syntax of a while loop is:

```python
while (condition):
    # Executes this block if
    # condition is true
```

Observe the example below:

In [8]:
count = 0
while (count < 9):
   print('The count is:', count)
   count = count + 1

The count is: 0
The count is: 1
The count is: 2
The count is: 3
The count is: 4
The count is: 5
The count is: 6
The count is: 7
The count is: 8


### Python packages
Python is an open-source langugage with many packages for data science. Packages are collections of functions designed for a specific purpose that you can use. In this course, we will use three popular Python packages. 
- **pandas**: this is a popular package for data manipulation and analysis. In pandas, you will use the series and the dataframe, the two ways that pandas organises data. You can think of a series as a single list of data and a dataframe as a collection of columns and rows, which is similar to a spreadsheet. 
- **NumPy**: this is a powerful package for mathematical calculations. In NumPy, data is organised into arrays, which enable us to perform calculations on the data.
- **SkLearn**: this is a powerful package for linear algebra operations and enables us to conduct machine learning. 

### Documentation  and resources
You can access relevant documentation at the links below for additional learning. We also recommend the resources listed below to help you ask questions, troubleshoot and connect with the data science and machine learning communities. 
- Python documentation: https://docs.python.org/3.3/reference/index.html
- pandas documentation: https://pandas.pydata.org/docs/getting_started/index.html#getting-started
- NumPy documentation: https://docs.scipy.org/doc/
- SkLearn documentation: https://scikit-learn.org/stable/user_guide.html
- Stack Overflow: https://stackoverflow.com/questions/tagged/python

### Numpy basics

`NumPy` is the core library for scientific computing in Python. It provides a high-performance, multidimensional array object, and tools for working with these arrays.


#### Arrays

A `NumPy` array is arguably the must powerful object that is part of this library.

An array is a grid of values, all of the same type. The number of dimensions is the rank of the array; the shape of an array is a tuple of integers giving the size of the array along each dimension.

We can initialise NumPy arrays from nested Python lists, and access elements using square brackets. See below:


In [16]:
import numpy as np        #importing and aliasing NumPy

a = np.array([1, 2, 3])   # Create a rank 1 array (1 dimensional)
print(type(a))            # Prints "<class 'numpy.ndarray'>"
print(a.shape)            # Prints "(3,)"
print(a[0], a[1], a[2])   # Prints "1 2 3"
a[0] = 5                  # Change an element of the array
print(a)                  # Prints "[5, 2, 3]"


b = np.array([[1,2,3],[4,5,6]])    # Create a rank 2 array (2 dimensional)
print(b.shape)                     # Prints "(2, 3)"
print(b[0, 0], b[0, 1], b[1, 0])   # Prints "1 2 4"

<class 'numpy.ndarray'>
(3,)
1 2 3
[5 2 3]
(2, 3)
1 2 4


NumPy also provides many functions to create arrays:


In [17]:

a = np.zeros((2,2))   # Create an array of all zeros
print (a)              # Prints "[[ 0.  0.]
                      #          [ 0.  0.]]"

b = np.ones((1,2))    # Create an array of all ones
print(b)              # Prints "[[ 1.  1.]]"

c = np.full((2,2), 7)  # Create a constant array
print(c)               # Prints "[[ 7.  7.]
                       #          [ 7.  7.]]"

d = np.eye(2)         # Create a 2x2 identity matrix
print(d)              # Prints "[[ 1.  0.]
                      #          [ 0.  1.]]"

e = np.random.random((2,2))  # Create an array filled with random values
print(e)                     
                            

[[0. 0.]
 [0. 0.]]
[[1. 1.]]
[[7 7]
 [7 7]]
[[1. 0.]
 [0. 1.]]
[[0.75541531 0.74231258]
 [0.56012948 0.51371359]]


#### Array indexing

NumPy offers several ways to index arrays. Here, we will cover just some of them. For a more comprehensive explanation, refer to the [NumPy basics documentation](https://docs.scipy.org/doc/numpy/user/basics.html).

**Slicing**: similar to Python lists, NumPy arrays can be sliced. Since arrays may be multidimensional, you must specify a slice for each dimension of the array:

In [18]:
# Create the following rank 2 array with shape (3, 4)
# [[ 1  2  3  4]
#  [ 5  6  7  8]
#  [ 9 10 11 12]]
a = np.array([[1,2,3,4], [5,6,7,8], [9,10,11,12]])

# Use slicing to pull out the subarray consisting of the first 2 rows
# and columns 1 and 2; b is the following array of shape (2, 2):
# [[2 3]
#  [6 7]]
b = a[:2, 1:3]

# A slice of an array is a view into the same data, so modifying it
# will modify the original array.
print(a[0, 1])   # Prints "2"
b[0, 0] = 77     # b[0, 0] is the same piece of data as a[0, 1]
print(a[0, 1])   # Prints "77"

2
77


**Integer array indexing:** when you index a NumPy arrays using slicing, the resulting array view will always be a subarray of the original array. In contrast, integer array indexing allows you to construct arbitrary arrays using the data from another array. Here is an example:

In [17]:
a = np.array([[1,2], [3, 4], [5, 6]])

# An example of integer array indexing.
# The returned array will have shape (3,) and
print(a[[0, 1, 2], [0, 1, 0]])  # Prints "[1 4 5]"

# The above example of integer array indexing is equivalent to this:
print(np.array([a[0, 0], a[1, 1], a[2, 0]]))  # Prints "[1 4 5]"

# When using integer array indexing, you can reuse the same
# element from the source array:
print(a[[0, 0], [1, 1]])  # Prints "[2 2]"

# Equivalent to the previous integer array indexing example
print(np.array([a[0, 1], a[0, 1]]))  # Prints "[2 2]"

[1 4 5]
[1 4 5]
[2 2]
[2 2]


#### Array math

Basic mathematical functions operate elementwise on arrays, and are available both as operator and as functions in the NumPy module.

In [18]:
x = np.array([[1,2],[3,4]], dtype=np.float64)
y = np.array([[5,6],[7,8]], dtype=np.float64)

# Elementwise sum; both produce the array
# [[ 6.0  8.0]
#  [10.0 12.0]]
print(x + y)
print(np.add(x, y))

# Elementwise difference; both produce the array
# [[-4.0 -4.0]
#  [-4.0 -4.0]]
print(x - y)
print(np.subtract(x, y))

# Elementwise product; both produce the array
# [[ 5.0 12.0]
#  [21.0 32.0]]
print(x * y)
print(np.multiply(x, y))

# Elementwise division; both produce the array
# [[ 0.2         0.33333333]
#  [ 0.42857143  0.5       ]]
print(x / y)
print(np.divide(x, y))

# Elementwise square root; produces the array
# [[ 1.          1.41421356]
#  [ 1.73205081  2.        ]]
print(np.sqrt(x))

[[ 6.  8.]
 [10. 12.]]
[[ 6.  8.]
 [10. 12.]]
[[-4. -4.]
 [-4. -4.]]
[[-4. -4.]
 [-4. -4.]]
[[ 5. 12.]
 [21. 32.]]
[[ 5. 12.]
 [21. 32.]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[0.2        0.33333333]
 [0.42857143 0.5       ]]
[[1.         1.41421356]
 [1.73205081 2.        ]]


Of course, NumPy is an extremely powerful library and it would be impossible to cover all of its functionalities and capabilities in this Notebook. We encourage you to visit the [NumPy webpage](https://docs.scipy.org/doc/numpy/index.html) for a more detailed explanation.

### pandas basics

pandas is an open-source Python library providing high-performance data manipulation and analysis tool using its powerful data structures.

Some interesting pandas functionalities include:

- Fast and efficient dataframe object with default and customised indexing
- Tools for loading data into in-memory data objects from different file formats
- Data alignment and integrated handling of missing data
- Reshaping and pivoting of data sets
- Label-based slicing, indexing and subsetting of large data sets
- Columns from a data structure can be deleted or inserted
- Group by data for aggregation and transformations
- High-performance merging and joining of data



pandas deals with the following three data structures:

- Series - 1D labled homogeneous array, size immutable.
- Dataframe - General 2D labelled, size-mutable tabular structure with potentially heterogeneously typed columns.
- Panel

This course and primer focus mainly on series and dataframes.

#### Building pandas series and dataframes

A pandas series can be created using the following constructor:

``` python
pandas.Series( data, index, dtype, copy)
```


For example, if data is an ndarray, then  the index passed must be of the same length. If no index is passed, then by default the index will be range(n) where n is the array length, i.e., [0,1,2,3…. range(len(array))-1].

In [4]:
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)

0    a
1    b
2    c
3    d
dtype: object

A pandas dataframe can be created using various inputs like the following:
- Lists
- Dict
- Series
- NumPy ndarrays
using the following constructor:

``` python
pandas.DataFrame( data, index, dtype, copy)
```

For example:

In [5]:
import pandas as pd
data = {'Name':['Tom', 'Jack', 'Steve', 'Ricky'],'Age':[28,34,29,42]}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age
0,Tom,28
1,Jack,34
2,Steve,29
3,Ricky,42


Additionally, pandas allows you to import data sets saved with different formats (`.csv`, `.pickle`, etc.) into a dataframe using functions like `.read_csv()`, `.read_pickle()` and so on. You can find a comprehensive list of pandas functions to read data [here](https://pandas.pydata.org/pandas-docs/stable/reference/io.html).

#### Exploratory data analysis (EDA)

EDA plays a critical role in understanding the what, why, and how of the problem statement. It’s first in the order of operations that a data analyst will perform when handed a new data source and problem statement.

EDA is an approach for analysing data sets by summarising their main characteristics with visualizations. The EDA process is a crucial step prior to building a model for unravelling insights important in developing robust algorithms.

Let’s try to break down this definition and understand the different use cases for EDA:

- First and foremost, EDA provides a stage for breaking down problem statements into smaller experiments which can help understand the data set
- EDA provides relevant insights which help analysts make key business decisions
- EDA provides a platform to run all thought experiments and ultimately guides us towards making a critical decision

In this section, we will review some basic pandas commands to perform EDA.

Imagine we have a data set `data.csv` that contains some data from different marketing campaignes. We use the function `read_csv()` to read the file into a dataframe df.

In [14]:
import pandas as pd
df = pd.read_csv("data.csv")
df

Unnamed: 0,ad_id,reporting_start,reporting_end,campaign_id,fb_campaign_id,age,gender,interest1,interest2,interest3,impressions,clicks,spent,total_conversion,approved_conversion
0,708746,17/08/2017,17/08/2017,916,103916,30-34,M,15,17,17,7350.000000,1,1.43,2.0,1.0
1,708749,17/08/2017,17/08/2017,916,103917,30-34,M,16,19,21,17861.000000,2,1.82,2.0,0.0
2,708771,17/08/2017,17/08/2017,916,103920,30-34,M,20,25,22,693.000000,0,0.00,1.0,0.0
3,708815,30/08/2017,30/08/2017,916,103928,30-34,M,28,32,32,4259.000000,1,1.25,1.0,0.0
4,708818,17/08/2017,17/08/2017,916,103928,30-34,M,28,33,32,4133.000000,1,1.29,1.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1138,1314410,19/08/2017,19/08/2017,45-49,F,109,111,114,1129773,252,358.189997,13,2.00,,
1139,1314411,19/08/2017,19/08/2017,45-49,F,110,111,116,637549,120,173.880003,3,0.00,,
1140,1314412,19/08/2017,19/08/2017,45-49,F,111,113,117,151531,28,40.289999,2,0.00,,
1141,1314414,17/08/2017,17/08/2017,45-49,F,113,114,117,790253,135,198.710000,8,2.00,,


The **describe()** function returns a pandas series type that provides descriptive statistics which summarises the central tendency, dispersion and shape of a data set’s distribution, excluding NaN values. The three main numerical measures for the center of a distribution are the mode, mean($\mu$), and the median ($M$). The mode is the most frequently occurring value. The mean is the average value while the median is the middle value.

In [9]:
df.describe()

Unnamed: 0,ad_id,interest1,interest2,interest3,impressions,clicks,spent,total_conversion,approved_conversion
count,1143.0,1143.0,1143.0,1143.0,1143.0,1143.0,1143.0,761.0,761.0
mean,987261.1,33.884514,118060.6,42.474191,68725.0,11.629921,17.59776,2.161629,0.768725
std,193992.8,27.560263,267050.6,48.987248,206702.3,27.347899,48.418711,4.062201,1.656445
min,708746.0,2.0,3.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,777632.5,16.0,22.0,19.0,144.265,1.0,0.0,1.0,0.0
50%,1121185.0,26.0,33.0,27.0,3142.0,2.0,1.53,1.0,0.0
75%,1121804.0,32.0,98894.0,38.0,27864.0,8.0,8.54,2.0,1.0
max,1314415.0,120.0,2286228.0,421.0,3052003.0,340.0,639.949998,60.0,21.0


The function **shape** can be used to retrieve the number of entries in the data set.

In [11]:
df.shape

(1143, 15)

The **info()** method prints information about a dataframe including the index dtype, column dtypes, non-null values and memory usage.

In [15]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1143 entries, 0 to 1142
Data columns (total 15 columns):
ad_id                  1143 non-null int64
reporting_start        1143 non-null object
reporting_end          1143 non-null object
campaign_id            1143 non-null object
fb_campaign_id         1143 non-null object
age                    1143 non-null object
gender                 1143 non-null object
interest1              1143 non-null int64
interest2              1143 non-null int64
interest3              1143 non-null int64
impressions            1143 non-null float64
clicks                 1143 non-null int64
spent                  1143 non-null float64
total_conversion       761 non-null float64
approved_conversion    761 non-null float64
dtypes: float64(4), int64(5), object(6)
memory usage: 134.1+ KB


Of course, pandas is a very powerful library with many other functions. Exploring all of them is beyond our scope. For this reason, we will leave it up to you to go explore its many other functionalities as you learn more about data science and machine learning.

### A note about other Python libraries 

Python is very powerful and popular programming language that is used when doing analysis in many fields. For this reason, many third party libraries have been developed in order to accomodate the needs of different researches, scientists and analylists.

When it comes to data science, another popular library is `scikit-learn`. Again, this is a very powerful package that comes with endless functionalities. For this reason, we encourage you to visit the links above for the documentation of each library.

Additionally, each solution to the group exercises in the assignments you will see in this course is designed as a learning experience and will guide you through the steps of each algorithm implemented. We hope you will enjoy this learning experience and become passionate about machine learning and its algorithms!