# Introduction to Python

* Link: https://learn.datacamp.com/courses/intro-to-python-for-data-science

## Course Descripton

Python is a general-purpose programming language that is becoming ever more popular for data science. Companies worldwide are using Python to harvest insights from their data and gain a competitive edge. Unlike other Python tutorials, this course focuses on Python specifically for data science. In our Introduction to Python course, you’ll learn about powerful ways to store and manipulate data, and helpful data science tools to begin conducting your own analyses. Start DataCamp’s online Python curriculum now

# Chapter 1 - Python Basics

An introduction to the basic concepts of Python. Learn how to use Python interactively and by using a script. Create your first variables and acquaint yourself with Python's basic data types.

## Hello Python!

Python is a pretty versatile language. It can be used in many applications:
* When you want to do some quick calculations.
* For your new business, you want to develop a database-driven website.
* When your boss asks you to clean and analyze the results of the latest satisfaction survey.

### `print()`

In [2]:
print(7 + 10)

print(5 / 8)

17
0.625


### Using comments (#)

In [3]:
# This is a comment
print("Test")

Test


### Arithmetical operations

* Addition: `+`
* Subtraction: `-`
* Division: `\`
* Multiplication: `*`
* Exponentiation: `**`
  * This operator raises the number to its left to the power of the number to its right. For example `4**2` will give `16`.
* Modulo: `%`
  * This operator returns the remainder of the division of the number to the left by the number on its right. For example `18 % 7` equals `4`.

In [4]:
print(5 + 5) # Addition
print(5 - 5) # Subtraction
print(3 * 5) # Multiplication
print(10 / 2) # Division
print(18 % 7) # Modulo
print(4 ** 2) # Exponentiation

# How much is your $100 worth after 7 years?
print(100*(1.1**7))

10
0
15
5.0
4
16
194.87171000000012


## Variable and Types

### Variables

* Specfic, case-sensitive name
* Call up value through variable name

In [5]:
height = 1.79
weight = 68.7
bmi = weight / height ** 2
print(bmi)

21.44127836209856


### Python Types

* `int` : Integer, a number without a fractional part.
* `float` : Floating point, a real number.
  * It has a has both an integer and fractional part, separated by a point.
* `str` : String, a type to represent text.
  * You can use single or double quotes to build a string.
* `bool` : Boolean, a type to represent logical values.
  * Can only be `True` or `False` (the capitalization is important!).

In [6]:
print(type(bmi))

<class 'float'>


In [7]:
day_of_week = 5
print(type(day_of_week))

<class 'int'>


In [8]:
x = "body mass index"
y = 'this works too'
print(type(y))

<class 'str'>


In [9]:
z = True
print(type(z))

<class 'bool'>


### Behavior of operations with different variable types

* Different types have different behaviors

In [10]:
2 + 3

5

In [11]:
'ab' + 'cd'

'abcd'

### Type convertions

* `int()` : Convert a variable with another type to `int`. 
* `float()` : Convert a variable with another type to `float`.
* `bool()` : Convert a variable with another type to `bool`.
* `str()` : Convert a variable with another type to `str`.

In [14]:
# Using int()

x = 10.3 # type float
y = '100' # type string
z = True
print(int(x))
print(int(y))
print(int(z))

10
100
1


In [15]:
# Using str()

a = 100 # type int
b = 123.56 # type float
c = False # type bool
print(str(a))
print(str(b))
print(str(c))

100
123.56
False


# Chapter 2 - Python Lists

Learn to store, access, and manipulate data in lists: the first step toward efficiently working with huge amounts of data.

## Python Lists

* List format: `[a, b, c]`
* Why use lists:
  * Name a collection of values
  * Contain any type
  * Contain different types

In [18]:
my_list = [1.73, 1.68, 1.71, 1.89]
print(my_list)

[1.73, 1.68, 1.71, 1.89]


In [19]:
# Simple list
fam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad", 1.89]
print(fam)

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]


In [20]:
# List of lists
fam2 = [["liz", 1.73], ["emma", 1.68], ["mom", 1.71], ["dad", 1.89]]
print(fam2)

[['liz', 1.73], ['emma', 1.68], ['mom', 1.71], ['dad', 1.89]]


### List type

In [21]:
print(type(my_list))
print(type(fam))
print(type(fam2))

<class 'list'>
<class 'list'>
<class 'list'>


## Subsetting Lists

* Each element inside a list has an index associeted to it.
* The first element has index `0`, the second has index `1`, and so on. 
* Also, to access the last element of the list we can use index `-1`. 
  * Using `-2`will access the element before the last one, and so on.

In [24]:
# Select an specific value in the list
fam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad", 1.89]
print(fam)

# Access
print(fam[3]) # fourth element
print(fam[4]) # fifth element
print(fam[-1]) # last element

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
1.68
mom
1.89


### Subsetting list if lists

In [35]:
x = [["a", "b", "c"],
     ["d", "e", "f"],
     ["g", "h", "i"]]
print(x[0][0]) # First list, first position
print(x[2][0]) # Third list, first position
print(x[2][-1]) # Third list, last position
print(x[-1][-1]) # Last list, last position

a
g
i
i


### List slicing

* It is possible to select a part of a list, creating a entire new list.
* To do that we can use slicing, by specifying the `start` and `end` of the list.
  * Slicing is done in the format: `my_list[start:end]`
  * `start` is inclusive
  * `end` is exclusive
* If `start` or `end` is not specified, then slicing is done considering the extremes of the list.
  * Example 1 : my_list[:5] is going to slice from index 0 to index 4
  * Example 2: my_list[3:] is going to slice from index 3 to the last index
  * Example 3: my_list[:] is going to slice from index 0 to the last index

In [29]:
print(fam) # Entire list

print(fam[3:5]) # Sliced list
print(fam[0:5]) # Sliced list
print(fam[0:-1]) # Sliced list
print(fam[2:]) # Sliced list
print(fam[:]) # Sliced list

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
[1.68, 'mom']
['liz', 1.73, 'emma', 1.68, 'mom']
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad']
['emma', 1.68, 'mom', 1.71, 'dad', 1.89]
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]


In [31]:
x = ["a", "b", "c", "d"]

# Same result
print(x[1])
print(x[-3]) 

b
b


## Manipulating Lists

* We can:
  * Change list elements
  * Add list elements
  * Remove list elements

### Changing list elements

In [40]:
fam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad", 1.89]
print(fam)

fam[7] = 2.15
print(fam)

fam[0:2] = ["eduard", 1.99]
print(fam)

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 2.15]
['eduard', 1.99, 'emma', 1.68, 'mom', 1.71, 'dad', 2.15]


In [45]:
x = ["a", "b", "c", "d"]
print(x)

x[1] = "r"
print(x)

x[2:] = ["s", "t"]
print(x)

['a', 'b', 'c', 'd']
['a', 'r', 'c', 'd']
['a', 'r', 's', 't']


### Adding and removing elements

In [43]:
fam = ["liz", 1.73, "emma", 1.68, "mom", 1.71, "dad", 1.89]
print(fam)

# Adding elements
fam2 = fam + ["me", 1.79]
print(fam2)

# Removing elements
del(fam2[0])
print(fam2)

['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89]
['liz', 1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89, 'me', 1.79]
[1.73, 'emma', 1.68, 'mom', 1.71, 'dad', 1.89, 'me', 1.79]


## Creating lists with `list()`

To copy a list we have to use `list()` or `[:]`, otherwise using just `=` will make both list point to the same list, causing problems when mapilutation values.

In [49]:
# Wrong way to do it
list1 = [1, 2, 3, 4, 5, 6]
list2 = list1 # Point to the same values as list1
print(list1)
print(list2)

list2[0] = 100
print(list1) # The value in list1 also changes, because both lists point to the same values in memory.
print(list2)

[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]
[100, 2, 3, 4, 5, 6]
[100, 2, 3, 4, 5, 6]


In [50]:
# First right way to do it
list1 = [1, 2, 3, 4, 5, 6]
list2 = list(list1) # Creates a whole new list of values
print(list1)
print(list2)

list2[0] = 100
print(list1) # The value in list1 also changes, because both lists point to the same values in memory.
print(list2)

[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]
[100, 2, 3, 4, 5, 6]


In [51]:
# Second right way to do it
list1 = [1, 2, 3, 4, 5, 6]
list2 = list1[:] # Slicing creates a whole new list of values
print(list1)
print(list2)

list2[0] = 100
print(list1) # The value in list1 also changes, because both lists point to the same values in memory.
print(list2)

[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]
[100, 2, 3, 4, 5, 6]


# Chapter 3 - Functions and Packages

You'll learn how to use functions, methods, and packages to efficiently leverage the code that brilliant Python developers have written. The goal is to reduce the amount of code you need to solve challenging problems!

## Functions

* Functions are pieces of reusable code
* They solve a particular task
* You can call a function instead of writing code yourself
* Examples of functions: `print()`, `max()`, `min()`, `round()`, `type()`, `bool()`, many others.


In [52]:
# Using a function
fam = [1.73, 1.68, 1.71, 1.89]
max(fam) # Return maximum value in the list

1.89

In [53]:
# Using function type()
print(type("This is a string"))

<class 'str'>


In [54]:
# Using function round()
print(round(123.56))

124


In [55]:
# Using function len()
print(len("What is the size of this string?"))

32


In [56]:
# Using function sorted()
full = [11.25, 18.0, 20.0, 10.75, 9.50, 8.50, 12.40]
full_sorted = sorted(full, reverse=True)
print(full_sorted)

[20.0, 18.0, 12.4, 11.25, 10.75, 9.5, 8.5]


## Methods

* Methods are functions that belong to objects
* Methods are accessed using a `.`
* Example: `my_list.index("a")` will return the index of element `'a'`

### `list` methods

* `index()` : return the index of an element
* `count()` : count number of occurencies of the element inside the list
* `append()` : add element to the end of the list
* `remove()` : remove element from list
* `reverse()` : reverses the order of the elements in the list it is called on  

In [68]:
fam = [1.73, 1.68, 1.71, 1.89, 1.73]
print(fam.index(1.89)) # Return index
print(fam.count(1.73)) # Return number of occurencies

3
2


In [67]:
fam = [1.73, 1.68, 1.71, 1.89, 1.73]
print(fam)
fam.append(1.96)
print(fam)
fam.remove(1.96)
print(fam)

[1.73, 1.68, 1.71, 1.89, 1.73]
[1.73, 1.68, 1.71, 1.89, 1.73, 1.96]
[1.73, 1.68, 1.71, 1.89, 1.73]


In [72]:
fam = [1.73, 1.68, 1.71, 1.99, 1.57]
print(fam)
fam.reverse()
print(fam)

[1.73, 1.68, 1.71, 1.99, 1.57]
[1.57, 1.99, 1.71, 1.68, 1.73]


### `str` methods

* `replace()` : change a string for another inside a string
* `index()`: return the index of an element
* `upper()`: put the string in upper case

In [63]:
my_name = "João Gross"
print(my_name.replace("s", "Z"))
print(my_name.index("G"))
print(my_name.upper())

João GroZZ
5
JOÃO GROSS


## Packages

* Packages are colleciton of scripts that already have functions, methods and types implemented, so you don't have to implement them all over again.
* There are thousand of packages available
  * Numpy
  * Matplotlib
  * Scikit-learn

### Install package

* Link: http://pip.readthedocs.org/en/stable/installing/
* Download `get-pip.py`
* Terminal:
  * `python3 get-pip.py`
  * `pip3 install numpy`

### Import packages

In [75]:
import numpy as np

np_array = np.array([1,2,3])
print(np_array)
print(type(np_array))

[1 2 3]
<class 'numpy.ndarray'>


In [79]:
import math
r = 1
circunference = 2 * math.pi * r
print(circunference)
area = math.pi * r**2
print(area)

6.283185307179586
3.141592653589793


# Chapter 4 - Numpy

NumPy is a fundamental Python package to efficiently practice data science. Learn to work with powerful tools in the NumPy array, and get started with data exploration.

## Numpy

### List Recap

* Powerful
* Collection of values
* Hold different types
* Change, add, remove
* Need for Data Science
  * Mathematical operations over collections
  * Speed

### Working with lists

In [80]:
# We can't make a single division operation between two lists
height = [1.73, 1.68, 1.71, 1.89, 1.79]
weight = [65.4, 59.2, 63.6, 88.4, 68.7]
weight / height

TypeError: ignored

### Solution: Use Numpy

* Numeric Python
* Alternative to Python List: Numpy Array
* Calculations over entire arrays
* Easy and Fast
* Installation
  * In the terminal: `pip3 install numpy`

In [82]:
import numpy as np

height = [1.73, 1.68, 1.71, 1.89, 1.79]
weight = [65.4, 59.2, 63.6, 88.4, 68.7]
np_height = np.array(height)
np_weight = np.array(weight)

bmi = np_weight / np_height ** 2
print(bmi)

[21.85171573 20.97505669 21.75028214 24.7473475  21.44127836]


### Numpy: remarks

* Every number array must have only one type of elements
* Numpy arrays differ in behaviour from lists

In [85]:
# A Numpy array must have all elements of the same type!
np.array([1.0, "is", True]) # All elements converted to str

array(['1.0', 'is', 'True'], dtype='<U32')

In [84]:
# Different behaviours
python_list = [1, 2, 3]
numpy_array = np.array([1, 2, 3])

print(python_list * 2)
print(numpy_array * 2)

[1, 2, 3, 1, 2, 3]
[2 4 6]


### Numpy subsetting

In [90]:
print(bmi)
print(bmi[0])
print(bmi > 23)
print(bmi[bmi > 23])

[21.85171573 20.97505669 21.75028214 24.7473475  21.44127836]
21.85171572722109
[False False False  True False]
[24.7473475]


## 2D Numpy Arrays

In [91]:
# One dimensional array
import numpy as np
np_height = np.array([1.73, 1.68, 1.71, 1.89, 1.79])
np_weight = np.array([65.4, 59.2, 63.6, 88.4, 68.7])
print(type(np_height))
print(type(np_weight))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


In [92]:
# Two dimensional array
import numpy as np
np_2d = np.array([[1.73, 1.68, 1.71, 1.89, 1.79],
                  [65.4, 59.2, 63.6, 88.4, 68.7]])
print(np_2d)

[[ 1.73  1.68  1.71  1.89  1.79]
 [65.4  59.2  63.6  88.4  68.7 ]]


### Subsetting 2D Numpy Arrays

In [98]:
print(np_2d[0]) # First line

print(np_2d[1,:]) # Second line, all elements

print(np_2d[0][2]) # First line, third element

print(np_2d[0,2]) # First line, third element

print(np_2d[:,1:3]) # All lines, second and third elements

[1.73 1.68 1.71 1.89 1.79]
[65.4 59.2 63.6 88.4 68.7]
1.71
1.71
[[ 1.68  1.71]
 [59.2  63.6 ]]


## Numpy: Basic Statistics

In [104]:
np_2d = np.array([[1.73, 1.68, 1.71, 1.89, 1.79],
                  [65.4, 59.2, 63.6, 88.4, 68.7]])

print(np.mean(np_2d))
print(np.median(np_2d))
print(np.std(np_2d))
print(np_2d.sum())

35.41
30.545
34.406168923610196
354.09999999999997
None


### Generate data

* Arguments for `np.random.normal()`
  1. distribution mean
  2. distribution standard deviation
  3. number of samples

In [107]:
import numpy as np
height = np.round(np.random.normal(1.75, 0.20, 5000), 2)
weight = np.round(np.random.normal(60.32, 15, 5000), 2)
np_city = np.column_stack((height, weight))

print(np_city)
print(np_city.shape) # 5000 rows, 2 columns

[[ 1.72 64.57]
 [ 2.25 52.6 ]
 [ 1.73 72.89]
 ...
 [ 2.2  59.93]
 [ 1.64 65.65]
 [ 1.49 43.47]]
(5000, 2)
