# Introduction to Python

**Author:** 'Felipe Millacura'

**Date:** '13th December 2020'

## Learning Objectives

* Be able to use Jupyter notebooks
* Get exposed to the basics of Python
* Understand the differences between Python and other programming languages
* Be able to use Python packages

## Very short feedback loop and adaptation cycle

A common characteristic in **agile software development** is the daily stand-up (a daily scrum in Scrum framework). In a brief session, team members report to each other what they did the previous day toward their team's iteration goal, what they intend to do today toward the goal, and any roadblocks or impediments they can see to the goal.
- How are you?
- How is the course going?






In [2]:
import time

def countdown(t):
    while t > 0:
        print(t)
        t -= 1
        time.sleep(1)
        
    print("Time Off!")
        
countdown(30)

30
29
28
27
26
25
24
23
22
21
20
19
18
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
Time Off!


![agile_sd](https://www.cleart.com/wp-content/uploads/2017/11/what-is-agile-software-development.jpg)

## Introduction

1.  Python’s popularity & high salary
2.    Python is used in Data Science 
3.    Python’s scripting & automation 
4.    Python used with Big Data
5.    Python supports Testing
6.    Computer Graphics in Python
7.    Python used in Artificial Intelligence
8.    Python in Web Development
9.    Python is portable & extensible
10.    Python is simple & easy to learn

SOURCE: [towardsdatascience.com](https://towardsdatascience.com/top-10-reasons-why-you-need-to-learn-python-as-a-data-scientist-e3d26539ec00)

### Simplicity

Python is one of the easiest languages to start your journey. Also, its simplicity does not limit your functional possibilities.

What gives Python such flexibility? There are multiple factors:

    Python is a free and open-source language (
    This is a high-level programming
    Python is interpreted 
    It has an enormous community

In addition, Python is fast in writing. Just compare these 2 examples written in Java and Python:

![python_java](https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2018/01/comparison-top-reasons-to-learn-python-Edureka.png)

### Scalability

Python is a programming language that scales very fast. Among all available languages, Python is a leader in scaling. That means that Python has more and more possibilities.

    Python flexibility is super useful for any problem in-app development

Any problem can be decided easily with new updates that are coming. Saying that Python provides the best options for newbies because there are many ways to decide the same issue.

Even if you have a team of non-Python programmers, who knows C+ +design patterns, Python will be better for them in terms of time needed to develop and verify code correctness.

It happens fast because you don`t spend your time to find memory leaks, work for compilation or segmentation faults.

### Libraries and Frameworks

Due to its popularity, Python has hundreds of different libraries and frameworks which is a great addition to your development process. They save a lot of manual time and can easily replace the whole solution.

As a Data Scientist, you will find that many of these libraries will be focused on Data Analytics and Machine Learning. Also, there is a huge support for Big Data. I suppose there should be a strong pro why you need to learn Python as your first language.

Some of these libraries are given below:

    Pandas

It is great for data analysis and data handling. Pandas provides data manipulation control.

    NumPy

NumPy is a free library for numerical computing. It provides high-level math functions along with data manipulations.

    SciPy

This library is related to scientific and technical computing. SciPy can be used for data optimization and modification, algebra, special functions, etc.


## Anaconda

Anaconda is a distribution of packages built for data science. It comes with conda, a package and environment manager. You'll be using conda to create environments for isolating your projects that use different versions of Python and/or different packages. You'll also use it to install, uninstall, and update packages in your environments. Using Anaconda has made my life working with data much more pleasant.

### What is Anaconda Distribution?

Anaconda is a program to manage (install, upgrade, or uninstall) packages and environments to use with Python. It's simple to install packages with Anaconda and create virtual environments to work on multiple projects conveniently.

Even if you already have Python installed, it will be beneficial to use Anaconda/Miniconda because:

    Anaconda comes with a bunch of data science packages; you'll be all set to start working with data.
    Using conda to manage your packages and environments will reduce future issues dealing with the various libraries you'll be using.

### Python Packages

A package is a bunch of modules, where each module consists of a set of classes and function definitions. After installing a particular package, you can import and use the functions defined in that package.

If we install Anaconda, then a basic few packages are installed by default. However, you can install any more packages, if needed.

### Anaconda Distribution

Anaconda is a fairly large download (~500 MB) because it comes with Python's most common data science packages. Anaconda is a software distribution that includes the following:

https://www.anaconda.com/products/individual#Downloads

**Anaconda Navigator** - It is a graphical user interface that helps open up any installed applications, such as Jupyter notebook or VS code editor. We will learn more about the notebook in the next lesson. See a snapshot of Anaconda Navigator below:

![Anaconda GUI](https://video.udacity-data.com/topher/2020/September/5f741d48_screenshot-2020-09-26-at-5.03.28-pm/screenshot-2020-09-26-at-5.03.28-pm.png)


## Jupyter Notebook

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualisations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modelling, data visualisation, machine learning, and much more.

There are two versions of Python available: Python 2 and Python 3. Python 3 was an update to Python 2 that isn't backwards compatible. So code written in Python 2 won't necessarily run in Python 3. Python 2 is no longer supported (as of January 2020!) so if you have the choice you should avoid using it.

We have installed Jupyter Notebook via the Anaconda Distribution - This includes everything we need, from Python 3.7 to all data science related packages.

>"The open-source Anaconda Distribution is the easiest way to perform Python/R data science and machine learning on Linux, Windows, and Mac OS X."

You can start up Jupyter Notebook by typing

```
jupyter notebook
```

in your terminal. 

This will start a notebook server in your current location in terminal. If you want to make your notebooks somewhere other than your home directory, you will want to use `cd` to move to the directory you want. 

- Note that if the terminal drops a "command not found" error, you have to add the following to your bash profile:

```
PATH=$PATH:$HOME/anaconda3/bin
```

Once you have Jupyter Notebook up and running in your browser, we can create a new notebook by clicking on `New -> Notebook/Python 3`

In Jupyter, we are going to work in cells - each cell can hold programming logic, graphs, descriptive text and many more. This is similar to any other Markdown, where we could either write markdown or Python code.

Keyboard shortcuts work a little different in Jupyter Notebooks. There are two modes: an editing mode and a command mode. 

* Press `esc` to enter command mode.
* Press `enter` enter editing mode.

In editing mode you can write code or markdown and you can run the cell by pressing `shift + enter`.
In command mode you have access to a huge range of keyboard shortcuts, including:

* The arrow keys will move you between cells
* `a` : create a new cell above
* `b` : create a new cell below
* `dd` : pressing `d` twice, quickly will delete your current cell
* `m` : change a code cell into a markdown cell
* `y`: change a markdown cell into a code cell

[Here are some more handy shortcuts](https://www.cheatography.com/weidadeyue/cheat-sheets/jupyter-notebook/)


## Basics

Python was intended to be a general purpose programming language that gradually built up its usefulness for data science with a strong community behind it who built extensions and packages for this purpose.

The more tools you know the easier it is to pick one that fits the end goal perfectly!

## Variable assignment

In Python variable assignment is fairly straightforward: The only option we have is the `=` sign.

```{python}
x = 3
print(x)
```

You could use `print()` to print out values from variables, but in Jupyter Notebook, the result of the last expression will be printed out:

```{python}
x = 3
x
```

- Note that anything you declare in a cell will be available in the global scope. If you intend to hide something from being exposed, you have to place it into a function.


In [5]:
x = 3

y = 67


print(y)

67


## Primitive Datatypes

Let's compare the different primitive datatypes in Python languages!

`text` equivalent is called strings, can be both double-, or single quoted:



In [7]:
"Hello world!"
'Hello world'

'Hello world'

The `logical` datatype is called `bool`, standing for boolean - `True` or `False`. This will be returned from logical operations:


In [12]:
print(4 == [4])

False


NameError: name 'FALSE' is not defined

- Note that only the first letter is capitalised.

Logical operations are much the same to other programming languages. You can use `>`, `<`, `>=`, `<=`, `==` `%`. The latter `%` is the remainder after integer division, also known as modulo. 


In [13]:
10 / 3

3.3333333333333335

In [14]:
10 % 3


1

An interesting feature is that you can use the keywords `and` and `or` to link logical operations, and this is quite characteristic of Python: it is widely regarded as a very 'human readable' language.

The commonly known as `NULL` datatype is called `None`, and it indicates a lack of value.


In [20]:
x = None
print(x)

type(x)

None


NoneType

Again with the `numeric` datatype, we differentiate between integers (`int`) and floating point numbers (`float`). 

- Note that dividing 2 `int`s with each other might result in a `float`


In [50]:
2 / 3

0.6666666666666666


You can figure out the datatype of a value by using the `type()` function. This is the equivalent of `class()` in R.


In [24]:
print(4==[4])
type(4)



False


int

## Lists

Python `list`s are indicated by square brackets:



In [25]:
my_items = ["key", "wallet", "phone", "mask", 4]
my_items


['key', 'wallet', 'phone', 'mask', 4]

By default you can add any type of data to a Python list, so you need to be careful to add either heterogeneous or homogeneous data. Also, in R everything is a vector: a single number is just a one-element numeric vector. However Python distinguishes between a primitive variable and a one-element list. You can see this below.


In [57]:
5 == [5]

False

As Python is a general purpose programming language, `list`s are **0-indexed**, whereas other programming languages could be **1-indexed**. All this means is that in Python you start counting list entries at zero (so, the )

To access an item from a `list`, just use the item's index in square brackets after the variable it's stored in:


In [28]:
my_items[4]

4

A great feature of Python is that using negative indices makes it possible to access elements from the end of a list, making it easy to access the last item:

In [30]:
my_items[-2]

'mask'

Slicing a `list` is another useful feature, giving us a chance to get a subset of data from it, this can be done using indices separated by a colon:


In [32]:
my_items[0:4]


['key', 'wallet', 'phone', 'mask']

- Note that when you slice, it is left-inclusive, but right-exclusive, so the item at index `0` above will be in the output, but the item at index `2` will not. You can also leave out either the first or the second index to start from the beginning or the list, or to include everything through to the end, respectively:


In [34]:
my_items[:2]




['key', 'wallet']

In [35]:
my_items[2:]

['phone', 'mask', 4]

To make a list of numbers, use the function `range()`. The two numbers you give `range()` will define the start and end. Just like slicing this is left-inclusive and right-exclusive. 


In [36]:
range(1, 10)

range(1, 10)

The function `range()` actually produces an object of type 'range'. Don't worry just now about what 'range' objects are, but just realise that you may need to use `list()` to convert them to lists. 


In [37]:
type(range(5, 10))

range

In [39]:
type(range(5, 10))

range


If you provide only the end point, `range()` will automatically start at zero.


In [40]:
list(range(100))


[0,
 1,
 2,
 3,
 4,
 5,
 6,
 7,
 8,
 9,
 10,
 11,
 12,
 13,
 14,
 15,
 16,
 17,
 18,
 19,
 20,
 21,
 22,
 23,
 24,
 25,
 26,
 27,
 28,
 29,
 30,
 31,
 32,
 33,
 34,
 35,
 36,
 37,
 38,
 39,
 40,
 41,
 42,
 43,
 44,
 45,
 46,
 47,
 48,
 49,
 50,
 51,
 52,
 53,
 54,
 55,
 56,
 57,
 58,
 59,
 60,
 61,
 62,
 63,
 64,
 65,
 66,
 67,
 68,
 69,
 70,
 71,
 72,
 73,
 74,
 75,
 76,
 77,
 78,
 79,
 80,
 81,
 82,
 83,
 84,
 85,
 86,
 87,
 88,
 89,
 90,
 91,
 92,
 93,
 94,
 95,
 96,
 97,
 98,
 99]

**Task - 5 minutes**

1. Create a list with the following data, and assign it to a variable named `things`:

- 'Apple'
- 'Grape'
- 101
- 'Onion'
- False

Make sure that you don't use quotes around 101 and False.

2. Extract the first element of the list.

3. Extract the last element of the list.

4. Extract everything except the last element of the list.

5. What is the type of:

1. the 3rd element?
2. the 4th element?
3. the 5th element?

6. Create a list with the following data, using the `range()` function.

- 11
- 12
- 13
- 14
- 15

**Solution**

In [None]:
# Create a list with the following data, and assign it to a variable named things:

things = ["Apple", "Grape", 101, "Onion", False]
things

In [None]:
# Extract the first element of the list.

things[0]

In [None]:
# Extract the last element of the list.

things[-1]

In [None]:
# Extract everything except the last element of the list.

things[:4]

# Or:

things[:-1]

In [None]:
#What is the type of:

# the 3rd element?

print(type(things[2]))

# the 4th element?

print(type(things[3]))

# the 5th element?

print(type(things[4]))

In [None]:
# Create a list with the following data, using the `range()` function.

list(range(11, 16))

## Dictionaries

The equivalent of a `dictionary` in Python is often called a `list` or `hash`, or a `hashmap`, or an `associative array` in some other languages.

A `dict` is just a set of key-value pairs, where the keys must be unique. We create a `dict` using curly brackets, like so:

In [41]:
person = {
        "name": "Marcela",
        "age": 23
}
person

{'name': 'Marcela', 'age': 23}


What do you think will happen in the following case? 

In [43]:
person2 = {
        "name": "Felipe",
        "surname": "Millacura",
        "age": 29,
        "age2": 29.5
}

person2

{'name': 'Felipe', 'surname': 'Millacura', 'age': 29, 'age2': 29.5}

The second use of a key `"name"` *overwrites* the first key value pair!

- Note that keys should be strings, as this ensures that you can easily create unique keys for every value.

You can access items in the same way as with lists: namely by using square brackets and referencing the relevant key.

In [48]:
person2["age", "age2"]

TypeError: unhashable type: 'list'

To find out how many items there are in a `list`, or key-value pairs in a `dict`, we can use the `len()` function:


In [49]:
len(person2)

4

In [50]:
len(my_items)

5

## Using built-in functions

Functions in Python work very similarly to those in other programming anguages. We call a function by putting brackets after the function name (along with any required or desired arguments):

In [54]:
sum([1, 2, 3, "string"])

TypeError: unsupported operand type(s) for +: 'int' and 'str'

One major difference could be that Python is a very object-oriented language - almost everything is an object in Python. This means that, on occassion, functions are not called in a 'functional' way. 
In Python, we often call functions that 'live' on an object. These functions are called **methods**: methods are functions that live on objects.

Here's a 'real world' example! Imagine we have two `Dog` objects: `fido` and `rex`. Both of these 'Dog' objects have a `bark()` method, because all `Dog`s can bark! 

In Python we call a method on an object by a syntax like `fido.bark()` - this will make `fido` bark. Similarly `rex.bark()` will make `rex` bark. Each of the `Dog` objects has it's own `bark()` method 'living' on the object.

Let's see a more useful example:

In [59]:
animals = ["dog", "cat", "cow", "eagle"]

print(animals)

animals.sort()

print(animals)


['cat', 'cow', 'dog', 'eagle']
['cat', 'cow', 'dog', 'eagle']


Here, the `sort()` method 'lives' on the `animals` list. Think of all Python lists as coming equipped with this method, along with many others.

You might have noticed another major difference in the code above. In Python methods often act **in-place** i.e. they change the origional object and don't create a copy.


Adding or removing items from a list is done with the following methods:

* `pop()` : remove last, and return it
* `remove()` : remove a specified item
* `append()` : add an element to a list
* `extend()` : merge elements into a list

All of these operate in-place.

In [61]:
animals.pop()
# removes last item 33)
animals

['cat', 'cow', 'dog']

In [62]:
animals.append(5)
# adds number 5 to list
animals

['cat', 'cow', 'dog', 5]

In [63]:
animals.remove(5)
# removes number 5 from list
animals

['cat', 'cow', 'dog']

`append()` and `extend()` behave differently when given a list.

`append()` directly adds the list as an element:

In [64]:
animals.append([7, 3, 99])
animals

['cat', 'cow', 'dog', [7, 3, 99]]

While `extend()` adds the elements of the list, into the list.


In [69]:
#animals.remove([7, 3, 99])

animals

animals.extend([7, 3, 99])
animals


['cat', 'cow', 'dog', 7, 3, 99]

Adding or removing key-values pairs to dicts can be done using the `update({"existing key": "updated value", "new key": "new value"})` or `pop(keyname)` methods respectively.

In [70]:
person2.update({"name": "Francisca", "preferred music": ["rock", "salsa"]})
#person.pop("age")
person2

{'name': 'Francisca',
 'surname': 'Millacura',
 'age': 29,
 'age2': 29.5,
 'preferred music': ['rock', 'salsa']}

- Note that even though the method `pop()` exists on both collection types, they behave differently, and most methods will be datatype-exclusive. Refer to the documentation for help with this.


Below is a vector containing the first 5 prime numbers

```{python}
primes = [1, 2, 3, 5, 7]
```

1. Oops, 1 isn't prime. Remove it from the list.

2. Now add the next 3 prime numbers (11, 13, 17).

3. Now use the method `reverse()` to reverse the list. Print out the reversed list.


## Importing

Even though Jupyter Notebook comes with hundreds of packages installed, most of their functionality can only be accessed by importing selected packages. You can import and use packages as follows:


In [74]:
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/MinCiencia/Datos-COVID19/master/output/producto25/CasosActualesPorComuna_std.csv')

In [75]:
df

Unnamed: 0,Region,Codigo region,Comuna,Codigo comuna,Poblacion,Fecha,Casos actuales
0,Arica y Parinacota,15,Arica,15101.0,247552.0,2020-04-13,88.0
1,Arica y Parinacota,15,Camarones,15102.0,1233.0,2020-04-13,0.0
2,Arica y Parinacota,15,General Lagos,15202.0,810.0,2020-04-13,0.0
3,Arica y Parinacota,15,Putre,15201.0,2515.0,2020-04-13,0.0
4,Arica y Parinacota,15,Desconocido Arica y Parinacota,,,2020-04-13,
...,...,...,...,...,...,...,...
26455,Magallanes y la Antartica,12,San Gregorio,12104.0,681.0,2020-12-11,0.0
26456,Magallanes y la Antartica,12,Timaukel,12303.0,282.0,2020-12-11,0.0
26457,Magallanes y la Antartica,12,Torres del Paine,12402.0,1021.0,2020-12-11,1.0
26458,Magallanes y la Antartica,12,Desconocido Magallanes,,,2020-12-11,0.0


In [78]:
df[(df["Comuna"]=="Antofagasta")]

Unnamed: 0,Region,Codigo region,Comuna,Codigo comuna,Poblacion,Fecha,Casos actuales
15,Antofagasta,2,Antofagasta,2101.0,425725.0,2020-04-13,50.0
393,Antofagasta,2,Antofagasta,2101.0,425725.0,2020-04-15,55.0
771,Antofagasta,2,Antofagasta,2101.0,425725.0,2020-04-17,63.0
1149,Antofagasta,2,Antofagasta,2101.0,425725.0,2020-04-20,88.0
1527,Antofagasta,2,Antofagasta,2101.0,425725.0,2020-04-24,128.0
...,...,...,...,...,...,...,...
24585,Antofagasta,2,Antofagasta,2101.0,425725.0,2020-11-27,121.0
24963,Antofagasta,2,Antofagasta,2101.0,425725.0,2020-11-30,120.0
25341,Antofagasta,2,Antofagasta,2101.0,425725.0,2020-12-04,117.0
25719,Antofagasta,2,Antofagasta,2101.0,425725.0,2020-12-07,106.0


If you want to install a new package, you use the following command **in the terminal**.

In [None]:
!conda install #something

For example

In [79]:
!conda install requests



`conda` is the Anaconda package manager. Note that you do not install packages from inside Python. You must do it from the command line.