# Introduction to Data Analysis & Python

## Lecture 1: Overview & Introduction to Python

### September 17, 2019
### Tristan Glatard

![xkcd](img/python.png)
Source: [xkcd](https://xkcd.com/353)

# 1. Meet Python

![python logo](https://www.python.org/static/img/python-logo.png)

* Birth date: 1994
* Python 2.0: 2000
* *Python 3.0: 2008*



## Start the Python interpreter...

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

* Simply type `python` and check the "prompt":
```
$ python
Python 3.7.3 (default, May 11 2019, 00:38:04) 
[GCC 9.1.1 20190503 (Red Hat 9.1.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
```

## ... and write your first Python program

In [11]:
2 + 90

92

# Have you heard of or used any other programming language?

## Motivations for Python vs other languages

* Python is very popular in data science and engineering: check [SciPy](https://scipy.org), [scikit-learn](http://scikit-learn.org)
* Python is free software ([as in freedom](https://www.fsf.org))
* Python is portable, available for all major operating systems
* Python is a versatile language, "[the second best language for everything](http://pypl.github.io/PYPL.html)"
* Python has a lively online community, active on [Stackoverflow](https://stackoverflow.com) and many other forums


## Other notes
* Python is an interpreted language
* Python is an object-oriented language

We focus on Python 3. Be careful, there are important differences between Python 3 and Python 2.


# Today's goals

* Introductions (meet Python, me, you, datasets, the course)
* Working in the Python shell
* Creating and comparing variables of different types

# 2. Meet me :)

* [Tristan Glatard](https://users.encs.concordia.ca/~tglatard)
* Associate Professor, Computer Science and Software Engineering, Concordia
* [Big Data Infrastructures for Neuroinformatics](http://slashbin.ca)
* Lots of Python!
[![github](img/tristan.png)](http://github.com/glatard)

# 3. Meet you

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

* Fill in the [ice-breaker](https://docs.google.com/forms/d/1IfjXffSZxcdwbNT71DUaVkN7bwjFEQ0QYbxoC1DZVR0/viewform?edit_requested=true) survey
* Any expectation for the course?

# 4. Meet the course 

## Objectives
* Gain basic understanding of tasks in “Data Analysis”
* Learn a foundation for data science
* Develop basic skills in Python

## Resources
* [Moodle page](https://moodle.concordia.ca/moodle/course/view.php?id=121207)
* [Slack channel](https://moodle.concordia.ca/moodle/mod/url/view.php?id=2094798)
* [Course outline](https://moodle.concordia.ca/moodle/mod/resource/view.php?id=2094809)

## Keys to success

* Repetition, repetition, repetition, repetition, repetition, repetition…
* Failure
* Asking questions
* Googling


# 5. Meet datasets

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

1. Look at [Scikit learn toy datasets](https://scikit-learn.org/stable/datasets/index.html#toy-datasets)
2. Choose one dataset
3. Write two sentences to describe it (and be ready to read them to the class)
4. Propose a question that could be asked from your dataset
5. Write it [here](https://docs.google.com/document/d/1eWuvAvBdCvYkpGgdWBEXoNuFHxMoJh9gPNX4m-OHmdY/edit)

# 6. Core Python

## Numbers

Python can manipulate various kinds of numbers:
* Integers
* Floating-point numbers
* Booleans 

In [12]:
1  # an integer

1

In [13]:
2.0  # a floating-point number

2.0

In [14]:
True  # a boolean

True

... and it can do operations on them:

## Arithmetic: +, -, /, *, **

In [15]:
2 ** 3

8

## Basic math: abs(), max(), min(), sum()

In [16]:
abs(-1)

1

In [17]:
max(2, 5)

5

In [18]:
sum([4, 5])

9

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

In your Python shell, create numbers and try out a few functions.

# Comparisons

## ==, <, >, is

In [19]:
1 == 1.0

True

In [20]:
(1 == 2) > 3

False

In [21]:
1 is 1.0

False

In [22]:
(1+4) > 3

True

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

Try a few comparisons. Pay attention to **==** vs **is** vs **=**.

# *math* module

A *module* is a collection of Python functions that can be imported on demand. The *math* module is one of the most common ones:

In [23]:
import math # imports all the functions in the module
math.cos(1)

0.5403023058681398

In [24]:
from math import sin # imports a particular function in the module
sin(1) # note the absence of math.

0.8414709848078965

In [25]:
from math import log as l # imports and renames a function
l(1)

0.0

# Getting help!

* *help()* is a built-in function to provide help on modules and functions.

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

* Look at `help(math)` and try a few functions. Pay attention to function parameters.


# Strings

In Python, strings can be single- double- or triple-quoted:


In [26]:
'Hello," World!'  # here is a double-quoted string

'Hello," World!'

In [30]:
'Hello, World!'  # here is a single-quoted string

'Hello, World!'

In [31]:
'''A double-quoted string with a double quote: "'''

'A double-quoted string with a double quote: "'

In [32]:
# A triple quoted string 
'''Dear <user>, "  ' ' "" '' "" ''
Please stop trying to break in our system, 
you are really not supposed to do this!
'''

'Dear <user>, "  \' \' "" \'\' "" \'\'\nPlease stop trying to break in our system, \nyou are really not supposed to do this!\n'

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/> Create strings containing the following text: 
1. What a nice day!
2. Hi, it's me!
3. Jane said: "let's get some coffee"

# Slicing (bonus)

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

Try this:

In [33]:
"Hello world"[3:6]

'lo '

# Data types and conversions

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

* Try to compare two strings with > or <: what happens?
* Try to "add" two strings with +: what happens?
* Try to "add" a string and a number: what happens?
* What is the difference between "2" + "3" and 2 + 3?

Function `type()` can be used to get the type of a value:

In [34]:
type("3")

str

The following built-in functions can be used to convert values between types: int(), float(), str(). For instance:
    

In [35]:
int("3")

3

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/>

* Using type conversion, modify the following program so that it returns "5":

In [36]:
int("3"+"2")

32

# Variables

Variables store values. They are *assigned* values with **=** (not **==**). They can be named anything as long as they don't contain special characters: 

Variables can change type throughout their lifetime: 

In [37]:
a = 1  # a is an integer
type(a)

int

In [38]:
a = 1.0 # now a is a float
type(a)

float

In [40]:
b = "Hello"

type(b)


str

# Printing variables

In [41]:
a = 1
print(a)  # the very useful print function

1


In [42]:
print("Here is the {0} value of a: '{1}'".
       format(a, b))  # the useful format function

Here is the 1 value of a: 'Hello'


<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/> 

1. Define two variables a and b
2. Make some operation on a and b
3. Format and print the result of the operation as follows: "{a} multiplied by {b} is {result}".


In [1]:
a = 3
b = 7
c = a * b
print('{} multiplied by {} is {}'.format(a, b, c))

3 multiplied by 7 is 21


# String functions


<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/> 

* Look at `help(str)` and try out a few functions such as: replace, find, strip, split, startsWith

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/> 

* Write a greeter program that prints "Hello {name}" where `{name}` is replaced by the content of variable `name`.



In [2]:
name = 'Peter'
print("Hello {}".format(name))

Hello Peter


<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/> 

**Bonus**
* Download a news article about a famous character, put it in a string, and write a program that replaces all occurrences of their name with your first name.

# User input

Function *input()* can get input from the user:

In [1]:
# A simple echo program
a = input("Enter a number:")
print("echo:", a)

Enter a number:123
echo: 123


<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/> 

* Update your greeter program to print "Hello {name}" where `{name}` is obtained from user input.

In [3]:
name = input("Enter your name: ")
print("Hello {}".format(name))

Enter your name: Mary
Hello Mary


# Date and time

The time package provides functions to deal with time:

In [3]:
import time
print("Sleeping...")
time.sleep(2)  # sleeps for 5 seconds
print("Done!")

Sleeping...
Done!


The datetime package provides a data format to represent dates:

In [4]:
import datetime
datetime.datetime.now()

datetime.datetime(2019, 9, 24, 19, 1, 12, 207054)

In [46]:
print(datetime.datetime.now())

2019-09-20 15:20:07.366527


<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/> 

* Try to add or subtract `datetime` variables

* **Bonus**: Write an alarm clock: 
    1. Define an alarm date/time in a variable named `alarm`
    2. Sleep until the alarm date/time
    3. Print "Drrrinnng!!!"

In [12]:
from time import sleep
from datetime import datetime
alarm = datetime(2019, 10, 3, 21, 7, 30)
sleep((alarm - datetime.now()).seconds)
print('Drrring!!!')

Drrring!!!


# Tuples

Tuples are collections of values:

In [6]:
t = (1, 2, 3, 4, 5, 6, 7) # tuple of integers
print(t)

(1, 2, 3, 4, 5, 6, 7)


In [9]:
print(t[0])  # accessing items in a tuple

1


In [16]:
u = ("Hello", 1.0, 2, "one more", "error" ) # tuple of heterogeneous values
print(u)

('Hello', 1.0, 2, 'one more', 'error')


In [22]:
concordia, b, c, d, mcgill = u # unpacking
mcgill

'error'

Tuples are immutable, they can't be modified:

In [11]:
u[0] = 2

TypeError: 'tuple' object does not support item assignment

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/> 

* Modify your greeter program so that it greets a person defined by a tuple (name, age). For instance, from ("John", 21) it should print "Hello John, you are 21 years".

In [13]:
person = ("Jane", 58)
print("Hello {}, you are {} years".format(person[0], person[1]))

Hello Jane, you are 58 years


# Lists

Lists resemble tuples, but they are mutable:

In [23]:
l = [ 1, "a", 3.0 ]
print(l)

[1, 'a', 3.0]


In [24]:
l[1] = 'hello'
print(l)

[1, 'hello', 3.0]


<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/> 

* Try to add two lists: what happens?
* Try to add a list and a number: what happens?



<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/> 
* Look at `help(list)` and try a few methods: sort, reverse, count


<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/> 
1. Create a list of integers, for instance `[12, 4, 6, 8, 7]`, and assign it to variable `numbers`.
2. Print the second largest integer in `numbers`.

In [18]:
ints = [12, 4, 6, 8, 7]
list.sort(ints)
ints[-2]      


8

# Slicing (bonus)

Try this:

In [54]:
a = [1, 2, 3, 4, 5]
a[2:4]

[3, 4]

In [55]:
a[2:4] = ["three", "four"]

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/> 
Let a be a list:
1. Create a list containing all the elements in a, except the last 2.
2. Create a list containing only the first 2 elements of a, in reverse.

In [19]:
a = ["one", "two", "three", "four", "five"]
a_except_last_2 = a[:-2]
print(a_except_last_2)

['one', 'two', 'three']


In [33]:
b = a[0:2]
list.reverse(b)
print(b)

['two', 'one']


# Dictionaries

Dictionaries associate *keys* and *values*:

In [28]:
john = { "age": 21, "name": "John", 12: "somewhere" }
john["name"]

'John'

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Antu_task-complete.svg/1024px-Antu_task-complete.svg.png" width=50 align=left/> 

* Create a dictionary where keys are person names, and values are person ages
* Use it to print the age of a person whose name is specified in a variable
* For instance, `print(d[name])` should print the age of John when `name="John"`

In [34]:
directory = { "Jane": 20, "John":21, "Jane": 48, "Paul": 63 }
name = "Jane"
print(directory[name])

48


# Back to today's goals

* Introductions
* Working in the Python shell
* Creating and comparing variables of different types

# References (bonus)

Mutable objects are references:

In [58]:
a = ['a', 'b', 'c']
b = a # b is a reference, i.e., an 'alias' for a
b[0] = 'qwerty' # Modify b
print(a) # a is modified too

['qwerty', 'b', 'c']


In [59]:
c = a[:] # c is an independent copy of a
c[0] = 'trewq' # Modify c
print(a) # a isn't modified

['qwerty', 'b', 'c']
