<center>

<h1>Intro to Data Analysis in Python</h1>

<h2>PyLadies Vancouver Workshop</h2>

<img src="img/logo-pyladies.jpeg" width="200px"></img>

</center>

## About Me

Hi, I'm Jennifer!

- Environmental scientist at UBC
- Used to code in Matlab and it was crushing my soul
- Switched to Python, fell in love, and never looked back!*

<small>*It was actually a little more complicated than that, as I'll explain shortly</small>

## Agenda

#### 1. Getting oriented

- Navigating the Python world as a data geek
- Human-centered, interactive tools: IPython and Jupyter
- Quick recap of Python basics

#### 2. The power of Pandas

- Loading and summarizing spreadsheet data
- Creating graphs
- Diving deeper into data analysis

#### 3. Onwards and upwards

- Visual storytelling with data: a brief tour of the Python landscape
- Next steps, ideas, and inspiration

- Environmental scientist at UBC
- Used to code in Matlab and it was crushing my soul
- Switched to Python, fell in love, and never looked back!*

<small>*It was actually a little more complicated than that, as I'll explain shortly</small>

## Navigating the Python world...
### ... as a data geek


## PyData Ecosystem

Image of Python logo - mention built-in functions (e.g. print(), abs(), sorted()) and built-in libraries (e.g. random, datetime)

## IPython Shell

In [25]:
print('Hello world!')

Hello world!


## Built-in functions

In [32]:
round?

In [33]:
round(3.14159, 2)

3.14

In [31]:
sorted?

In [28]:
range?

In [29]:
range(10)

range(0, 10)

In [30]:
list(range(10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

## Python Basics

In no particular order yet:

- Variable types - integer, float, string, Boolean, None, list
- Basic math operations, reassigning variables (e.g. `a = a + 1`)
- Functions and methods, sorted(), chaining methods
- Lists - indexing, slicing, concatenation and other operations, append(), extend(), remove(), range(), copies vs. pointers, nested lists, square brackets vs. parentheses, membership testing with `in`
- Tuples and zip(), set()

- Loops - iterating over lists, counting with enumerate(), indentation
- Conditionals and logic - if statements, logical operators, "in", "and", "or", "not"
- Dictionaries
- Strings - indexing, slicing, concatenation, long strings, upper(), lower(), capitalize(), startswith(), endswith(), find(), replace()
- Defining functions - e.g. shouting() -- upper case and periods to exclamation points

In [52]:
x = -20
print(x + 2)

-18


In [51]:
abs(x)

20

In [76]:
animals = ['dog', 'cat', 'rabbit', 'duck', 'goose']
print(animals)
print(sorted(animals))

['dog', 'cat', 'rabbit', 'duck', 'goose']
['cat', 'dog', 'duck', 'goose', 'rabbit']


In [77]:
mammals = animals[:3]
mammals

['dog', 'cat', 'rabbit']

In [78]:
mammals[1] = 'CAT'
mammals

['dog', 'CAT', 'rabbit']

In [79]:
animals

['dog', 'cat', 'rabbit', 'duck', 'goose']

## Built-in libraries

In [1]:
import calendar

In [8]:
calendar.isleap?

In [9]:
calendar.isleap(2018)

False

In [24]:
calendar.weekday?

In [12]:
calendar.weekday(2018, 5, 26)

5

In [14]:
from datetime import date
date.today()

datetime.date(2018, 5, 22)

In [7]:
import random

# Random decimal number in the interval [0, 1) (including 0, excluding 1)
x1 = random.random()
x1

0.5221095490284775

In [9]:
# Random integer in the range [a,b], including both end points
num = random.randint(1, 100)
print(num)


94


In [36]:
import pandas as pd

In [37]:
pd.read_csv?

In [80]:
weather = pd.read_csv('./data/weather_YVR.csv')
weather

Unnamed: 0,Datetime,Conditions,Temperature (C),Relativehumidity(%)
0,2018-05-21 00:00:00,Mostly Cloudy,12.9,78
1,2018-05-21 01:00:00,Mostly Cloudy,12.4,79
2,2018-05-21 02:00:00,Mostly Cloudy,12.8,83
3,2018-05-21 03:00:00,Cloudy,12.6,85
4,2018-05-21 04:00:00,Cloudy,12.3,84
5,2018-05-21 05:00:00,Cloudy,12.2,83
6,2018-05-21 06:00:00,Cloudy,12.3,83
7,2018-05-21 07:00:00,Mostly Cloudy,12.6,80
8,2018-05-21 08:00:00,Mostly Cloudy,13.1,79
9,2018-05-21 09:00:00,Mostly Cloudy,13.6,81


In [53]:
weather.head()

Unnamed: 0,Datetime,Conditions,Temperature (C),Relativehumidity(%)
0,2018-05-21 00:00:00,Mostly Cloudy,12.9,78
1,2018-05-21 01:00:00,Mostly Cloudy,12.4,79
2,2018-05-21 02:00:00,Mostly Cloudy,12.8,83
3,2018-05-21 03:00:00,Cloudy,12.6,85
4,2018-05-21 04:00:00,Cloudy,12.3,84


In [54]:
weather.head(3)

Unnamed: 0,Datetime,Conditions,Temperature (C),Relativehumidity(%)
0,2018-05-21 00:00:00,Mostly Cloudy,12.9,78
1,2018-05-21 01:00:00,Mostly Cloudy,12.4,79
2,2018-05-21 02:00:00,Mostly Cloudy,12.8,83


In [55]:
weather.tail(7)

Unnamed: 0,Datetime,Conditions,Temperature (C),Relativehumidity(%)
17,2018-05-21 17:00:00,Mainly Sunny,18.7,62
18,2018-05-21 18:00:00,Mainly Sunny,18.4,58
19,2018-05-21 19:00:00,Mainly Sunny,17.7,62
20,2018-05-21 20:00:00,Mainly Sunny,16.8,66
21,2018-05-21 21:00:00,Mainly Clear,14.0,80
22,2018-05-21 22:00:00,Mainly Clear,14.8,75
23,2018-05-21 23:00:00,Clear,13.5,76


In [56]:
weather.sample(4)

Unnamed: 0,Datetime,Conditions,Temperature (C),Relativehumidity(%)
22,2018-05-21 22:00:00,Mainly Clear,14.8,75
6,2018-05-21 06:00:00,Cloudy,12.3,83
4,2018-05-21 04:00:00,Cloudy,12.3,84
16,2018-05-21 16:00:00,Partly Cloudy,18.4,66


In [82]:
weather.describe()

Unnamed: 0,Temperature (C),Relativehumidity(%)
count,24.0,24.0
mean,14.770833,75.541667
std,2.248377,7.785042
min,12.2,58.0
25%,12.75,72.25
50%,14.0,78.5
75%,16.5,81.5
max,18.7,85.0


In [83]:
weather.max()

Datetime               2018-05-21 23:00:00
Conditions                   Partly Cloudy
Temperature (C)                       18.7
Relativehumidity(%)                     85
dtype: object

In [84]:
weather['Conditions'].value_counts()

Mostly Cloudy    11
Mainly Sunny      4
Cloudy            4
Partly Cloudy     2
Mainly Clear      2
Clear             1
Name: Conditions, dtype: int64

## Filters

In [85]:
mainly_sunny = weather['Conditions'] == 'Mainly Sunny'
mainly_sunny

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17     True
18     True
19     True
20     True
21    False
22    False
23    False
Name: Conditions, dtype: bool

In [86]:
weather[mainly_sunny]

Unnamed: 0,Datetime,Conditions,Temperature (C),Relativehumidity(%)
17,2018-05-21 17:00:00,Mainly Sunny,18.7,62
18,2018-05-21 18:00:00,Mainly Sunny,18.4,58
19,2018-05-21 19:00:00,Mainly Sunny,17.7,62
20,2018-05-21 20:00:00,Mainly Sunny,16.8,66


In [87]:
mainly_clear = weather['Conditions'] == 'Mainly Clear'
weather[mainly_clear]

Unnamed: 0,Datetime,Conditions,Temperature (C),Relativehumidity(%)
21,2018-05-21 21:00:00,Mainly Clear,14.0,80
22,2018-05-21 22:00:00,Mainly Clear,14.8,75


In [88]:
temp_C = 23
temp_F = (1.8 * temp_C) + 32
# Note: parentheses not necessary in this case, but good habit to get into
print(f'{temp_C} degrees C equals {temp_F} degrees F')

23 degrees C equals 73.4 degrees F


In [89]:
weather['Temperature (F)'] = (1.8 * weather['Temperature (C)']) + 32
weather.head()

Unnamed: 0,Datetime,Conditions,Temperature (C),Relativehumidity(%),Temperature (F)
0,2018-05-21 00:00:00,Mostly Cloudy,12.9,78,55.22
1,2018-05-21 01:00:00,Mostly Cloudy,12.4,79,54.32
2,2018-05-21 02:00:00,Mostly Cloudy,12.8,83,55.04
3,2018-05-21 03:00:00,Cloudy,12.6,85,54.68
4,2018-05-21 04:00:00,Cloudy,12.3,84,54.14


In [90]:
weather['Temperature (F)']

0     55.22
1     54.32
2     55.04
3     54.68
4     54.14
5     53.96
6     54.14
7     54.68
8     55.58
9     56.48
10    57.20
11    59.72
12    60.62
13    61.34
14    61.52
15    63.32
16    65.12
17    65.66
18    65.12
19    63.86
20    62.24
21    57.20
22    58.64
23    56.30
Name: Temperature (F), dtype: float64

In [91]:
weather2 = weather

In [92]:
weather2['Conditions'] = 'Raining cats and dogs'
weather2

Unnamed: 0,Datetime,Conditions,Temperature (C),Relativehumidity(%),Temperature (F)
0,2018-05-21 00:00:00,Raining cats and dogs,12.9,78,55.22
1,2018-05-21 01:00:00,Raining cats and dogs,12.4,79,54.32
2,2018-05-21 02:00:00,Raining cats and dogs,12.8,83,55.04
3,2018-05-21 03:00:00,Raining cats and dogs,12.6,85,54.68
4,2018-05-21 04:00:00,Raining cats and dogs,12.3,84,54.14
5,2018-05-21 05:00:00,Raining cats and dogs,12.2,83,53.96
6,2018-05-21 06:00:00,Raining cats and dogs,12.3,83,54.14
7,2018-05-21 07:00:00,Raining cats and dogs,12.6,80,54.68
8,2018-05-21 08:00:00,Raining cats and dogs,13.1,79,55.58
9,2018-05-21 09:00:00,Raining cats and dogs,13.6,81,56.48


In [93]:
# Our original dataframe is modified too!
# If we try to re-run any of the code in earlier cells, such as
# weather['Conditions'].value_counts(), the results will be wrong
weather

Unnamed: 0,Datetime,Conditions,Temperature (C),Relativehumidity(%),Temperature (F)
0,2018-05-21 00:00:00,Raining cats and dogs,12.9,78,55.22
1,2018-05-21 01:00:00,Raining cats and dogs,12.4,79,54.32
2,2018-05-21 02:00:00,Raining cats and dogs,12.8,83,55.04
3,2018-05-21 03:00:00,Raining cats and dogs,12.6,85,54.68
4,2018-05-21 04:00:00,Raining cats and dogs,12.3,84,54.14
5,2018-05-21 05:00:00,Raining cats and dogs,12.2,83,53.96
6,2018-05-21 06:00:00,Raining cats and dogs,12.3,83,54.14
7,2018-05-21 07:00:00,Raining cats and dogs,12.6,80,54.68
8,2018-05-21 08:00:00,Raining cats and dogs,13.1,79,55.58
9,2018-05-21 09:00:00,Raining cats and dogs,13.6,81,56.48
