# Intro to Pandas
by Ryan Orsinger

## Module 1: Intro to pandas series
- Creating series from Python collections
- Describing
- Doing math on series of numbers
- Filtering series
- Operating on series of strings
- Using built-in series attributes and methods

## What is pandas?
- The leading data analysis library for Python
- Built for acquiring, cleaning, organizing, and analyzing data

## So what?
- Pandas is ubiquitous for data tasks in Python
- Pandas is also *fast*, faster than base Python
- Enables accomplishing more with less code

### Pandas Series Part 1 of 3
- Creating series objects
- Assigning series
- Doing math on series
- Describing a series

In [1]:
import pandas as pd

In [2]:
pd.Series([7, 8, 9])

0    7
1    8
2    9
dtype: int64

In [3]:
# Assigning a series to a variable 
results = pd.Series([True, False, True])
results

0     True
1    False
2     True
dtype: bool

In [4]:
# Series can be any Python data type
colors = ["red", "orange", "yellow", "green", "blue", "indigo", "violet"]
colors = pd.Series(colors)
colors

0       red
1    orange
2    yellow
3     green
4      blue
5    indigo
6    violet
dtype: object

In [5]:
# We can assign ranges to make series of numbers
numbers = pd.Series(range(-3, 3))
numbers

0   -3
1   -2
2   -1
3    0
4    1
5    2
dtype: int64

In [6]:
# We can do arithmetic on entire series with our math operators
numbers + 1

0   -2
1   -1
2    0
3    1
4    2
5    3
dtype: int64

In [7]:
# Pandas, like Python, follows PEMDAS order of operations
numbers * 2 + 5

0   -1
1    1
2    3
3    5
4    7
5    9
dtype: int64

In [8]:
# Notice how Python's built-in operators work on the entire series
numbers ** 2

0    9
1    4
2    1
3    0
4    1
5    4
dtype: int64

In [9]:
# We can take the square root by raising to the 1/2 power
numbers ** (1/2)

0         NaN
1         NaN
2         NaN
3    0.000000
4    1.000000
5    1.414214
dtype: float64

In [10]:
# Notice that arithmetic does not change the original series
numbers

0   -3
1   -2
2   -1
3    0
4    1
5    2
dtype: int64

In [11]:
# Assigning the result of an operation to a new variable
tripled = numbers * 3
tripled

0   -9
1   -6
2   -3
3    0
4    3
5    6
dtype: int64

In [12]:
prices = pd.Series([1.30, 2.50, 2.50, 5.60, 10.10])
prices

0     1.3
1     2.5
2     2.5
3     5.6
4    10.1
dtype: float64

In [13]:
# Reassigning a variable to overwrite the values with the result of an operation
prices = prices * .8
prices

0    1.04
1    2.00
2    2.00
3    4.48
4    8.08
dtype: float64

In [14]:
sequence = pd.Series([7, 8, 8, 9, 9, 9])
sequence

0    7
1    8
2    8
3    9
4    9
5    9
dtype: int64

In [15]:
# The .index attribute returns information about the index
# Zero based integer indexes are the default
# Pandas can also use strings and dates as index values
sequence.index

RangeIndex(start=0, stop=6, step=1)

In [16]:
# The .dtype attribute returns
sequence.dtype

dtype('int64')

In [17]:
# The .values attribute returns only the values from a pandas dataset
sequence.values

array([7, 8, 8, 9, 9, 9])

In [18]:
# The .shape attribute returns the 
sequence.shape

(6,)

In [19]:
# .value_counts returns a frequency count of values
# The index is the value
sequence.value_counts()

9    3
8    2
7    1
dtype: int64

In [20]:
# Mode is the most frequently occurring value in a dataset
sequence.mode()

0    9
dtype: int64

In [21]:
# Median is the ordinal middle of the sorted data
sequence.median()

8.5

In [22]:
# Average
sequence.mean()

8.333333333333334

In [23]:
# Standard deviation is a measure of spread
sequence.std()

0.816496580927726

In [24]:
# The .describe method outputs descriptive statistics
prices.describe()

count    5.000000
mean     3.520000
std      2.849842
min      1.040000
25%      2.000000
50%      2.000000
75%      4.480000
max      8.080000
dtype: float64

### Exercise check-in, part 1 of 3
- Create a series named `a` that is the numbers `[1, 2, 3, 4, 5]`
- Create a series named `b` that is the numbers `[1, 1, 2, 3, 5]`
- Square `a` and reassign to the variable `a`
- Square `b` and reassign to the variable `b`
- Add the squares `a` and `b`. Assign to a variable named `sum_of_squares`
- Now take the square root of that sum (*hint* raising to the 0.5 power takes the square root)

In [25]:
# Create a series named "a" and assign it the numbers [1, 2, 3, 4, 5]

In [26]:
# Create a series named "b" and assign it the numbers [1, 1, 2, 3, 5]

In [27]:
# Square all the numbers in a and reassign the result to a

In [28]:
# Square all the numbers in b and reassign the result to b

In [29]:
# Create a series named sum_of_squares that holds the sum of a and b squares

In [30]:
# Evaluate the square root of that sum_of_squares