## AGENDA

1. Series
2. Creating them
3. Retrieving with '[]', '.loc', '.iloc'
4. Indexs and how that affects retrieval
5. Broadcasting and retrieving

In [None]:
import pandas as pd

In [1]:
from pandas import Series

In [2]:
s = Series([10,20,30,40,50])

In [3]:
s

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [4]:
s = Series([10,20,30.5,40,50])

In [5]:
s

0    10.0
1    20.0
2    30.5
3    40.0
4    50.0
dtype: float64

## Lists vs. series

A list can contain any objects we want. They are traditionally all of the same type, but they don't have to be.

In a series, all values *must* be of the same dtype.

In [6]:
# How can I retrieve from a series?

s[0]

10.0

In [7]:
s[1]

20.0

In [8]:
type(s[1])

numpy.float64

In [9]:
s[-1] #can I get the final value?

KeyError: -1

# Don't use '[]' by themselves on a series!

Yes, using '[]' will work when you're working with a series. But it's a bad habit to get into, especially since when we start to work with data frames, we'll be using '[]' to refer to  the columns, rather than the rows.

What do you use instead?

You can use '.loc' and '.iloc'

Right now, these are (almost) identical in behavior. However, we will soon see that they don't have to be.

The basic idea is that you use '.loc' with '[]' after it, and the index you want inside of the '[]'.

In [10]:
s.loc[0]

10.0

In [11]:
s.loc[1]

20.0

In [12]:
#what, then is .iloc if .loc uses index?
# .iloc uses the position, starting with 0

s.iloc[0]

10.0

In [13]:
s.iloc[1]

20.0

## Methods we can run on our series

- 'min'
- 'max'
- 'mean'
- 'std'
- 'count' (how many non-NaN values are there)
- 'median'

## Exercise: Weather report

1. Define a series containing the max temp of where you live over the coming 10 days.
2. What will be the mean temperature?
3. What will be the median? Are they significantly different and does that matter?

In [18]:
temp = Series([70,75,80,84,65,70,72,75,73,69])

In [19]:
temp.mean()

73.3

In [20]:
temp.median()

72.5

## Setting the index

An index in a Pandas series can be basically any data type at all, and can contain whatever values you want.

You can (almost) think of it as a dictionary, but with even fewer restrictions on what it can obtain.

In [23]:
s = Series ([10,20,30,40,50])
s.index #what is the index of my series?

RangeIndex(start=0, stop=5, step=1)

In [25]:
# I can replace the index by assigning to it
# so long as I assign the right number of values, that's fine

s.index = [2,3,6,8,10]

In [26]:
s

2     10
3     20
6     30
8     40
10    50
dtype: int64

In [29]:
s.loc[6]

30

In [30]:
s.iloc[6]

IndexError: single positional indexer is out-of-bounds

In [31]:
x = 'abcd'
x.upper()

'ABCD'

In [33]:
str.upper(x)

'ABCD'

In [34]:
type(s)

pandas.core.series.Series

In [35]:
type(pd.core.series.Series.index)

NameError: name 'pd' is not defined

In [36]:
s.index = list('abcd')
s

ValueError: Length mismatch: Expected axis has 5 elements, new values have 4 elements