# Agenda

1. Series
2. Creating them
3. Retrieving with `[]`, `.loc`, `.iloc`
4. Indexes and how that affects retrieval
5. Broadcasting and retrieving

In [1]:
import pandas as pd

In [2]:
from pandas import Series

In [3]:
s = Series([10, 20, 30, 40, 50])

In [4]:
s

0    10
1    20
2    30
3    40
4    50
dtype: int64

In [5]:
s = Series([10, 20, 30.5, 40, 50])

In [6]:
s

0    10.0
1    20.0
2    30.5
3    40.0
4    50.0
dtype: float64

# Lists vs series

A list can contain any objects we want. They are traditionally all of the same type, but they don't have to be.

In a series, all values *must* be of the same dtype.

In [7]:
# How can I retrieve from a series?

s[0]

10.0

In [8]:
s[1]

20.0

In [9]:
type(s[1])

numpy.float64

In [10]:
s[-1]  # can I get the final value 

KeyError: -1

# Don't use `[]` by themselves on a series!

Yes, using `[]` will work when you're working with a series. But it's a bad habit to get into, especially since when we start to work with data frames, we'll be using `[]` to refer to the columns, rather than to the rows.

What do you use instead?

You can use `.loc` and `.iloc`.

Right now, these are (almost) identical in behavior. However, we will soon see that they don't have to be.

The basic idea is that you use `.loc` with `[]` after it, and the index you want inside of the `[]`.

In [11]:
s.loc[0]

10.0

In [12]:
s.loc[1]

20.0

In [13]:
# what, then, is .iloc if .loc uses the index?
# .iloc uses the position, starting with 0

s.iloc[0]

10.0

In [14]:
s.iloc[1]

20.0

# Methods we can run on our series

- `min`
- `max`
- `mean`
- `std`
- `count` (how many non-NaN values are in there)
- `median`

# Exercise: Weather report

1. Define a series containing the max temperature of where you live over the coming 10 days.
2. What will be the mean temperature?
3. What will be the median? Are they significantly different, and does that matter?


In [15]:
s = Series([33, 38, 27, 24, 23, 24, 25, 27, 32, 35])
s

0    33
1    38
2    27
3    24
4    23
5    24
6    25
7    27
8    32
9    35
dtype: int64

In [16]:
s.mean()

28.8

In [17]:
s.sum() / s.count()

28.8

In [18]:
s.median()

27.0

# Setting the index

An index in a Pandas series can be basically any data type at all, and can contain whatever values you want.

You can (almost) think of it as a dictionary, but with even fewer restrictions on what it can contain.

In [19]:
s = Series([10, 20, 30, 40, 50])
s.index  # what is the index on my series?

RangeIndex(start=0, stop=5, step=1)

In [20]:
# I can replace the index by assigning to it
# so long as I assign the right number of values, that's fine

s.index = [2,4,6,8,10]

In [21]:
s

2     10
4     20
6     30
8     40
10    50
dtype: int64

In [22]:
# now we can see the difference between .loc and .iloc

s.loc[6]

30

In [23]:
s.iloc[6]

IndexError: single positional indexer is out-of-bounds

In [24]:
x = 'abcd'
x.upper()

'ABCD'

In [25]:
str.upper(x)

'ABCD'

In [26]:
type(s)

pandas.core.series.Series

In [28]:
type(pd.core.series.Series.index)

pandas._libs.properties.AxisProperty

In [29]:
s.index = list('abcde')
s

a    10
b    20
c    30
d    40
e    50
dtype: int64

In [30]:
s.loc['b']

20

In [31]:
s.loc['c']

30

In [32]:
s.iloc[1]

20

In [33]:
s.iloc[2]

30

In [34]:
s.iloc[3]

40

In [37]:
# I can always use : to show a range
s.iloc[1:4]  # up to and *NOT* including

b    20
c    30
d    40
dtype: int64

In [38]:
s.loc['b':'d']   # up and *INCLUDING*

b    20
c    30
d    40
dtype: int64

When I say

    s.iloc[a:b]

We get up to and not including b.  But if I say

    s.loc[a:b]

we get *including* b!



In [39]:
s

a    10
b    20
c    30
d    40
e    50
dtype: int64

In [41]:
s = Series([10, 20, 30, 40, 50])

s.loc[2]

30

In [42]:
s.loc[4]

50

In [43]:
# what if I want both of them?
s.loc[ [2,4] ]     # this is known as "fancy indexing" -- I can retrieve more than one value at a time

2    30
4    50
dtype: int64

In [44]:
s.loc[ [2,4] ].mean()

40.0

In [45]:
# what if our series has a custom index?

s.index = list('abcde')

s.loc[['b', 'd']]

b    20
d    40
dtype: int64

In [46]:
# I can set the index when I create the series by passing the keyword argument index= and a list of values

s = Series([10, 20, 30, 40, 50],
           index=list('abcde'))
s

a    10
b    20
c    30
d    40
e    50
dtype: int64

In [47]:
# how about this?

s = Series([10, 20, 30, 40, 50],
           index=list('abcab'))
s

a    10
b    20
c    30
a    40
b    50
dtype: int64

In [48]:
# you can have an index that repeats!

s.loc['a']

a    10
a    40
dtype: int64

In [49]:
s.loc[['a', 'b']]

a    10
a    40
b    20
b    50
dtype: int64

In [50]:
s.loc['a':'c']  # slice from a-c

KeyError: "Cannot get left slice bound for non-unique label: 'a'"

In [51]:
s.index

Index(['a', 'b', 'c', 'a', 'b'], dtype='object')

In [52]:
list(s.index)

['a', 'b', 'c', 'a', 'b']

In [None]:
s.index.