## Querying Series

In [1]:
# A pandas Series can be queried either by the index position or the index label. If you don't give an 
# index to the series when querying, the position and the label are effectively the same values. To 
# query by numeric location, starting at zero, use the iloc attribute. To query by the index label, 
# you can use the loc attribute. 

# Lets start with an example. We'll use students enrolled in classes coming from a dictionary

import pandas as pd

student_classes = {'Alice': 'Physics',
                  'Jack': 'Chemistry',
                  'Molly' : 'English',
                  'Sam' : 'History'}

s = pd.Series(student_classes)
s

Alice      Physics
Jack     Chemistry
Molly      English
Sam        History
dtype: object

In [3]:
# So, for this series, if you wanted to see the fourth entry we would we would use the iloc 
# attribute with the parameter 3.

s.iloc[3]

'History'

In [4]:
# If you wanted to see what class Molly has, we would use the loc attribute with a parameter 
# of Molly.
s.loc['Molly']

'English'

In [5]:
# Keep in mind that iloc and loc are not methods, they are attributes. So you don't use 
# parentheses to query them, but square brackets instead, which is called the indexing operator. 
# In Python this calls get or set for an item depending on the context of its use.

# This might seem a bit confusing if you're used to languages where encapsulation of attributes, 
# variables, and properties is common, such as in Java.

In [7]:
# Pandas tries to make our code a bit more readable and provides a sort of smart syntax using 
# the indexing operator directly on the series itself. For instance, if you pass in an integer parameter, 
# the operator will behave as if you want it to query via the iloc attribute
s[3]

  s[3]


'History'

In [8]:
# If you pass in an object, it will query as if you wanted to use the label based loc attribute.
s['Molly']

'English'

In [10]:
# So what happens if your index is a list of integers? This is a bit complicated and Pandas can't 
# determine automatically whether you're intending to query by index position or index label. So 
# you need to be careful when using the indexing operator on the Series itself. The safer option 
# is to be more explicit and use the iloc or loc attributes directly.

# Here's an example using class and their classcode information, where classes are indexed by 
# classcodes, in the form of integers


class_code = {99: 'Physics',
             100: 'Chemistry',
             101: 'English',
             102: 'History'}

s = pd.Series(class_code)
s

99       Physics
100    Chemistry
101      English
102      History
dtype: object

In [13]:
# If we try and call s[0] we get a key error because there's no item in the classes list with 
# an index of zero, instead we have to call iloc explicitly if we want the first item.

# s[0]  provide an error
s.iloc[0]

'Physics'

In [14]:
# So, that didn't call s.iloc[0] underneath as one might expect, instead it 
# generates an error

# Now we know how to get data out of the series, let's talk about working with the data. A common 
# task is to want to consider all of the values inside of a series and do some sort of 
# operation. This could be trying to find a certain number, or summarizing data or transforming 
# the data in some way.

In [15]:
# A typical programmatic approach to this would be to iterate over all the items in the series, 
# and invoke the operation one is interested in. For instance, we could create a Series of 
# integers representing student grades, and just try and get an average grade

In [18]:
grades = pd.Series([90, 80, 70, 60])

total = 0
for grade in grades:
    total+= grade
print(total/len(grades))

75.0


In [19]:
# This works, but it's slow. Modern computers can do many tasks simultaneously, especially, 
# but not only, tasks involving mathematics.

# Pandas and the underlying numpy libraries support a method of computation called vectorization. 
# Vectorization works with most of the functions in the numpy library, including the sum function.

In [20]:
import numpy as np

total = np.sum(grades)
print(total/len(grades))

75.0
