# Pandas Series

"pandas" is a Python package providing data structures to work on relational and labeled data. It is designed to be efficient and intuitive.

The two main data structures in Pandas are <b>Series</b> and <b>DataFrame</b>. Series is a one-dimensional labeled array, while DataFrame is a two-dimensional tabular data. This module focues on series. 

The convention is to import pandas as <i>pd</i>.

In [None]:
import pandas as pd

First, we load the data set in <i>students.csv</i> that is in the current folder, store it in a DataFrame called <i>df</i>, and use the students' name column as the index for easy identification. We'll talk more about pandas in the next module.

In [None]:
df = pd.read_csv('/content/drive/MyDrive/AMA/03a_PandasSeries/students.csv', index_col='Name')
# Or index by number "index_col=0" 

In [None]:
df

In [None]:
df.head()

Note that NaN indicates missing value.

# Series

In this lecture, we will mostly focus only on the column <i>hw1</i>. Let's make a Series of hw1 scores. 

A Series is a one-dimensional array of data (<b>values</b>) and an associated array of data labels (<b>index</b>). In this example, the <b>index</b> is the student name and the <b>value</b> is the score in hw1.

You can access the column in the df via using a **[' ']** or a **.**

In [None]:
hw1 = df['hw1']
# Or equivalently:
# hw1 = df.hw1

In [None]:
# Or equivalently:
# hw1 = df.hw1

In [None]:
hw1

In [None]:
type(hw1)

## Properties of a Series: index and values

Return the index as an Index object and the values as ndarray

In [None]:
hw1.index

In [None]:
type(hw1.index)

In [None]:
hw1.values

In [None]:
type(hw1.values)

The length of hw1, a.k.a., the number of elements:

In [None]:
len(hw1)

## Summary statistics using the describe() method

In [None]:
hw1.describe()

<div class="alert alert-block alert-info"> 
**Tech Note**: 
You can also use ***tab*** to perform the **auto-filled**. Type the partial function name from beginning, then presss ***tab***. It will auto-fill the function name or bring up a pop-up window with matching multiple choices.
</div>

<div class="alert alert-block alert-info"> 
**Tech Note**: To bring up the **on-line help** for a particular function, type the function name, then press ***shift-tab***. 
</div>

## Aggregate functions (max, min, mean, ...)

An aggregate function performs a calculation on a set of values, and returns a single value. `pandas.Series` offers several such aggregate functions.

The minimum and maximum grade among all students

In [None]:
hw1.max()

In [None]:
hw1.min()

The average grade among all students

In [None]:
hw1.mean()

<div class="alert alert-block alert-info"> 
**Tech Note**: To check how many functions and data objects are available for an object( in this case **hw1**, a **Series**). Type ***hw1.*** then press ***tab***
</div>

Exercise: read the above tech note, and find the function to calculate the median grade 

Exercise: The sum of all grades

## Selection

## <i>.iloc[...]</i>: position-based selection 

Selects rows using the positional index. It is like accessing a list of elements, with one big difference: we can access the values using <b>slices</b>.

#### Using one index value

Access the 4-th value. It returns one value.

In [None]:
hw1.iloc[3]

Exercise: access the last value.

In [None]:
hw1.iloc[-1]

#### Using slices

Retrieve multiple values: 1st, 2nd and 5th.

In [None]:
hw1.iloc[[0,1,4]]

**Caution!**
+ The above code returns a Series object. 
+ And it returns a view, not a copy.

<div class="alert alert-block alert-info"> 
**Tech Note** : Python uses the [ ] operator for both indexing and for constructing a list. The outer [  ] in hw1.iloc[[0,1,4]] is performing the indexing, and the inner is creating a list.
</div>

Retrieve all elements from the 3rd to the 7th (included). It returns a Series. <b>Caution!</b> Slicing as `2:7` below creates a list.

In [None]:
hw1.iloc[2:7]

Exercise: Select the "second to last" student of the Series. Make sure to retrieve both the name and the grade.

## <i>s[...]</i>: index-based selection 

Selects rows using the index (using a label value, a slice of label values, or a Boolean selection). It is like accessing a Dictionary of elements, with one big difference: we can access the values using <b>slices</b> and <b>boolean selection</b>.

#### Using a label value

Find Luci's hw1 grade.

In [None]:
hw1['Luci']

#### Using a slice of label values (rarely used)

Find the grades from Luci's to Michael's

In [None]:
hw1['Luci':'Michael']

Exercise: What is Michael's hw1 score?

Now change `[]` to `[[]]` in the code and observe how the output differs from above:

## Boolean selection

The binary operators >,<,>=,<=,==,!= can be used to create a Series of booleans to identify those elements whose value satisfy a certain condition

<b>Problem</b>: Find the students whose grade is greater than or equal to 6

First, create a boolean Series

In [None]:
hw1 >= 6

Second, select only those students who have a "True" in the boolean Series above

In [None]:
hw1[hw1>=6]

We can specify multiple concurrent conditions using `&` for AND and `|` for OR. For example, select those students whose hw1 score is less than 5 or greater than 9


In [None]:
hw1[(hw1<5)|(hw1>9)]

Exercise: Compute the average hw1 grade among those students whose grade is less than or equal to 6


## More Series methods

### rank

Ranks each row based on the value (where by default low values get low rank numbers. It does **NOT** reorder the list. The rank number is **NOT** the original value. 

In [None]:
hw1.rank()

### idxmax and idxmin

Find the index of the row with maximum and minimum values


In [None]:
hw1.idxmax()

In [None]:
hw1.idxmin()

### sort_values

Sort by values


In [None]:
hw1.sort_values()

### sort_index

Sort by index

In [None]:
hw1.sort_index()

### nlargest and nsmallest

Finds the n items with largest or smallest value


In [None]:
hw1.nlargest(4)

In [None]:
hw1.nsmallest(3)

### head and tail

Returns the first (or last) rows according to the positional index


In [None]:
hw1.head(3)

In [None]:
hw1.tail(4)

## Exercises

Explore the parameters of the method "rank" to solve this question. Find the rank of each student (1=best, 10=worst) and deal with ties in the way that makes most sense to you. *Hint:* use `ascending=False, method='min'`

Who got the 4th highest grade? Return both name and grade. (there are multiple ways to solve this)

Retrieve the row of  the person who comes last in alphabetical order.

Retrieve the name only of the person who comes last in alphabetical order.

Retrieve the grade only of the person who comes last in alphabetical order.

Among those whose name starts with ‘J’, who got the highest grade?

## Operations on one Series

### Operations between a scalar and a Series

Operations between a Series and a scalar(a real number) are performed element-wise on the values.

<b>Example</b>: It's Christmas time! As a gift, we want to increase everyone's grade by 5. What will the new grades be?

In [None]:
hw1 + 5

What if we wanted to multiply by 2 each grade?

In [None]:
hw1 * 2

### abs

Returns the absolute value of all values

In [None]:
hw1.abs()

## Operations between two Series

Operations between two Series are performed element-wise on those elements with the same index label.

Let's create a Series of the hw2 grades. Remember that we have a dataframe object, <i>df</i>

In [None]:
hw2 = df['hw2']
hw2

The operation is executed between elements *with the same index label*. For example, let's add up hw1 and hw2 grades.

In [None]:
hw1 + hw2

Compute everyone's average grade

In [None]:
(hw1 + hw2)/2

## Exercises

<p>The average grade of hw1 is too low. We want to normalize it to 8. To this end, do the following <b>in one single command</b>:
<ol>
<li>decrease everyone's grade by the average grade (this will set the new average to 0)</li>
<li>increase everyone's grade by 8</li>
</ol>
</p>

To verify it ..

Compute the average grade between hw1 and hw2 of each student. Which student has the average closest to 6.7?
