# Pandas Series

- matplotlib, numpy, **pandas**, seaborn, scikit-learn, stats, scipy, ... 

## Intro

### In this lesson, you will learn about...

- Pandas Series
- Attributes
- Binning values
- Summarizing a series
- Vectorized operation using a user-defined function

### By the end of this lesson, you should be able to...

- Create a new series
- Perform vectorized operations on a series
- Access attributes of a series
- Describe values of a series (.describe, .value_counts)
- Peek into the series (.head, .tail, .sample)
- Sort values (sort_values, sort_index)
- Test for values in the series (.isin, .any, .all)
- Perform string manipulation (.str)
- Apply a user defined function to all items in a series (.apply)
- Bin continuous data to convert it to discrete (.cut)
- Plot series values (.plot)

### Agenda

1. About Pandas Series
2. Series Part 1
    - Create a Series
    - Vectorized Operations
    - Series Attributes: .index, .values, .dtype, .name, .size, .shape
    - Series Methods: .head, .tail, .sample, ,astype, .value_counts, .describe, .nlargest, .nsmallest, sort_values, .sort_index
3. Exercises, part I
4. Series Part II
    - Indexing and Subsetting
    - Series Attribute: .str
    - Series Methods: .any, .all, .isin, .apply
5. Exercises, part II
6. Series Part III
    - Binning
    - Plotting
7. Exercises, part III

## 1. About Pandas Series

A pandas Series object is a one-dimensional, labeled array made up of an autogenerated index that starts at 0 and data of a single data type.

A couple of important things to note about a Series:

- If I try to make a pandas Series using multiple data types like int and string values, the data will be converted to the same object data type; the int values will lose their int functionality.

- A pandas Series can be created in several ways; we will look at a few of these ways below. However, **it will most often be created by selecting a single column from a pandas Dataframe in which case the Series retains the same index as the Dataframe.** We will dive into this in the next two lessons: DataFrames and Advanced DataFrames.

______


Numpy vs. Pandas

- Numpy: Python library for representing n-dimensional arrays. 
- Pandas: Python library, built upon Numpy, for representing series and dataframes which are tabular structures. 
___________ 

Series vs. Dataframes

- Series: a one-dimensional, labeled array. A series has row names but no column name.   

- Dataframes: 2-d structures that represent datasets. Imagine a table with rows and columns. A dataframe has row names and column names. 

______

Series vs. List 

- Series contains an index, which can be thought of as a row name (often is a row number), which is a way to reference items. The index is stored with other meta-information (information about the series).   

- the elements are of a specific data type. The data type is inferred, but can be manually specified. 

_____ 

## 2. Series Part I

- Create a Series
- Series data types
- Vectorized Operations
- Series Attributes: .index, .values, .dtype, .name, .size, .shape
- Series Methods: .head, .tail, .sample, ,astype, .value_counts, .describe, .nlargest, .nsmallest, sort_values, .sort_index

Import Pandas

`import pandas as pd`

In [2]:
import pandas as pd
import numpy as np
from pydataset import data

### Create a Series

In practice, a Series will most often be created by selecting a single column from a pandas Dataframe in which case the Series retains the same index as the Dataframe. 

1. from a list
2. from a numpy array
3. from a dictionary
4. from a dataframe

**From a List**

In [3]:
my_list = [2, 3, 5]
type(my_list)

list

Using an index to access value in list is possible, but those indices are integers representing location and cannot be changed to be a name, datetime, etc. 

In [4]:
my_list[0]

2

Create series from list, similar to how you would convert a list to an array with `np.array(my_list)`, using `pd.Series(my_list)`.    

*Notice how the `S` is capitalized.*

In [5]:
my_series = pd.Series(my_list)

What kind of object is that? 

In [6]:
type(my_series)

pandas.core.series.Series

What's inside the series?

In [7]:
my_series

0    2
1    3
2    5
dtype: int64

- 3 rows, with the row indices (or row names) as [0, 1, 2]
- the values are [2, 3, 5]
- the datatype is int64 (i.e. will store LARGE integers)


**From an array**

In [8]:
my_array = np.array([8.0, 13.0, 21.0])

# create series from array
my_series = pd.Series(my_array)

type(my_series)

pandas.core.series.Series

In [9]:
my_series

0     8.0
1    13.0
2    21.0
dtype: float64

- 3 rows, with the row indices as [0, 1, 2]
- the values are [8.0, 13.0, 21.0]
- the datatype is float64

**From a dictionary**

In [10]:
labeled_series = pd.Series({'a' : 0, 'b' : 1.5, 'c' : 2, 'd': 3.5, 'e': 4, 'f': 5.5})
labeled_series
# you can create a series from a dictionary
# my_dict = 
# pd.Series(my_dict)

a    0.0
b    1.5
c    2.0
d    3.5
e    4.0
f    5.5
dtype: float64

**From a dataframe**

In [11]:
sleep_df = data('sleepstudy')
sleep_df.head()

Unnamed: 0,Reaction,Days,Subject
1,249.56,0,308
2,258.7047,1,308
3,250.8006,2,308
4,321.4398,3,308
5,356.8519,4,308


option 1: `.column_name`

In [12]:
sleep_series = sleep_df.Reaction
type(sleep_series)
sleep_series
# my_series
# column name not in quotation marks "" 

1      249.5600
2      258.7047
3      250.8006
4      321.4398
5      356.8519
         ...   
176    329.6076
177    334.4818
178    343.2199
179    369.1417
180    364.1236
Name: Reaction, Length: 180, dtype: float64

option 2: single bracket `[]`

In [13]:
sleep_series = sleep_df['Reaction'] # column name here must be in quotation marks
type(sleep_series)

pandas.core.series.Series

In the next lesson, we will learn about dataframes, but notice if I use double brackets to select the column, I end up with a dataframe, not a series. 

In [14]:
my_dataframe_that_resembles_a_series = sleep_df[['Reaction']]
type(my_dataframe_that_resembles_a_series)

pandas.core.frame.DataFrame

You can also see the difference between a series and a dataframe in the display type when they are printed, as seen below...

In [15]:
sleep_series

1      249.5600
2      258.7047
3      250.8006
4      321.4398
5      356.8519
         ...   
176    329.6076
177    334.4818
178    343.2199
179    369.1417
180    364.1236
Name: Reaction, Length: 180, dtype: float64

In [16]:
my_dataframe_that_resembles_a_series

Unnamed: 0,Reaction
1,249.5600
2,258.7047
3,250.8006
4,321.4398
5,356.8519
...,...
176,329.6076
177,334.4818
178,343.2199
179,369.1417


#### Summary

From a list, array, dictionary: 
- `myseries = pd.Series(<list or array or dictionary>)`

From existing dataframe: 

- `myseries = df['col_for_series']`
- `myseries = df.col_for_series`

### Pandas data types

Data types you will see in series and dataframes: 

- int: integer, whole number values  
- float: decimal numbers  
- bool: true or false values  
- object: strings  
- category: a fixed set of string values  
- a name, an optional human-friendly name for the series  


1. inferring
2. using `astype()`

#### Inferring

In [17]:
pd.Series([True, False, True])

0     True
1    False
2     True
dtype: bool

In [18]:
pd.Series(['I', 'Love', 'Codeup'])

0         I
1      Love
2    Codeup
dtype: object

In [19]:
my_series = pd.Series([1, 3, 'five'])
my_series

0       1
1       3
2    five
dtype: object

In [20]:
# filter out 'five' from the series and reassign
my_new_series = my_series[my_series != 'five']

my_new_series

0    1
1    3
dtype: object

#### Using astype()

In [21]:
my_new_series.astype('int')

0    1
1    3
dtype: int64

What would happen if we tried to change a series to a datatype that is cannot convert the values to? 

In [22]:
my_series.astype('int') 

ValueError: invalid literal for int() with base 10: 'five'

The sleep subject column in the Sleep dataframe is an ID representing a person/subject; therefore, we should store the values as an 'object' (string). 

In [23]:
sleep_subj_series = sleep_df['Subject'].astype('str')
sleep_subj_series

1      308
2      308
3      308
4      308
5      308
      ... 
176    372
177    372
178    372
179    372
180    372
Name: Subject, Length: 180, dtype: object

#### Summary

- Pandas will infer datatypes
- You can change datatypes upon creating the series `pd.Series(mylist).astype('int')` or later using "astype(x)" where x can be 'float', 'int', 'str', e.g. `myseries.astype('str')`
- astype('str') will show the series dtype = object. 

### Vectorized Operations

Like numpy arrays, pandas series are vectorized by default. E.g., we can easily use the basic arithmetic operators to manipulate every element in the series.

1. arithmetic operations
2. comparison operations

In [26]:
fibi_series = pd.Series([0, 1, 1, 2, 3, 5, 8])

fibi_series.head()

0    0
1    1
2    1
3    2
4    3
dtype: int64

In [27]:
fibi_series + 1

0    1
1    2
2    2
3    3
4    4
5    6
6    9
dtype: int64

In [28]:
fibi_series/2

0    0.0
1    0.5
2    0.5
3    1.0
4    1.5
5    2.5
6    4.0
dtype: float64

In [29]:
fibi_series >= 5

0    False
1    False
2    False
3    False
4    False
5     True
6     True
dtype: bool

In [32]:
(fibi_series >= 3) & (fibi_series % 2 == 0)

0    False
1    False
2    False
3    False
4    False
5    False
6     True
dtype: bool

#### Summary

- Just as in Numpy, we can perform operations on each element in the series by simply applying the series, s + 1, s/2, s == 3, etc. and each will be evaluated. 

- a series is always returned
- a series of booleans if we are giving condition statements. 
- a series of transformed values if we are doing an arithmetic operation. 

### Series Attributes

**Attributes** return useful information about a Series' properties; they don't perform operations or calculations with the Series. Attributes are easily accessible using dot notation like we will see in the examples below. Jupyter Notebook allows you to quickly access a list of available attributes by pressing the tab key after the series name followed by a period or dot; this is called dot notation or attribute access.

There are several components that make up a pandas Series, and I can easliy access each component by using attributes.

`.index`

The index allows us to reference items in the series. In our numbers_series, the index consists of the numbers 0-3.

In [33]:
fibi_series.index

RangeIndex(start=0, stop=7, step=1)

`.values`

The values are my data.

In [34]:
# The values are stored in a NumPy array. Hello vectorized operations!

fibi_series.values

array([0, 1, 1, 2, 3, 5, 8])

`.dtype`

The dtype is the data type of the elements in the Series. In our numbers_series, the data type is int64; it was inferred from the data we used.

Pandas has several main data types we will work with:

- int: integer, whole number values
- float: decimal numbers
- bool: true or false values
- object: strings
- category: a fixed and limited set of string value

In [35]:
fibi_series.dtype

dtype('int64')

`.name`

The name is an optional human-friendly name for the Series.

Our Series doesn't have a name, but we can give it one:

In [36]:
fibi_series.name = 'Fibonacci'
fibi_series

0    0
1    1
2    1
3    2
4    3
5    5
6    8
Name: Fibonacci, dtype: int64

`.size`

The .size attribute returns an int representing the number of rows in the Series. NULL values are included.

In [37]:
fibi_series.size

7

`.shape`

The .shape attribute returns a tuple representing the rows and columns when used on a two-dimensional structure like a DataFrame, but it can also be used on a Series to return its number of rows. NULL values are included.

In [None]:
fibi_series.shape

### Series Methods

**Methods** used on pandas Series objects often return new Series objects; most also offer parameters with default settings designed to keep the user from mutating the original Series objects. (inplace=False)

If I want to save any manipulations or transformations I make on my Series, I can either assign the Series to a variable or adjust my parameters (inplace=True).

___________________

- `.head()`: returns the 1st 5 rows (max) of the series

In [None]:
fibi_series.head()

- `.tail()`: returns the last 5 rows of the series

In [None]:
fibi_series.tail()

- `.sample()`: returns a random sample of rows in the Series; n = 1 by default. Again, the index is retained.


In [39]:
sleep_df = data('sleepstudy')
sleep_days_series = sleep_df.Days

In [40]:
sleep_days_series.sample(5)

135    4
86     5
110    9
84     3
173    2
Name: Days, dtype: int64

- `.value_counts()`: count number of records/items/rows containing each unique value (think "group by")

In [41]:
sleep_days_series.value_counts()

0    18
1    18
2    18
3    18
4    18
5    18
6    18
7    18
8    18
9    18
Name: Days, dtype: int64

In SQL, this would look like: 

```sql
select Days, count(Subject) from my_df group by Days;
```

#### Descriptive stats

Pandas has a number of methods that can be used to view summary statistics about our data. The table below [taken from here](https://pandas.pydata.org/pandas-docs/stable/basics.html#descriptive-statistics) provides a summary of some of the most commonly used methods.

| Function |	Description |
|:----------|:----------------|
| count |	Number of non-NA observations |
| sum | 	Sum of values |
| mean | 	Mean of values |
| median |	Arithmetic median of values | 
| min |	Minimum | 
| max |	Maximum |
| mode | 	Mode | 
| abs | 	Absolute Value | 
| std | 	Bessel-corrected sample standard deviation | 
| quantile |	Sample quantile (value at %) |

In [42]:
sleep_df = data('sleepstudy')
sleep_reaction_time_series = sleep_df.Reaction

In [43]:
{
    'count': sleep_reaction_time_series.count(),
    'sum': sleep_reaction_time_series.sum(),
    'mean': sleep_reaction_time_series.mean(),
    'median': sleep_reaction_time_series.median()
    
}

{'count': 180,
 'sum': 53731.42049999999,
 'mean': 298.50789166666664,
 'median': 288.6508}

- `.describe()`: returns a series of descriptive statistics on a pandas Series. The information it returns depends on the data type of the elements in the Series.

In [45]:
sleep_reaction_time_series.describe()

count    180.000000
mean     298.507892
std       56.328757
min      194.332200
25%      255.375825
50%      288.650800
75%      336.752075
max      466.353500
Name: Reaction, dtype: float64

In [None]:
print(fibi_series)
fibi_series.describe()

`.nlargest()`, `.nsmallest()`

These methods allow me to return the n largest or n smallest values from a pandas Series. I can set the keep parameter to first, last, or all to deal with duplicate largest or smallest values; this is quite handy.

The default argument for keep is shown below.

In [None]:
fibi_series.nlargest(n=3, keep='first')

In [None]:
fibi_series.nsmallest(n=2, keep='all')

`.sort_values()`, `.sort_index()`

These are handy methods that allow you to either sort your Series values or index respectively in ascending or descending order.

I can use the parameters for these methods to customize my sorts to meet my needs.

In [46]:
# ascending = True is default and doesn't need to be included. 
sleep_reaction_time_series.sort_values(ascending=True)

22     194.3322
21     199.0539
13     202.9778
14     204.7070
12     205.2658
         ...   
9      430.5853
57     454.1619
99     455.8643
100    458.9167
10     466.3535
Name: Reaction, Length: 180, dtype: float64

In [49]:
fibi_series.sort_values(ascending=False)

6    8
5    5
4    3
3    2
1    1
2    1
0    0
Name: Fibonacci, dtype: int64

In [48]:
sleep_reaction_time_series.sort_index(ascending=False)

180    364.1236
179    369.1417
178    343.2199
177    334.4818
176    329.6076
         ...   
5      356.8519
4      321.4398
3      250.8006
2      258.7047
1      249.5600
Name: Reaction, Length: 180, dtype: float64

## 3. Exercises Part I

Make a file named pandas_series.py or pandas_series.ipynb for the following exercises.

Use pandas to create a Series named fruits from the following list:

`["kiwi", "mango", "strawberry", "pineapple", "gala apple", "honeycrisp apple", "tomato", "watermelon", "honeydew", "kiwi", "kiwi", "kiwi", "mango", "blueberry", "blackberry", "gooseberry", "papaya"]`


Use Series attributes and methods to explore your fruits Series.

1. Determine the number of elements in fruits.

2. Output only the index from fruits.

3. Output only the values from fruits.

4. Confirm the data type of the values in fruits.

5. Output only the first five values from fruits. Output the last three values. Output two random values from fruits.

6. Run the .describe() on fruits to see what information it returns when called on a Series with string values.

7. Run the code necessary to produce only the unique string values from fruits.

8. Determine how many times each unique string value occurs in fruits.

9. Determine the string value that occurs most frequently in fruits.

10. Determine the string value that occurs least frequently in fruits.



_________________________
_________________________

## 4. Series Part II

- Indexing and subsetting
- The .str Attribute
- Methods: .any, .all, .isin, .apply

### Subsetting & Indexing

We can select subsets of our data using index labels, index position, or boolean sequences (list, array, Series).

We can also pass a sequence of boolean values to the indexing operator, []; that sequence could be a list or array, but it can also be another pandas Series **if the index of the boolean Series matches the original Series.**

In [51]:
pi_series = pd.Series([3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5])
booleans = pi_series > 5
booleans

0     False
1     False
2     False
3     False
4     False
5      True
6     False
7      True
8     False
9     False
10    False
dtype: bool

In [52]:
pi_series[booleans]

5    9
7    6
dtype: int64

I can simply pass my conditional expression into the indexing operator, too.

In [53]:
pi_series[pi_series > 5]

5    9
7    6
dtype: int64

We can create compound logical statements to narrow/expand our subsetting options 
Wrap parentheses around each comparison. The pipe `|` character is for OR, and the `&` is for AND. 


In [54]:
# Find the numbers that are even or greater than 5
pi_series[(pi_series % 2 == 0) | (pi_series > 5)]

2    4
5    9
6    2
7    6
dtype: int64

In [55]:
# Alternative syntax without the parentheses
is_even = pi_series % 2 == 0
greater_than_five = pi_series > 7

pi_series[is_even | greater_than_five]

2    4
5    9
6    2
7    6
dtype: int64

In [56]:
# Find the numbers that are even AND less than 5
pi_series[(pi_series % 2 == 0) & (pi_series < 5)]

2    4
6    2
dtype: int64

Let's use subsetting to find all response times that are in the 4th quartile of response times. 

1. identify the value at the 75th percentile using `describe` and subsetting using the row name '75%'. 
2. subset the series using that value in a conditional statement (where 'values' > 'q3')

In [59]:
sleep_reaction_time_series.describe()
# let's change the units from seconds to minutes for interpretability
# what type of object does this return? series

# convert seconds to minutes:
sleep_reaction_minutes_series = sleep_reaction_time_series/60

# it's a series!
type(sleep_reaction_minutes_series.describe())



pandas.core.series.Series

1. identify the value at the 75th percentile using `describe` and subsetting using the row name '75%'. 

In [61]:
q3 = sleep_reaction_minutes_series.describe()['75%']
q3

5.612534583333333

2. Subset the series to the 4th quartile only, using that value in a conditional statement (where 'values' > 'q3')

In [65]:
q4_series = sleep_reaction_minutes_series[sleep_reaction_minutes_series > q3].head()
q4_series.describe()

count    5.000000
mean     6.835615
std      0.708026
min      5.947532
25%      6.370063
50%      6.911502
75%      7.176422
max      7.772558
Name: Reaction, dtype: float64

### The .str Attribute

In addition to vectorized arithmetic operations, pandas also provides us with a way to vectorize string manipulation. Once we access the .str attribute, we can apply a string method to each string value in a Series. Performing string manipulation like this does not mutate my original Series; I have to assign my manipulation to a variable if I want to keep it.

For example, we can call the .lower method, which will convert each string value in the string_series to lowercase.

In [68]:
ds_team_series = pd.Series(['Adam', 'Adam', 'Andrew', 'Carina', 'John', 'John', 
                            'Madeleine', 'Misty', 'Margaret', 'Ryan', 'Tasha'
                           ])

ds_team_series.str.lower()

0          adam
1          adam
2        andrew
3        carina
4          john
5          john
6     madeleine
7         misty
8      margaret
9          ryan
10        tasha
dtype: object

In [69]:
ds_team_series = ds_team_series.str.replace('rgaret', 'ggie')
# replaces value, Margaret to Maggie
ds_team_series

0          Adam
1          Adam
2        Andrew
3        Carina
4          John
5          John
6     Madeleine
7         Misty
8        Maggie
9          Ryan
10        Tasha
dtype: object

In [70]:
string_series = pd.Series(['Hello', 'CodeuP', 'StUDenTs'])
string_series


0       Hello
1      CodeuP
2    StUDenTs
dtype: object

In [71]:
string_series.str.lower()


0       hello
1      codeup
2    students
dtype: object

In [72]:
string_series.str.replace('e', '_')


0       H_llo
1      Cod_uP
2    StUD_nTs
dtype: object

In [73]:
# Since each method returns a Series, I can use method chaining like this.

string_series.str.lower().str.replace('e', '_')


0       h_llo
1      cod_up
2    stud_nts
dtype: object

In [76]:
# I can even use method chaining and indexing!

string_series[string_series.str.lower().str.startswith('h')]

# the conversion to lower only happened in the mask not for the whole series.  see how it's inside the brackets?


0    Hello
dtype: object

In [77]:
# Notice my original string_series is not mutated. 

string_series


0       Hello
1      CodeuP
2    StUDenTs
dtype: object

### More Series Methods

- `.any()`: returns a single boolean...do any values in the series meet the condition? 

In [78]:
(fibi_series > 3).any()

True

- `.all()`: returns a single boolean...do all values in the series meet the condition? 

In [80]:
(fibi_series > 3).all()

False

- `.isin()`: comparing string of each item in series to a list of strings. Is the string in your series found in the list of strings? Returns a series of boolean values. 

In [84]:
# Use `isin()` to tell whether each value is in a set of known values. 
vowels = list('aeiouy')
letters = list('abcdefghijkeliminnow')
letters_series = pd.Series(letters)


In [85]:
letters_series


0     a
1     b
2     c
3     d
4     e
5     f
6     g
7     h
8     i
9     j
10    k
11    e
12    l
13    i
14    m
15    i
16    n
17    n
18    o
19    w
dtype: object

In [86]:
letters_series.isin(vowels).value_counts()

False    13
True      7
dtype: int64

In [88]:
letters_series[letters_series.isin(vowels)]

0     a
4     e
8     i
11    e
13    i
15    i
18    o
dtype: object

- - - 



- `.apply()`: apply a function to each item in a series. 

    1. define the function -> series.apply(fcn)
    2. using a lambda: series.apply(lambda n: )
    
1. Define the function the .apply(your_function)

Below we define a function, even_or_odd, then reference that function when we call .apply. Notice that when we reference the even_or_odd function, we are not calling the function, rather, we are passing the even_or_odd function itself to the .apply method as an argument, which pandas will then call on every element of the Series.

In [89]:
def even_or_odd(n):
    '''
    this function takes a number and returns a string indicating 
    whether the passed number is even or odd
    '''
    if n % 2 == 0:
        return 'even'
    else:
        return 'odd'

fibi_series.apply(even_or_odd)

0    even
1     odd
2     odd
3    even
4     odd
5     odd
6    even
Name: Fibonacci, dtype: object

2. Use a lambda

It is also very common to see lambda functions used along with .apply. We could re-write the above example with a lambda function like so:

In [90]:
fibi_series.apply(lambda n: 'even' if n % 2 == 0 else 'odd')

0    even
1     odd
2     odd
3    even
4     odd
5     odd
6    even
Name: Fibonacci, dtype: object

Going back to series where we were looking for 4th quartile, let's create a new series that contained labels of 'q4' if the value was above the 3rd q, or 'q1-q3' if not. 

In [91]:
sleep_reaction_minutes_series.apply(lambda n: 
                                    'q4' if n > q3 else 'q1-q3')

1      q1-q3
2      q1-q3
3      q1-q3
4      q1-q3
5         q4
       ...  
176    q1-q3
177    q1-q3
178       q4
179       q4
180       q4
Name: Reaction, Length: 180, dtype: object

## 5. Exercises Part II

Explore more attributes and methods while you continue to work with the fruits Series.

1. Capitalize all the string values in fruits.

2. Count the letter "a" in all the string values (use string vectorization).

3. Output the number of vowels in each and every string value.

4. Write the code to get the longest string value from fruits.

5. Write the code to get the string values with 5 or more letters in the name.

6. Find the fruit(s) containing the letter "o" two or more times.

7. Write the code to get only the string values containing the substring "berry".

8. Write the code to get only the string values containing the substring "apple".

9. Which string value contains the most vowels?

## 6. Series Part III

- Binning
- Plotting

### Numerical to Categorical Values - Binning & Cutting

`pd.cut(series, bins=n)` put numerical values into discrete bins. 

We can either specify the number of bins to create, and pandas will create bins of equal size, or we can specify the bin edges ourselves by passing a list of bin edges or cutoffs.

In [92]:
# create bins of equal intervals
reaction_bins_series = pd.cut(sleep_reaction_minutes_series, 4)

In [93]:
reaction_bins_series.value_counts()

(4.372, 5.506]    75
(3.234, 4.372]    53
(5.506, 6.639]    44
(6.639, 7.773]     8
Name: Reaction, dtype: int64

In [97]:
# specify bins to create
reaction_bins_series = pd.cut(sleep_reaction_minutes_series, [3, 4, 5, 6, 7, 8])
reaction_bins_series

1      (4, 5]
2      (4, 5]
3      (4, 5]
4      (5, 6]
5      (5, 6]
        ...  
176    (5, 6]
177    (5, 6]
178    (5, 6]
179    (6, 7]
180    (6, 7]
Name: Reaction, Length: 180, dtype: category
Categories (5, interval[int64, right]): [(3, 4] < (4, 5] < (5, 6] < (6, 7] < (7, 8]]

In [95]:
reaction_bins_series.value_counts()

(4, 5]    74
(5, 6]    51
(3, 4]    28
(6, 7]    22
(7, 8]     5
Name: Reaction, dtype: int64

`value_counts(bins=n)`

The `value_counts` method can also be valuable here. It has a parameter named `bins`, which will allow us to quickly bin and group our data at the same time if that is our desired end goal.

In [98]:
sleep_reaction_minutes_series.value_counts(bins=5)

(4.146, 5.052]    70
(5.052, 5.959]    48
(3.233, 4.146]    35
(5.959, 6.866]    20
(6.866, 7.773]     7
Name: Reaction, dtype: int64

### Plotting

The .plot() method allows us to quickly visualize the data in a Series. It's built on top of Matplotlib. 

- By default, Matplotlib will choose the best type of plot for us.

- We can also customize our plot using the paramters of the .plot method or by using Matplot lib if we like. We will look at examples of both ways below.

Check the [docs](https://pandas.pydata.org/pandas-docs/version/0.24.2/reference/api/pandas.Series.plot.html) here for more on the .plot() method.

In [None]:
# Matplotlib is choosing the plot for us here, and it 
# might tell the story we want.

nums_series = pd.Series([1, 5, 5, 5, 10, 20, 100, 40])
nums_series.plot()

In [None]:
# So, here we specify the type of plot we would like 
# Matplotlib to use.
nums_series.plot.hist()

In [None]:
# Use the parameters of the .plot method to customize my chart.

(
    pd.Series(['a', 'b', 'a', 'c', 'b', 'a', 'd', 'a']).
    value_counts().plot.bar(title='Example Pandas Visualization', 
                            rot=0, 
                            color='firebrick', 
                            ec='black',
                            width=.9).set(xlabel='Letter',
                            ylabel='Frequency')
)

## 7. Exercises Part III

Use pandas to create a Series named letters from the following string. The easiest way to make this string into a Pandas series is to use list to convert each individual letter into a single string on a basic Python list.

`'hnvidduckkqxwymbimkccexbkmqygkxoyndmcxnwqarhyffsjpsrabtjzsypmzadfavyrnndndvswreauxovncxtwzpwejilzjrmmbbgbyxvjtewqthafnbkqplarokkyydtubbmnexoypulzwfhqvckdpqtpoppzqrmcvhhpwgjwupgzhiofohawytlsiyecuproguy'`

1. Which letter occurs the most frequently in the letters Series?

2. Which letter occurs the Least frequently?

3. How many vowels are in the Series?

4. How many consonants are in the Series?

5. Create a Series that has all of the same letters but uppercased.

6. Create a bar plot of the frequencies of the 6 most commonly occuring letters.

Use pandas to create a Series named numbers from the following list:

`['$796,459.41', '$278.60', '$482,571.67', '$4,503,915.98', '$2,121,418.3', '$1,260,813.3', '$87,231.01', '$1,509,175.45', '$4,138,548.00', '$2,848,913.80', '$594,715.39', '$4,789,988.17', '$4,513,644.5', '$3,191,059.97', '$1,758,712.24', '$4,338,283.54', '$4,738,303.38', '$2,791,759.67', '$769,681.94', '$452,650.23']`

1. What is the data type of the numbers Series?

2. How many elements are in the number Series?

3. Perform the necessary manipulations by accessing Series attributes and methods to convert the numbers Series to a numeric data type.

4. Run the code to discover the maximum value from the Series.

5. Run the code to discover the minimum value from the Series.

6. What is the range of the values in the Series?

7. Bin the data into 4 equally sized intervals or bins and output how many values fall into each bin.

8. Plot the binned data in a meaningful way. Be sure to include a title and axis labels.

Use pandas to create a Series named exam_scores from the following list:

`[60, 86, 75, 62, 93, 71, 60, 83, 95, 78, 65, 72, 69, 81, 96, 80, 85, 92, 82, 78]`

1. How many elements are in the exam_scores Series?

2. Run the code to discover the minimum, the maximum, the mean, and the median scores for the exam_scores Series.

3. Plot the Series in a meaningful way and make sure your chart has a title and axis labels.

4. Write the code necessary to implement a curve for your exam_grades Series and save this as curved_grades. Add the necessary points to the highest grade to make it 100, and add the same number of points to every other score in the Series as well.

5. Use a method to convert each of the numeric values in the curved_grades Series into a categorical value of letter grades. For example, 86 should be a 'B' and 95 should be an 'A'. Save this as a Series named letter_grades.

6. Plot your new categorical letter_grades Series in a meaninful way and include a title and axis labels.

## More Practice

Revisit the exercises from https://gist.github.com/ryanorsinger/f7d7c1dd6a328730c04f3dc5c5c69f3a.

After you complete each set of Series exercises, use any extra time you have to pursue the challenge below. You can work on these in the same notebook or file as the Series exercises or create a new practice notebook you can work in a little every day to keep your python and pandas skills sharp by trying to solve problems in multiple ways. These are not a part of the Series exercises grade, so don't worry if it takes you days or weeks to meet the challenge.

**Challenge yourself to be able to...**

- solve each using vanilla python.

- solve each using list comprehensions.

- solve each by using a pandas Series for the data structure instead of lists and using vectorized operations instead of loops and list comprehensions.